unsloth

mirror of https://github.com/unslothai/unsloth synced 2026-04-21 13:37:39 +00:00

Author	SHA1	Message	Date
Roland Tannous	21e9a91a57	Studio: forward standard OpenAI tools / tool_choice on /v1/responses (Codex compat) (#5122 ) * Studio: forward standard OpenAI tools / tool_choice on /v1/responses Mirrors the /v1/chat/completions client-side tool pass-through from #5099 so clients (OpenAI Codex CLI, OpenAI Python SDK, ...) that target the Responses API receive structured function_call output items instead of plain text with tool-call tokens leaking into content. - ResponsesRequest: type tools/tool_choice properly, add parallel_tool_calls; accept function_call and function_call_output input items for multi-turn - Translate flat Responses tool / tool_choice shape to the nested Chat Completions shape before forwarding to llama-server - _normalise_responses_input: map function_call_output -> role="tool", function_call -> assistant tool_calls (preserving call_id) - Non-streaming: map returned tool_calls -> top-level function_call output items keyed by call_id - Streaming: emit response.output_item.added (function_call), response.function_call_arguments.delta/.done, and response.output_item.done per tool call while keeping the text message at output_index 0 - Pytest coverage: tools/tool_choice translation, multi-turn input mapping, non-streaming tool_calls mapping, response round-trip * Studio: merge system messages and close inner stream on /v1/responses Fixes two issues surfacing when OpenAI Codex CLI drives /v1/responses against a GGUF with a strict chat template (gpt-oss harmony, Qwen3, ...). 1. "System message must be at the beginning" upstream errors Codex sends `instructions` AND a `role:"developer"` message in `input`, producing two separate system-role messages. Strict templates raise when a second system message exists or when one appears after a user turn. _normalise_responses_input now hoists all instructions / system / developer content into a single merged system message at the top of the Chat Completions message list. 2. "async generator ignored GeneratorExit" / "Attempted to exit cancel scope in a different task" _responses_stream consumed the inner chat-completions body_iterator without an explicit aclose() in a finally block. On client disconnect (Codex frequently cancels mid-stream), Python 3.13 finalized the inner async generator on a different task, tripping anyio's cancel-scope check. Mirrored the same try/finally + aclose pattern used by the /v1/messages, /v1/chat/completions, and /v1/completions passthroughs. Tests: hoisting of instructions + developer, developer mid-conversation, multiple system messages in input, no-system passthrough. * Studio: accept Codex multi-turn shapes and fix cross-task stream close on /v1/responses Two issues observed driving /v1/responses from OpenAI Codex CLI against a GGUF backend. 1. 422 on every turn after the first Codex replays prior assistant turns with `content:[{"type":"output_text","text":...,"annotations":[],"logprobs":[]}]` and carries forward `reasoning` items (o-series / gpt-5) between turns. Our `ResponsesContentPart` union only accepted input_text / input_image, and `ResponsesInputItem` only message / function_call / function_call_output, so Pydantic failed the whole list and FastAPI returned `"Input should be a valid string"` against the `str` branch of the outer union. - Add `ResponsesOutputTextPart` for assistant-replay content. - Add `ResponsesUnknownContentPart` and `ResponsesUnknownInputItem` as permissive catch-alls (drop during normalisation). - Wire an explicit `Discriminator` so dispatch is deterministic and the fallthrough reaches the catch-all instead of misreporting via the outer `Union[str, list[...]]`. - `_normalise_responses_input` now accepts output_text parts, flattens single-part assistant text to a plain string (keeps legacy chat templates happy), and silently drops reasoning / unknown items. 2. "async generator ignored GeneratorExit" / cross-task cancel scope `_responses_stream` awaited `openai_chat_completions` in the parent route-handler task, which opens the httpx client for the inner passthrough on that task. The outer `StreamingResponse` then iterates in a child task, so the asyncgen GC finalises the inner httpcore byte stream on the child task, tripping anyio's "Attempted to exit cancel scope in a different task". Move the `await` inside `event_generator` so the httpx lifecycle stays within the single streaming child task, and surface any HTTPException as a `response.failed` SSE frame. Tests: assistant output_text replay, reasoning-item tolerance, unknown content-part tolerance, end-to-end Codex-shape payload (developer + user + reasoning + function_call + function_call_output + assistant output_text + user), and single-part assistant flattening to plain string. * Studio: call llama-server directly from streaming /v1/responses The previous fix (running the inner await inside event_generator) was not enough. Wrapping the existing `openai_chat_completions` pass-through still stacks two async generators: when the outer generator is closed, the innermost `HTTP11ConnectionByteStream.__aiter__` in httpcore doesn't receive GeneratorExit before Python's asyncgen GC finalises it in a sibling task, tripping "Attempted to exit cancel scope in a different task" and "async generator ignored GeneratorExit" — the same Python 3.13 + httpcore 1.0.x interaction already seen in PRs #4956, #4981, #5099. Cure both pass-throughs had: a single same-task httpx lifecycle with explicit `aiter_lines().aclose()` BEFORE `resp.aclose()` / `client.aclose()` in the generator's finally block. Apply it at the Responses layer by dropping the wrapper entirely for GGUF: open httpx, consume `resp.aiter_lines()`, parse `chat.completion.chunk`, emit Responses SSE events, close everything in finally — all in the single StreamingResponse child task. Non-GGUF streaming is rejected with a 400 (wrapping the transformers backend would re-introduce the double-layer pattern and isn't a Codex-compatible path today anyway). Also surfaces upstream httpx.RequestError / non-200 as a `response.failed` SSE frame rather than a dropped stream now that the request is dispatched after SSE headers have gone out. * Studio: silence benign httpcore asyncgen GC warnings on Python 3.13 The streaming pass-throughs (/v1/chat/completions, /v1/messages, /v1/responses, /v1/completions) all use the proven #4981 / #5099 pattern — single-task httpx lifecycle with explicit aiter_lines().aclose() ahead of resp.aclose() / client.aclose() in the generator's finally block. That handles our own iterators correctly. The residual noise ("async generator ignored GeneratorExit" / "Attempted to exit cancel scope in a different task") comes from an innermost HTTP11ConnectionByteStream.__aiter__ that httpcore creates internally inside its pool. We hold no reference to it, so we cannot aclose it ourselves. Python 3.13's asyncgen GC hook finalises it on the finaliser task, its aclose path enters an anyio CancelScope shield, and Python flags the cross-task exit. The response has already been delivered with a 200 by then — it is purely log noise, not a functional failure. Same interaction seen in modelcontextprotocol/python-sdk #831, agno #3556, chainlit #2361, langchain-mcp-adapters #254. Install a targeted sys.unraisablehook that swallows this specific tuple — RuntimeError mentioning "cancel scope" or "GeneratorExit" plus an object repr referencing HTTP11ConnectionByteStream — and defers to the default hook for every other unraisable. Idempotent; guarded by a sentinel attribute so repeated imports don't stack filters.	2026-04-21 13:17:20 +04:00
Lee Jackson	c20959dbf4	Studio: Improve chat composition, fix scroll behaviour, and refine sidebar UX (#5089 ) * Chatbox, scroll, and menu fixes - Fixed chatbox auto-expand height for multi-line text on the compare page - Fixed chatbox UI to be consistent across compare and new chat - Fixed scrolling being enabled on pages with no content, which also triggered the scroll-to-bottom button - Fixed scroll-to-bottom button to only appear after scrolling up a reasonable amount instead of instantly - Added shutdown studio button to the menu for easier access - Fixed pop-up menu width to match the user button width (cherry picked from commit cd4e390dfa84fe311fae79a781b96cc0ef5970a9) * fix: correct compare scroll viewport and clean up chat composer UI polish * Dark theme refactor and sidebar/chat UI refinements - Complete refactoring of dark theme - Replaced square rounded-corner user profile image with a circular bordered one - Replaced user profile icon with 'U' initial and renamed label from 'Studio' to 'User' - Chat bubbles now have a pointy top-right edge - Sidebar menu tab line color selection is now consistent across all menus - Tab-selection color animation now also applies to recent chats - Removed 'Compare' menu autoselect when a compare chat conversation is selected - Fixed UI consistency in Compare to match New Chat - Removed sidebar animation and tab line, replaced with rounded selection for consistency - Further adjustments to sidebar UI - Further adjustments to compare chat UI * Fixed sidebar collapse/expand for recent chats and recent runs not being clickable * Chatbox, scroll, and menu fixes - Fixed chatbox auto-expand height for multi-line text on the compare page - Fixed chatbox UI to be consistent across compare and new chat - Fixed scrolling being enabled on pages with no content, which also triggered the scroll-to-bottom button - Fixed scroll-to-bottom button to only appear after scrolling up a reasonable amount instead of instantly - Added shutdown studio button to the menu for easier access - Fixed pop-up menu width to match the user button width * Sidebar, fonts, and chat UI refinements - Replaced logo PNG with real font text for 'unsloth' and 'BETA' label - Added Hellix font and applied it across menus and UI elements - Lighter scrollbar in the sidebar compared to other areas of the app - Adjusted chat font and chat bubble styling - Adjusted app menu design to stay consistent with the sidebar - Adjusted text style for 'New Chat' and repositioned content/chatbox - Adjusted model selector and top area UI - Fixed footer text from 'LLM's' to 'LLMs' - Fixed active selection border color incorrectly appearing on page refresh and during general navigation - Logo now defaults to 'New Chat' when clicked * Sidebar, model selector, and mobile UI fixes - Further adjustments to sidebar UI and logo - Changed right bar icon - Model selector adjustments - Collapsed sidebar now matches the content area background - Adjusted Hellix font spacing across pages - Fixed sidebar icon overlap on mobile screens * Adjust sidebar icons * Adjust sidebar icons * Fixed compare chat UI and scrolling issues * Fixed inference settings icon behavior and context info positioning - Fixed top right inference settings icon to move into sidepanel during expand/collapse, matching left sidebar behavior - Adjusted context information element positioning * Fix: textarea overflow in system prompt editor * Code block redesign, font, and chat bubble adjustments - Redesigned code block colors and theme - Changed code block font to Fira Code - Fixed scrollbar disappearing when expanding/collapsing tool calls in chats - Adjusted chat bubble background color * Fix chat bubble background color in dark theme * fix: restore textarea auto-sizing and scope prompt editor sizing * fix: add explicit textarea field sizing for prompt editor overflow * fix: generate chat nonce on click instead of render * fix: respect training lock on logo navigation * Refactor compare page dual chat scrolling behavior * Revert "Refactor compare page dual chat scrolling behavior" This reverts commit `d056ec09f2`. --------- Co-authored-by: sneakr <hauzin@hotmail.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-04-21 02:20:45 +04:00
Konstantin Azizov	0a5c61ffcc	fix: prefer mainstream clipboard copy over deprecated one (#5109 ) Fixes #5097 Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-04-20 23:18:18 +04:00
Lee Jackson	d3215ce113	Studio: Show LoRA live logs and update GGUF quant options (#5058 ) * export: update GGUF quant list and ordering * gguf: add Q2_K_L quantize flags for output and embeddings * export: add live console logs for LoRA export flow * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: stream q2_k_l quantize logs and include subprocess error details * fix: route Q2_K_L preset to q2_k ftype with q8_0 output+embeddings --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-04-20 23:14:49 +04:00
Lee Jackson	9c8a079d97	Studio: Local profile customization in settings and sync sidebar identity (#5088 ) * studio: add local profile customization in settings * studio: add local profile settings and sync sidebar identity * fix: adjust profile card margin * fix: move helper modules to utils and use single-letter avatar fallback * fix: keep profile icon visible on sidebar collapse * fix: sidebar account trigger labeling and profile reset prefs	2026-04-20 22:28:02 +04:00
Roland Tannous	9954781d30	fix(studio/chat): cancel in-flight run when trashing a thread from sidebar (#5067 ) Trashing a thread mid-stream used to delete the Dexie rows while the model kept generating, because the sidebar has no access to the @assistant-ui aui context. Expose per-thread cancelRun() through the chat runtime store and call it from deleteChatItem so trash behaves like Stop → Trash. Covers compare pairs by cancelling each paired thread. Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>	2026-04-20 21:06:59 +04:00
Michael Han	b24f3f61b8	Update README.md	2026-04-20 00:37:40 -07:00
Michael Han	f5eec8a6f2	Qwen3.6 and ReadMe revamp.md	2026-04-19 23:16:36 -07:00
Roland Tannous	ac2daf8b7a	Studio: forward standard OpenAI tools / tool_choice to llama-server (#5099 ) * fix(studio): forward OpenAI tools/tool_choice to llama-server (#4999) Studio's /v1/chat/completions silently stripped standard OpenAI `tools` and `tool_choice` fields, so clients using standard function calling (opencode, Claude Code, Cursor, Continue, ...) never got structured tool_calls back. Adds a client-side pass-through path mirroring the existing Anthropic /v1/messages flow: when `tools` is present without Studio's `enable_tools` shorthand, the request is forwarded to llama-server verbatim so the client sees native id, finish_reason ("tool_calls"), delta.tool_calls, and accurate usage tokens. Also wires Anthropic tool_choice forwarding: /v1/messages previously accepted tool_choice on the request model but silently dropped it with a warning. Translate the four Anthropic shapes to OpenAI format and forward them so agentic clients can actually enforce tool use. - ChatCompletionRequest: add tools, tool_choice, stop; extra="allow" - ChatMessage: accept role="tool", optional tool_call_id / tool_calls / name; content is now optional (assistant with only tool_calls) - routes/inference.py: _openai_passthrough_stream / _openai_passthrough_non_streaming helpers, routing branch in openai_chat_completions, vision+tools via content-parts injection - _build_passthrough_payload: tool_choice parameter (default "auto") - anthropic_compat: anthropic_tool_choice_to_openai() translator - tests/test_openai_tool_passthrough.py: Pydantic + translator unit tests - tests/test_studio_api.py: 5 new E2E tests (non-stream, stream, multi-turn, OpenAI SDK, Anthropic tool_choice=any regression) * fix(studio): surface httpx transport errors from OpenAI passthrough When the managed llama-server subprocess crashes mid-request, the async pass-through helpers in routes/inference.py used to return a bare 500 (non-streaming) or an "An internal error occurred" SSE chunk (streaming) because _friendly_error only recognized the sync path's "Lost connection to llama-server" substring -- httpx transport failures (ConnectError / ReadError / RemoteProtocolError / ReadTimeout) stringify differently and fell through to the generic case. - _friendly_error: map any httpx.RequestError subclass to the same "Lost connection to the model server" message the sync chat path emits. Placed before the substring heuristics so the streaming path automatically picks it up via its existing except Exception catch. - _openai_passthrough_non_streaming: wrap the httpx.AsyncClient.post in a try/except httpx.RequestError and re-raise as HTTPException 502 with the friendly detail. - tests/test_openai_tool_passthrough.py: new TestFriendlyErrorHttpx class pinning the mapping for ConnectError, ReadError, RemoteProtocolError, ReadTimeout, and confirming non-httpx paths (context-size heuristic, generic fallback) are unchanged. * fix(studio): close aiter_bytes/aiter_lines explicitly in passthroughs The httpcore asyncgen cleanup fix in `5cedd9a5` is incomplete on Python 3.13 + httpcore 1.0.x: it switched to manual client/response lifecycle but still used anonymous `async for raw_line in resp.aiter_lines():` patterns in all three streaming paths. Python's async for does NOT auto-close the iterator on break/return, so the aiter_lines / aiter_bytes async generator remains alive, reachable only from the surrounding coroutine frame. Once `_stream()` returns the frame is GC'd and the orphaned asyncgen is finalized on a LATER GC pass in a DIFFERENT asyncio task, where httpcore's HTTP11ConnectionByteStream.aclose() enters anyio.CancelScope.__exit__ with a mismatched task and prints "Exception ignored in: <async generator>" / "async generator ignored GeneratorExit" / "Attempted to exit cancel scope in a different task" to the server log. User observed this on /v1/messages after successful (status 200) requests, with the traceback pointing at HTTP11ConnectionByteStream .__aiter__ / .aclose inside httpcore. Fix: save resp.aiter_lines() / resp.aiter_bytes() as a variable and explicitly `await iter.aclose()` in the finally block BEFORE resp.aclose() / client.aclose(). This closes the asyncgen inside the current task's event loop, so the internal httpcore byte stream is cleaned up before Python's asyncgen GC hook has anything orphaned to finalize. Each aclose is wrapped in try/except Exception so nested anyio cleanup noise can't bubble out. Applied to all three streaming passthrough paths: - _anthropic_passthrough_stream (/v1/messages client-side tool path) - _openai_passthrough_stream (/v1/chat/completions client-side tool path, new in this PR) - openai_completions (/v1/completions bytes proxy from PR #4956) * fix(studio): default ChatCompletionRequest.stream to false per OpenAI spec OpenAI's /v1/chat/completions spec defaults `stream` to false, so clients that omit the field (naive curl, minimal integrations) expect a single JSON response back. Studio was defaulting to true, silently switching those clients into SSE and breaking any parser that didn't also handle streaming. ResponsesRequest and AnthropicMessagesRequest already default to false correctly; only ChatCompletionRequest was wrong. Studio's own frontend always sets `stream` explicitly on every chat-adapter / chat-api / runtime-provider call site, so the flip has no UI impact. SDK users (OpenAI Python/JS SDK, opencode, Claude Code, Cursor, Continue) also always pass `stream` explicitly, so they're unaffected. The only clients feeling the change are raw-curl users who were relying on the wrong default -- those get the correct OpenAI behavior now. Added a regression test pinning the default so it can't silently flip back. * fix(studio): reject images in OpenAI tool passthrough for text-only GGUFs The new tool passthrough branch runs before _extract_content_parts, skipping the existing not is_vision guard. Requests combining tools with an image on a text-only tool-capable GGUF were forwarded to llama-server, producing opaque upstream errors instead of the pre-existing clear 400. Restore the guard inline at the dispatch point, checking both legacy image_base64 and inline image_url parts. * fix(studio): require tool_call_id on role=tool chat messages Enforce the OpenAI spec rule that role="tool" messages must carry a tool_call_id. Without it, upstream backends cannot associate a tool result with the assistant's prior tool_calls entry and the request fails in non-obvious ways through the passthrough path. Reject at the request boundary with a 422 instead. * fix(studio): harden OpenAI tool passthrough validation and error surfacing Three related fixes called out by the PR review: 1. Preserve upstream status codes in the streaming passthrough. The httpx request is now dispatched before the StreamingResponse is constructed. Non-200 upstream responses and httpx RequestError transport failures raise HTTPException with the real status instead of being buried inside a 200 SSE error frame, so OpenAI SDK clients see APIError/BadRequestError/... as expected. 2. Require non-empty content on user/system/tool messages. Per the OpenAI spec, content may only be omitted on assistant messages that carry tool_calls; enforce that at the request boundary so malformed messages never reach the passthrough path. 3. Role-constrain tool-call metadata. tool_calls is only valid on role=assistant, tool_call_id and name only on role=tool. Without this, a user/system message with tool_calls would flip the passthrough branch on and be forwarded to llama-server, surfacing as an opaque upstream error. * fix(studio): normalize image mode and passthrough JSON verbatim Two Gemini-code-assist review findings on PR #5099: 1. Unconditionally convert decoded images to RGB before PNG encoding. The prior code only handled RGBA, letting CMYK/I/F images crash at img.save(format="PNG") and surface as opaque 400s. Applied to both the passthrough helper and the non-passthrough GGUF path that originally carried this pattern, keeping the two sites in sync. 2. Return the upstream JSON body as raw bytes via Response rather than parse-then-re-serialize with JSONResponse. Matches the passthrough helper's "verbatim" contract and drops a redundant round-trip. --------- Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-18 12:53:23 +04:00
Manan Shah	7d0d2f256c	Add qwen3.6 script (#5084 ) * unsloth gemma4 support files * some fixes * Fixing cache.empty() calls (#4813) * Fixing cache.empty() calls * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Manan Shah <mananshah@Manans-MacBook-Pro.local> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix/gemma4 mlx (#4816) * Fixing cache.empty() calls * fixing for mlx versions * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Manan Shah <mananshah@Manans-MacBook-Pro.local> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * removed bidirectional check for 31b (#4839) Co-authored-by: Manan17 <shahmanan170602@gmail.coml> * Add Gemma 4 26B MoE support (MLX) (#4844) * removed bidirectional check for 31b * Change gemma4_text for moe * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Manan Shah <mananshah@Manans-MacBook-Pro.local> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix(gemma4): cast RoPE offset to int before mx.arange() (#4901) * fix(gemma4): cast RoPE offset to int before mx.arange() * fix(gemma4): use zero-based arange + offset to avoid CPU-GPU sync * qwen3.6 patches for multi-turn chat * qwen3.6 script * removing unnecessary scripts * displaying errors for not installed packages --------- Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Manan Shah <mananshah@Manans-MacBook-Pro.local> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Manan17 <shahmanan170602@gmail.coml> Co-authored-by: Théophile Lafargue <138336683+eauchs@users.noreply.github.com>	2026-04-17 01:21:30 -07:00
Daniel Han	d20b306755	Versioning	2026-04-16 12:06:10 -07:00
Daniel Han	0b57884120	Add Qwen3.6 inference defaults for Studio (#5065 ) * Add Qwen3.6 inference defaults for Studio Add qwen3.6 family entry to inference_defaults.json with the recommended sampling parameters from Qwen's documentation: temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0. Without this, Qwen3.6 models fall through to the generic qwen3 pattern which uses different defaults (temperature=0.6, top_p=0.95, no presence_penalty). * Add Qwen3.6-35B-A3B-GGUF to default model lists * Add Qwen3.5/3.6 presence_penalty to thinking toggle and small-model disable logic - Thinking toggle (on-load + button click) now sets presencePenalty: 1.5 for Qwen3.5 and Qwen3.6 models (both thinking-ON and thinking-OFF states) - Small-model thinking-disable check (<9B defaults to no-thinking) extended from Qwen3.5-only to also cover Qwen3.6, in all 3 locations: frontend on-load, frontend refresh, backend llama_cpp.py	2026-04-16 11:42:42 -07:00
Daniel Han	d56f980452	fix: multi-GPU inference crash for bnb 4-bit/8-bit models (#5068 ) * fix: multi-GPU inference crash for bnb 4-bit/8-bit models When load_in_4bit or load_in_8bit is used with device_map="sequential" and max_memory constraints that place weights across multiple GPUs (or entirely on a non-default GPU like cuda:1), the bitsandbytes loading path in transformers never calls dispatch_model. No AlignDevicesHook is installed, and the first forward/generate call crashes with: RuntimeError: Expected all tensors to be on the same device This adds _attach_bnb_multidevice_hooks() which is called after from_pretrained returns. It infers a device map from actual parameter placements and calls dispatch_model(force_hooks=True) to install the missing hooks. The function is a complete no-op for the common single-GPU cuda:0 case. Call sites: FastBaseModel.from_pretrained (vision.py) and FastLlamaModel.from_pretrained (llama.py). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: align with PR #5053 final review improvements - Add hook call to the bnb quantized loading branch in llama.py (the primary load_in_4bit path), not just the non-fast-inference fallback - Expand bnb detection: also check model.is_loaded_in_4bit, model.is_loaded_in_8bit, model.quantization_method - Pass explicit main_device and skip_keys to dispatch_model - Use logger.info instead of print for the success message - Use kwargs.get("load_in_8bit", False) at llama.py call sites * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-16 11:35:02 -07:00
Lee Jackson	ee86530e55	chore: switch helper and no-cache fallback to Gemma (#5066 )	2026-04-16 22:27:30 +04:00
Wasim Yousef Said	bc9ddb3af6	Fix onboarding followups (#5064 ) * Fix onboarding followups * Rename sidebar studio to train	2026-04-16 10:11:35 -07:00
Wasim Yousef Said	7ef65bd2e5	Chat first onboarding (#5063 ) * auth: default to chat * settings: relaunch onboarding * onboarding: return to launch page * studio: stop auto guided tour * ui: soften global radius * cleanup: rename onboarding exit prop * fix onboarding redirect safety * Show real Unsloth version in settings * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-16 09:58:10 -07:00
हिमांशु	f4422b0a62	change torchcodec version to 0.10.0 in extra-no-deps (#5043 ) Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-04-16 19:50:57 +04:00
Wasim Yousef Said	b01e9af124	feat(studio): replace navbar with collapsible sidebar (#4936 ) * feat(studio): replace navbar navigation with collapsible sidebar Add an app-wide sidebar with hover-expand and pin-to-dock behavior. Navigation items (Studio, Recipes, Export, Chat) move from the center pill navbar to the sidebar. Chat threads and recipes render as collapsible sub-lists. Navbar simplified to logo + update + close. - Extend SidebarProvider with pinned/hovered state model - New AppSidebar with animated active indicator, sloth profile menu, theme toggle, guided tour, back/forward navigation - Chat page refactored to URL-driven view state via search params - Extract reusable hooks for chat thread and recipe sidebar data - Guard startViewTransition for browser compatibility - Wrap chat deletions in Dexie transaction for data integrity * feat(studio): move logo to sidebar and make navbar overlay - Sidebar is now full-height with logo in SidebarHeader - Collapsed sidebar shows sticker.png, expanded shows full logo - Navbar is absolute-positioned overlay (no layout space) - Main content extends to top, aligning with navbar controls * feat(studio): full-height sidebar with recents, edge-to-edge nav buttons - Sidebar outside max-w-7xl, pinned to left edge - Remove sidebar rounding, menu buttons rounded-md - Nav buttons flush to sidebar edges with no left rounding - Replace collapsible recipes/chat with flat nav items - Add Recents section with chat history (1 item when not on chat, full on chat) - New Chat as first nav item with PencilEdit02Icon - Cursor pointer on all sidebar buttons - Navbar temporarily hidden for screenshots * fix(studio): fix chat scroll, action bar hover, collapsible recents - Fix sticky composer by removing `relative` override on viewport footer - Action bar buttons only show on hover (autohide=always) - Remove floating border/shadow from action bar - Add scroll space above composer for last message actions - Back/forward buttons use router history (stay in-app) - Recents section collapsible with chevron on chat route - Set html/body/#root height for proper h-full chain * fix(studio): address review feedback, clean up unused code - Unhide navbar (was left hidden from screenshot) - Remove unused imports: SidebarMenuSub, BubbleChatIcon, ColumnInsertIcon - Remove unused vars: recipeItems, activeRecipeId, canCompare, recipesOpen - Include compare query id in active sidebar selection - Use store type for contextUsage instead of inline type - Simplify noop in sidebar.tsx - Remove empty className prop feat(studio): add mobile sidebar, recent runs section, and misc UX fixes * feat(studio): scaffold settings feature module with dialog store * feat(studio): add tri-state theme store for settings * feat(chat): add clear-all-chats and export-chat-history utils * feat(studio): add settings dialog shell with tab rail * feat(studio): add appearance tab with theme and sidebar pin * feat(studio): add settings general tab with hf token, auto-title, reset prefs * feat(studio): add settings chat tab with export and clear * feat(studio): add api keys tab with list and revoke flow * feat(studio): add create-key form and reveal dialog * feat(studio): add usage examples panel to api keys tab * feat(studio): add settings about tab with update and shutdown * feat(studio): add settings dropdown item and cmd-comma shortcut * feat(studio): remove legacy api-keys route and chat-sheet preference rows * fix(studio): settings dialog a11y + polish pass * feat(studio): inline api key reveal card replacing nested dialog * fix(studio): hide revoked keys from settings list * refactor(studio): strip navbar and hoist training unload guard * feat(studio): explicit sidebar toggle, remove hover-open and pin icons * fix(studio): use SidebarRight01Icon for collapsed sidebar open toggle * fix(studio): address code review findings for settings dialog * feat(studio): collapsible navigate group with standalone new-chat and compare * fix(studio): chat-only standalone actions, use ColumnInsertIcon for compare * fix(studio): sidebar new-chat/compare state reset and icon-mode collapsible * feat(studio): add compact logo assets for sidebar header * Fixed sidebar design * fix(studio): sidebar delete icon hover contrast and sizing * feat(studio): route-gate sidebar recents (chats off /studio, runs on /studio) * feat(studio): add chat search store * feat(studio): add chat search index hook with snapshot-on-open * feat(studio): add chat search command dialog with global shortcut * feat(studio): wire chat search into sidebar * fix(studio): trim hf token on save, add show/hide toggle, commit on close * revert(studio): restore original sidebar/border colors, brighten sidebar * feat(studio): forward overlayClassName through CommandDialog * fix(studio): wrap search dialog in Command context, redesign as flat 635px card * fix(studio): reserve right padding on recent items so delete icon stops overlapping title * fix(studio): skip hf token unmount-commit during reset-prefs reload * chore(studio): drop unused icon import and unreachable runs navigate branch * fix(studio): chat search index filters archived before limit, batches message query, picks up reasoning text * fix(studio): keep CommandEmpty in tree so empty state renders correctly * fix(studio): cap system prompt and chat template textareas so they scroll instead of growing * fix(studio): attach chat-compare tour anchor to sidebar compare button * fix(studio): persist system theme explicitly so next-themes does not clobber on reload * fix(studio): auto-switch to history tab when selecting a recent run from sidebar * UI overhaul: chatbox, scrollbar, sidebar, and compare view UI Changes: - Redesigned the Compare UI with general cleanup - Redesigned the Chatbox UI - Reduced the width of the user chat bubble for improved readability - Narrowed the user chat box across the content page - Adjusted thinking-box text color to be slightly darker - Removed faded text effect from chat messages - Removed faded text effect from the thinking box - Added a small LLM chat safety note at the bottom of the chatbox - Restyled the scrollbar Layout & Behavior: - Reworked the scrollbar to span the full height of the page (no top/bottom padding) and remain persistently visible when content is scrollable, rather than only on hover - Reworked the Configuration sidebar to span full height — removed rounded corners and borders, with the scrollbar adjusted to match the full top-to-bottom layout - Adjusted the top menu and bottom chatbox content areas to work correctly with the new full-page scroll behavior - Made chat content match the chatbox width, with content sliding slightly behind the chatbox when scrolling - Aligned chat text width with the chatbox for visual consistency, including how far the text extends behind the chatbox Fixes: - Fixed the chatbox not auto-expanding when typing multi-line input while bottom-positioned during an active chat (previously only worked before a chat had started) - Fixed positioning and design of the user chat hover menu buttons to match the assistant chat box — now displayed below the chat bubble instead of on the left side * Fix user message layout in thread component * swap code icon * fix compare layout * fix compare pane flex * Sidebar improvements and fixes - Added scrolling support to the sidebar so menus and recent chats no longer get hidden - Recent chats are now always visible in the sidebar, not hidden when in Studio, Recipes, or Export - Recent chat is now deselected when selecting other navigations - Fixed sidebar glitch where browser resize could make the sidebar and expand button disappear completely - Fixed glitch where the open-sidebar hover tooltip appeared above the logo when clicking expand sidebar - Reduced sidebar width on mobile to around 2/3 of the screen (was too wide) - Made the close-sidebar hover tooltip consistent with the rest of the design - Removed sidebar collapse/expand animation - Small adjustment to chat width * Fix route scrolling, polling, and theme sync issues * Fix Studio page scrolling --------- Co-authored-by: sneakr <hauzin@hotmail.com>	2026-04-16 08:46:16 -07:00
Daniel Han	05ec0f110b	Studio: Ollama support, recommended folders, Custom Folders UX polish (#5050 ) * Studio: Ollama support, recommended folders, Custom Folders UX polish Backend: - Add _scan_ollama_dir that reads manifests/registry.ollama.ai/library/* and creates .gguf symlinks under <ollama_dir>/.studio_links/ pointing at the content-addressable blobs, so detect_gguf_model and llama-server -m work unchanged for Ollama models - Filter entries under .studio_links from the generic models/hf/lmstudio scanners to avoid duplicate rows and leaked internal paths in the UI - New GET /api/models/recommended-folders endpoint returning LM Studio and Ollama model directories that currently exist on the machine (OLLAMA_MODELS env var + standard paths, ~/.lmstudio/models, legacy LM Studio cache), used by the Custom Folders quick-add chips - detect_gguf_model now uses os.path.abspath instead of Path.resolve so the readable symlink name is preserved as display_name (e.g. qwen2.5-0.5b-Q4_K_M.gguf instead of sha256-abc...) - llama-server failure with a path under .studio_links or .cache/ollama surfaces a friendlier message ("Some Ollama models do not work with llama.cpp. Try a different model, or use this model directly through Ollama instead.") instead of the generic validation error Frontend: - ListLabel supports an optional leading icon and collapse toggle; used for Downloaded (download icon), Custom Folders (folder icon), and Recommended (star icon) - Custom Folders header gets folder icon on the left, and +, search, and chevron buttons on the right; chevron uses ml-auto so it aligns with the Downloaded and Recommended chevrons - New recommended folder chips render below the registered scan folders when there are unregistered well-known paths; one click adds them as a scan folder - Custom folder rows that are direct .gguf files (Ollama symlinks) load immediately via onSelect instead of opening the GGUF variant expander (which is for repos containing multiple quants, not single files) - When loading a direct .gguf file path, send max_seq_length = 0 so the backend uses the model's native context instead of the 4096 chat default (qwen2.5:0.5b now loads at 32768 instead of 4096) - New listRecommendedFolders() helper on the chat API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: log silent exceptions and support read-only Ollama dirs Replace silent except blocks in _scan_ollama_dir and the recommended-folders endpoint with narrower exception types plus debug or warning logs, so failures are diagnosable without hiding signal. Add _ollama_links_dir helper that falls back to a per-ollama-dir hashed namespace under Studio's own cache (~/.unsloth/studio/cache/ollama_links) when the Ollama models directory is read-only. Common for system installs at /usr/share/ollama/.ollama/models and /var/lib/ollama/.ollama/models where the Studio process has read but not write access. Previously the scanner returned an empty list in that case and Ollama models would silently not appear. The fallback preserves the .gguf suffix on symlink names so detect_gguf_model keeps recognising them. The prior "raw sha256 blob path" fallback would have missed the suffix check and failed to load. * Address review: detect mmproj next to symlink target for vision GGUFs Codex P1 on model_config.py:1012: when detect_gguf_model returns the symlink path (to preserve readable display names), detect_mmproj_file searched the symlink's parent directory instead of the target's. For vision GGUFs surfaced via Ollama's .studio_links/ -- where the weight file is symlinked but any mmproj sidecar lives next to the real blob -- mmproj was no longer detected, so the model was misclassified as text-only and llama-server would start without --mmproj. detect_mmproj_file now adds the resolved target's parent to the scan order when path is a symlink. Direct (non-symlink) .gguf paths are unchanged, so LM Studio and HF cache layouts keep working exactly as before. Verified with a fake layout reproducing the bug plus a regression check on a non-symlink LM Studio model. * Address review: support all Ollama namespaces and vision projector layers - Iterate over all directories under registry.ollama.ai/ instead of hardcoding the "library" namespace. Custom namespaces like "mradermacher/llama3" now get scanned and include the namespace prefix in display names, model IDs, and symlink names to avoid collisions. - Create companion -mmproj.gguf symlinks for Ollama vision models that have an "application/vnd.ollama.image.projector" layer, so detect_mmproj_file can find the projector alongside the model. - Extract symlink creation into _make_symlink helper to reduce duplication between model and projector paths. * Address review: move imports to top level and add scan limit - Move hashlib and json imports to the top of the file (PEP 8). - Remove inline `import json as _json` and `import hashlib` from function bodies, use the top-level imports directly. - Add `limit` parameter to `_scan_ollama_dir()` with early exit when the threshold is reached. - Pass `_MAX_MODELS_PER_FOLDER` into the scanner so it stops traversing once enough models are found. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: Windows fallback, all registry hosts, collision safety _make_link (formerly _make_symlink): - Falls back to os.link() hardlink when symlink_to() fails (Windows without Developer Mode), then to shutil.copy2 as last resort - Uses atomic os.replace via tmp file to avoid race window where the .gguf path is missing during rescan Scanner now handles all Ollama registry layouts: - Uses rglob over manifests/ instead of hardcoding registry.ollama.ai - Discovers hf.co/org/repo:tag and any other host, not just library/ - Filenames include a stable sha1 hash of the manifest path to prevent collisions between models that normalize to the same stem Per-model subdirectories under .studio_links/: - Each model's links live in their own hash-keyed subdirectory - detect_mmproj_file only sees the projector for that specific model, not siblings from other Ollama models Friendly Ollama error detection: - Now also matches ollama_links/ (the read-only fallback cache path) and model_identifier starting with "ollama/" Recommended folders: - Added os.access(R_OK \| X_OK) check so unreadable system directories like /var/lib/ollama/.ollama/models are not advertised as chips * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: filter ollama_links from generic scanners The generic scanners (models_dir, hf_cache, lmstudio) already filter out .studio_links to avoid duplicate Ollama entries, but missed the ollama_links fallback cache directory used for read-only Ollama installs. Add it to the filter. * Address review: idempotent link creation and path-component filter _make_link: - Skip recreation when a valid link/copy already exists (samefile or matching size check). Prevents blocking the model-list API with multi-GB copies on repeated scans. - Use uuid4 instead of os.getpid() for tmp file names to avoid race conditions from concurrent scans. - Log cleanup errors instead of silently swallowing them. Path filter: - Use os.sep-bounded checks instead of bare substring match to avoid false positives on paths like "my.studio_links.backup/model.gguf". * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: drop copy fallback, targeted glob, robust path filter _make_link: - Drop shutil.copy2 fallback -- copying multi-GB GGUFs inside a sync API request would block the backend. Log a warning and skip the model when both symlink and hardlink fail. Scanner: - Replace rglob("") with targeted glob patterns (// and ///) to avoid traversing unrelated subdirectories in large custom folders. Path filter: - Use Path.parts membership check instead of os.sep substring matching for robustness across platforms. Scan limit: - Skip _scan_ollama_dir when _generic already fills the per-folder cap. * Address review: sha256, top-level uuid import, Path.absolute() - Switch hashlib.sha1 to hashlib.sha256 for path hashing consistency. - Move uuid import to the top of the file instead of inside _make_link. - Replace os.path.abspath with Path.absolute() in detect_gguf_model to match the pathlib style used throughout the codebase. * Address review: fix stale comments (sha1, rglob, copy fallback) Update three docstrings/comments that still referenced the old implementation after recent changes: - sha1 comment now says "not a security boundary" (no hash name) - "rglob" -> "targeted glob patterns" - "file copies as a last resort" -> removed (copy fallback was dropped) * Address review: fix stale links, support all manifest depths, scope error _make_link: - Drop size-based idempotency shortcut that kept stale links after ollama pull updates a tag to a same-sized blob. Only samefile() is used now -- if the link doesn't point at the exact same inode, it gets replaced. Scanner: - Revert targeted glob back to rglob so deeper OCI-style repo names (5+ path segments) are not silently skipped. Ollama error: - Only show "Some Ollama models do not work with llama.cpp" when the server output contains GGUF compatibility hints (key not found, unknown architecture, failed to load). Unrelated failures like OOM or missing binaries now show the generic error instead of being misdiagnosed. --------- Co-authored-by: Daniel Han <info@unsloth.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: danielhanchen <michaelhan2050@gmail.com>	2026-04-16 08:24:08 -07:00
Daniel Han	ff23ce40b4	Fix review findings for chat-template repair (#5049 ) (#5056 ) * Fix review findings for PR #49 1. Sandbox fallback Jinja env in _VariantTokenizerProxy.apply_chat_template (use SandboxedEnvironment, matching _derive_assistant_prefix_by_render) 2. Unwrap benign outer-If guards in _template_ends_with_toplevel_for so templates like {% if messages %}{% for ... %}{% endfor %}{% endif %} are still repairable (preserves Qwen3-Guard rejection via else-branch and add_generation_prompt-name checks) 3. Preserve raw name_or_path in _VariantTokenizerProxy._source_path so local-path detection works for dict/list variant tokenizers 4. Context-aware strict-mode messages: omit "will still load" and "Set UNSLOTH_STRICT_CHAT_TEMPLATE=1" when already raising * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-16 08:02:05 -07:00
Daniel Han	b42e3a120d	Remove legacy venv Scripts entry from User PATH on upgrade (#5060 ) Older installers persisted the venv Scripts directory directly in the User PATH registry. The shim approach from #4961 no longer writes that entry, but on upgrade the old one survived and python.exe / pip.exe from the unsloth venv continued winning resolution in every new shell. Before creating the shim, read the current User PATH, filter out any entry matching $VenvDir\Scripts (using the same symmetric raw+expanded comparison as Add-ToUserPath), and write back if changed. No-op on fresh installs where the legacy entry was never written. Confirmed on a real Windows machine: `where.exe python` was returning the venv interpreter first even after the shim PR merged.	2026-04-16 07:36:59 -07:00
Daniel Han	5b8643969e	Revert "Remove legacy venv Scripts entry from User PATH on upgrade" This reverts commit `cae4a74297`.	2026-04-16 14:20:43 +00:00
Daniel Han	cae4a74297	Remove legacy venv Scripts entry from User PATH on upgrade Older installers persisted the venv Scripts directory directly in the User PATH registry. The shim approach (added in this PR) no longer writes that entry, but it also did not remove the old one. On upgrade, the legacy entry survived and python.exe / pip.exe from the unsloth venv continued winning resolution in every new shell, which is exactly the hijack the shim was designed to prevent. Before creating the shim, read the current User PATH, filter out any entry matching $VenvDir\Scripts (using the same symmetric raw+expanded comparison as Add-ToUserPath), and write back if changed. This runs once per install and is a no-op on fresh installs where the legacy entry was never written.	2026-04-16 14:19:04 +00:00
Datta Nimmaturi	6764cb9b90	Restrict flash attn to <=256 head dim. Consolidate attn impl checks (#5051 ) * Restrict flash attn to <=256 head dim. Consolidate attn impl checks * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Consolidate the changes into single function * safeguard for dict instead of object * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-16 09:00:17 -05:00
Daniel Han	c5be8b1cd2	Chat-template repair: warn-by-default, AST classification, dict support (#5049 ) * Chat-template repair: warn-by-default, AST classification, dict support Follow-up hardening on top of PR #4426 (which fixed the #4150 RuntimeError for ChatML LoRA reloads). Behavior changes: - Warn-by-default instead of RuntimeError. When fix_chat_template cannot repair a broken template, emit a warning and return the original. Set UNSLOTH_STRICT_CHAT_TEMPLATE=1 to restore the pre-warn hard fail. Fixes the UX where a missing `{% if add_generation_prompt %}` block on a saved LoRA (typical after LlamaFactory / Axolotl re-serialize) would block model loading entirely. - Local path vs HF hub distinguished in the warning message. For local paths the message points at the likely downstream tool; for HF IDs it points at the upstream model maintainers. Previously both said "file a bug report to the maintainers of <path>" even when <path> was the user's own saves/ directory. - Dict / list chat_template now handled. Hermes-3 ships with {default, tool_use} and the previous code crashed with AttributeError: 'dict' object has no attribute 'find' when entering _fix_chat_template with a dict. Each variant is now fixed independently; structure is preserved. Internals: - _find_end_position now matches all four Jinja whitespace-control variants ({% %}, {%- %}, {% -%}, {%- -%}) and returns the rightmost endfor/endif so multi-for templates aren't locked onto the first loop. Previously {%- endfor -%} (both-side dash, used by Qwen3-Guard) was silently bypassed. - _has_add_generation_prompt_block uses Jinja AST via jinja2.nodes.If/Name walks instead of substring matching, so templates that hide the block behind comments or dash-style variants are classified correctly. - _template_ends_with_toplevel_for gates the GH#4150 ChatML repair on the AST: only fires when the last structural top-level node is a For (standard ChatML shape), ignoring trailing pure-whitespace output nodes. Templates wrapped in an outer If (Qwen3-Guard) are now explicitly skipped at the _fix_chat_template level as well, not just at load_correct_tokenizer's name-based exemption. - _validate_patched_template renders the patched template with and without add_generation_prompt and confirms the patched output responds to the flag by appending (not replacing) content. If validation fails, the patch is discarded and we fall through to the warn path. Verified with an expanded regression suite in tests/: - test_fix_chat_template_pr4426.py: 42/42 template-matrix cells - test_load_correct_tokenizer_pr4426.py: 5/5 tokenizer loads - test_chat_template_followups.py: 10/10 new follow-up tests - test_mistral_pr4426.py: 5 Mistral variants byte-identical - test_qwen_pr4426.py: 14 Qwen variants byte-identical (Qwen1.5, Qwen2, Qwen2.5-Instruct/Coder/Math/VL, Qwen3, Qwen3-Coder, QwQ, Qwen3-Guard-Gen) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Guard _validate_patched_template against read-only chat_template If tokenizer.chat_template is a property or otherwise read-only, the validation helper would crash with AttributeError when trying to temporarily set the patched template. Catch the assignment failure and return False (skip validation), and best-effort restore in the finally block. * Replace regex separator inference with render-diff; broaden repair to non-ChatML templates The previous `_infer_assistant_separator` was a four-tier regex heuristic that only worked on ChatML-shaped templates and forced a hard `<\|im_start\|>` / `<\|im_end\|>` presence gate on Case 2 repair. This meant a Llama-3, Gemma, or Phi-3 template stripped of its generation-prompt block by a downstream tool (LlamaFactory, Axolotl, etc.) would still warn-and-return even though the structural shape is identical to the ChatML case the PR already handles. This replaces the regex with `_derive_assistant_prefix_by_render`: render the template with two dialogs that differ only in assistant content, then `os.path.commonprefix` on the tails captures the exact assistant-turn prefix the template emits. The template itself is ground truth, so non-ChatML shapes work as long as the assistant block is a literal the template emits once per message. Three guards keep the derivation safe: A. both assistant renders extend the base render (no reordering); B. the divergence point is exactly the content-insertion site (sentinel follows the common prefix); C. a user-role cross-check: if a render with a user sentinel also emits the same prefix, role has no effect on output and we reject. A render failure on [user, user] (e.g. Gemma's `raise_exception` alternation check) is evidence that role matters; we accept. Sentinels differ at character 0 so `commonprefix` cannot absorb them, and trailing whitespace/comments after the last `{% endfor %}` are stripped before probing (they would appear in base but not after the appended assistant turn and break Guard A). `_fix_chat_template` and `_repair_string_template` now thread an `is_sharegpt` kwarg; `_fix_chat_template` retries once with `is_sharegpt=True` if the first probe returns None (dual-probe fallback for dict/list callers). The ChatML `<\|im_start\|>` / `<\|im_end\|>` hard gate in Case 2 is dropped. `_infer_assistant_separator` is deleted. Verified via: - tests/test_fix_chat_template_pr4426.py: 51/51 cells (new Llama-3, Gemma, Phi-3 broken-template rows all repair FIX-OK) - tests/test_load_correct_tokenizer_pr4426.py: 5/5 - tests/test_chat_template_followups.py: 18/18 (T11-T18 cover non-ChatML repair + probe failure modes) - tests/test_mistral_pr4426.py: 5/5 byte-identical - tests/test_qwen_pr4426.py: 14/14 byte-identical (Qwen3-Guard AST gate still rejects) - tests/hermes3_lora_pr4426.py reload: patched template ends with `<\|im_start\|>assistant\n`, inference returns sensible output. - temp/sim/battery.py: 79/79 followup; vs baseline: 0 regressions, 9 improvements. - Spot-check probe on real stripped tokenizers (Hermes-3, Phi-4, Llama-3.2-1B, Gemma-3-1B): all derive the expected prefix. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address reviewer findings: variant routing, positive-gate detection, comment-safe end scan Resolves three reviewer findings on PR #5049 (`fix/chat-template-followups`): Finding #1 [10/10]: dict/list variants now route through `_fix_chat_template_for_tokenizer` via a new `_VariantTokenizerProxy` adapter. Previously the dict/list branches called `_fix_chat_template` directly, silently bypassing the warn/strict (`UNSLOTH_STRICT_CHAT_TEMPLATE`) contract, the `no == yes` diagnostic, broken-existing-block detection, and `_validate_patched_template` guard. The proxy swaps `base.chat_template` to the variant string before each `apply_chat_template` call so tokenizer globals (`bos_token`, custom filters, `raise_exception`) remain available; if the base is read-only it falls back to isolated Jinja rendering. Finding #2 [1/10]: `_has_add_generation_prompt_block` now requires the `If` body to contain at least one `Output` node (a new `_if_body_emits_content` helper walks descendants). This distinguishes a real generation-prompt block from a header guard like `{% if not add_generation_prompt is defined %}{% set ... %}{% endif %}` (body contains only `Assign`) which references the name but emits nothing. Also dropped a now-redundant `"add_generation_prompt" not in scrubbed` guard in `_fix_chat_template` Case 2 so header-guarded templates still get repaired. Finding #4 [1/10]: `_find_end_position` now replaces Jinja comments with equal-length whitespace before scanning for `{% endfor %}` / `{% endif %}` tokens. This prevents a trailing comment containing those tokens from being picked as the real end tag. Positions in the padded string map 1:1 to positions in the original template. Tests: - tests/test_chat_template_followups.py: 21/21 (T19 strict-mode dict variant, T20 header-guard repair, T21 comment-endfor trap added; T4/T5 stubs updated with a working apply_chat_template that routes through Jinja). - tests/test_fix_chat_template_pr4426.py: 51/51 cells unchanged. - tests/test_load_correct_tokenizer_pr4426.py: 5/5. - tests/test_mistral_pr4426.py: 5/5 byte-identical. - tests/test_qwen_pr4426.py: 14/14 byte-identical. - temp/sim/battery.py: 79/79 followup; 0 regressions vs baseline. - Phase 3 Hermes-3 broken-LoRA reload: inference still returns `'The answer to the equation 2+2 is 4.'`. - Spot-checks on Hermes-3 / Phi-4 / Llama-3.2-1B / Gemma-3-1B real stripped templates: probe still derives the expected prefix. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Tighten comments in chat-template helpers Pure comment minimization across `_find_end_position`, `_has_add_generation_prompt_block`, `_if_body_emits_content`, `_derive_assistant_prefix_by_render`, `_fix_chat_template` Case 2, and `_VariantTokenizerProxy`. No behavior change; same intent, fewer lines. All 21 follow-up tests and the 51-cell Phase 1 matrix still pass. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Sandbox probe, fix is_sharegpt validator mismatch, reject negated gates Three real bugs from the 10-agent Opus review: 1. Probe now uses `jinja2.sandbox.SandboxedEnvironment` instead of bare `jinja2.Environment`. The probe renders at model-load time (before the user calls `apply_chat_template`), so it was a new eager code-execution surface that the base HF tokenizer loading does not have. SandboxedEnvironment blocks attribute-chain exploits at negligible cost. 2. `_repair_string_template` now tries validation with both `is_sharegpt=False` and `is_sharegpt=True`. Previously, when `_fix_chat_template` internally fell back to the other schema via its dual-probe, the outer validation still used the caller's original `is_sharegpt` -- rendering with the wrong message keys and spuriously dropping a valid repair. 3. `_has_add_generation_prompt_block` now skips `If` nodes whose test is a `Not` expression. A negated gate like `{% if not add_generation_prompt %}{{ x }}{% endif %}` fires when agp=False, so its emitting body is not a generation block -- but the old code counted any Name reference regardless of polarity. Cleanup: removed unused `self._label`, added `\r` escape in generation-block literal, switched variant labels to `!r` formatting, removed redundant `import os as _os`. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jinja2.sandbox import and sandbox proxy fallback Two critical findings from the 20-reviewer pass: 1. [20/20] The proxy read-only fallback used bare `jinja2.Environment`, not sandboxed. All 20 reviewers independently reproduced marker-file creation via `cycler.__init__.__globals__['os'].system(...)` during `fix_chat_template()`. Fixed: fallback now uses `from jinja2.sandbox import SandboxedEnvironment`. 2. [14/20] The render-diff probe did `import jinja2` then referenced `jinja2.sandbox.SandboxedEnvironment`. `jinja2.sandbox` is a submodule that is NOT auto-imported by `import jinja2` on Jinja 3.1.6. This caused `AttributeError` (swallowed by `except Exception`), making the entire Case 2 repair path silently return None in a clean process. The 6 reviewers who saw it work had `jinja2.sandbox` pre-imported by an earlier module in their process. Fixed: both the probe and the proxy fallback now use `from jinja2.sandbox import SandboxedEnvironment`. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-16 05:52:33 -07:00
Daniel Han	6e87bade25	Trim verbose comments in PATH helpers Reduce inline comments from ~160 lines to ~25 across both files. Keep one-line summaries of the "why"; drop multi-paragraph rationale blocks that repeated information already captured in commit messages and PR discussion.	2026-04-16 12:01:01 +00:00
Etherll	ec32ce2e82	fix: use direct registry API for PATH writes instead of SetEnvironmentVariable (#4961 ) * fix: replacing SetEnvironmentVariable with direct registry API * apply reviews * Use CreateSubKey for HKCU\Environment * Store PATH backup under HKCU\Software\Unsloth * Fix $backupKey registry handle leak in PATH backup block Wrap $backupKey operations in try/finally so the handle is closed even if GetValue or SetValue throws. The Add-ToUserPath helper already uses this pattern for its registry key -- the backup block was the only place missing it. * Isolate WM_SETTINGCHANGE broadcast from PATH write error handling Wrap the broadcast dummy-variable calls in their own try/catch so a broadcast failure does not mask a successful registry PATH write. Previously, if SetEnvironmentVariable threw after SetValue already committed the new PATH, Add-ToUserPath would return $false and the caller would skip Refresh-SessionPath. * PATH helper polish: venv precedence, quoted entries, raw/expanded dedup Three small follow-ups surfaced by a 10-reviewer pass against the rebased PR head. None fix a regression vs main; each strictly improves the new helpers. Refresh-SessionPath / Refresh-Environment: - Move $env:Path to the front of the merge so an activated venv keeps precedence over machine/user PATH after a refresh. Pre-PR dropped process-only entries entirely; post-PR kept them but at the back. - Dedup on both raw and expanded forms so %USERPROFILE%\foo and the already-expanded C:\Users\me\foo do not both survive. Add-ToUserPath: - Trim whitespace and surrounding double-quotes from each compared entry so quoted PATH entries like "C:\Program Files\CMake\bin" deduplicate against an unquoted directory of the same path. * Back up User PATH inside Add-ToUserPath, before first mutation Previously only studio/setup.ps1 took a one-time PATH backup, at script top (line ~547). install.ps1 (the irm \| iex entry point) had no backup, so users who installed via that path had no recovery surface if anything clobbered their PATH. The PR description's "one-time backup before any modifications" promise only held for the studio installer flow. Move the backup into Add-ToUserPath itself: just before the first actual SetValue mutation, write the pristine raw PATH to HKCU\Software\Unsloth\PathBackup if no backup already exists. This: - Covers both entry points (install.ps1 and studio/setup.ps1). - Captures the TRUE pristine PATH even when install.ps1 runs first and studio/setup.ps1 runs afterwards (the script-top backup in setup.ps1 would otherwise see an already-modified PATH). - Is idempotent: once a backup exists, subsequent calls preserve it. - Skips when nothing would mutate (dedup match) or PATH is empty. The script-top backup in studio/setup.ps1 is kept for defense in depth. * Refresh PATH: venv-aware merge order Reconcile two competing concerns about Refresh-SessionPath / Refresh-Environment surfaced by separate review rounds: - venv at the back -> activated venv loses precedence to system Python - process at the front -> stale shims (old node, old python, etc.) still on $env:Path can beat a freshly installed tool New merge order: 1. Activated venv Scripts dir, only if $env:VIRTUAL_ENV is set 2. Machine PATH freshly read from registry 3. User PATH freshly read from registry 4. Current $env:Path as fallback This way an explicitly-activated venv keeps priority while a tool the script just installed wins over any stale entry that was already on the inherited shell PATH. When no venv is active, fresh registry entries take precedence as expected. * Append to User PATH by default, close $envKey in finally Add-ToUserPath gains a -Position Append\|Prepend parameter defaulting to Append so installing unsloth no longer prepends the bundled venv Scripts directory ahead of the user's existing python / pip on new shells. The four current call sites (install.ps1 launcher, studio/setup.ps1 CMake, nvcc, Python user Scripts) all take the Append default because each one that needs in-session precedence already does an inline $env:Path prepend independently. This matches rustup / cargo / nvm / pyenv / uv behavior. Also wrap the script-top $envKey.GetValue in a try/finally so the registry handle is released even if the read throws. Matches the pattern already used for $backupKey five lines below. * Prepend cmake, nvcc, Python Scripts; keep venv Scripts appended The previous commit switched Add-ToUserPath to append by default so that installing unsloth would not silently hijack the user's system python / pip. That was correct for the venv Scripts dir (which contains python.exe and pip.exe alongside unsloth.exe), but wrong for the three studio/setup call sites. Those persist cmake, the driver-compatible nvcc, and the Python user Scripts dir for future shells, and in all three cases an older tool already earlier in the user PATH would keep winning after the install finished. The nvcc case is especially load-bearing: setup selects a driver-compatible CUDA toolkit, then llama.cpp builds against whatever wins PATH resolution, so a stale older nvcc produces broken builds. Pass -Position 'Prepend' explicitly at the three setup.ps1 call sites (cmake at line 754, nvcc bin at line 1025, Python user Scripts at line 1191). None of those directories holds python.exe, so prepending them does not re-introduce the original hijack problem. Leave the install.ps1 venv Scripts call on the default Append with a comment explaining why. * Symmetric dedup, Prepend reorders duplicates, unsloth shim dir Address three separate findings surfaced by review: 1. Dedup asymmetry (Gemini high-priority): the existing dedup expanded registry entries via ExpandEnvironmentVariables but did NOT expand the new directory. Passing "%USERPROFILE%\foo" when "C:\Users\me\foo" was already in PATH produced a duplicate. Expand both sides so the check is symmetric. 2. -Position Prepend no-op on existing duplicates: the dedup loop returned $false as soon as it saw a match, regardless of position. That left a late-position duplicate in place instead of moving it to the front, so "prepend the newly selected cmake/nvcc" did not always beat an older copy earlier in PATH. Partition entries into kept and dropped lists, then reinsert a single copy at the requested position. Append still returns $false on any match so user-curated orderings are not reshuffled. Prepend also returns $false when the only copy is already at position 0 so we preserve the user's casing. 3. Stop adding the venv Scripts dir to User PATH entirely. That dir holds python.exe and pip.exe alongside unsloth.exe, so neither Prepend nor Append worked: prepend hijacked the user's system python and pip, append made the freshly-installed unsloth.exe lose to any older unsloth.exe earlier on PATH. Replace the Scripts-dir PATH add with a dedicated shim directory that contains only unsloth.cmd, and prepend that dir. The shim calls the venv's unsloth.exe by absolute path so future pip upgrades inside the venv propagate automatically. * Shim via hardlink, Append user Scripts, drop venv sysconfig fallback Three follow-ups to the `c0ab1ab` shim commit, targeting concerns raised in the second 20-reviewer pass: 1. Shim uses unsloth.exe (hardlink, copy fallback) instead of unsloth.cmd. The batch-file approach had three distinct regressions: - cmd.exe expanded %...% sequences inside user arguments, so prompts like "What does 50% mean?" got mangled before reaching the CLI - Git Bash / MSYS2 / POSIX-style shells on Windows do not resolve bare-name lookups to .cmd files, so `unsloth` stopped working there - Set-Content -Encoding ASCII replaced non-ASCII profile characters with '?', so installs under C:\Users\Jörg\... wrote a broken shim A hardlink (fallback: copy) of unsloth.exe is a native Windows executable with no shell indirection. PATHEXT picks .exe before .cmd in cmd.exe and PowerShell, Git Bash honors .exe natively, subprocess callers hit it directly, and a hardlink stays in sync with the venv on pip upgrades because both names point at the same inode. 2. studio/setup.ps1 Python user Scripts dir is added with default Append instead of -Position Prepend. That directory holds every pip-installed user console script (pip, pytest, huggingface-cli, and so on), not just unsloth, so reordering it silently changed resolution order for unrelated tools. The new install.ps1 shim at PATH position 0 already guarantees `unsloth` resolves to the freshly installed copy, so the Python user Scripts entry only needs to be present, not at the front. 3. The sysconfig lookup in studio/setup.ps1 no longer falls back to sysconfig.get_path('scripts') when the nt_user scheme dir does not exist. When setup.ps1 is invoked from an activated venv (a flow the linked issue actually hits) that fallback returns the venv's Scripts directory, which would then be added to the persisted User PATH and re-introduce the python / pip hijack the shim dir is meant to avoid. Stick strictly to the nt_user scheme; skip the block if it does not exist on disk. * Do not crash installer when unsloth.exe shim is locked The shim update sequence at install.ps1:1095 did a bare Remove-Item / New-Item HardLink / Copy-Item. Under the script's $ErrorActionPreference a locked target (most commonly 'unsloth studio' still running while the user re-invokes the installer) turns the Remove-Item failure into a terminating error that aborts the install with no actionable message. The existing shim is perfectly usable in that state, so there is no reason to abort. Wrap the whole remove/link/copy sequence in a try/catch that logs the probable cause (Studio still running), points at the fix (close Studio and re-run), and lets the installer finish with the old launcher still serving the command. Also only emit the "added unsloth launcher to PATH" step line when the launcher was actually (re)created AND the PATH entry was newly added -- previously the message fired even when the shim refresh silently failed, which was confusing. * Guard shim PATH entry on existence, use NullString for broadcast delete Two follow-ups surfaced by the latest review pass: 1. Do not add the shim directory to User PATH when the launcher was not actually created. Antivirus blocking unsloth.exe, a disk-full volume, or restrictive filesystem permissions can make both the hardlink and the copy fallback fail on a fresh install. In that case the existing sequence would report "added unsloth launcher to PATH" warnings but still prepend the empty $ShimDir to User PATH -- the user sees an install that claims success but then cannot resolve `unsloth` in a new shell. Gate Add-ToUserPath on Test-Path $ShimExe so the PATH entry is only persisted when the launcher is really there. 2. Pass [NullString]::Value instead of $null to the broadcast-delete call in Add-ToUserPath. On PowerShell 7.5 and later (running on .NET 9), a bare $null going into [Environment]::SetEnvironmentVariable can be coerced to an empty string rather than a true .NET null, which sets the dummy UnslothPathRefresh_XXXXXXXX variable to "" in HKCU\Environment instead of deleting it. The leaked variable is visible in System Properties and accumulates one entry per install run. [NullString]::Value is a PowerShell-specific sentinel that crosses the interop boundary as a real null and works on both PS 5.1 and PS 7.x. See PowerShell/PowerShell#24637 for the underlying issue. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>	2026-04-16 04:49:51 -07:00
Imgyu Kim	14ab6fbfae	BUG: fix _fix_chat_template for ChatML templates missing add_generation_prompt (#4426 ) Fixes #4150. Pre-PR, `_fix_chat_template` only patched templates where a trailing `{{ ... }}` expression followed the last `{% endfor %}`. ChatML templates (Hermes, Magnum, Phi-4, etc.) that end cleanly at `{% endfor %}` with no generation-prompt block were left unchanged, so the outer `fix_chat_template` raised: ``` RuntimeError: Unsloth: The tokenizer `...` does not have a {% if add_generation_prompt %} for generation purposes. ``` This commonly shows up when a downstream tool (LlamaFactory, Axolotl) re-serializes the tokenizer during LoRA save and strips the generation-prompt block. This PR adds a second branch to `_fix_chat_template` that fires when: - the content after the last `{% endfor %}` is empty modulo Jinja `{# ... #}` comments, - the scrubbed template contains `<\|im_start\|>` and `<\|im_end\|>`, - and the scrubbed template does not already mention `add_generation_prompt`. The assistant-turn separator is inferred from the template itself (preferring an explicit `'<\|im_start\|>assistant<sep>'` literal, then the unique `message['role'] + '<sep>'` from role concatenations, then `<\|im_sep\|>` for Phi-4-mini mixed-separator templates, then `\n`), so Phi-4-style templates are not silently corrupted with the wrong separator. Verified against the existing chat-template corpus: - Hermes-3, Magnum-v2, Phi-4-mini, Phi-4 multi-sep, ChatML with trailing whitespace, ChatML with trailing Jinja comment, dot-access `message.role`, split-literal `'<\|im_start\|>assistant'`: all repaired with the correct assistant prefix. - Already-fixed ChatML templates: idempotent NOP. - Trap templates with `<\|im_start\|>` only inside a Jinja comment: correctly not rewritten. - Llama-3, Gemma-3, Qwen2.5 (non-ChatML): byte-identical. - Mistral family (5 models including Mistral-Nemo, Mistral-Small-24B, Mixtral): byte-identical, protected both by the structural guard (no ChatML tokens) and the existing name-based exemption in `load_correct_tokenizer`. - Qwen family (14 models including Qwen2.5, Qwen3, Qwen3-Coder, QwQ, VL, Math, Qwen3-Guard): byte-identical. End-to-end reproduction: Hermes-3 LoRA SFT, save with stripped chat_template, reload. Pre-PR code path raises the RuntimeError above. Post-PR reload loads cleanly, patches the template at load time, and `apply_chat_template(add_generation_prompt=True)` produces the correct `<\|im_start\|>assistant\n` prefix.	2026-04-16 00:21:29 -07:00
DoubleMathew	a4d4dfe4ac	fix Gemma4 flash attn disable (#5045 ) * fix pass attn implementation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 17:50:48 -05:00
Daniel Han	3869fbe1cc	Bump installer minimum to 2026.4.5 (#5041 )	2026-04-15 08:23:41 -07:00
Daniel Han	cdb3e752ec	Update _utils.py	2026-04-15 08:06:43 -07:00
Daniel Han	ba387e2c8f	Update pyproject.toml	2026-04-15 08:06:30 -07:00
Daniel Han	f0d03655e8	Studio: add folder browser modal for Custom Folders (#5035 ) * Studio: add folder browser modal for Custom Folders The Custom Folders row in the model picker currently only accepts a typed path. On a remote-served Studio (Colab, shared workstation) that means the user has to guess or paste the exact server-side absolute path. A native browser folder picker can't solve this: HTML `<input type="file" webkitdirectory>` hides the absolute path for security, and the File System Access API (Chrome/Edge only) returns handles rather than strings, neither of which the server can act on. This PR adds a small in-app directory browser that lists paths on the server and hands the chosen string back to the existing `POST /api/models/scan-folders` flow. ## Backend * New endpoint `GET /api/models/browse-folders`: * `path` query param (expands `~`, accepts relative or absolute; empty defaults to the user's home directory). * `show_hidden` boolean to include dotfiles/dotdirs. * Returns `{current, parent, entries[], suggestions[]}`. `parent` is null at the filesystem root. * Immediate subdirectories only (no recursion); files are never returned. * `entries[].has_models` is a cheap hint: the directory looks like it holds models if it is named `models--` (HF hub cache layout) or one of the first 64 children is a .gguf/.safetensors/config.json/ adapter_config.json or another `models--` subfolder. * Sort order: model-bearing dirs, then plain, then hidden; case- insensitive alphabetical within each bucket. * Suggestions auto-populate from HOME, the HF cache root, and any already-registered scan folders, deduplicated. * Error surface: 404 for missing path, 400 for non-directory, 403 on permission errors. Auth-required like the other models routes. * New Pydantic schemas `BrowseEntry` and `BrowseFoldersResponse` in `studio/backend/models/models.py`. ## Frontend * New `FolderBrowser` component (`studio/frontend/src/components/assistant-ui/model-selector/folder-browser.tsx`) using the existing `Dialog` primitive. Features: * Clickable breadcrumb with a `..` row for parent navigation. * Quick-pick chips for the server-provided suggestions. * `Show hidden` checkbox. * In-flight fetch cancellation via AbortController so rapid navigation doesn't flash stale results. * Badges model-bearing directories inline. * `chat-api.ts` gains `browseFolders(path?, showHidden?)` and matching types. * `pickers.tsx` adds a folder-magnifier icon next to the existing `Add` button. Opening the browser seeds it with whatever the user has already typed; confirming fills the text input, leaving the existing validation and save flow unchanged. ## What it does NOT change * The existing text-input flow still works; the browser is additive. * No new permissions or escalation; the endpoint reads only directories the server process is already allowed to read. * No model scanning or filesystem mutation happens from the browser itself -- it just returns basenames for render. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: cap folder-browser entries and expose truncated flag Pointing the folder browser at a huge directory (``/usr/lib``, ``/proc``, or a synthetic tree with thousands of subfolders) previously walked the whole listing and stat-probed every child via ``_looks_like_model_dir``. That is both a DoS shape for the server process and a large-payload surprise for the client. Introduce a hard cap of 2000 subdirectory entries and a ``truncated: bool`` field on the response. The frontend renders a small hint below the list when it fires, prompting the user to narrow the path. Below-cap directories are unchanged. Verified end-to-end against the live backend with a synthetic tree of 2050 directories: response lands at 2000 entries, ``truncated=true``, listing finishes in sub-second time (versus tens of seconds if we were stat-storming). * Studio: suggest LM Studio / Ollama dirs + 2-level model probe Three improvements to the folder-browser, driven by actually dropping an LM Studio-style install (publisher/model/weights.gguf) into the sandbox and walking the UX: ## 1. Quick-pick chips for other local-LLM tools `well_known_model_dirs()` (new) returns paths commonly used by adjacent tools. Only paths that exist are returned so the UI never shows dead chips. * LM Studio current + legacy roots + user-configured `downloadsFolder` from its `settings.json` (reuses the existing `lmstudio_model_dirs()` helper). * Ollama: `$OLLAMA_MODELS` env override, then `~/.ollama/models`, `/usr/share/ollama/.ollama/models`, and `/var/lib/ollama/.ollama/models` (the systemd-service install path surfaced in the upstream "where is everything?" issue). * Generic user-choice locations: `~/models`, `~/Models`. Dedup is stable across all sources. ## 2. Two-level model-bearing probe LM Studio and Ollama both use `root/publisher/model/weights.gguf`. The previous `has_models` heuristic only probed one level, so the publisher dir (whose immediate children are model dirs, not weight files) was always marked as non-model-bearing. Pulled the direct- signal logic into `_has_direct_model_signal` and added a grandchild probe so the classic layout is now recognised. Still O(PROBE^2) worst-case, still returns immediately for `models--` names (HF cache layout) and for any direct weight file. ## 3. model_files_here hint on response body A leaf model dir (just GGUFs, no subdirs) previously rendered as `(empty directory)` in the modal, confusing users into thinking the folder wasn't scannable. Added a `model_files_here` count on the response (capped at 200) and a small hint row in the modal: `N model files in this folder. Click "Use this folder" to scan it.` ## Verification Simulated an LM Studio install by downloading the real 84 MB `unsloth/SmolLM2-135M-Instruct-Q2_K.gguf` into `~/.lmstudio/models/unsloth/SmolLM2-135M-Instruct-GGUF/`. Confirmed end-to-end: Home listing suggests `~/.lmstudio/models` as a chip. * Browsing `~/.lmstudio/models` flags `unsloth` (publisher) as `has_models=true` via the 2-level probe. * Browsing the publisher flags `SmolLM2-135M-Instruct-GGUF` (model dir) as `has_models=true`. * Browsing the model dir returns empty entries but `model_files_here=1`, and the frontend renders a hint telling the user it is a valid target. * Studio: one-click scan-folder add + prominent remove + plain search icon Three small Custom Folders UX fixes after real-use walkthrough: * One-click add from the folder browser. Confirming `Use this folder` now submits the path directly to `POST /api/models/scan-folders` instead of just populating the text input. `handleAddFolder` takes an optional explicit path so the submit lands in the same tick as `setFolderInput`, avoiding a state-flush race. The typed-path + `Add` button flow is unchanged. * Prominent remove X on scan folders. The per-folder delete button was `text-muted-foreground/40` and hidden entirely on desktop until hovered (`md:opacity-0 md:group-hover:opacity-100`). Dropped the hover-only cloak, bumped color to `text-foreground/70`, added a red hover/focus background, and sized the icon up from `size-2.5` to `size-3`. Always visible on every viewport. * Plain search icon for the Browse button. `FolderSearchIcon` replaced with `Search01Icon` so it reads as a simple "find a folder" action alongside the existing `Add01Icon`. * Studio: align Custom Folders + and X buttons on the same right edge The Custom Folders header used `px-2.5` with a `p-0.5` icon button, while each folder row used `px-3` with a `p-1` button. That put the X icon 4px further from the right edge than the +. Normalised both rows to `px-2.5` with `p-1` so the two icons share a column. * Studio: empty-state button opens the folder browser directly The first-run empty state for Custom Folders was a text link reading "+ Add a folder to scan for local models" whose click toggled the text input. That's the wrong default: a user hitting the empty state usually doesn't know what absolute path to type, which is exactly what the folder browser is for. * Reword to "Browse for a models folder" with a search-icon affordance so the label matches what the click does. * Click opens the folder browser modal directly. The typed-path + Add button flow is still available via the + icon in the section header, so users who know their path keep that option. * Slightly bump the muted foreground opacity (70 -> hover:foreground) so the button reads as a primary empty-state action rather than a throwaway hint. * Studio: Custom Folders header gets a dedicated search + add button pair The Custom Folders section header had a single toggle button that flipped between + and X. That put the folder-browser entry point behind the separate empty-state link. Cleaner layout: two buttons in the header, search first, then add. * Search icon (left) opens the folder browser modal directly. * Plus icon (right) toggles the text-path input (unchanged). * The first-run empty-state link is removed -- the two header icons cover both flows on every state. Both buttons share the same padding / icon size so they line up with each other and with the per-folder remove X. * Studio: sandbox folder browser + bound caps + UX recoveries PR review fixes for the Custom Folders folder browser. Closes the high-severity CodeQL path-traversal alert and addresses the codex / gemini P2 findings. Backend (studio/backend/routes/models.py): * New _build_browse_allowlist + _is_path_inside_allowlist sandbox. browse_folders now refuses any target that doesn't resolve under HOME, HF cache, Studio dirs, registered scan folders, or the well-known third-party model dirs. realpath() is used so symlink traversal cannot escape the sandbox. Also gates the parent crumb so the up-row hides instead of 403'ing. * _BROWSE_ENTRY_CAP now bounds visited iterdir entries, not appended entries. Dirs full of files (or hidden subdirs when show_hidden is False) used to defeat the cap. * _count_model_files gets the same visited-count fix. * PermissionError no longer swallowed silently inside the enumeration / counter loops -- now logged at debug. Frontend (folder-browser.tsx, pickers.tsx, chat-api.ts): * splitBreadcrumb stops mangling literal backslashes inside POSIX filenames; only Windows-style absolute paths trigger separator normalization. The Windows drive crumb value is now C:/ (drive root) instead of C: (drive-relative CWD-on-C). * browseFolders accepts and forwards an AbortSignal so cancelled navigations actually cancel the in-flight backend enumeration. * On initial-path fetch error, FolderBrowser now falls back to HOME instead of leaving the modal as an empty dead end. * When the auto-add path (one-click "Use this folder") fails, the failure now surfaces via toast in addition to the inline paragraph (which is hidden when the typed-input panel is closed). * Studio: rebuild browse target from trusted root for CodeQL clean dataflow CodeQL's py/path-injection rule kept flagging the post-validation filesystem operations because the sandbox check lived inside a helper function (_is_path_inside_allowlist) and CodeQL only does intra-procedural taint tracking by default. The user-derived ``target`` was still flowing into ``target.exists`` / ``target.is_dir`` / ``target.iterdir``. The fix: after resolving the user-supplied ``candidate_path``, locate the matching trusted root from the allowlist and rebuild ``target`` by appending each individually-validated segment to that trusted root. Each segment is rejected if it isn't a single safe path component (no separators, no ``..``, no empty/dot). The downstream filesystem ops now operate on a Path constructed entirely from ``allowed_roots`` (trusted) plus those validated segments, so CodeQL's dataflow no longer sees a tainted source. Behavior is unchanged for all valid inputs -- only the construction of ``target`` is restructured. Live + unit tests all pass (58 selected, 7 deselected for Playwright env). * Studio: walk browse paths from trusted roots for CodeQL --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@h100-8-cheapest.us-east5-a.c.unsloth.internal>	2026-04-15 08:04:33 -07:00
Roland Tannous	800ddc95f8	Re-apply #4939 : updated models template mappers (#4950 ) * Reapply "updated models template mappers. added lfm2.5vl450m to transformers 5…" (#4945) This reverts commit `33503ea248`. * Add missing gemma-4-31B-it bnb-4bit mapper entry and LFM2.5 upstream namespace for PR #4950 - Add unsloth/gemma-4-31B-it-unsloth-bnb-4bit to __INT_TO_FLOAT_MAPPER so the int-to-float resolution works for this model (already listed in TEMPLATE_TO_MODEL_MAPPER but had no mapper entry). - Add LiquidAI/LFM2.5-1.2B-Instruct to lfm-2.5 TEMPLATE_TO_MODEL_MAPPER entry so the canonical upstream namespace is mapped consistently with lfm-2. * Add missing gemma-4-31B-it bnb-4bit Ollama mapping and lfm-2.5 chat template alias - Add unsloth/gemma-4-31B-it-unsloth-bnb-4bit to OLLAMA_TEMPLATE_TO_MODEL_MAPPER so Ollama export works for this model (E2B-it and E4B-it bnb-4bit variants were already present, 31B-it was inconsistently omitted) - Register CHAT_TEMPLATES["lfm-2.5"] as alias of the lfm-2 template to prevent KeyError when Studio resolves LFM2.5 models through MODEL_TO_TEMPLATE_MAPPER * Add missing LFM2 bnb-4bit INT_TO_FLOAT_MAPPER entry unsloth/LFM2-1.2B-unsloth-bnb-4bit is referenced in model_mappings.py but had no mapper.py entry, so model resolution would fail when users load that variant with load_in_4bit=False or when the float name is used with load_in_4bit=True. * Fix review findings for PR #16 1. ollama_template_mappers.py: Restore dropped Gemma-4 base model IDs (E2B, E4B, 31B, 26B-A4B) and add missing google/ upstream IDs to the gemma4 Ollama mapper for consistency with other gemma entries. 2. mapper.py: Remove self-mapping non-bnb-4bit entries from __INT_TO_FLOAT_MAPPER that were polluting FLOAT_TO_INT_MAPPER with lowercase 16-bit names, causing load_in_4bit=True to return bad model names. Add direct MAP_TO_UNSLOTH_16bit entries to preserve the google->unsloth 16-bit redirects. 3. mapper.py: Add LFM2.5 MAP_TO_UNSLOTH_16bit redirect so LiquidAI/LFM2.5-1.2B-Instruct resolves to its unsloth mirror. * Add review tests for PR #4950 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove top-level test files These test_.py files were added at the repo root rather than under tests/. Removing them from this PR; the production mapper changes remain. Add gemma-4-26B-A4B-it mapping Adds unsloth/gemma-4-26B-A4B-it to __INT_TO_FLOAT_MAPPER as a 2-tuple so google/gemma-4-26B-A4B-it routes to unsloth/gemma-4-26B-A4B-it across INT_TO_FLOAT_MAPPER, FLOAT_TO_INT_MAPPER, and MAP_TO_UNSLOTH_16bit. The 26B-A4B (MoE) model has no bnb-4bit variant, so the key uses the plain unsloth name rather than the -unsloth-bnb-4bit suffix. Removes the now-redundant standalone _add_with_lower call for the -it variant; the 16bit mapping is registered via the dict loop. * Add unsloth-bnb-4bit mappings for gemma-4 base (non-it) models Adds E2B, E4B, 31B base unsloth-bnb-4bit entries to __INT_TO_FLOAT_MAPPER. The 26B-A4B (MoE) base has no bnb-4bit variant on HF, so it stays on the standalone _add_with_lower line for the 16bit-only routing. Removes the redundant _add_with_lower lines for E2B, E4B, 31B base since the dict loop now registers the same google->unsloth route through the 2-tuple entries, plus full FLOAT_TO_INT and INT_TO_FLOAT coverage. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 07:52:12 -07:00
Avaya Aggarwal	7c5464ad71	feat: Add cactus QAT scheme support (#4679 ) * feat: Add cactus QAT scheme support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test(qat): add tests for cactus QAT scheme and fix missing import * Fix cactus QAT scheme: correct MappingType import, tighten PerGroup filter - Drop the broken `from torchao.dtypes import MappingType` import. `MappingType` lives in `torchao.quantization` (and `torchao.quantization.quant_primitives`); it is not exported from `torchao.dtypes` in any supported torchao release (verified on 0.14, 0.16, 0.17). The previous code raised `ImportError` on every cactus call and was masked as a misleading 'torchao not found' error. - Since `IntxWeightOnlyConfig` already defaults `mapping_type` to `MappingType.SYMMETRIC`, drop the explicit kwarg entirely and remove the import. Behavior is unchanged. - Introduce a named `group_size = 32` constant (matches the int4 / fp8-int4 pattern in the surrounding branches) and add a `% group_size == 0` divisibility guard to the filter. `PerGroup(32)` requires `in_features % 32 == 0` at `quantize_()` time, otherwise torchao raises `ValueError: in_features (N) % group_size (32) must be == 0`. The old `in_features >= 32` filter would admit non-aligned widths (e.g. 33, 48, 65, 127) and crash `_prepare_model_for_qat` for those shapes. * Warn when cactus QAT skips non-divisible Linear layers Multiple reviewers flagged that the divisibility guard added in the previous commit can silently leave Linear layers in full precision when their in_features is not a multiple of 32. For currently supported Unsloth models (Qwen, Llama, Gemma, Mistral, Phi) every Linear width is already a multiple of 32/64/128 so this never triggers, but surfacing the coverage gap is cheap and avoids users assuming 100% QAT coverage when they bring a custom model with unusual shapes. Emit a UserWarning listing up to the first 8 skipped layers whenever the cactus filter excludes any Linear due to the modulo guard. This keeps the lenient silent-skip behavior (consistent with int4 / fp8-int4), but stops making it silent. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-15 07:40:03 -07:00
Avaya Aggarwal	f18e9dddf0	feat: Add support for OLMo-3 model (#4678 ) * feat: Add support for OLMo-3 model in mapping and tests * Update unsloth/models/mapper.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update tests/test_get_model_name.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Fix casing, add Think variants, and align version gate for OLMo-3 PR 4678 Mapper: switch slugs from OLMo-3 to canonical Olmo-3 mixed case, drop the non-existent unsloth/Olmo-3-7B-Instruct-bnb-4bit dead alias, and add the already-published Olmo-3-7B-Think and Olmo-3-32B-Think Unsloth mirrors. Loader: change the olmo3 transformers version gate from Version("4.57.0") to Version("4.57.0.dev0") so nightly/source builds that already contain olmo3 are not blocked, matching the OLMo-2, Gemma 3 and Cohere patterns. * Use canonical Olmo-3 casing and cover Think variants in OLMo-3 tests Mirrors the mapper.py fixes on pr-4678-code: HuggingFace canonical slugs for the OLMo-3 family use mixed-case Olmo-3 (not OLMo-3 like OLMo-2), and Unsloth already hosts Olmo-3-7B-Think and Olmo-3-32B-Think mirrors, so the resolution matrix now covers all three published Olmo-3 families. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-15 07:39:11 -07:00
Daniel Han	c3cd890357	Studio: refresh Downloaded GGUF list and recurse into variant subdirs (#5032 ) * Studio: refresh Downloaded GGUF list and recurse into variant subdirs Two fixes for the model picker's "Downloaded" section. Frontend (`pickers.tsx`): * `HubModelPicker`'s mount effect short-circuited the cached-gguf and cached-models refetch whenever the module-level cache already had entries (`if (alreadyCached) return;`). After downloading a new repo in the same session, reopening the picker rendered the stale cache and the new repo never appeared in "Downloaded" until a full page reload. The early return is removed so the lists are always refreshed on mount; the module cache still drives the initial render so there is no spinner flash when we already had data. Backend (`utils/models/model_config.py`): * `list_local_gguf_variants` and `_find_local_gguf_by_variant` used a non-recursive `Path.glob(".gguf")`. Some HF GGUF repos (e.g. `unsloth/gemma-4-26B-A4B-it-GGUF`) place the largest quants under a variant-named subdirectory such as `BF16/...gguf`, which the top-level glob missed. Both helpers now use `rglob` and the variant filename is stored as a path relative to the scan root so the locator can still find the file. The flat-layout case (variants directly in the snapshot root) is unchanged: verified against `unsloth/gemma-4-E2B-it-GGUF` which still returns its UD-Q4_K_XL variant correctly. Studio: emit posix-style relative filenames for local GGUF subdirs `list_local_gguf_variants` was doing `str(f.relative_to(p))`, which on Windows produces backslash-separated paths like `BF16\foo.gguf`. The remote `list_gguf_variants` (HF API path) always returns forward-slash filenames such as `BF16/foo.gguf`, so the two would diverge on Windows. Switch to `.as_posix()` so the local and remote variant filenames stay identical across Linux, macOS, and Windows. Verified by simulating with `PureWindowsPath` in the test suite. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: detect mmproj at snapshot root for nested-variant layouts When _find_local_gguf_by_variant returns a weight file inside a quant-named subdir (e.g. snapshot/BF16/foo.gguf), detect_mmproj_file was scanning only the immediate parent and missing the mmproj file sitting at the snapshot root. The model was then loaded without --mmproj, silently breaking vision support for repos that ship nested variants. detect_mmproj_file now takes an optional search_root and walks up from the weight file to that root, in order, so the mmproj at the snapshot root is picked up. Sibling quant subdirs are not scanned, so an unrelated variant's mmproj does not leak in. Also apply the suggested micro-optimization on relative_to in list_local_gguf_variants -- only build the posix path when storing the first file for a quant. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 07:34:42 -07:00
Daniel Han	156f3fc4b0	Gate trl disable_gradient_checkpointing patch warning on UNSLOTH_ENABLE_LOGGING (#5038 ) The "Patched trl.models.utils.disable_gradient_checkpointing with a no-op" warning fires once on every Unsloth import, including from notebooks where the user did not opt into verbose logging. It is a routine integration patch, not an anomaly the user needs to know about. Gate it on UNSLOTH_ENABLE_LOGGING=1 like other diagnostic notices.	2026-04-15 07:33:48 -07:00
jonahsamost	777e1bd0ac	fix (#4887 )	2026-04-15 07:21:03 -07:00
Daniel Han	1a4ca5eca8	Fix grad-accum accepts_loss_kwargs detection for vision wrappers (#5036 ) * Fix grad-accum model_accepts_loss_kwargs detection for vision wrappers Replace the source-string rewrite of Trainer.__init__ with an instance-level accepts_loss_kwargs shadow applied on the loaded model. Covers: 1. Unsloth-compiled forward -> True, so HF Trainer does not double-scale on top of unsloth_fixed_cross_entropy's num_items_in_batch division. 2. Stock forward on a conditional-generation wrapper (Gemma3n, Gemma3 pre-4.57, Qwen-VL family, etc.) where the outer class has no accepts_loss_kwargs but the inner .model declares False -> False. This is the case that reproduces issue #4982 under trust_remote_code or UNSLOTH_COMPILE_DISABLE, where the previous fix's outer-attr check walked past the inner model and fell through to signature inspection. 3. Text LMs without any explicit accepts_loss_kwargs -> leave HF default. The previous .replace()-based patch silently no-ops on transformers 4.48 through 4.52 (variable named model, not unwrapped_model) and is fragile against any upstream reformat. The new helper walks the PEFT / HF wrapper chain, finds the first class that declares accepts_loss_kwargs on its own class dict (type(m).__dict__, not hasattr, to avoid PEFT __getattr__ forwarding), and setattr-shadows that value at every wrapper level so HF Trainer's hasattr(unwrapped_model, ...) check picks it up at whichever level accelerate.unwrap_model returns. Also adds an unconditional post-init clamp of accelerator.gradient_accumulation_steps = 1 to work around the transformers 5.0 through 5.5 GradientAccumulationPlugin regression that makes accelerator.backward divide loss by GA on top of training_step's own /GA division. Fixed upstream in 5.6.0.dev0; no-op on 4.x and 5.6+. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Trim comments * Address review: cover PEFT-after-load and custom compile location Two review findings from 3/20 reviewers: 1. [3 of 20 reviewers] apply_accepts_loss_kwargs_fix was called from the loaders before get_peft_model wraps the base model, so on transformers 4.48-4.52 (which does hasattr on the outer model) the instance shadow on the base model was lost after PEFT wrapping. Fix: also call it from the wrapped Trainer.__init__ so it runs on whatever model the user actually hands to Trainer, which is always the final wrapped form. 2. [1 of 20 reviewers] _forward_is_unsloth_compiled hard-coded the substrings "unsloth_compiled" / "unsloth_cache" in the co_filename check, which misclassifies compiled forwards when UNSLOTH_COMPILE_LOCATION is set to a custom directory. Fix: new _unsloth_compile_cache_leaves helper that reads the env var and matches the basename against path components, honoring both the default and any user override. Verified locally: - PEFT-after-load simulation: HF's hasattr(peft, "accepts_loss_kwargs") now returns True after our init wrapper runs, and value resolves to False on Gemma3n-style inner wrappers. - Custom UNSLOTH_COMPILE_LOCATION simulation: compiled detection returns True for /tmp/my_custom_cache/compiled.py when the env var is set. - End-to-end Gemma-3 270m + LoRA SFT unchanged: loss 4.9626, grad-norm matches prior run, all 4 wrapper levels now carry the shadowed attr. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 06:59:36 -07:00
Daniel Han	1ccfd2e0a5	fix(rocm): tighten gfx regex to ignore generic ISA lines (#5033 ) * fix(rocm): tighten gfx regex to ignore generic ISA lines ROCm 6.1+ rocminfo emits generic ISA names such as "amdgcn-amd-amdhsa--gfx11-generic" and "amdgcn-amd-amdhsa--gfx9-4-generic" alongside the real GPU name. The previous `gfx[1-9]` regex used in `_has_rocm_gpu` matched both, so a host with only a generic ISA entry would be reported as having a usable AMD GPU. Tighten the pattern to `gfx[1-9][0-9a-z]{2,3}` so only real gfx ids match. This covers every documented target from GFX6 (gfx600) through GFX12 (gfx1201), including letter-suffixed ids like gfx90a (MI250 / MI250X) and gfx90c. Documented generic ISA names always have 1 or 2 digits before the dash and no longer match. Applied to both `studio/install_python_stack.py` and `studio/install_llama_prebuilt.py` so the two detection paths agree. Co-authored-by: Martin Hoyer <mhoyer@redhat.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Martin Hoyer <mhoyer@redhat.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 05:24:41 -07:00
Daniel Han	b7a8ff2833	Respect classification head skip list on pre-quantized 4-bit checkpoints (#5027 ) (#5034 ) * Respect classification head skip list on pre-quantized 4-bit checkpoints (#5027) FastLanguageModel.from_pretrained(..., num_labels=N) crashed with "NotImplementedError: normal_kernel_cuda not implemented for 'Byte'" on pre-quantized bnb 4-bit checkpoints (e.g. unsloth/Qwen3-4B-bnb-4bit) when running on transformers 5.x. Two pieces were needed to close this out: 1. unsloth_zoo PR: add "score", "classifier", "qa_outputs" to SKIP_QUANTIZATION_MODULES so replace_with_bnb_linear leaves task heads in the compute dtype. 2. This commit: for pre-quantized checkpoints, transformers reads llm_int8_skip_modules from the quantization_config baked into config.json and ignores the runtime BitsAndBytesConfig we pass via kwargs. Unsloth must merge its skip list into model_config.quantization_config.llm_int8_skip_modules before the from_pretrained call, or the checkpoint's frozen list (e.g. ["lm_head", "multi_modal_projector", "merger", "modality_projection"]) wins and the `score` head gets converted to Linear4bit with uint8 storage, then _init_weights calls normal_ on uint8 and crashes. Also add a defensive post-load cast on the task head to guard against any residual path that ends up with a non-floating head dtype. Verified on transformers 4.57.6 and 5.5.0 with: - unsloth/Qwen3-4B-bnb-4bit + num_labels=3 - unsloth/Qwen3-4B (non-bnb repo, load_in_4bit=True) - unsloth/Llama-3.2-1B-Instruct + num_labels=3 - unsloth/ModernBERT-large classifier head (bert_classification notebook) - Regression: causal LM path unchanged, backbone still 4-bit - 3-step SFT on num_labels=3 confirms gradient flow and weight updates on score.weight Fixes unslothai/unsloth#5027 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 05:16:33 -07:00
David Solanas Sanz	1fcb2502cf	fix: prevent offline freeze by fixing stats retry and forwarding local_files_only (#5016 ) Fixes #2393. - `_utils.py`: `has_internet()` now respects `HF_HUB_OFFLINE` with truthy variant parsing in addition to `TRANSFORMERS_OFFLINE`. - `_utils.py`: replace uncontrolled `except Exception: stats_check()` retry (which had no time limit and could freeze on Kaggle offline mode) with a logged skip. - `loader.py`: forward `local_files_only` from kwargs into all `AutoConfig.from_pretrained` and `PeftConfig.from_pretrained` probes in `FastLanguageModel.from_pretrained` and `FastModel.from_pretrained`, including the PEFT base-model reload paths.	2026-04-15 04:51:31 -07:00
Lee Jackson	f9ef639dde	Studio: support GGUF variant selection for non-suffixed repos (#5023 ) * fix: support GGUF variant selection for non-suffixed repos * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: harden GGUF detection across cached models and picker flows * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * chore: use shared GGUF picker helper for search rows * fix: avoid mixed cache duplication and preserve GGUF fallback detection * fix: unify GGUF cache matching and merge picker hints * fix: normalize local GGUF matching across picker and model config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: robust cached-gguf classification + hint-aware click routing - _repo_gguf_size_bytes: treat size_on_disk=None as 0 and dedupe fallback by commit_hash so partial/interrupted downloads don't TypeError out of sum() and wipe the entire cached list. - list_cached_gguf / list_cached_models: narrow per-repo try/except so one malformed repo no longer poisons the whole response. - handleModelClick: route through isKnownGgufRepo instead of the suffix-only isGgufRepo, so non-suffixed GGUF repos still open the variant expander from every call site. - Replace the modelIsGgufById/resultIsGgufById Maps with Sets of known GGUF ids to stop conflating "no hint" with "known not-GGUF". - Make HfModelResult.isGguf required (it is always set in makeMapModel). - Add regression tests for the None size case, mixed-repo inclusion in cached-gguf, and per-repo error isolation. * fix: exclude mmproj from GGUF classification and case-normalize hint lookups - _repo_gguf_size_bytes now filters mmproj vision-adapter files so safetensors+mmproj.gguf repos stay on the cached-models path and non-GGUF rows no longer show zero pickable variants. A vision-capable GGUF repo (main weight + mmproj adapter) still classifies as GGUF and reports the main weight size. - modelGgufIds / resultGgufIds now key on lowercased ids and isKnownGgufRepo lowercases its lookup, so store and HF-search ids that differ only by casing still match the same GGUF hint. - New regression tests: mmproj-only repo excluded from cached-gguf, same repo included in cached-models, vision-capable repo still classified as GGUF with correct size. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-04-15 15:32:01 +04:00
Roland Tannous	13928b5f0e	Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var (#5024 ) * Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var When set, UNSLOTH_PYTORCH_MIRROR overrides the default https://download.pytorch.org/whl base URL in all four install scripts (install.sh, install.ps1, studio/setup.ps1, studio/install_python_stack.py). When unset or empty, the official URL is used. This lets users behind corporate proxies or in regions with poor connectivity to pytorch.org point at a local mirror without patching scripts. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add pytest for UNSLOTH_PYTORCH_MIRROR in install_python_stack.py Tests that _PYTORCH_WHL_BASE picks up the env var when set, falls back to the official URL when unset or empty, and preserves the value as-is (including trailing slashes). * Remove stale test assertions for missing install.sh messages * Fix GPU mocking in test_get_torch_index_url.sh Extract _has_usable_nvidia_gpu and _has_amd_rocm_gpu alongside get_torch_index_url so the GPU-presence checks work in tests. Add -L flag handling to mock nvidia-smi so it passes the GPU listing check. All 26 tests now pass on CPU-only machines. * Strip trailing slash from UNSLOTH_PYTORCH_MIRROR to avoid double-slash URLs --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 11:39:11 +04:00
Datta Nimmaturi	826c98f3c0	[moe][gemma4] Target MoE for gemma4 (#4913 ) * Target MoE for gemma4 * refactor attention impl determine * Revert "refactor attention impl determine" This reverts commit 888fca08110a9a74278dc1ebc14d0da043bbd11d. * Remove attention policy changes from gemma4 MoE fix	2026-04-14 16:53:07 -05:00
Daniel Han	5aa8c15246	Studio: hard-stop at n_ctx with a 'Context limit reached' toast (#5021 ) * Studio: hard-stop at n_ctx with a dedicated 'Context limit reached' toast llama-server's default behavior when the KV cache fills is to silently drop the oldest non-``n_keep`` tokens and keep generating. The UI has no way to tell the user that earlier turns were evicted -- they just see degraded continuity and a confusing ``5,361 / 4,096`` on the context usage bar. Launch llama-server with ``--no-context-shift`` so it returns a clean error once the request would exceed ``n_ctx``. In the chat adapter, catch the error, identify it as a context-limit error via ``isContextLimitError()``, and surface a dedicated toast that names the exact control to adjust: the ``Context Length`` field in the chat Settings panel. Also add a lightweight tooltip hint on ``ContextUsageBar`` when usage crosses 85%, so users see the "raise Context Length in Settings" suggestion before they hit the hard stop. Tests: * ``test_llama_cpp_no_context_shift.py`` pins the ``--no-context-shift`` flag in the static launch-command template, and pins it inside the unconditional ``cmd = [ ... ]`` block so a future refactor can't hide it behind a branch. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Shorten --no-context-shift comment to 1 line * Match backend _friendly_error rewrite in isContextLimitError Codex review on PR caught that ``backend/routes/inference.py::_friendly_error`` rewrites the raw llama-server text "request (X tokens) exceeds the available context size (Y tokens)" into "Message too long: X tokens exceeds the Y-token context window. ..." on the main streaming GGUF path. The heuristic only looked for "context size" / "exceeds the available context" / "context shift", none of which survive the rewrite, so the new "Context limit reached" toast would never fire for the most common case. Add matches for "message too long" and "context window" so both wordings hit. Also addresses Gemini feedback on the launch-flag test: * Use ``inspect.getsource(LlamaCppBackend.load_model)`` instead of reading ``__file__`` directly; scopes the assertions to the function that actually launches llama-server. * Replace the hardcoded ``" ]"`` indent search with a line-at-a-time scan for a line that is just ``]``, so the test survives reformatting. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 10:58:20 -07:00
Daniel Han	5861a7ce15	Studio: split model-load progress label across two rows (#5020 ) * Studio: split model-load progress label across two rows The chat flow and training overlay both compose a progress label like "112.6 of 122.3 GB • 331.0 MB/s • 30s left" and render it next to the percent badge in a single flex row. Once the rate + ETA part shows up, the label outgrows the row width and wraps mid-phrase, orphaning the percent ("19 left %") onto a second ragged line. Fix in model-load-status.tsx: split the label on the first " • " into a primary (size) chunk that stays on row 1 with the percent, and a secondary (rate/ETA) chunk that renders on its own muted row below. Labels without a bullet (e.g. "22.8 GB downloaded") collapse cleanly to one row. The inline-status variant keeps only the primary and surfaces the full label via the tooltip. Also extracts the rate/ETA math out of useTransferStats into a pure ``transfer-stats.ts`` module (appendSample + computeTransferStats) so it can be reasoned about and tested without React. The hook is now a thin wrapper that feeds sample history through the pure functions. Backend: adds two companion test files for load_progress(): * test_llama_cpp_load_progress_matrix.py (21 tests) -- platform matrix (Linux /proc, macOS/Windows absence), VmRSS parsing variants (tab/space/missing/malformed), filesystem edges (HF-cache symlinks, broken symlinks, nonexistent paths, relative paths), shard aggregation (partial multi-shard, two series in same dir, mmproj-* exclusion, single-file), lifecycle races, concurrent sampling (10 threads x 50 iters against real /proc), fraction bounds. * test_llama_cpp_load_progress_live.py (5 tests) -- no-mock live integration: real subprocess allocating 100 MB to match VmRSS, real ready phase, real dead-pid degradation, real shard aggregation, repeated polling. Skipped on non-Linux. Both complement the existing test_llama_cpp_load_progress.py. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Hoist splitProgressLabel out of JSX IIFE (review feedback) --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 10:58:16 -07:00
Eda Z	5b8dbdc3c2	Fix bitsandbytes ROCm install by using pip instead of uv (#4966 ) * Fix bitsandbytes ROCm install by using pip instead of uv * Also use pip for PyPI fallback path in _install_bnb_rocm The original fix correctly switched the pre-release wheel install from uv to pip, but left the PyPI fallback path on uv. If uv breaks bnb on ROCm, the fallback would hit the same issue. Move pip bootstrap before the branch so both paths use pip consistently. * Harden pip bootstrap: try ensurepip first, warn on failure - Try ensurepip --upgrade before falling back to uv pip install pip. ensurepip works offline and does not need PyPI, making the bootstrap robust when the network or index is unavailable. - If both ensurepip and uv fail, emit a visible warning instead of silently swallowing the error (which previously led to a cryptic "No module named pip" downstream). - Use run_maybe_quiet so --verbose users see bootstrap output. - Update comment to document the actual root cause: uv rejects the wheel because filename version and metadata version disagree. * Add --isolated to pip install calls in _install_bnb_rocm uv pip install ignores pip.conf and PIP_* env vars, but python -m pip reads them. Without --isolated, users with PIP_INDEX_URL pointing to a private mirror that does not carry bitsandbytes would see the PyPI fallback fail where it previously worked under uv. --isolated restores parity with the old uv behavior. * Drop --isolated from PyPI fallback in _install_bnb_rocm --isolated suppresses PIP_INDEX_URL, PIP_EXTRA_INDEX_URL, and pip.conf. This is correct for the pre-release path (hardcoded GitHub URL, no index consulted), but breaks the PyPI fallback for users in corporate or air-gapped environments whose only route to bitsandbytes is a private mirror configured via those mechanisms. Keep --isolated on the direct-URL pre-release install; drop it from the index-dependent fallback. * Drop --isolated from pre-release pip install, fix warning wording --isolated suppresses pip.conf cert/proxy/CA settings in addition to index config. For the direct GitHub URL, index config is irrelevant but cert/proxy settings matter in corporate SSL-inspection environments. Without this fix, users with pip.conf-based CA bundles get a TLS error on the pre-release download and silently fall back to the broken PyPI version -- the exact outcome the PR is trying to prevent. Also fix the fallback warning: "unreachable" is too specific since the pre-release install can fail for reasons other than network reachability. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-14 10:23:40 -07:00
pre-commit-ci[bot]	a0b9d14081	[pre-commit.ci] pre-commit autoupdate (#5004 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.15.9 → v0.15.10](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.9...v0.15.10) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 09:49:18 -07:00
Daniel Han	bb14ab144a	Studio: live model-load progress + rate/ETA on download and load (#5017 ) * Studio: live model-load progress + rate/ETA on download and load Two UX fixes for the opaque multi-minute wait between clicking Load and being able to chat, visible most clearly on large MoE GGUFs like MiniMax-M2.7 (131 GB of weights on a 97 GB GPU): 1. Model-load phase is now observable. The existing chat flow transitions the toast to "Starting model..." as soon as the download hits 100%, then shows a spinner with no other feedback until llama-server reports healthy. For a 130 GB model that spinner freezes for five-plus minutes while the kernel pages shards into the page cache. A new `GET /api/inference/load-progress` endpoint samples `/proc/<pid>/status VmRSS` on the llama-server subprocess against the sum of shard file sizes on disk, so the UI can render a real bar plus rate / ETA during that window. 2. Rate and ETA on downloads and loads. Both the chat toast and the training-start overlay used to show a static pair of numbers (for example "15.4 of 140.8 GB"). A rolling 15-second window over the existing byte-series now surfaces "85.3 MB/s, 24m 23s left" beside that pair. The estimator is shared between the download and load phases so the numbers don't reset when the phase flips. Also fixes a pre-existing assignment bug uncovered while wiring this up: `load_model` was storing the caller's `gguf_path` kwarg into `self._gguf_path`, which is `None` on the HF-download code path. The resolved on-disk path (`model_path`) is what llama-server actually mmaps; downstream consumers need that. No existing reader used `_gguf_path`, so this is a correctness fix for the new endpoint. - Backend: `LlamaCppBackend.load_progress()`, `GET /api/inference/load-progress`, `LoadProgressResponse` Pydantic model. - Frontend: `useTransferStats` hook, `formatRate` / `formatEta` helpers, `getLoadProgress` client, rewired chat toast and `DownloadRow` in the training overlay. - Tests: `studio/backend/tests/test_llama_cpp_load_progress.py` covers empty states, mmap phase, ready phase, sharded total aggregation, missing gguf_path, and unreadable /proc (7 cases). `tsc -b` and `vite build` on the frontend both clean. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 09:46:22 -07:00
Roland Tannous	514bb3a20e	studio: pin peft to 0.18.1 to fix export subprocess issues (#5015 ) * studio: pin peft to 0.18.1 to fix export subprocess issues peft 0.19.0 causes export subprocess shutdown failures in Studio. Reverting to 0.18.1 resolves the issue. * studio: move peft pin to extras-no-deps to prevent torch upgrade Installing peft via overrides.txt would resolve its deps and pull in torch>=0.11.0, breaking other pinned packages. Moving the pin to extras-no-deps.txt ensures --no-deps is used during install.	2026-04-14 20:16:30 +04:00
Datta Nimmaturi	4328d0b4f6	Fix num_items_in_batch GA for Gemma4 (#4998 ) * Fix num_items_in_batch GA for Gemma4 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 09:01:10 -07:00
Daniel Han	7252410ccc	studio: stream export worker output into the export dialog (#4897 ) * studio: stream export worker output into the export dialog The Export Model dialog only showed a spinner on the "Exporting..." button while the worker subprocess was doing the actual heavy lifting. For Merged to 16bit and GGUF / Llama.cpp exports this meant several minutes (or more, for large models) of opaque silence, with no way to tell whether save_pretrained_merged, convert_hf_to_gguf.py, or llama-quantize was making progress. This adds a live terminal-style output panel inside the export dialog, rendered just above the Cancel / Start Export buttons and scrollable with auto-follow-tail. It shows stdout and stderr from both the worker process itself and any child process it spawns (GGUF converter, llama-quantize), coloured by stream. Backend - core/export/worker.py: new _setup_log_capture(resp_queue) installed before LogConfig.setup_logging. It saves the original stdout/stderr fds, creates pipes, os.dup2's the write ends onto fds 1 and 2 (so every child process inherits the redirected fds), and spins up two daemon reader threads. Each thread reads bytes from a pipe, echoes them back to the original fd (so the server console keeps working), splits on \n and \r, and forwards each line to the resp queue as {"type":"log","stream":"stdout\|stderr","line":...,"ts":...}. PYTHONUNBUFFERED=1 is set so nested Python converters flush immediately. - core/export/orchestrator.py: - Thread-safe ring buffer (collections.deque, maxlen 4000) with a monotonically increasing seq counter. clear_logs(), get_logs_since(cursor), get_current_log_seq(), is_export_active(). - _wait_response handles rtype == "log" by appending to the buffer and continuing the wait loop. Status messages are also surfaced as a "status" stream so users see high level progress alongside raw subprocess output. - load_checkpoint, _run_export, and cleanup_memory now wrap their bodies with the existing self._lock (previously unused), clear the log buffer at the start of each op, and flip _export_active in a try/finally so the SSE endpoint can detect idle. - routes/export.py: - Wrapped every sync orchestrator call (load_checkpoint, cleanup_memory, export_merged_model, export_base_model, export_gguf, export_lora_adapter) in asyncio.to_thread so the FastAPI event loop stays free during long exports. Without this the new SSE endpoint could not be served concurrently with the blocking export POST. - New GET /api/export/logs/stream SSE endpoint. Honors Last-Event-ID and a since query param for reconnect, emits log / heartbeat / complete / error events, uses the id field to carry the log seq so clients can resume cleanly. On first connect without an explicit cursor it starts from the current seq so old lines from a previous run are not replayed. Frontend - features/export/api/export-api.ts: streamExportLogs() helper that authFetches the SSE endpoint and parses id / event / data fields manually (same pattern as streamTrainingProgress in train-api.ts). - features/export/components/export-dialog.tsx: - Local useExportLogs(exporting) hook that opens the SSE stream on exporting transitions to true, accumulates up to 4000 lines in component state, and aborts on cleanup. - New scrollable output panel rendered above DialogFooter, only shown for Merged to 16bit and GGUF / Llama.cpp (LoRA adapter is a fast disk write with nothing to show). Dark terminal styling (bg-black/85, emerald text, rose for stderr, sky for status), max-height 14rem, auto-scrolls to the bottom on new output but stops following if the user scrolls up. A small streaming / idle indicator is shown next to the panel title. - DialogContent widens from sm:max-w-lg to sm:max-w-2xl when the output panel is visible so the logs have room to breathe. Verified - Python smoke test (tests/smoke_export_log_capture.py): spawns a real mp.get_context("spawn") process, installs _setup_log_capture, confirms that parent stdout prints, parent stderr prints, AND a child subprocess invoked via subprocess.run (both its stdout and stderr) are all captured in the resp queue. Passes. - Orchestrator log helpers tested in isolation: _append_log, get_logs_since (with and without a cursor), clear_logs not resetting seq so reconnecting clients still progress. Passes. - routes.export imports cleanly in the studio venv and /logs/stream shows up in router.routes. - bun run build: tsc -b plus vite build, no TypeScript errors. No existing export behavior is changed. If the subprocess, the SSE endpoint, or the frontend hook fails, the export itself still runs to completion the same way it did before, with or without logs visible. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * export dialog: trim bootstrap noise, scope logs per screen, show realpath Several follow-ups to the live export log work: 1. Worker bootstrap noise (transformers venv activation, Unsloth banner, "Top GGUF/hub models" lists, vision detection, 2k-step weight load bar) is dropped from the export-dialog stream. A threading.Event gate in worker.py defaults closed and only opens once _handle_export actually starts; until then the reader thread still echoes lines to the saved console fd for debugging but does not push them onto the resp_queue. The orchestrator already spawns a fresh subprocess for every checkpoint load, so the gate is naturally reset between runs. 2. tqdm in non-tty mode defaults to a 10s mininterval, which makes multi-step bars look frozen in the panel. Set TQDM_MININTERVAL=0.5 in the worker env so any tqdm-driven progress emits more often. 3. The dialog's useExportLogs hook now also clears its line buffer when exportMethod or open changes, so re-opening the dialog into a different action's screen no longer shows the previous action's saved output. A useElapsedSeconds tick + "Working Xs" badge in the log header gives users a visible sign that long single-step phases (cache copies, GGUF conversion) are still running when no new lines are arriving. 4. ExportBackend.export_{merged,base,gguf,lora} now return (success, message, output_path); the worker forwards output_path on each export__done response, the orchestrator's _run_export passes it to routes/export.py, which surfaces it via ExportOperationResponse.details.output_path. The dialog's Export Complete screen renders the resolved on-disk realpath under "Saved to" so users can find their exported model directly. fix(cli): unpack 3-tuple return from export backend ExportOrchestrator.export_{merged,base,gguf,lora} now return (success, message, output_path) so the studio dialog can show the on-disk realpath. The CLI still unpacked 2 values, so every `unsloth export --format ...` crashed with ValueError before reporting completion. Update the four call sites and surface output_path via a "Saved to:" echo. * fix(studio): anchor export log SSE cursor at run start The export dialog SSE defaulted its cursor to get_current_log_seq() at connect time, so any line emitted between the POST that kicks off the export and the client opening the stream was buffered with seqs 1..k and then skipped (seq <= cursor). Long-running exports looked silent during their first seconds. Snapshot _log_seq into _run_start_seq inside clear_logs() and expose it via get_run_start_seq(). The SSE default cursor now uses that snapshot, so every line emitted since the current run began is reachable regardless of when the client connects. Old runs still can't leak in because their seqs are <= the snapshot. * fix(studio): reconnect export log SSE on stream drop useExportLogs launched streamExportLogs once per exporting transition and recorded any drop in .catch(). Long GGUF exports behind a proxy with an idle kill-timeout would silently lose the stream for the rest of the run even though the backend already supports Last-Event-ID resume. The "retry: 3000" directive emitted by the backend is only meaningful to native EventSource; this hook uses a manual fetch + ReadableStream parse so it had no effect. Wrap streamExportLogs in a retry loop that tracks lastSeq from ExportLogEvent.id and passes it as since on reconnect. Backoff is exponential with jitter, capped at 5s, reset on successful open. The loop stops on explicit backend `complete` event or on effect cleanup. * fix(studio): register a second command so Typer keeps `export` as a subcommand The CLI export unpacking tests wrap `unsloth_cli.commands.export.export` in a fresh Typer app with a single registered command. Typer flattens a single-command app into that command, so the test's `runner.invoke(cli_app, ["export", ckpt, out, ...])` treats the leading `"export"` token as an unexpected extra positional argument -- every parametrized case failed with: Got unexpected extra argument (.../out) Register a harmless `noop` second command so Typer preserves subcommand routing and the tests actually exercise the 3-tuple unpack path they were written to guard. Before: 4 failed After: 4 passed --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: studio-install <studio@local.install> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>	2026-04-14 08:55:43 -07:00
Daniel Han	eca592effe	studio: show HF model download progress in training start overlay (#4894 ) * studio: show HF model download progress in training start overlay During the training setup phase, the overlay only displayed a static "Loading model..." line while model weights were being downloaded from Hugging Face. On slow connections this looked like the app had frozen. This adds a small self-contained progress block inside the existing TrainingStartOverlay that polls the existing GET /api/models/download-progress endpoint and renders a Progress bar with bytes downloaded, total bytes, and percent complete. Notes: - Frontend only change. No backend, worker, SSE, or runtime store edits. - Reuses the existing getDownloadProgress client wrapper and the existing /api/models/download-progress endpoint that already scans the HF blob cache for completed and .incomplete files. - selectedModel is read directly from useTrainingConfigStore inside the overlay, so no prop drilling and live-training-view.tsx is unchanged. - Polling runs at 1500 ms and is gated on the HF repo regex (^[A-Za-z0-9._-]+/[A-Za-z0-9._-]+$), the same regex the backend uses, so local paths and empty form state never hit the endpoint. - Polling stops once progress reaches 1.0 so the bar can stay at 100 until the overlay hides on the first training step. - Network errors are silently swallowed, matching the chat side flow (the bar simply freezes at the last value). - When downloadedBytes is 0 the block is hidden entirely, so cached models do not flash a progress bar. - When the HF API cannot determine the total size, the block falls back to "X downloaded" with no percent and no bar. Verified with bun run build (tsc -b plus vite build, no TypeScript errors). * training overlay: track dataset download + show on-disk realpath Adds a dedicated "Downloading dataset..." section to the training-start overlay alongside the existing model-weights one, so an HF dataset that is downloading mid-startup is no longer mislabeled as model weights or hidden entirely. The new GET /api/datasets/download-progress endpoint mirrors /api/models/download-progress against the datasets-- prefix in HF_HUB_CACHE. Both endpoints now also return cache_path, the resolved on-disk realpath of the snapshot directory (or the cache repo root if no snapshot is materialized yet). The overlay surfaces this under each download row so users can immediately see where the model and dataset landed without digging through server logs. The frontend's existing useModelDownloadProgress hook is generalized to a single useHfDownloadProgress(repoId, fetcher) hook that the model and dataset variants both delegate to, keeping polling, gating, and completion semantics in one place. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: Polish training start overlay download progress UI (#4957) * studio: polish training start overlay download progress visuals * Fix formatCachePath cross-platform support and redundant sizeLabel - Extend formatCachePath regex to also shorten macOS /Users/<user> paths to ~ - Suppress sizeLabel when no byte info is available (cachePath-only state), since the "Preparing" badge already conveys the status * Fix misleading status badge when download total is unknown - Hide badge when totalBytes is 0 but downloadedBytes > 0, since we cannot determine if the download is still in progress or already complete (happens when HF size metadata lookup fails for gated/private repos) - Keep "Preparing" badge for the zero-bytes cachePath-only state - Add Windows native path shortening to formatCachePath (C:\Users\<name>) --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> --------- Co-authored-by: studio-install <studio@local.install> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>	2026-04-14 08:54:01 -07:00
Daniel Han	44082cf88e	Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM (#5014 ) * Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM The chat settings sheet's ctx slider reads `max_context_length` from `/api/inference/status` and renders Exceeds estimated VRAM capacity (N tokens). The model may use system RAM. when the user drags the slider above that value. For models whose weights fit on some GPU subset, `_max_context_length` was already set to the binary-search cap and the warning fired correctly. For models whose weights exceed 90% of every GPU subset's free memory (e.g. MiniMax-M2.7-GGUF at 131 GB on a 97 GB GPU), the ceiling-probe loop never matched a subset, so `max_available_ctx` stayed at the native context (e.g. 196608). The slider ran all the way to native with no indication that any value above the 4096 spec default would trigger `--fit on` and degrade performance. Anchor `max_available_ctx` at `min(4096, native_context_length)` when no subset fits, so the warning fires at the right threshold and the user sees the correct safe-zone / warning-zone split: Before (MiniMax-M2.7 on 97 GB GPU): slider 0 .. 196608, warning threshold = 196608 (never fires) After: slider 0 .. 196608, warning threshold = 4096 (fires correctly) No frontend changes required: `chat-settings-sheet.tsx` already consumes `ggufMaxContextLength` (= status.max_context_length) as the warning threshold and `ggufNativeContextLength` as the slider max. Adds tests/test_llama_cpp_max_context_threshold.py covering weights-exceed-VRAM (single / multi-GPU), a native-ctx below the 4096 fallback case (don't lie about supported ctx), fittable-model regressions (small / multi-GPU / tiny on huge GPU), and the `max_context_length` property's fallback semantics. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 08:53:49 -07:00
Daniel Han	b2f80f210e	Studio: make GGUF disk-space preflight cache-aware (#5012 ) * Studio: make GGUF disk-space preflight cache-aware The pre-download disk check in LlamaCppBackend.load_model compared the repo's total GGUF size against free disk without crediting bytes already present in the Hugging Face cache. Re-loading a large cached model (e.g. MiniMax-M2.7-GGUF at 131 GB) then failed cold with "Not enough disk space to download any variant" whenever free disk was below the full weight footprint, even though nothing actually needed to be downloaded. Subtract bytes already on disk via try_to_load_from_cache before comparing against free space. A partial blob (interrupted download) is not credited, so a second attempt still allocates room to finish the download. The log line now also surfaces how much is already cached. Adds tests/test_llama_cpp_cache_aware_disk_check.py covering the fully-cached, partial-cache-insufficient-disk, partial-cache-enough-disk, cold-cache, incomplete-blob, and zero-size-path-info cases. Sparse tempfiles keep the GB-scale scenarios cheap to simulate. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 08:53:37 -07:00
Daniel Han	767fa8cade	Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM (#5011 ) * Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM The load-time auto-fit in LlamaCppBackend.load_model had two issues for models whose weights do not fit on any GPU subset (the common case for large MoE GGUFs such as MiniMax-M2.7, Qwen3.5-397B-A17B, etc.): 1. Auto mode (max_seq_length=0) left effective_ctx at the model's native context when no subset passed the 90% fit check. The UI slider then landed on e.g. 196608 for MiniMax-M2.7, far above anything usable. Default the auto-pick to 4096 so the UI starts at a sane value; the slider ceiling stays at the native context so the user can still opt in to longer contexts and receive the "might be slower" warning. 2. Explicit ctx was silently shrunk when weights fit but the requested KV overflowed the 90% budget. The shrink loop emitted -c <capped> -ngl -1 without informing the caller, so a user who had opted into a longer context via the UI never actually got it. Drop the shrink loop on the explicit path and emit -c <user_ctx> --fit on instead, letting llama-server flex -ngl (CPU layer offload). Adds tests/test_llama_cpp_context_fit.py covering both paths, the file-size-only fallback when KV metadata is missing, non-regression on fittable auto-pick, and platform-agnostic input shape. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 08:53:25 -07:00
TF-MTGE	a31c82a640	fix(studio): remove 300s cap on load_checkpoint (inherits 3600s default) (#4922 ) * fix: increase wait response timeout to 900 sec instead of 300 sec. #4845 * Apply suggestion from @gemini-code-assist[bot] good catch Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-04-14 08:53:14 -07:00
Datta Nimmaturi	da78c6be71	[Studio] Install flash attn at setup time for linux (#4979 ) * [Studio] Install flash attn at setup time for linux * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup changes Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Test cases * wheel_utils: narrow url_exists exceptions and log at debug level --------- Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>	2026-04-14 16:40:17 +04:00
Datta Nimmaturi	dccc0ebada	[Studio] Show non exported models in chat UI (#4892 ) * Show non exported models in chat UI * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Distinguish b/w LoRa and full fine tune saves. Cleanup --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-04-14 15:03:58 +04:00
Bharath Kumar Adinarayan	a50f61009b	fix(studio): default chart view to full training history (#5007 ) * fix(studio): default chart view to full training history instead of last 80 steps Fixes #5003 * chore: windowsize as null code comment --------- Co-authored-by: imagineer99 <samleejackson0@gmail.com> Co-authored-by: Wasim Yousef Said <wasimysdev@gmail.com>	2026-04-14 03:29:27 -07:00
Lee Jackson	bfa17330bd	Studio: Polish API key copy button and harden async clipboard fallback (#5006 ) * fix: polish clipboard style and fix async clipboard path * Use copyToClipboardAsync in CopyButton for Safari fallback CopyButton was calling navigator.clipboard.writeText directly, bypassing the execCommand fallback added in this same PR. Switch to copyToClipboardAsync which tries execCommand first (Safari user-gesture requirement) then falls back to the async clipboard API. * Fix copyToClipboard sync contract regression and improve async path - Restore copyToClipboard() to return only the execCommand result, preserving the boolean contract that 7 existing callers depend on to gate their "Copied!" UI state. The fire-and-forget async fallback was returning true before the promise resolved, causing false success. - Add document.body null guard to copyWithExecCommand for SSR safety. - Reorder copyToClipboardAsync to try the async Clipboard API first, avoiding unnecessary DOM/focus overhead in Radix focus-trapped dialogs where execCommand always fails anyway. * Restore queryCommandSupported guard and fix async catch path - Restore the queryCommandSupported("copy") guard in copyToClipboard() to match the original contract exactly: when execCommand is entirely unsupported, fall through to fire-and-forget async clipboard write. - Fix copyToClipboardAsync catch block: after navigator.clipboard.writeText rejects, the user-gesture frame is gone, so execCommand will also fail. Return false from catch instead of falling through. The execCommand fallback at the bottom only runs when the Clipboard API is absent (still in user-gesture frame). * Restore execCommand fallback in copyToClipboardAsync catch path The catch block was returning false after clipboard API rejection, based on the incorrect premise that the user-gesture frame is lost after an await. Per the HTML spec, transient user activation IS preserved through promise microtask chains. The real reason execCommand fails in the Radix dialog is the focus trap intercepting textarea.focus(), not gesture loss. For non-dialog callers, execCommand can still succeed after a clipboard rejection. Inside a Radix modal, execCommand returns false harmlessly (focus trap blocks it). * Harden textarea fallback for mobile and continue to async path on failure --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>	2026-04-14 14:22:14 +04:00
Wasim Yousef Said	97eafd999e	studio: fix api-keys access + refresh (#5005 ) * studio: fix api-keys access + refresh * studio: guard v1 in spa fallback	2026-04-13 23:48:51 +04:00
AdamPlatin123	d2fc582840	studio: skip training status/metrics polling when idle (#4988 ) * fix(studio): skip training status/metrics polling when idle Add an early return in the status and metrics setInterval callbacks when the runtime store reports phase === "idle" and hasHydrated is true. Previously these polls fired unconditionally every 3s/5s, generating unnecessary network traffic and console errors when no training was running. * fix(studio): reduce idle polling to 30s instead of stopping entirely Review feedback (PR #4988): completely stopping polling when idle risks permanent UI desync if hydration fails, and misses out-of-band state changes from other clients. Add a 30s background poll that only fires when idle to recover gracefully. * fix: harden idle status polling around hydration and runtime reset --------- Co-authored-by: AdamPlatin123 <AdamPlatin123@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> Co-authored-by: imagineer99 <samleejackson0@gmail.com>	2026-04-13 12:02:12 -07:00
Daniel Han	9a261aec5f	Studio: Expose openai and anthropic compatible external API end points (#4956 ) * Studio: add API key authentication for programmatic access External users want to hit the Studio API (chat completions with tool calling, training, export, etc.) without going through the browser login flow. This adds sk-unsloth- prefixed API keys that work as a drop-in replacement for JWTs in the Authorization: Bearer header. Backend: - New api_keys table in SQLite (storage.py) - create/list/revoke/validate functions with SHA-256 hashed storage - API key detection in _get_current_subject before the JWT path - POST/GET/DELETE /api/auth/api-keys endpoints on the auth router Frontend: - /api-keys page with create form, one-time key reveal, keys table - API Keys link in desktop and mobile navbar - Route registered with requireAuth guard Zero changes to any existing route handler -- every endpoint that uses Depends(get_current_subject) automatically works with API keys. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use actual origin in API key usage examples The examples on /api-keys were hardcoded to localhost:8888 which is wrong for remote users. Use window.location.origin so the examples show the correct URL regardless of where the user is connecting from. * Add `unsloth studio run` CLI command for one-liner model serving Adds a `run` subcommand that starts Studio, loads a model, creates an API key, and prints a ready-to-use curl command -- similar to `ollama run` or `vllm serve`. Usage: unsloth studio run -m unsloth/Qwen3-1.7B-GGUF --gguf-variant UD-Q4_K_XL * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add end-to-end tests for `unsloth studio run` and API key usage Tests the 4 usage examples from the API Keys page: 1. curl basic (non-streaming) chat completions 2. curl streaming (SSE) chat completions 3. OpenAI Python SDK streaming completions 4. curl with tools (web_search + python) Also tests --help output, invalid key rejection, and no-key rejection. All 7 tests pass against Qwen3-1.7B-GGUF. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add /v1/completions, /v1/embeddings, /v1/responses endpoints and --parallel support - llama_cpp.py: accept n_parallel param, pass to llama-server --parallel - run.py: plumb llama_parallel_slots through to app.state - inference.py: add /completions and /embeddings as transparent proxies to llama-server, add /responses as application-level endpoint that converts to ChatCompletionRequest; thread n_parallel through load_model - studio.py: set llama_parallel_slots=4 for `unsloth studio run` path * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make /v1/responses endpoint match OpenAI Responses API format The existing /v1/responses shim returned Chat Completions format, which broke OpenAI SDK clients using openai.responses.create(). This commit replaces the endpoint with a proper implementation that: - Returns `output` array with `output_text` content parts instead of `choices` with `message` - Uses `input_tokens`/`output_tokens` instead of `prompt_tokens`/ `completion_tokens` in usage - Sets `object: "response"` and `id: "resp_..."` - Emits named SSE events for streaming (response.created, response.output_text.delta, response.completed, etc.) - Accepts all OpenAI Responses API fields (tools, store, metadata, previous_response_id) without erroring -- silently ignored - Maps `developer` role to `system` and `input_text`/`input_image` content parts to the internal Chat format Adds Pydantic schemas for request/response models and 23 unit tests covering schema validation, input normalisation, and response format. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: add Anthropic-compatible /v1/messages endpoint (#4981) * Add Anthropic-compatible /v1/messages endpoint with tool support Translate Anthropic Messages API format to/from internal OpenAI format and reuse the existing server-side agentic tool loop. Supports streaming SSE (message_start, content_block_delta, etc.) and non-streaming JSON. Includes offline unit tests and e2e tests in test_studio_run.py. * Add enable_tools, enabled_tools, session_id to /v1/messages endpoint Support the same shorthand as /v1/chat/completions: enable_tools=true with an optional enabled_tools list uses built-in server tools without requiring full Anthropic tool definitions. session_id is passed through for sandbox isolation. max_tokens is now optional. * Strip leaked tool-call XML from Anthropic endpoint content Apply _TOOL_XML_RE to content events in both streaming and non-streaming tool paths, matching the OpenAI endpoint behavior. * Emit custom tool_result SSE event in Anthropic stream Adds a non-standard tool_result event between the tool_use block close and the next text block, so clients can see server-side tool execution results. Anthropic SDKs ignore unknown event types. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Split /v1/messages into server-side and client-side tool paths enable_tools=true runs the existing server-side agentic loop with built-in tools (web_search/python/terminal). A bare tools=[...] field now triggers a client-side pass-through: client-provided tools are forwarded to llama-server and any tool_use output is returned to the caller with stop_reason=tool_use for client execution. This fixes Claude Code (and any Anthropic SDK client) which sends tools=[...] expecting client-side execution but was previously routed through execute_tool() and failing with 'Unknown tool'. Adds AnthropicPassthroughEmitter to convert llama-server OpenAI SSE chunks into Anthropic SSE events, plus unit tests covering text blocks, tool_use blocks, mixed, stop reasons, and usage. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix httpcore GeneratorExit in /v1/messages passthrough stream Explicitly aclose aiter_lines() before the surrounding async with blocks unwind, mirroring the prior fix in external_provider.py (`a41160d3`) and cc757b78's RuntimeError suppression. * Wire stop_sequences through /v1/messages; warn on tool_choice Plumb payload.stop_sequences to all three code paths (server-side tool loop, no-tool plain, client-side passthrough) so Anthropic SDK clients setting stop_sequences get the behavior they expect. The llama_cpp backend already accepted `stop` on both generate_chat_ completion and generate_chat_completion_with_tools; the Anthropic handler simply wasn't passing it. tool_choice remains declared on the request model for Anthropic SDK compatibility (the SDK often sets it by default) but is not yet honored. Log a structured warning on each request carrying a non- null tool_choice so the silent drop is visible to operators. * Wire min_p / repetition_penalty / presence_penalty through /v1/messages Align the Anthropic endpoint's sampling surface with /v1/chat/completions. Adds the three fields as x-unsloth extensions on AnthropicMessagesRequest and threads them through all three code paths: server-side tool loop, no-tool plain, and client-side passthrough. The passthrough builder emits "repeat_penalty" (not "repetition_penalty") because that is llama-server's field name; the backend methods already apply the same rename internally. * Fix block ordering and prev_text reset in non-streaming tool path _anthropic_tool_non_streaming was building the response by appending all tool_use blocks first, then a single concatenated text block at the end — losing generation order and merging pre-tool and post-tool text into one block. It also never reset prev_text between synthesis turns, so the first N characters of each post-tool turn were dropped (where N = length of the prior turn's final cumulative text). Rewrite to build content_blocks incrementally in generation order, matching the streaming emitter's behavior: deltas within a turn are merged into the trailing text block, tool_use blocks interrupt the text sequence, and prev_text is reset on tool_end so turn N+1 diffs against an empty baseline. Caught by gemini-code-assist[bot] review on #4981. * Make test_studio_run.py e2e tests pytest-compatible Add a hybrid session-scoped studio_server fixture in conftest.py that feeds base_url / api_key into the existing e2e test functions. Three invocation modes are now supported: 1. Script mode (unchanged) — python tests/test_studio_run.py 2. Pytest + external server — point at a running instance via UNSLOTH_E2E_BASE_URL / UNSLOTH_E2E_API_KEY env vars, no per-run GGUF load cost 3. Pytest + fixture-managed server — pytest drives _start_server / _kill_server itself via --unsloth-model / --unsloth-gguf-variant, CI-friendly The existing _start_server / _kill_server helpers and main() stay untouched so the script entry point keeps working exactly as before. Test function signatures are unchanged — the (base_url, api_key) parameters now resolve via the new fixtures when running under pytest. * Rename test_studio_run.py -> test_studio_api.py The file is entirely about HTTP API endpoint testing (OpenAI-compatible /v1/chat/completions, Anthropic-compatible /v1/messages, API key auth, plus a CLI --help sanity check on the command that runs the API). None of its tests cover training, export, chat-UI, or internal-Python-API concerns. The old name misleadingly suggested "tests for the unsloth studio run CLI subcommand" — the new name reflects the actual scope. Updates: - git mv the file (rename tracked, history preserved) - Rewrite opening docstring to state the API surface focus and call out what is explicitly out of scope - Update all 4 Usage-block path references to the new filename - LOG_FILE renamed to test_studio_api.log - conftest.py fixture import rewritten from test_studio_run to test_studio_api, plus 7 docstring/comment references updated No functional changes to test logic, signatures, or main(). --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix httpcore asyncgen cleanup in /v1/messages and /v1/completions The earlier fix in `985e92a9` was incomplete: it closed aiter_lines() explicitly but still used `async with httpx.AsyncClient()` / `async with client.stream()` inside the generator. When the generator is orphaned (e.g. client disconnects mid-stream and Starlette drops the StreamingResponse iterator without explicitly calling aclose()), Python's asyncgen finalizer runs the cleanup in a DIFFERENT task than the one that originally entered the httpx context managers. The `async with` exits then trigger httpcore's HTTP11ConnectionByteStream .aclose(), which enters anyio.CancelScope.__exit__ with a mismatched task and raises RuntimeError("Attempted to exit cancel scope in a different task"). That error escapes any user-owned try/except because it happens during GC finalization. Replace `async with` with manual client/response lifecycle in both /v1/messages passthrough and /v1/completions proxy. Close the response and client in a finally block wrapped in `try: ... except Exception: pass`. This suppresses RuntimeError (and other Exception subclasses) from the anyio cleanup noise while letting GeneratorExit (a BaseException, not Exception) propagate cleanly so the generator terminates as Python expects. Traceback observed in user report: File ".../httpcore/_async/connection_pool.py", line 404, in __aiter__ yield part RuntimeError: async generator ignored GeneratorExit ... File ".../anyio/_backends/_asyncio.py", line 455, in __exit__ raise RuntimeError( RuntimeError: Attempted to exit cancel scope in a different task * Expand unsloth studio run banner with SDK base URL and more curl examples Add an explicit "OpenAI / Anthropic SDK base URL" line inside the info box so SDK users don't accidentally copy the bare server URL (without /v1) into their OpenAI/Anthropic SDK constructors and hit 404s. Replace the single /v1/chat/completions curl example with three labeled blocks: chat/completions, Anthropic /messages, and OpenAI Responses. The Anthropic example includes max_tokens (Anthropic SDKs require it even though Studio accepts None). All examples derived from a computed sdk_base_url so the /v1 prefix stays in sync if the public path ever changes. * Hash API keys with HMAC-SHA256 + persistent server secret Stores the HMAC secret in a new app_secrets singleton table. Fixes CodeQL py/weak-sensitive-data-hashing alert on storage.py:74-76, 394-395. Refresh tokens stay on plain SHA-256 (unchanged _hash_token) so existing user sessions survive upgrade — API keys are new on this branch so there is no migration. * Use PBKDF2 for API key hashing per CodeQL recommendation HMAC-SHA256 was still flagged by py/weak-sensitive-data-hashing. Switch to hashlib.pbkdf2_hmac, which is in CodeQL's recommended allowlist (Argon2/scrypt/bcrypt/PBKDF2). Persistent server-side salt stays in app_secrets for defense-in-depth. 100k iterations to match auth/hashing.py's password hasher. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>	2026-04-13 21:08:11 +04:00
Roland Tannous	3bb72a557f	Pin kernels==0.12.1 to avoid huggingface_hub dataclass conflict (#5000 )	2026-04-13 20:42:02 +04:00
Lee Jackson	21a7895959	Studio: Prompt manager, message deletion, and chat UI improvements (#4938 ) * feat(chat): code block styling, delete with Dexie sync, settings sheet polish * style: config save/delete padding fix * fix(studio): centralize dark code-block surface and optimize message sync writes * style: config padding/alignment polish * fix(studio): upsert custom presets without implicit rename-delete * fix settings sheet save state polish * fix settings sheet button widths * fix chat settings presets * fix chat delete sync * fix chat trust remote code flow --------- Co-authored-by: shine1i <wasimysdev@gmail.com>	2026-04-13 16:42:33 +02:00
AdamPlatin123	3b092bcd46	fix(studio): prevent route transition DOM duplication via AnimatePresence (#4987 ) Add mode="wait" and exit={{ opacity: 0 }} to the root AnimatePresence wrapper so outgoing routes fully unmount before incoming routes render. Without this, rapid navigation between Studio/Export/Recipes/Chat caused pages to stack (2x–3x duplication). Co-authored-by: AdamPlatin123 <AdamPlatin123@users.noreply.github.com> Co-authored-by: Wasim Yousef Said <wasimysdev@gmail.com>	2026-04-13 01:38:00 -07:00
Manan Shah	80c12ff1a6	Move gemma4 script (#4994 ) * updating gemma4 script * moving gemma4 script to scripts folder	2026-04-12 23:41:15 -07:00
Manan Shah	db3b3a4d9b	updating gemma4 script (#4992 ) * updating gemma4 script * show errors	2026-04-12 23:11:32 -07:00
Daniel Han	93a24f6698	Add ROCm test suite for PR #4720 (#4824 ) 95 Python tests and 23 shell tests covering ROCm detection, torch index URL selection, hardware flags, prebuilt asset selection, and install pathway logic. All tests use mocks -- no AMD hardware required. Companion to #4720 (AMD ROCm/HIP support).	2026-04-11 04:44:13 -07:00
Daniel Han	53af4a1b3e	Fix Gemma-4 GRPO catastrophic KL divergence with TRL 1.0.0+ (#4934 ) * Fix Gemma-4 GRPO catastrophic KL divergence with TRL 1.0.0+ Two compounding bugs caused Gemma-4 GRPO training to diverge with KL ~10^12 at step 1 against TRL 1.0.0+. Both fixes are runtime patches in the existing TRL/model patch flow and are no-ops for models and TRL versions that are not affected. Fix 1 (rl.py): replace trl.models.utils.disable_gradient_checkpointing with a no-op context manager. TRL 1.0.0+ wraps generation in `with torch.no_grad(), disable_gradient_checkpointing(self.model, ...):` purely to suppress a cosmetic PyTorch warning ("None of the inputs have requires_grad=True"). Inside torch.no_grad() the gradient checkpointing state has no functional effect on the forward pass. On context exit, TRL calls model.gradient_checkpointing_enable() which dispatches to HF's generic implementation and overwrites Unsloth's custom `use_gradient_checkpointing="unsloth"` wrapper, corrupting Gemma-4 forward numerics. Replacing the toggle with a no-op preserves Unsloth's custom GC wrapper across generation passes. The patch walks sys.modules dynamically to also rebind the symbol on every trl.* module that already imported it (grpo_trainer, dpo_trainer, rloo_trainer, dppo_trainer, gfpo_trainer, grpo_with_replay_buffer_trainer, and any future trainer module). Fix 2 (vision.py): inject `final_logit_softcapping` from `config.text_config` into the top-level `model.config` for multimodal models. Unsloth's GRPO trainer reads `getattr(model.config, "final_logit_softcapping", 0)` but for Gemma-4 the attribute lives only on the nested `Gemma4TextConfig`, so the lookup silently defaults to 0 instead of 30. Backwards compatibility: - trl 0.22.2: no `disable_gradient_checkpointing` symbol exists, the patch early-returns via `hasattr` guard. - trl 0.27.1: same broken pattern as 1.0.0, the noop replacement is correct. - trl 1.0.0+: end-to-end verified on `unsloth/gemma-4-E2B-it` GRPO with TRL 1.0.0 and transformers 5.5.0. Step 1 loss=2.46e-08, kl=2.92e-05 (machine zero) vs broken baseline loss=1.37e+06, kl=1.76e+09. - Llama / non-VLM text models: Fix 2 is a no-op (no `text_config`); Fix 1 is functionally identical (Unsloth's GC wrapper is preserved). - Qwen3-VL and other VLMs without final_logit_softcapping: Fix 2 is a no-op (text_config.final_logit_softcapping is None). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply loop 1 review fixes for PR #4934 - Move Fix 2 from vision.py to rl_replacements.py:858 and :1110 at the actual consumer sites. This avoids mutating model.config (which could leak into save_pretrained output) and covers text-only Gemma-4 paths that do not flow through FastBaseModel.from_pretrained. - Revert the vision.py injection block entirely. - Narrow the bare except blocks in patch_trl_disable_gradient_checkpointing from `except Exception:` to `(AttributeError, ImportError)` and `(AttributeError, TypeError)` to avoid masking unrelated bugs. - Add logger.warning_once when the noop patch is installed, matching patch_trl_openenv and patch_trl_vllm_generation convention. - Remove the dead per-module `_unsloth_noop_patched` sentinel check inside the sys.modules walk. The function-level early return already covers this case. - Move `import sys` and `from contextlib import contextmanager` to the module-level imports instead of inside the function body. - Rewrite the ordering comment in PatchFastRL to accurately describe why patch_trl_disable_gradient_checkpointing must run before patch_trl_rl_trainers. - Fix keyword default spacing to match surrounding rl.py style. End-to-end verified: Gemma-4-E2B GRPO on TRL 1.0.0 + transformers 5.5.0 step 1 loss=2.464e-08 kl=2.921e-05, all 5 steps succeed. * Apply loop 2 review fix for PR #4934 Extract the final_logit_softcapping fallback logic into a shared helper `_unsloth_get_final_logit_softcapping(config)` defined in rl_replacements.py and injected into the compiled cache via RL_PRE_ITEMS["grpo_trainer"]. Both call sites (`grpo_trainer__generate_and_score_completions` and `grpo_trainer_compute_loss`) now use the helper instead of inlining the same text_config fallback block twice. Verified: compiled cache file lists the helper at module scope and both consumer sites call it. Gemma-4-E2B GRPO step 1 loss=2.464e-08 kl=2.921e-05 (unchanged), all 5 steps pass. * Apply loop 3 review fix for PR #4934 Extend _unsloth_get_final_logit_softcapping to also fall back to config.get_text_config() for composite configs such as T5GemmaConfig where the text sub-config is not exposed via the text_config attribute but only via the get_text_config() method. Guard against (TypeError, ValueError) raised by ambiguous composite configs, and skip the self-referential case where get_text_config() returns self. This addresses the 6/7 reviewer consensus from the third review loop. Verified: - Helper returns 30.0 for Gemma-4, T5Gemma, and Gemma 1/2 configs. - Helper returns 0 for Llama, Qwen, Mistral, Cohere, Granite, and ambiguous configs raising ValueError. - Gemma-4-E2B GRPO step 1 loss=2.464e-08 kl=2.921e-05 (unchanged). - Llama-3.2-1B GRPO all 5 steps loss=0 kl=0 (no regression). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-10 07:58:15 -07:00
Daniel Han	65b4028560	Pin bitsandbytes to continuous-release_main on ROCm (4-bit decode fix) (#4954 ) * Pin bitsandbytes to continuous-release_main on ROCm for 4-bit decode fix bitsandbytes 0.49.2 on PyPI ships with a broken 4-bit GEMV kernel on every ROCm target: - CDNA (gfx90a / gfx942 / gfx950 = MI210 / MI300X / MI350) via a broken blocksize=32/64 warp64 GEMV kernel whose tests were explicitly skipped with ROCM_WARP_SIZE_64 guards because the code was known broken. - RDNA3 / RDNA3.5 (gfx1100-1103 / gfx1150-1152) via a compile-time BNB_WARP_SIZE macro in the host-side dispatch that resolves to 64 when the multi-arch wheel is compiled with CDNA as the primary target, so num_blocks is wrong on RDNA and half the GEMV output is never written. At decode shape (1, 1, hidden) both bugs produce NaN. Training is unaffected because training shapes are (batch, seq_len > 1, hidden) and never touch the GEMV path. The crash during autoregressive inference surfaces as _assert_async_cuda_kernel in torch.multinomial which on HIP becomes a hard HSA_STATUS_ERROR_EXCEPTION instead of a clean Python error. Both bugs are fixed by bitsandbytes commit 713a3b8 ("[ROCm] Enable blocksize 32 4-bit quantization and GEMV kernels on AMD CDNA", PR #1887, merged 2026-03-09) which replaces BNB_WARP_SIZE with a runtime hipDeviceGetAttribute query and ships a working CDNA warp64 kernel. That commit has not shipped to PyPI yet, but continuous-release_main wheels are published on every push to bnb main via GitHub Releases. Point the ROCm install path at the continuous-release_main x86_64 and aarch64 wheels and fall back to PyPI >=0.49.1 when the pre-release is unreachable (offline installs, firewalled hosts, or architectures not covered by the pre-release wheels). Drop the pin once bnb cuts a 0.50+ tag on PyPI. Verified on MI300X (gfx942, ROCm 7.2, torch 2.10.0+rocm7.1): direct bnb GEMV shape test now returns 0.0078 max abs error at seq_len=1 (no NaN) vs NaN on 0.49.2, and full Unsloth + for_inference + 4-bit sampling generation works end-to-end. NVIDIA / CPU / Mac / Windows paths are unaffected -- the helper is gated on the ROCm torch index and platform.machine() respectively. * Drop Studio ROCm 16-bit fallback now that bnb 0.50+ fixes 4-bit decode The 16-bit fallback in studio/backend/core/inference/inference.py was added as a workaround for a bug that this PR already fixes at the install layer: bitsandbytes <= 0.49.2 has a broken 4-bit GEMV kernel on every ROCm target, which NaNs at decode shape (seq_len=1) and crashes autoregressive inference. bnb PR #1887 (commit 713a3b8, in 0.50.0.dev0+, pinned by install.sh / install_python_stack.py in this PR) restores correct 4-bit decode on MI300X and verified working end-to-end with full Unsloth + for_inference + sampling. Revert the dual code path so ROCm and NVIDIA both go through the normal FastLanguageModel.from_pretrained + for_inference flow: - Remove the conditional `from unsloth import` that skipped the import on ROCm. The monkey-patches it was trying to avoid were never the cause of the crash; bnb 4-bit GEMV was. - Remove the `if _hw_module.IS_ROCM:` branch in load_model that loaded with plain transformers + PEFT + bfloat16, and the `_resolve_fp16_base` helper it relied on. - Remove the `get_chat_template is not None` fallback in _load_chat_template_info -- get_chat_template is now always imported. - Refactor the audio/vision ROCm guard to check _hw_module.IS_ROCM directly instead of the removed _IS_ROCM_ENV global. Audio and vision on ROCm still need separate validation (FastVisionModel and the CSM audio codecs were never tested on HIP) so the guard stays for now. Add _bnb_rocm_4bit_ok() as a runtime safety net for users who install from this PR before the install.sh bnb pin kicks in, or whose installer fell back to the PyPI pin because the continuous- release wheel was unreachable. When the installed bnb is < 0.50 on ROCm, force load_in_4bit=False and strip any -unsloth-bnb-4bit / -bnb-4bit suffix from the model path so a pre-quantized repo resolves to its FP16 sibling instead of pulling bnb back in via the repo's quantization_config. LoRA adapters whose base is a pre-quantized repo on old bnb will still fail inside Unsloth's loader -- the only real fix there is `unsloth studio update`. Verified on MI300X (gfx942, ROCm 7.2, torch 2.10.0+rocm7.1): - HAPPY path (bnb 0.50.0.dev0, load_in_4bit=True, pre-quantized repo): loads in 4-bit via the fixed GEMV, generation returns "Paris." for greedy and sampling. - SAFETY-NET path (simulated old bnb, suffix-stripped to the FP16 sibling, load_in_4bit=False): loads in bf16, generation returns "Paris." for greedy and sampling. Net diff is ~45 lines smaller than the pre-revert state because the entire plain-transformers 16-bit branch is gone. * Cache _bnb_rocm_4bit_ok() with functools.cache load_model() can be called many times in a single session but the bnb version and hardware state cannot change at runtime, so memoise the check. First call is ~1.9 ms (dominated by the lazy `import bitsandbytes` inside the try block), subsequent calls drop to sub-microsecond dict lookups. Zero behavioral change. * Shorten verbose bnb/ROCm comments Comment-only cleanup across install.sh, studio/install_python_stack.py, and studio/backend/core/inference/inference.py. No behavioral change. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove _bnb_rocm_4bit_ok safety net from inference.py Studio's ROCm support is brand new (PR #4720, merged today) and every fresh install pulls the bnb continuous-release_main wheel via install.sh / install_python_stack.py in this same PR. There are no existing ROCm Studio installs carrying bnb < 0.50, so the defensive version-check fallback is guarding against a scenario that cannot actually occur. Delete the helper, the functools import, and the safety-net block -- inference.py now calls FastLanguageModel.from_pretrained directly with no ROCm branching. * Drop audio/vision ROCm guard in inference.py — verified unblocked by bnb fix Vision inference was blocked by the same bnb 4-bit GEMV bug that affected text inference (vision models use bnb 4-bit for the LM backbone). With bnb 0.50+ pinned in install.sh / install_python_stack.py, vision works end-to-end on MI300X: Llama-3.2-11B-Vision-Instruct-unsloth-bnb-4bit loaded in 4-bit via FastVisionModel + for_inference returns a correct answer to a multimodal prompt. Audio (CSM) was never actually blocked by HIP — on this hardware CSM loads and runs its backbone forward pass fine with bnb 0.50, then fails during generate() with a transformers-level kwarg validation mismatch in generation_csm.py (`backbone_last_hidden_state` rejected). That's a pre-existing transformers/CSM integration bug that reproduces identically on NVIDIA, so the ROCm-gated guard was never actually protecting users from anything HIP-specific. Remove the combined audio/vision guard and the now-unused _hw_module import. Also restore the one-word "Can be" in an inline comment that drifted during the earlier comment-shortening pass, so the inference.py delta vs pre-#4720 is exactly the max_seq_length<=0 crash fix and nothing else. * Shorten max_seq_length=0 guard comment to one line --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-10 06:25:39 -07:00
Daniel Han	cad8c6ad05	Add AMD ROCm/HIP support across installer and hardware detection (#4720 ) * Add ROCm detection to install.sh and expand shell tests Add AMD ROCm GPU detection to get_torch_index_url() in install.sh. When nvidia-smi is not found, probe for ROCm via amd-smi, /opt/rocm version file, hipconfig, dpkg-query, and rpm. Includes validation guard for malformed _rocm_tag, Debian epoch prefix stripping, ROCm 7.2+ cap to rocm7.1 index, bitsandbytes AMD install, and status messaging. Shell tests expanded to 23 cases. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Add ROCm torch reinstall support to install_python_stack.py Add _detect_rocm_version() and _ensure_rocm_torch() to detect when a Linux host has ROCm but the venv received CPU-only torch, and reinstall with the correct ROCm wheels. Covers ROCm 6.0 through 7.1 with a 30-second timeout on the torch GPU probe subprocess. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Add ROCm support to llama.cpp prebuilt installer Add has_rocm field to HostInfo, extend detect_host() to probe for ROCm via hipcc/amd-smi/rocm-smi/ROCM_PATH, and route ROCm hosts to upstream prebuilts (Linux ROCm 7.2 prebuilt with source fallback, Windows HIP prebuilt with CPU fallback). Add linux-rocm and windows-hip install kinds to runtime_patterns_for_choice(). Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Add IS_ROCM hardware flag and fix AMD error message Add IS_ROCM flag to hardware.py detect_hardware() (set when torch.version.hip is present, DeviceType stays CUDA). Export IS_ROCM from __init__.py. Add "rocm" key to get_package_versions(). Replace "We do not support AMD" error in tokenizer_utils.py with a helpful message pointing to ROCm installation docs. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Add comprehensive ROCm support test suite (68 tests) Add tests/studio/install/test_rocm_support.py covering all ROCm code paths across install_llama_prebuilt.py, install_python_stack.py, hardware.py, tokenizer_utils.py, and install.sh. All tests use mocks and run without AMD hardware. Covers: asset selection (11), runtime patterns (5), HostInfo (4), ROCm version detection (9), torch reinstall (9), index mapping (8), hardware flag (8), tokenizer message (2), install.sh structure (10), and live regression (1). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden ROCm support: probe error handling, version cap, validation Address review findings from 8 independent reviewers: - Wrap _ensure_rocm_torch() torch probe in try/except for TimeoutExpired and OSError so a hung or broken torch import does not crash the installer (8/8 reviewers flagged this) - Add torch>=2.4,<2.11.0 version cap to the ROCm reinstall path to prevent installing unsupported torch 2.11.0 from the rocm7.1 index - Use with-statement for file reads in _detect_rocm_version() to avoid resource leaks - Handle ROCM_PATH="" correctly (use `or "/opt/rocm"` instead of default parameter to avoid relative path resolution) - Strengthen shell validation guard from rocm[0-9] to rocm[1-9] to reject rocm0.x tags that would produce nonexistent PyTorch index URLs - Switch shell version cap from blocklist to allowlist (rocm6.\|rocm7.0 \|rocm7.1* pass through, everything else caps to rocm7.1) so future ROCm 10+ does not fall through to a nonexistent index - Add sorted() to _ROCM_TORCH_INDEX lookup for defensive ordering - Fix test_probe_timeout_handled: replace zero-assertion test with proper assertions verifying reinstall proceeds after timeout * Clean up rocm_paths list construction in detect_host() Filter None from the ROCM_PATH env var lookup at list construction time instead of relying on the inline `if p` guard in the any() call. * Require actual AMD GPU presence before selecting ROCm paths All 8 reviewers across 2 cycles independently flagged that ROCm detection used toolkit/filesystem hints (hipcc, /opt/rocm, rocm-core) as a proxy for GPU presence, which would misroute CPU-only or NVIDIA hosts that happen to have ROCm tools installed. Now all 3 detection points (install.sh, install_python_stack.py, install_llama_prebuilt.py) probe for an actual AMD GPU before entering the ROCm path: - install.sh: check rocminfo for gfx* GPU names, or amd-smi list for device rows, before version detection - install_python_stack.py: new _has_rocm_gpu() function probes rocminfo and amd-smi list before _ensure_rocm_torch() proceeds - install_llama_prebuilt.py: detect_host() probes rocminfo/amd-smi list instead of just checking tool existence or directory paths Also: - Shell test mock amd-smi now handles "list" subcommand - Python tests updated to mock _has_rocm_gpu where needed - Added test_no_gpu_with_rocm_tools_skips to verify the new guard - Test index lookups now use sorted() to match production code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden hipconfig version parsing and torch probe compatibility - Add parts[1].isdigit() check in hipconfig version parsing to handle versions like "6.3-HIP" where the minor component has non-numeric suffix (strip "-" prefix before int() conversion) - Use getattr() in torch probe subprocess to safely handle old or custom torch builds that may lack torch.version.hip/cuda attributes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Strengthen AMD GPU detection and add NVIDIA precedence guard - Change amd-smi list detection from any-non-empty-output to requiring "gpu" marker in output, matching the shell-side NR>1 check. Prevents false positives from header-only amd-smi list output. - Add nvidia-smi check at the top of _ensure_rocm_torch() so mixed AMD+NVIDIA hosts preserve NVIDIA precedence (matching install.sh and install_llama_prebuilt.py behavior). - Apply the same amd-smi marker fix to install_llama_prebuilt.py detect_host() for consistency. * Add Windows-specific ROCm/HIP detection in detect_host() The previous detect_host() ROCm check used rocminfo and amd-smi list which are Linux-only tools. On Windows, has_rocm would always be False, making the Windows HIP prebuilt path at line 1794 unreachable. Now detect_host() uses platform-specific detection: - Linux: rocminfo (check for gfx GPU names) or amd-smi list - Windows: hipinfo.exe, amd-smi, or amdhip64.dll on PATH This allows Windows AMD users to get the HIP prebuilt binary instead of silently falling through to the CPU prebuilt. * Add AMD ROCm gaps: Mamba/SSM source builds, GPU monitoring, Windows messaging, RDNA expansion - worker.py: Add HIP detection to causal-conv1d/mamba-ssm probe, check for hipcc before ROCm source builds, improve status messages and error reporting, add timeout and uv support for the source build fallback - amd.py: New AMD GPU monitoring module via amd-smi metric --json, mirroring nvidia.py structure (utilization, temperature, power, VRAM) - hardware.py: Branch to amd.py when IS_ROCM is True for GPU utilization, visible GPU queries, and physical GPU count - install_python_stack.py: Detect AMD GPUs on Windows and warn that ROCm-enabled PyTorch must be installed manually - kernels/utils.py: Expand is_rdna() to cover RDNA2 (gfx1030-1032), RDNA3 (gfx1102-1103), RDNA3.5 (gfx1150-1152) alongside existing entries - tests: Add 32 new tests covering all changes (95/95 pass) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden ROCm detection, fix VRAM heuristic, and expand RDNA2 coverage - Windows ROCm detection: validate actual GPU presence via hipinfo/amd-smi output markers instead of just checking tool existence on PATH - _ensure_rocm_torch: validate nvidia-smi actually reports a GPU before giving NVIDIA precedence (fixes AMD-only hosts with stale NVIDIA tools) - amd.py _parse_numeric: handle dict-shaped metric objects from newer amd-smi versions ({"value": 10, "unit": "W"}) and strip MiB/GiB units - amd.py VRAM heuristic: raise threshold from 100k to 10M to correctly handle MI300X (192 GB = 196608 MB) and other high-VRAM GPUs - amd.py visible GPU: use AMD-reported GPU IDs instead of enumerate index so non-dense sets like CUDA_VISIBLE_DEVICES=1,3 report correctly - install.sh: add ROCm <6.0 minimum version guard (no PyTorch wheels exist for older versions); fix rocm7.1* glob to not match rocm7.10+ - is_rdna: add gfx1033-1036 for RDNA2 mobile GPUs (RX 6600M etc.) - worker.py: increase ROCm source build timeout from 600s to 1800s; fix success log message for ROCm source builds - Tests: update mocks for _has_usable_nvidia_gpu, add RDNA2 target asserts * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add HIP_VISIBLE_DEVICES support, unit-aware VRAM parsing, Windows GPU validation - hardware.py: check HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES on ROCm before falling back to CUDA_VISIBLE_DEVICES, so multi-GPU AMD setups with HIP-specific env vars report the correct visible device set - amd.py: add _parse_memory_mb() that reads "unit" from dict-shaped amd-smi JSON (e.g. {"value": 192, "unit": "GiB"}) and converts to MB correctly; fixes MI300X VRAM misreported as 0.19 GB instead of 192 GB - install_python_stack.py: Windows AMD warning now validates actual GPU presence via hipinfo/amd-smi output markers before printing - install_llama_prebuilt.py: restore amdhip64.dll fallback for Windows HIP detection after tool-based checks, so Windows HIP installs without CLI tools on PATH are still detected - hardware.py: fix IS_ROCM comment to accurately describe its role * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix HIP_VISIBLE_DEVICES empty-string handling in GPU visibility spec Use explicit None checks instead of Python `or` operator when reading HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES, so that an empty string ("") is correctly honored as "no visible GPUs" rather than silently falling through to CUDA_VISIBLE_DEVICES on mixed ROCm+CUDA systems. * Fix IS_ROCM test assertion for multi-line formatting * Cap torchvision/torchaudio versions, remove amdhip64.dll fallback, fix visible GPU count - Cap torchvision<0.26.0 and torchaudio<2.11.0 alongside torch<2.11.0 in both install.sh and install_python_stack.py to prevent resolver from selecting incompatible companion packages from ROCm wheel index - Remove amdhip64.dll fallback in Windows ROCm detection (DLL presence without hipinfo/amd-smi is not proof of GPU existence) - Fix get_visible_gpu_count() to use _get_parent_visible_gpu_spec() which respects HIP_VISIBLE_DEVICES/ROCR_VISIBLE_DEVICES on ROCm hosts * Attribute is_rdna() RDNA2/3/3.5/4 expansion to PR #4428 The is_rdna() expansion to cover RDNA2 (gfx1030-1036), RDNA3 (gfx1100-1103), RDNA3.5 (gfx1150-1152), and RDNA4 (gfx1200-1201) architectures is based on the original work from PR #4428. Co-authored-by: GoldenGrapeGentleman <yueyuan@amd.com> Co-authored-by: billishyahao <bill.he@amd.com> * Support AMD Radeon for studio (#4770) Co-authored-by: Iswarya Alex <iswarya.alex@amd.com> * Remove ROCm test files from main PR Move test_rocm_support.py and shell test additions to a separate PR to keep the main ROCm support PR focused on implementation changes. * Fix installer and hardware detection issues for PR #4720 - Fix empty _tri_arg passed to uv pip install in Radeon path (causes "Empty field is not allowed for PEP508" error) - Fix Radeon fallback: use ROCm index instead of CPU-only when repo.radeon.com is unreachable (TORCH_INDEX_URL already has ROCm) - Use $TORCH_CONSTRAINT in fallback paths instead of hardcoded strings - Fix _pick_radeon_wheel: relax suffix to match manylinux_2_28_x86_64 wheels (AMD Radeon repo does not use bare linux_x86_64 platform tag) - Fix IS_ROCM export: use __getattr__ so callers always see the live value after detect_hardware() runs - Fix apply_gpu_ids: set HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES on ROCm so _get_parent_visible_gpu_spec picks up narrowed GPU set - Fix _parse_memory_mb: distinguish GB (1000 MB) from GiB (1024 MiB) - Add amd-smi version as a fallback in _detect_rocm_version - Fix trailing whitespace and missing newline at EOF in install.sh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix GPU detection false positives and add missing health groups - Fix _has_rocm_gpu() false positive: require "GPU: <number>" data rows from amd-smi list, not just header containing "gpu" - Apply same fix in detect_host() in install_llama_prebuilt.py - Add runtime_payload_health_groups for linux-rocm and windows-hip so partial/corrupt ROCm/HIP prebuilt installs are properly detected - Add bitsandbytes install to Radeon fallback paths (was only in the success path, skipped when repo.radeon.com was unreachable) - Keep DEVICE/CHAT_ONLY as direct imports in __init__.py (matching main) and only use __getattr__ for IS_ROCM * Fix _ensure_rocm_torch and Windows AMD warning false positives - _ensure_rocm_torch: only skip when HIP is already present, not for CUDA builds (which are unusable on AMD-only hosts). Fixes the case where a venv has a stale CUDA wheel and the repair step is skipped. - Windows AMD warning: use GPU data row check (same as Linux fix) to avoid false positives from amd-smi list header-only output. * Fix amd-smi GPU detection for GPU[N] output format Older amd-smi versions output "GPU[0] : Card series: ..." instead of "GPU: 0". The regex now matches both "GPU: <digit>" and "GPU[<digit>" formats to detect actual GPU data rows. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden AMD GPU detection against false positives - install.sh: replace weak amd-smi list check (awk 'NR>1 && NF') with strict pattern matching GPU data rows (/^GPU[[:space:]][:\[]/) - All files: reject rocminfo gfx000 (CPU HSA agent) by requiring gfx[1-9] instead of gfx[0-9] in the rocminfo GPU probe - Fixes false positives on hosts with ROCm tools but no AMD GPU Remove duplicate comment from pre-commit merge * Refactor: deduplicate AMD detection, consolidate bitsandbytes, clean up imports - Extract _has_amd_rocm_gpu() shell function to avoid duplicating the rocminfo/amd-smi GPU detection logic in get_torch_index_url and the Radeon auto-detect block - Consolidate bitsandbytes install into a single case block after torch install (was duplicated 4 times across Radeon success/fallback paths) - Move math and re imports to top of amd.py (were inline in functions) - Add _smi_query() helper in hardware.py to centralize IS_ROCM backend selection for get_gpu_utilization and get_visible_gpu_utilization Addresses Gemini code review suggestions. * Fix VRAM parsing for string values and GB/GiB consistency - Extract unit from string-valued VRAM fields (e.g. "192 GiB") so _parse_memory_mb correctly applies the unit multiplier instead of treating the value as bare MB - Treat GB and GiB identically (both as binary x1024) since GPU tools including amd-smi use binary units even when labeling them "GB" - Fixes incorrect VRAM reporting on MI300-class cards (was showing ~0.19 GB instead of 192 GB for string-valued outputs) * Add --no-cache to uv for ROCm HIP source builds Avoid stale cache artifacts from partial HIP source builds when uv is used for causal-conv1d/mamba-ssm compilation on ROCm. The pip path already uses --no-cache-dir; this adds the uv equivalent (--no-cache) only when is_hip is True. * Fix critical: initialize _amd_gpu_radeon before case block _amd_gpu_radeon was only set inside the /rocm) case arm, so on NVIDIA/CPU/macOS paths where TORCH_INDEX_URL does not contain "rocm", the variable was unbound. With set -u (nounset) enabled, this crashes the installer for every non-AMD user. Move initialization to before the case block so it is always defined. * Fix Windows AMD: route has_rocm hosts to HIP prebuilt path resolve_release_asset_choice was selecting windows-cpu for all Windows x86_64 hosts including those with has_rocm=True. Windows AMD users should fall through to resolve_upstream_asset_choice which tries the HIP prebuilt first. Add "not host.has_rocm" guard to the published windows-cpu selection. * Harden ROCm detection, Radeon wheel fallback, and HIP visibility Addresses review findings from parallel reviewers on PR #4720: - install.sh: add _has_usable_nvidia_gpu() helper requiring nvidia-smi -L to actually list a GPU before treating the host as NVIDIA. Fixes the stale-nvidia-smi-on-PATH regression where AMD-only hosts fell into the CUDA branch. - install.sh: fix hipconfig awk blocks to propagate a non-zero exit code when the output is not a recognisable version string, so the \|\|-chain continues to dpkg-query / rpm instead of terminating early. - install.sh: fail-closed on Radeon wheel fallback. When torch, torchvision or torchaudio is missing from the Radeon repo for the active Python tag, fall back to the standard ROCm index instead of silently mixing Radeon wheels with PyPI defaults. Quote all wheel arguments individually so wheel filenames cannot be word-split or glob-expanded. - install_llama_prebuilt.py: detect_host() now requires nvidia-smi -L to list a GPU before setting has_physical_nvidia. Routes AMD ROCm hosts with a broken leftover nvidia-smi to the ROCm path instead of misclassifying them as NVIDIA. - install_llama_prebuilt.py: scan upstream assets for any rocm-<version> prebuilt instead of hard-coding rocm-7.2, so ROCm 6.x / 7.0 / 7.1 / 7.3+ users pick up a matching upstream prebuilt when one exists. - install_llama_prebuilt.py: validate_server() adds --n-gpu-layers 1 for linux-rocm and windows-hip hosts, so new HIP prebuilts are preflighted on the GPU path instead of passing validation on CPU only. - install_llama_prebuilt.py: restore the published windows-cpu fallback for AMD Windows hosts without a HIP prebuilt so hash-approved bundles are still preferred over the raw upstream CPU asset. - install_python_stack.py: drop the /opt/rocm / hipcc gate in _ensure_rocm_torch() and rely on _has_rocm_gpu(). Runtime-only ROCm installs (package-managed minimal installs, Radeon software) that ship amd-smi / rocminfo without hipcc can now repair a CPU-only venv via "unsloth studio update". Adds an explicit IS_WINDOWS / IS_MACOS guard. - studio/backend/utils/hardware/amd.py: honour HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES / CUDA_VISIBLE_DEVICES in get_primary_gpu_utilization(). A process restricted to GPU 2 now reports metrics for GPU 2 instead of physical GPU 0. Tighten the plain bytes unit detection to an explicit allowlist. - studio/backend/utils/hardware/hardware.py: route get_backend_visible_gpu_info()'s backend_cuda_visible_devices field through a helper that reads HIP_VISIBLE_DEVICES on ROCm. Drop the unconditional "(rocm=False)" suffix in apply_gpu_ids() logs. * Fix round 2 regressions: ROCm validate_server and Windows HIP routing Follow-up to `810b833b` addressing review findings on the first round of hardening commits: - install_llama_prebuilt.py validate_server: gate --n-gpu-layers on the resolved install_kind instead of host.has_rocm. AMD Windows hosts without a HIP prebuilt fall back to windows-cpu and must not be validated with GPU layers; thread install_kind through from the caller. - install_llama_prebuilt.py resolve_release_asset_choice: reinstate the "not has_rocm" guard on the published windows-cpu bundle so AMD Windows hosts reach resolve_upstream_asset_choice() where the new HIP prebuilt path lives. Prefer a published windows-hip bundle first when one exists, fall through to upstream HIP + upstream CPU otherwise. - install_llama_prebuilt.py detect_host: also set has_physical_nvidia when the secondary --query-gpu block confirms a working NVIDIA GPU, so older nvidia-smi versions without -L support do not silently skip the Linux diagnostics that key off has_physical_nvidia. - install_llama_prebuilt.py: drop redundant "import re as _re" / "import re as _re_rocm" local aliases in favour of the existing top-level "import re". - install_python_stack.py _ensure_rocm_torch: run the AMD bitsandbytes install unconditionally after the HIP-torch probe so "unsloth studio update" on venvs that already have ROCm torch still gains the AMD bitsandbytes build. - install.sh: add a non-x86_64 early-exit to get_torch_index_url() so aarch64 / arm64 Linux hosts do not hit the ROCm wheel index (PyTorch only publishes ROCm wheels for linux_x86_64). - install.sh: add bitsandbytes install to the migrated-environment branch so upgrades pick it up for ROCm hosts instead of only the fresh-install path. - install.sh: in the Radeon wheel path, pass version constraints + --no-index --find-links to uv instead of explicit wheel URLs so a version-compatible torch / torchvision / torchaudio triple is resolved, rather than picking the highest-version wheel for each package independently. - studio/backend/utils/hardware/amd.py _first_visible_amd_gpu_id: fall through to lower-priority visibility env vars when the first entry is malformed (leading comma, all-whitespace first token) instead of silently returning GPU 0. * Fix round 3 findings: x86_64 guard, ROCm version clip, Radeon deps Address issues surfaced by the round 3 reviewers on top of `8636fa63`: - install_python_stack.py _ensure_rocm_torch: add the same `x86_64` guard that install.sh already has. Linux aarch64 / arm64 ROCm hosts must skip the repair path entirely; PyTorch only publishes ROCm wheels for linux_x86_64, and without this guard `unsloth studio update` aborts with a missing-wheel error on non x86_64 hosts. - install_llama_prebuilt.py resolve_upstream_asset_choice: add a best-effort _detect_host_rocm_version() helper (reading /opt/rocm/.info/version, amd-smi version, hipconfig --version) and filter rocm_candidates to entries whose major.minor is <= host version. Falls back to the newest candidate only when no compatible one exists, so a ROCm 6.4 host downloads rocm-6.4 instead of being handed the numerically newest rocm-7.2 bundle (which fails preflight and forces a source build). - install.sh: remove the round 2 --no-index switch from the Radeon wheel branch. --no-index forced uv to ignore PyPI entirely, which broke transitive dependency resolution (filelock, sympy, networkx, jinja2, fsspec, setuptools, typing-extensions, ...) on a fresh venv. Restore the round 1 explicit wheel URL invocation but add a torch / torchvision / torchaudio version-pair sanity check so a mismatched trio (e.g. torch 2.9.1 + torchvision 0.23.0 + torchaudio 2.9.0) falls back to the standard ROCm index instead of installing a broken combination. - install_python_stack.py _ensure_rocm_torch: restructure the "tag is None" path so it no longer short-circuits the bitsandbytes install. On a ROCm runtime older than anything in _ROCM_TORCH_INDEX, print the "no wheel" warning but still run the AMD bitsandbytes install. - studio/backend/core/training/worker.py: restore the pre-PR "no timeout" behaviour for non-HIP causal-conv1d / mamba-ssm source builds. The round 2 "timeout = 1800 if is_hip else 300" cap aborts slow non-HIP builds (Linux aarch64, unsupported torch/CUDA combos) after 5 minutes; omit timeout for the non-HIP branch so the cap only applies to ROCm source builds. * Fix round 4 findings: apply_gpu_ids env inheritance, Radeon X.Y, bitsandbytes gate Address remaining issues surfaced by the round 4 reviewers: - studio/backend/utils/hardware/hardware.py apply_gpu_ids: mirror the selection into HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES whenever the caller already had a ROCm visibility env var set, not only when IS_ROCM has already been set by detect_hardware(). Training and inference workers call apply_gpu_ids() before detect_hardware() runs, so the old guard would leave a forked ROCm worker with a stale HIP_VISIBLE_DEVICES mask that no longer matched the narrowed CUDA_VISIBLE_DEVICES selection. - install.sh get_radeon_wheel_url: accept X.Y ROCm versions in addition to X.Y.Z. The `/opt/rocm/.info/version` file and some hipconfig versions report only two components, and the Radeon repository publishes both rocm-rel-X.Y.Z/ and rocm-rel-X.Y/ directories, so treating X.Y as invalid caused Radeon hosts to fall back to the generic ROCm index even when a matching AMD wheel set existed. - install_python_stack.py _ensure_rocm_torch: only install the AMD bitsandbytes build when the venv actually has a ROCm-compatible torch (either already present or just installed by this function). Previously the bitsandbytes install ran unconditionally, which could leave an AMD bitsandbytes layered on top of a CPU/CUDA torch on hosts where the ROCm runtime is older than any entry in _ROCM_TORCH_INDEX. Also add --force-reinstall so an existing CPU/CUDA bitsandbytes is replaced by the AMD build during upgrades. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix gemini findings: amd-smi metric envelope validation and dict-wrapped GPU id Two medium-severity defensive fixes from the gemini-code-assist review on the AMD monitoring backend: 1. _extract_gpu_metrics may return a dict where every value is None when amd-smi succeeds (zero exit) but the JSON envelope contains no usable fields (error response, unsupported card). The new _has_real_metrics helper lets get_primary_gpu_utilization surface available:False and lets get_visible_gpu_utilization skip ghost device rows so the UI does not render placeholder cards with empty numbers. 2. Newer amd-smi versions wrap scalar fields as {"value": 0, "unit": "none"}, including the per-GPU id. The previous int(raw_id) call silently fell back to the enumeration index in that case, losing the real GPU id. Routing raw_id through the existing _parse_numeric helper handles bare ints, floats, strings, and the dict shape uniformly, with a debug log on parse failure. * Fix gemini round 2 findings: explicit length guard on ROCm version file parser Both _detect_rocm_version (install_python_stack.py) and _detect_host_rocm_version (install_llama_prebuilt.py) read /opt/rocm/.info/version or $ROCM_PATH/lib/rocm_version, split on "." and unconditionally accessed parts[1]. The surrounding broad `except Exception: pass` already swallowed the resulting IndexError, so a one-component file like "6\n" did fall through to the next detection source -- but the control flow relied on exception handling instead of an explicit check. Add `if len(parts) >= 2:` guards in both helpers so the loop falls through on its own without raising. Behaviour is unchanged for the common multi- component case; the previously-silent IndexError path becomes an explicit no-op. * Fix gemini round 3: include has_rocm in validate_server fallback path When validate_server is called without an explicit install_kind (older call sites that have not been updated), the fallback was only enabling --n-gpu-layers for NVIDIA and macOS arm64 hosts. AMD ROCm Linux hosts fell through to the CPU validation path even though the prebuilt being exercised was a HIP binary. Add host.has_rocm to the fallback expression so the GPU offload flag is applied consistently with the install_kind=='linux-rocm' / 'windows-hip' branches above. * Fix gemini round 4: remove risky bytes-vs-MB heuristic in _parse_memory_mb The previous heuristic divided any bare number above 10_000_000 by 10241024 on the assumption that large unit-less values were bytes. This misclassified small VRAM allocations: 5 MB of used VRAM reported as 5_242_880 bytes without a unit would be taken at face value and render as 5_242_880 MB (~5 TB) in the monitoring UI. Modern amd-smi always provides explicit units (MiB/GiB dict form), and legacy amd-smi returns bare numbers in MB -- the heuristic never had a real workload to handle. Drop it and default to MB for bare numeric input, keeping the existing unit-aware branches for dict / string inputs unchanged. The unrelated gemini suggestion to "default minor to 0" in the amd-smi version awk parser was intentionally NOT applied: rocm7.0 and rocm7.1 ship different wheel sets, so silently substituting 0 for a missing minor could install the wrong wheels. The existing reject-and-fall-through behaviour is safer. Fix gemini round 5: POSIX compliance and leading-comma visibility parsing Three medium findings from gemini-code-assist addressed in this commit: 1. _pick_radeon_wheel used grep -o and sort -V, both GNU extensions that are not in POSIX and break on BSD/BusyBox coreutils. install.sh has a #!/bin/sh shebang so the whole pipeline was rewritten as a single awk script that extracts all href="..." hits on each line, filters to wheels matching the package prefix and python tag, and picks the newest version via zero-padded lexical comparison. No external sort or grep is needed. 2. _first_visible_amd_gpu_id in the AMD monitoring backend treated a leading comma (e.g. HIP_VISIBLE_DEVICES=",1") as "fall through to the next env var", which is surprising given the clear intent to narrow to device 1. Filter empty tokens after the split and return the first real one. An all-commas value ("," / ",,,") still falls through because no real tokens exist; the empty-string and "-1" explicit-zero cases are unchanged. The unrelated amd-smi version awk parser suggestion was not applied (see round 4 commit message for rationale: defaulting a missing minor to 0 could silently install the wrong ROCm wheel set). * Fix 20-reviewer.py findings: base drift, Radeon %2B, dpkg/rpm fallback, bnb, backend label Consolidated fix batch from a 20-parallel reviewer.py run on the current head. Each fix is drawn from a high-consensus finding and addresses a real bug or feature gap, not a stylistic preference. 1. install.sh: bump `unsloth>=2026.4.2` -> `unsloth>=2026.4.4` at five call sites so this branch no longer regresses main's version floor (main bumped to 2026.4.4 in #4876). Without this, merging 4720 would silently downgrade the minimum version pin for fresh installs. 2. install.sh: URL-decode Radeon wheel names before extracting the torch / torchvision / torchaudio version strings. Real wheel URLs from repo.radeon.com are percent-encoded ("torch-2.10.0%2Brocm7.2.0...") so the previous `[+-]` terminator in the sed regex never matched, `_torch_ver` stayed empty, `_radeon_versions_match` stayed false, and every Radeon consumer install silently fell back to the generic ROCm index. Now decode %2B -> + first, then extract, then validate. 3. install.sh: the two AMD bitsandbytes install lines were running `uv pip install "bitsandbytes>=0.49.1"` without `--force-reinstall`, so upgrades where the venv already has a CPU/CUDA bitsandbytes satisfying the constraint would keep the stale non-AMD wheel. Add `--force-reinstall --no-cache-dir` to both call sites, matching the pattern already used in install_python_stack.py::_ensure_rocm_torch. 4. install_python_stack.py and install_llama_prebuilt.py: add `dpkg-query -W rocm-core` and `rpm -q rocm-core` fallbacks to the Python-side ROCm version detectors so they match the chain in install.sh::get_torch_index_url. Package-managed ROCm installs (Debian/Ubuntu/RHEL/Fedora distro packages) can expose GPUs via rocminfo/amd-smi but still lack /opt/rocm/.info/version, hipconfig, or amd-smi `version` output -- without these fallbacks, `unsloth studio update` on such hosts returned None and skipped the ROCm torch repair. Also strip the dpkg epoch prefix ("1:6.3.0-1") before parsing so epoch-annotated packages parse correctly. 5. hardware.py: add a `_backend_label(device)` helper that returns "rocm" when IS_ROCM is set and the device is DeviceType.CUDA, and use it for every `"backend": ...` emission in JSON responses served to the Studio frontend. Internally we still represent ROCm hosts as DeviceType.CUDA (ROCm torch reuses the whole torch.cuda.* API surface), but the user-facing API now correctly reports "rocm" on AMD boxes instead of labeling them as "cuda". All 250 simulation scenarios pass (was 233 before this batch: added 17 new regression tests covering the version pin, %2B decoding, bnb force-reinstall flags, dpkg/rpm fallback presence, and the _backend_label helper's four-way truth table). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix gemini round 6 + URL audit: amd.py defensive checks, rocm6.5+ clip to 6.4 Two rounds of fixes in one commit, plus a full URL audit of every PyPI / download.pytorch.org / repo.radeon.com reference the PR introduces. amd.py (4 medium gemini findings on commit `b3627bc2`): 1. _extract_gpu_metrics used `and vram_total_mb` as part of the vram_util gate. The follow-up `vram_total_mb > 0` already handles the division guard, but the truthiness check was redundant and slightly surprising for a 0.0 valid value. Replace with explicit `is not None and > 0` for both vram_util and power_util. 2. get_physical_gpu_count called `data.get("gpu", ...)` without guarding for non-dict envelopes. A scalar / string JSON response from amd-smi would raise AttributeError. Add an isinstance(data, dict) check and return None for unexpected shapes. 3. get_visible_gpu_utilization had the same .get() exposure on the outer envelope. Rewrite the gpu_list extraction as an explicit list/dict/else cascade so a malformed scalar envelope produces gpu_list=[data] and continues without raising. 4. The same function's per-entry loop also called gpu_data.get() on whatever was inside gpu_list. If a scalar ever leaks into the list (directly or via the previous fix's fallback), _extract_gpu_metrics would raise on the first .get() inside the helper. Skip non-dict entries in the loop before extracting metrics. install.sh (URL audit finding, previously flagged by 20-reviewer as #13): 5. get_torch_index_url used `rocm6.` in the rocm tag case statement, which matched rocm6.5 and rocm6.6 and emitted download.pytorch.org/whl/rocm6.5 -- which returns HTTP 403 because PyTorch only publishes rocm 5.7, 6.0-6.4, 7.0-7.2. Enumerate the supported 6.x minors explicitly and add a rocm6. fallback branch that clips to rocm6.4 (the last supported 6.x wheel set). URL audit results (all URLs PR 4720 references): - 14/14 download.pytorch.org/whl/{cpu,cu118,cu124,cu126,cu128,cu130, rocm6.0..6.4,rocm7.0..7.2} return HTTP 200. - 9/9 repo.radeon.com/rocm/manylinux/rocm-rel-{5.7,6.0,6.1,6.2,6.3, 6.4,7.0,7.1,7.2}/ return HTTP 200. - X.Y.Z patch directories exist for 7.0.2, 7.1.1, 7.2.1 but NOT for 6.3.0, 6.4.0, 6.2.1 -- install.sh already handles this via the X.Y.Z -> X.Y fallback sed in the Radeon wheel install block. - Docs links (rocm.docs.amd.com, docs.unsloth.ai AMD guide) and the llama.cpp GitHub releases API endpoint all return 200. Test suite: 255 -> 258. New regression coverage: - U17: get_physical_gpu_count tolerates scalar amd-smi envelope - U18: get_visible_gpu_utilization tolerates scalar envelope - U19a-c: vram_util / power_util return None on zero total, but vram_total_gb still echoes 0.0 (not None) - A_rocm{6.5,6.6,6.9}_clips_to_rocm64: install.sh clips unsupported 6.x minors to rocm6.4 instead of producing a 403 index URL * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix reviewer.py round 2: tokenizer AMD multi-GPU, --no-torch bnb, main.py backend label Three high-confidence findings from a second 20-parallel reviewer.py run on commit `7effb3ae`. Triaged 15 total findings and applied the three that were confirmed as real bugs; the rest were either false positives (e.g. "migrated AMD venv not repaired" -- _ensure_rocm_torch runs downstream via setup.sh regardless), design decisions (e.g. visibility mask env vars not consulted in installer detection), or edge cases the existing fallback logic already handles. 1. unsloth/tokenizer_utils.py [6/20]: the multi-GPU guard's shell probe runs `nvidia-smi --query-gpu=memory.used`, catches the failure, then only raises if `torch.cuda.is_available()` is False. On ROCm torch, torch.cuda.is_available() returns True (ROCm reuses the torch.cuda.* API), so the guard becomes dead code on AMD hosts and multi-GPU AMD setups slip through even though unsloth does not support them yet. Add a torch.cuda.device_count() > 1 fallback inside the except so AMD multi-visible-device setups are flagged consistently with the original CUDA memory check. 2. install.sh [1/20]: the fresh-install bitsandbytes block for AMD ROCm ran unconditionally when TORCH_INDEX_URL matched `/rocm`, even when SKIP_TORCH=true (from --no-torch or Intel Mac auto-detect). A user running `install.sh --no-torch` on an AMD host would still pull in bitsandbytes despite explicitly asking for GGUF-only mode. Wrap the case block in an outer `[ "$SKIP_TORCH" = false ]` guard. 3. studio/backend/main.py [3/20]: the /api/system endpoint returned `"device_backend": get_device().value`, which is "cuda" on ROCm hosts (because ROCm torch piggybacks on torch.cuda). Other endpoints (hardware.py) already use the _backend_label helper which swaps "cuda" -> "rocm" when IS_ROCM. Route /api/system through the same helper so the Studio UI reports the backend consistently across all endpoints. 4. studio/backend/tests/test_utils.py: update test_backend_matches_device to call _backend_label(get_device()) instead of raw get_device().value so the test matches the new contract and still passes on CUDA hosts. Tests: 258 -> 261. New regression coverage: - X08 main.py /api/system uses _backend_label - X09 tokenizer multi-GPU guard has device_count() fallback - X10 fresh-install bnb case block gated on SKIP_TORCH=false * fix: prevent bitsandbytes from overwriting ROCm torch with CUDA wheels During install, bitsandbytes was installed without --no-deps, causing uv to resolve torch from PyPI (CUDA build) and silently overwrite the ROCm wheels that were just installed in the previous step. This happened in three places: - install.sh: bitsandbytes install in both migrated and fresh paths - install_python_stack.py: bitsandbytes install inside _ensure_rocm_torch() Additionally, multiple install steps in install_python_stack.py (extras, overrides, studio deps) can pull in CUDA torch via transitive dependencies. A final _ensure_rocm_torch() call at the end of the install sequence ensures ROCm torch is always in place at runtime. All changes are gated behind ROCm-specific conditions and do not affect NVIDIA, CPU-only, macOS, or Windows install paths. Tested on AMD Instinct MI300X VF with ROCm 7.2.0 -- confirms torch==2.10.0+rocm7.1 with HIP 7.1.25424 after install. * fix: ROCm inference fallback -- skip Unsloth patching and bnb 4-bit on HIP On AMD ROCm (HIP), two issues prevent the normal Unsloth inference path: 1. Unsloth's global monkey-patching of transformers model classes (LlamaRotaryEmbedding, attention modules) triggers _assert_async_cuda_kernel crashes on HIP during generation. Training uses different code paths and works fine. 2. bitsandbytes 4-bit matmul kernels also trigger HIP assertion failures on MI300X (CDNA3 / gfx942), even without Unsloth patching. This commit adds a ROCm-specific inference fallback that: - Skips importing Unsloth at module level (prevents global patching) - Loads models in 16-bit with plain transformers + PEFT instead - Resolves pre-quantized model names (e.g. "xxx-bnb-4bit" -> "xxx") since pre-quantized HF repos still trigger bnb codepaths - Guards get_chat_template calls (unavailable without Unsloth import) - Fixes max_seq_length=0 being passed to from_pretrained (GGUF semantics don't apply to transformers path) The NVIDIA path is completely unchanged -- Unsloth import and for_inference() optimization remain active. GGUF inference (via llama-server/HIP) is unaffected since it never imports Python model classes. AMD GPUs typically have large VRAM (e.g. 192GB on MI300X) so 16-bit loading is practical for inference. Tested on AMD Instinct MI300X VF (ROCm 7.2, HIP 7.1.25424): - Simple generation: PASS - Compare mode (base vs finetuned): PASS - GGUF inference + tool calling: PASS (unaffected by this change) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: guard audio/vision inference on ROCm, remove unused import - Add clear RuntimeError for audio/vision model inference on ROCm (these paths use Unsloth's FastModel/FastVisionModel which would crash on HIP; GGUF inference is the supported path on AMD) - Remove unused `import os as _os` from the ROCm changes * fix: amd-smi parsing for newer output format (gpu_data wrapper, mem_usage, temperature) amd-smi on recent ROCm versions (7.x) wraps metric output in a {"gpu_data": [...]} envelope instead of returning a raw list. This caused get_primary_gpu_utilization() and get_visible_gpu_utilization() to fail silently (returning available=False) because the GPU data dict was never unwrapped. Additionally: - VRAM data moved from "vram" to "mem_usage" with "total_vram" / "used_vram" keys. Added fallback key lookup. - Temperature "edge" sensor returns "N/A" on MI300X VF; the previous dict.get() chain returned the "N/A" string instead of falling through to "hotspot". Changed to a loop that checks each key until a parseable value is found. Tested on AMD Instinct MI300X VF (ROCm 7.2, amd-smi 24.x): - GPU utilization: 0% (idle), up to 100% during training - Temperature: 40-44C (from hotspot sensor) - VRAM: 0.28/191.69 GB (idle) - Power: 158-211W draw * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Bug fix detecting radeon (#4940) * Bug fix detecting radeon * Expanding GPU target for gfx1100* * Generalize gfx family-prefix filter to cover gfx10/gfx12 as well rocminfo on ROCm 6.1+ emits LLVM generic-family ISA lines alongside the specific GPU (e.g. gfx11-generic next to gfx1100). The outer grep captures the bare family prefix from the generic line, and passing that to -DGPU_TARGETS breaks the HIP build because clang only accepts specific gfxNNN ids. The previous filter only special-cased gfx11. Generalize it so any bare 2-digit family prefix (gfx10, gfx11, gfx12, ...) is dropped whenever a specific sibling target is present in the same list. No real AMD GPU has a 2-digit gfx id, so the filter can only ever drop family prefixes and never a real target. Covers the existing gfx11 cases unchanged, and extends the same fix to gfx10-1-generic / gfx10-3-generic (RDNA1/2) and gfx12-generic (RDNA4), which would otherwise hit the same build failure on newer rocminfo. --------- Co-authored-by: Iswarya Alex <iswarya.alex@amd.com> Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> --------- Co-authored-by: Eda Z <eda.zhou@amd.com> Co-authored-by: GoldenGrapeGentleman <yueyuan@amd.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: billishyahao <bill.he@amd.com> Co-authored-by: Iswarya Alex <47045679+iswaryaalex@users.noreply.github.com> Co-authored-by: Iswarya Alex <iswarya.alex@amd.com> Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-04-10 01:56:12 -07:00
Roland Tannous	33503ea248	Revert "updated models template mappers. added lfm2.5vl450m to transformers 5…" (#4945 ) This reverts commit `bcf4fd6bd3`.	2026-04-09 23:14:57 -07:00
Roland Tannous	bcf4fd6bd3	updated models template mappers. added lfm2.5vl450m to transformers 5… (#4939 ) * updated models template mappers. added lfm2.5vl450m to transformers 5.3.0 whitelist * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-09 23:36:42 +04:00
Ricardo-M-L	d5525e8bbb	fix: check find() return value before adding offset in try_fix_tokenizer (#4923 ) * fix: check find() return value before adding offset in try_fix_tokenizer The `str.find()` result was checked for -1 only after adding `len(find_text)`, turning the guard into dead code. When the substring is absent, `start` becomes `len(find_text) - 1` (a positive number), so the `if start == -1: continue` never triggers and the subsequent slice extracts garbage from the tokenizer string. Split the find and offset into two steps so the -1 check works correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add defensive guards for token_id None and end find() returning -1 - Skip loop iteration early when token_id is None to avoid constructing a find_text that can never match valid JSON - Guard end = tokenizer_string.find('",', start) against -1 to prevent silent garbage extraction from malformed tokenizer strings * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-09 06:15:46 -07:00
Lee Jackson	dc16e0c65b	Studio: keep chat input visible and fix compare pane clipping (#4924 ) * fix(chat): sticky composer bar in thread * fix(chat): fix compare pane clipping * fix(chat): tighten scroll-to-bottom placement and compare footer spacing * Fix TypeScript build break and clean up ViewportFooter classes - Remove unused `compact` prop from ThreadScrollToBottom call site (component is FC with no props, passing it caused TS2322) - Extract shared classes (sticky, bottom-0, z-20, bg-transparent) from ternary branches into the unconditional className string - Restore `relative` on normal-mode footer so the inner absolute bg-background strip has a positioning context - Remove redundant md:pb-3 / md:pb-4 (same value as base pb-3 / pb-4) - Remove no-op `sticky bottom-0` from SharedComposer wrapper in both LoraCompareContent and GeneralCompareContent (flex layout with shrink-0 already pins it at the bottom; parent has no scrollable overflow for sticky to bind to) - Fix truncated comment on pointer-events rationale --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-09 06:00:56 -07:00
kiankyars	ad5972492d	Fix raw text paragraph break normalization (#4884 ) * Fix raw text paragraph break normalization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Normalize horizontal whitespace before stripping non-ASCII and collapse leftover doubles Run the [^\S\n]+ horizontal-whitespace collapse before the non-ASCII strip so that Unicode whitespace (\u00A0, \u202F, \u2009, \u3000, \v, \f, etc.) becomes a single ASCII space instead of being deleted outright. The prior ordering silently merged adjacent words on HTML/PDF/OCR-sourced text: "hello\u00a0world" used to produce "helloworld" after this PR; it now produces "hello world". Also drop \t from the allow-list since the horizontal-whitespace collapse already normalizes tabs to a single space, and add a targeted [ ]{2,} pass right after the non-ASCII strip so that a non-whitespace non-ASCII character sitting between two spaces ("word1 (c) word2") does not leave an interior double space. Without this extra pass, clean_text was not idempotent on such inputs: the first call produced "word1 word2" and only the second call collapsed it to "word1 word2". Fuzz testing over 10000 random inputs now satisfies the idempotence invariant in every case. * Add regression tests for Unicode/control whitespace and non-ASCII edge cases Cover: - Unicode horizontal whitespace separators (NBSP, narrow NBSP, thin space, en/em space, ideographic space, vertical tab, form feed) normalizing to a single ASCII space instead of being deleted. - Mixed paragraph + Unicode whitespace realistic input ("Section\u00a01\r\n\r\nBody\ftext\u202Fhere"). - Tab collapsing and space trimming around newlines. - Non-whitespace non-ASCII characters (copyright, accented letters, emoji) sitting between spaces: must not leave an interior double space, and clean_text must be idempotent on these inputs. - Non-ASCII characters adjacent to a newline: stripping must not leave stray leading or trailing spaces on the neighbouring line, and must not swallow an adjacent paragraph break. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-09 04:45:43 -07:00
cheehook	7aa442289b	Fix Mistral DPO/preference training crash on non-xformers platforms (e.g. Intel XPU) (#4889 ) * Fix Mistral training crash when xformers is unavailable * Fix/adjust Mistral DPO training crash fix for PR #4889 - Clarify comment in MistralForCausalLM_fast_forward: the DPO embed-masking block runs BEFORE attention_mask is nulled out, and it is the consumer that requires a 2D mask. - Add defensive attention_mask.ndim == 2 guard to the LlamaModel_fast_forward DPO embed-masking block so it self-protects if a 4D mask ever reaches it. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-09 04:38:44 -07:00
Daniel Han	da2ef6dce6	Only run ldconfig CUDA-linking recovery when we have permission (#4930 ) * Only run ldconfig CUDA-linking recovery when we have permission When `import unsloth` runs on a non-root environment (shared HPC, locked-down container, CI runner, etc.) the CUDA-linking recovery path shells out to `os.system("ldconfig /usr/lib64-nvidia")`, which fails loudly with "Permission denied". It's especially noisy for users who don't even have bitsandbytes installed - they're doing 16bit or full finetuning and the line immediately above told them "16bit and full finetuning works!". The reason the recovery runs at all in that case is that `bnb.functional.lib.cdequantize_blockwise_fp32` raises AttributeError on `bnb is None`, the bare `except:` swallows it, and the code drops into the recovery unconditionally. Fix: gate the recovery body on `os.geteuid() == 0`. When we don't have permission to run ldconfig, silently skip the recovery. When we do, the recovery runs UNCHANGED - same `os.system()` calls, same reload + retry, same warnings. `libcuda_dirs()` is used by both triton and bitsandbytes, so we still want to run the recovery whenever we have permission, regardless of whether bnb is installed. For non-root users who DO have bitsandbytes installed and broken, emit a single remediation warning telling them how to fix it manually (`sudo ldconfig /usr/lib64-nvidia`). This preserves the diagnostic guidance from the original code without the Permission denied noise. Scope: - Only the `DEVICE_TYPE == "cuda"` branch is touched. - The `hip` (AMD ROCm) and `xpu` (Intel) branches are unchanged. - On a real CUDA box running as root, behavior is byte-identical to main: same os.system() calls, same reload, same retry, same warnings. AST-verified by /tmp/verify_minimal/verify.py. - `hasattr(os, "geteuid")` guards against Windows where `os.geteuid` doesn't exist. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Han <info@unsloth.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-09 00:07:25 -07:00
dependabot[bot]	5fa8683b27	build(deps): bump the bun-frontend group across 1 directory with 16 updates (#4586 ) * build(deps): bump the bun-frontend group across 1 directory with 16 updates Bumps the bun-frontend group with 16 updates in the /studio/frontend directory: \| Package \| From \| To \| \| --- \| --- \| --- \| \| [@dagrejs/dagre](https://github.com/dagrejs/dagre) \| `2.0.4` \| `3.0.0` \| \| [@dagrejs/graphlib](https://github.com/dagrejs/graphlib) \| `3.0.4` \| `4.0.1` \| \| @hugeicons/core-free-icons \| `3.3.0` \| `4.0.0` \| \| [@streamdown/cjk](https://github.com/vercel/streamdown/tree/HEAD/packages/streamdown-cjk) \| `1.0.2` \| `1.0.3` \| \| [@streamdown/code](https://github.com/vercel/streamdown/tree/HEAD/packages/streamdown-code) \| `1.0.2` \| `1.1.1` \| \| [lucide-react](https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react) \| `0.577.0` \| `1.6.0` \| \| [recharts](https://github.com/recharts/recharts) \| `3.7.0` \| `3.8.0` \| \| [shadcn](https://github.com/shadcn-ui/ui/tree/HEAD/packages/shadcn) \| `3.8.5` \| `4.1.0` \| \| [streamdown](https://github.com/vercel/streamdown/tree/HEAD/packages/streamdown) \| `2.3.0` \| `2.5.0` \| \| [@biomejs/biome](https://github.com/biomejs/biome/tree/HEAD/packages/@biomejs/biome) \| `1.9.4` \| `2.4.8` \| \| [@eslint/js](https://github.com/eslint/eslint/tree/HEAD/packages/js) \| `9.39.4` \| `10.0.1` \| \| [@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node) \| `24.12.0` \| `25.5.0` \| \| [eslint](https://github.com/eslint/eslint) \| `9.39.4` \| `10.1.0` \| \| [eslint-plugin-react-refresh](https://github.com/ArnaudBarre/eslint-plugin-react-refresh) \| `0.4.26` \| `0.5.2` \| \| [globals](https://github.com/sindresorhus/globals) \| `16.5.0` \| `17.4.0` \| \| [typescript](https://github.com/microsoft/TypeScript) \| `5.9.3` \| `6.0.2` \| Updates `@dagrejs/dagre` from 2.0.4 to 3.0.0 - [Release notes](https://github.com/dagrejs/dagre/releases) - [Changelog](https://github.com/dagrejs/dagre/blob/master/changelog.md) - [Commits](https://github.com/dagrejs/dagre/compare/v2.0.4...v3.0.0) Updates `@dagrejs/graphlib` from 3.0.4 to 4.0.1 - [Release notes](https://github.com/dagrejs/graphlib/releases) - [Changelog](https://github.com/dagrejs/graphlib/blob/master/changelog.md) - [Commits](https://github.com/dagrejs/graphlib/compare/v3.0.4...v4.0.1) Updates `@hugeicons/core-free-icons` from 3.3.0 to 4.0.0 Updates `@streamdown/cjk` from 1.0.2 to 1.0.3 - [Release notes](https://github.com/vercel/streamdown/releases) - [Changelog](https://github.com/vercel/streamdown/blob/main/packages/streamdown-cjk/CHANGELOG.md) - [Commits](https://github.com/vercel/streamdown/commits/@streamdown/cjk@1.0.3/packages/streamdown-cjk) Updates `@streamdown/code` from 1.0.2 to 1.1.1 - [Release notes](https://github.com/vercel/streamdown/releases) - [Changelog](https://github.com/vercel/streamdown/blob/main/packages/streamdown-code/CHANGELOG.md) - [Commits](https://github.com/vercel/streamdown/commits/@streamdown/code@1.1.1/packages/streamdown-code) Updates `lucide-react` from 0.577.0 to 1.6.0 - [Release notes](https://github.com/lucide-icons/lucide/releases) - [Commits](https://github.com/lucide-icons/lucide/commits/1.6.0/packages/lucide-react) Updates `recharts` from 3.7.0 to 3.8.0 - [Release notes](https://github.com/recharts/recharts/releases) - [Changelog](https://github.com/recharts/recharts/blob/main/CHANGELOG.md) - [Commits](https://github.com/recharts/recharts/compare/v3.7.0...v3.8.0) Updates `shadcn` from 3.8.5 to 4.1.0 - [Release notes](https://github.com/shadcn-ui/ui/releases) - [Changelog](https://github.com/shadcn-ui/ui/blob/main/packages/shadcn/CHANGELOG.md) - [Commits](https://github.com/shadcn-ui/ui/commits/shadcn@4.1.0/packages/shadcn) Updates `streamdown` from 2.3.0 to 2.5.0 - [Release notes](https://github.com/vercel/streamdown/releases) - [Changelog](https://github.com/vercel/streamdown/blob/main/packages/streamdown/CHANGELOG.md) - [Commits](https://github.com/vercel/streamdown/commits/streamdown@2.5.0/packages/streamdown) Updates `@biomejs/biome` from 1.9.4 to 2.4.8 - [Release notes](https://github.com/biomejs/biome/releases) - [Changelog](https://github.com/biomejs/biome/blob/main/packages/@biomejs/biome/CHANGELOG.md) - [Commits](https://github.com/biomejs/biome/commits/@biomejs/biome@2.4.8/packages/@biomejs/biome) Updates `@eslint/js` from 9.39.4 to 10.0.1 - [Release notes](https://github.com/eslint/eslint/releases) - [Commits](https://github.com/eslint/eslint/commits/v10.0.1/packages/js) Updates `@types/node` from 24.12.0 to 25.5.0 - [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases) - [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node) Updates `eslint` from 9.39.4 to 10.1.0 - [Release notes](https://github.com/eslint/eslint/releases) - [Commits](https://github.com/eslint/eslint/compare/v9.39.4...v10.1.0) Updates `eslint-plugin-react-refresh` from 0.4.26 to 0.5.2 - [Release notes](https://github.com/ArnaudBarre/eslint-plugin-react-refresh/releases) - [Changelog](https://github.com/ArnaudBarre/eslint-plugin-react-refresh/blob/main/CHANGELOG.md) - [Commits](https://github.com/ArnaudBarre/eslint-plugin-react-refresh/compare/v0.4.26...v0.5.2) Updates `globals` from 16.5.0 to 17.4.0 - [Release notes](https://github.com/sindresorhus/globals/releases) - [Commits](https://github.com/sindresorhus/globals/compare/v16.5.0...v17.4.0) Updates `typescript` from 5.9.3 to 6.0.2 - [Release notes](https://github.com/microsoft/TypeScript/releases) - [Commits](https://github.com/microsoft/TypeScript/compare/v5.9.3...v6.0.2) --- updated-dependencies: - dependency-name: "@dagrejs/dagre" dependency-version: 3.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@dagrejs/graphlib" dependency-version: 4.0.1 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@hugeicons/core-free-icons" dependency-version: 4.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@streamdown/cjk" dependency-version: 1.0.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: bun-frontend - dependency-name: "@streamdown/code" dependency-version: 1.1.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: bun-frontend - dependency-name: lucide-react dependency-version: 1.6.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: recharts dependency-version: 3.8.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: bun-frontend - dependency-name: shadcn dependency-version: 4.1.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: streamdown dependency-version: 2.5.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: bun-frontend - dependency-name: "@biomejs/biome" dependency-version: 2.4.8 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@eslint/js" dependency-version: 10.0.1 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@types/node" dependency-version: 25.5.0 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: eslint dependency-version: 10.1.0 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: eslint-plugin-react-refresh dependency-version: 0.5.2 dependency-type: direct:development update-type: version-update:semver-minor dependency-group: bun-frontend - dependency-name: globals dependency-version: 17.4.0 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: typescript dependency-version: 6.0.2 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend ... Signed-off-by: dependabot[bot] <support@github.com> * Revert dagrejs upgrades Keep @dagrejs/dagre at ^2.0.4 and @dagrejs/graphlib at ^3.0.4. * Revert biome, eslint, typescript, and recharts upgrades These upgrades break studio/frontend locally: - @biomejs/biome 2.4.10 fails to parse the existing biome.json (files.ignore and organizeImports keys removed in v2; schema version mismatch). - typescript 6.0.2 emits TS5101 on tsconfig.app.json baseUrl ("Option 'baseUrl' is deprecated and will stop functioning in TypeScript 7.0"), so tsc -b exits 2. - eslint 10.2.0 conflicts with eslint-plugin-react-hooks@7.0.1, which peers on eslint ^9; npm install fails with ERESOLVE. - recharts 3.8.1 widened LegendPayload.dataKey to include a function type, which breaks the React key={item.dataKey} usage in src/components/ui/chart.tsx (TS2322). Hold these at their current pinned versions until the upstream peer deps and config migrations are ready. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-08 04:34:33 -07:00
Wasim Yousef Said	8e977445d4	Let recipes use the model loaded in Chat (#4840 ) * feat: inject local model provider into recipe jobs via JWT * feat: auto-generate JWT for local model providers in recipes * feat: add is_local flag to model provider config types and utils * fix(studio): skip endpoint validation for local providers * feat(studio): add local/external model source toggle to provider dialog * feat(studio): thread localProviderNames through model config dialog chain * feat(studio): show 'Local model (Chat)' label for local model_provider configs * fix: hardcode loopback for local endpoint, clear stale creds on toggle * fix: document TOCTOU/JWT rotation, add deferred import comments, fix is_local serialization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(studio): clear stale local model state on provider toggle and validation * fix(studio): override empty local endpoint in validation and skip model gate for unused providers * fix(studio): resolve loopback port from app.state, clear stale local provider fields, sync model id on toggle Address review feedback on the local-model-provider flow: - Backend (jobs.py): _resolve_local_v1_endpoint now reads the actual bound port from app.state.server_port (set in run.py after binding) instead of parsing it out of request.base_url, which is wrong behind any reverse proxy or non-default port. The two duplicated urlparse blocks are gone. - Backend (jobs.py): defensively pop api_key_env, extra_headers, extra_body from local providers so a previously external provider that flipped to local cannot leak invalid JSON or rogue auth headers into the local /v1 call. Also dedupe the post-loop assignment and tighten the local-name intersection so empty names cannot match. - Backend (jobs.py): hoist datetime and urllib.parse imports to the top import block for consistency with the rest of the file. - Backend (run.py): expose the bound port on app.state.server_port after the uvicorn server is constructed. - Frontend (model-provider-dialog.tsx): clear extra_headers and extra_body when toggling to local mode. Hidden inputs would otherwise keep stale JSON blocking validate/run. - Frontend (model-config-dialog.tsx): factor the local-aware provider selection logic into applyProviderChange and call it from both onValueChange and onBlur, so manually typing a provider name and tabbing away keeps the model field consistent. - Frontend (recipe-studio.ts store): handle both directions of the is_local toggle in the cascade. external -> local now backfills model: "local" on already-linked model_configs so they pass validation immediately, mirroring the existing local -> external clear path. - Frontend (validate.ts + build-payload.ts): thread localProviderNames into validateModelConfigProviders and skip the "model is required" check for local-linked configs. Local providers do not need a real model id since the inference endpoint uses the loaded Chat model. * fix(studio): narrow store cascade types, sync model placeholder on graph relink and node removal, harden ephemeral port path Loop 2 review fixes: - recipe-studio.ts: type-narrow next.is_local by also checking next.kind === "model_provider". TS otherwise raised TS2339 because next was typed as the union NodeConfig after the spread. The behavior is unchanged but the code now compiles cleanly. - model-config-dialog.tsx: convert the lastProviderRef / providerInputRef ref-during-render pattern (pre-existing react-hooks/refs lint error) to a useEffect that syncs providerInputRef from config.provider. The combobox blur path still uses applyProviderChange and remains stable. - recipe-graph-connection.ts: when a graph drag links a model_provider to a model_config, mirror the dialog applyProviderChange behavior: fill model: "local" if the new provider is local and the model field is blank, clear model when relinking from a local placeholder to an external provider, otherwise leave the model alone. - reference-sync.ts: when a referenced provider node is removed, clear the synthetic model: "local" placeholder along with the provider field, so a future relink to an external provider does not pass validation with a stale value that fails at runtime. - run.py: only publish app.state.server_port when the bound port is a real positive integer; for ephemeral binds (port==0) leave it unset and let request handlers fall back to request.base_url. - jobs.py: _resolve_local_v1_endpoint also falls back when app.state.server_port is non-positive, and uses `is None` instead of the truthy fallback so a literal 0 is handled correctly. * fix(studio): strict is_local check, narrow loaded-model gate to LLM-reachable configs, add scope-server port fallback Loop 3 review fixes: - jobs.py, validate.py: require `is_local is True` instead of truthy check. Malformed payloads such as is_local: "false" or is_local: 1 would otherwise be treated as local and silently rewritten to the loopback endpoint. - jobs.py: _resolve_local_v1_endpoint now tries request.scope["server"] (the actual uvicorn-assigned (host, port) tuple) as a second resolution step before falling back to parsing request.base_url. This covers direct-uvicorn startup paths and ephemeral binds that never publish app.state.server_port. - jobs.py: new _used_llm_model_aliases helper collects the set of model_aliases that an LLM column actually references, and the "Chat model loaded" gate is now only triggered when a local provider is reachable from that set. Orphan model_config nodes on the canvas no longer block unrelated recipe runs. * fix(studio): force skip_health_check on local-linked configs, skip JSON parsing for local providers, local-aware inline editor Loop 4 review fixes: - jobs.py: after rewriting local providers, also force skip_health_check: true on any model_config linked to a local provider. The /v1/models endpoint only advertises the real loaded model id, so data_designer's default model-availability health check would otherwise fail against the placeholder "local" id before the first chat completion call. The inference route already ignores the model id in chat completions, so skipping the check is safe. - builders-model.ts: buildModelProvider now short-circuits for local providers and emits only { name, endpoint: "", provider_type, is_local } without running parseJsonObject on the hidden extra_headers/extra_body inputs. Imported or hydrated recipes with stale invalid JSON in those fields no longer block client-side validate/run. - inline-model.tsx: the model_config branch now accepts an optional localProviderNames prop and mirrors the dialog applyProviderChange behavior. Changing provider to/from a local one auto-fills or clears the "local" placeholder consistently with the other edit paths. - recipe-graph-node.tsx: derive localProviderNames from the store via useMemo (stable identity) and pass it through renderNodeBody to <InlineModel>. Hooks order is preserved by declaring them above the early return for markdown_note nodes. - run.py: minor comment tweak - loop 3 already added the scope-server fallback path, note that in the comment. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: danielhanchen <info@unsloth.ai>	2026-04-08 03:48:22 -07:00
Daniel Han	c3d2d58046	Update dependabot.yml (#4915 )	2026-04-08 03:39:50 -07:00
dependabot[bot]	0087515d5c	build(deps): bump oxc-parser (#4776 ) Bumps the npm-oxc-validator group in /studio/backend/core/data_recipe/oxc-validator with 1 update: [oxc-parser](https://github.com/oxc-project/oxc/tree/HEAD/napi/parser). Updates `oxc-parser` from 0.121.0 to 0.123.0 - [Release notes](https://github.com/oxc-project/oxc/releases) - [Changelog](https://github.com/oxc-project/oxc/blob/main/napi/parser/CHANGELOG.md) - [Commits](https://github.com/oxc-project/oxc/commits/crates_v0.123.0/napi/parser) --- updated-dependencies: - dependency-name: oxc-parser dependency-version: 0.123.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: npm-oxc-validator ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-08 03:35:40 -07:00
dependabot[bot]	67e9db4921	build(deps): bump oxc-parser (#4776 ) Bumps the npm-oxc-validator group in /studio/backend/core/data_recipe/oxc-validator with 1 update: [oxc-parser](https://github.com/oxc-project/oxc/tree/HEAD/napi/parser). Updates `oxc-parser` from 0.121.0 to 0.123.0 - [Release notes](https://github.com/oxc-project/oxc/releases) - [Changelog](https://github.com/oxc-project/oxc/blob/main/napi/parser/CHANGELOG.md) - [Commits](https://github.com/oxc-project/oxc/commits/crates_v0.123.0/napi/parser) --- updated-dependencies: - dependency-name: oxc-parser dependency-version: 0.123.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: npm-oxc-validator ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-08 03:35:33 -07:00
pre-commit-ci[bot]	c2184af079	[pre-commit.ci] pre-commit autoupdate (#4879 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.15.8 → v0.15.9](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.8...v0.15.9) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-07 22:50:48 -07:00
Roland Tannous	f801e59c29	split venv_t5 into tiered 5.3.0/5.5.0 and fix trust_remote_code (#4878 ) * split venv_t5 into venv_t5_530 and venv_t5_550 for tiered transformers 5.x support * fix bfloat16 crash on T4 for FORCE_FLOAT32 models and disable trust_remote_code auto-enable for native t5 models * revert FORCE_FLOAT32 dtype change * restrict trust_remote_code auto-enable to Nemotron models only * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use config.json model_type for tier detection, add unsloth/nvidia namespace guard * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit `fb43d468e2`. * Revert "use config.json model_type for tier detection, add unsloth/nvidia namespace guard" This reverts commit `fc49ae2453`. * add unsloth/nvidia namespace guard to Nemotron trust_remote_code auto-enable * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * reorder tier checks: all substring matches before config.json fetches * extract shared activate_transformers_for_subprocess into transformers_version.py * narrow Nemotron trust_remote_code to nemotron_h/nemotron-3-nano, add to export worker * clean venv_t5 dirs before re-install in setup.sh, clarify version alias comment * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * run venv_t5 migration outside deps fast-path gate in both setup scripts --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-07 20:05:01 +04:00
Daniel Han	1d8160376e	Bump minimum unsloth version to 2026.4.4 in install scripts (#4876 )	2026-04-06 09:46:35 -07:00
Daniel Han	b295daf932	Update _utils.py	2026-04-06 09:39:06 -07:00
Lee Jackson	8c89b84bb6	Studio: Fix empty chat threads on navigation and stabilize new chat flow (#4872 ) * fix(chat): prevent implicit empty thread creation and stabilize new-chat flow * fix(chat): harden compare thread sync and simplify sidebar thread query * fix(chat): harden new-thread state sync and isolate compare active thread updates * fix(chat): stabilize new-thread state sync and prevent compare/session bleed * Fix thread restoration, handleNewThread guard, sidebar filter, and delete flow - Remove __LOCALID_ filter from getInitialSingleChatView: in this Dexie-backed adapter, AUI's __LOCALID_ prefixed IDs ARE the real persistent thread IDs stored by initialize(). Filtering them out breaks thread restoration on navigation. - Simplify handleNewThread to synchronous: the async Dexie message check is redundant (persistence is already deferred to first append) and strands users on legacy empty threads. Use a simple guard that checks the store's activeThreadId to detect unsent drafts. - Add message-count filter to sidebar: filter threads to only show those with at least one message, hiding legacy empty threads. - Add store-based sidebar highlighting fallback: use activeThreadId from the store when view.threadId is not set (nonce-backed chats). - Fix handleDelete to call onNewThread() instead of onSelect(), and clear activeThreadId, so the runtime properly resets after deleting the active thread. * Fix handleDelete nonce path and restore __LOCALID_ filter handleDelete was calling onNewThread() after clearing activeThreadId, but the handleNewThread guard sees !view.threadId && !activeThreadId and returns early, leaving the UI stuck on the deleted thread. Fix by directly calling onSelect with a new nonce instead. Restore __LOCALID_ filter in getInitialSingleChatView to prevent restoring unpersisted AUI local thread IDs on navigation. Without this filter, navigating away from /chat before sending a message would restore a non-existent thread that Dexie cannot fetch. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-06 09:32:54 -07:00
Daniel Han	4c83e3540e	Update	2026-04-06 09:20:17 -07:00
Daniel Han	723bfb2363	Add unit tests for HfFileSystem glob skip guard (#4854 ) Tests verifying that HfFileSystem().glob() is correctly skipped when is_model or is_peft is False, matching the guard added in PR #4852.	2026-04-06 08:54:36 -07:00
JYYYYYT	aa4c6010e1	fix(studio): custom folder scan fails to find GGUF variants when pointing directly at a model directory (#4860 ) Fix custom folder scanning when pointing directly at a model directory. When a user adds a custom scan folder that points directly at a model directory (e.g. /path/to/gemma-4-e2b-it-gguf/ containing config.json and gemma-4-E2B-it-BF16.gguf), the model list previously showed individual .gguf files as separate entries instead of recognizing the directory as a single model. Clicking any entry showed "No GGUF variants found" because list_local_gguf_variants received a file path and immediately returned empty. Changes: - Add _is_model_directory() helper that detects directories with both config metadata and actual model weight files (excludes mmproj GGUFs and non-weight .bin files like tokenizer.bin) - _scan_models_dir: detect self-model and return single directory entry - _scan_lmstudio_dir: surface model directories directly instead of descending into them as publisher folders; handle both root and child model directories - Add _resolve_gguf_dir() helper for GGUF path resolution that only falls back to parent directory when parent has model metadata - list_local_gguf_variants / _find_local_gguf_by_variant: use resolver so .gguf file paths inside model directories work correctly	2026-04-06 08:31:07 -07:00
Roland Tannous	0835f0a61b	fix: skip redundant HfFileSystem().glob() calls in loader.py (#4852 ) * fix: skip redundant HfFileSystem().glob() calls in loader.py Guard the SUPPORTS_LLAMA32 glob blocks with `is_model and is_peft` so the HfFileSystem HTTP call is only made when both configs could actually exist. This prevents indefinite hangs on slow/unreliable networks since the glob result is redundant when either AutoConfig or PeftConfig already failed to load. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove test file from main PR - moved to separate PR Tests for the glob skip guard belong in their own PR to keep the loader change minimal and reviewable. * Harden HfFileSystem glob: fix Windows path splitting, add try/except - Use str.rsplit("/", 1) instead of os.path.split to extract filenames from HfFileSystem paths. HfFileSystem always returns POSIX-style paths, but os.path.split uses the OS separator, so on Windows the entire path was returned as the "filename" and the config name comparison always failed. - Wrap the HfFileSystem().glob() call in try/except to gracefully handle network failures (offline mode, timeouts, unreachable Hub). On failure both_exist stays False, which is the safe default. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove redundant HfFileSystem().glob() call for remote repos When is_model and is_peft are both True, AutoConfig and PeftConfig have already loaded successfully, proving both config.json and adapter_config.json exist. The HfFileSystem network call to re-verify this was redundant and could cause hangs on slow networks. Replace the glob + try/except block with a direct both_exist = True assignment. * Remove unused HfFileSystem import HfFileSystem was only used for the glob() calls that were replaced with direct both_exist = True assignments in the previous commit. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-06 07:46:39 -07:00
Daniel Han	07b6fcc344	Remove Gemma-4 from FORCE_FLOAT32 (#4875 ) Gemma-4 does not need FORCE_FLOAT32. Testing shows that both float16 and bfloat16 work correctly without the forced float32 override: - Inference: identical outputs for float16 and bfloat16 (greedy decoding) - Training (100 steps, 4-bit LoRA, SFT on FineTome-100k): - float16 final loss: 3.048 - bfloat16 final loss: 3.065 - Losses converge to within 0.02 by step 60 - Grad norms healthy and comparable for both dtypes The FORCE_FLOAT32 path was actually causing training divergence. With it enabled, the compiled float32 run diverged at step ~28 with grad norms collapsing to near zero and loss plateauing at ~12.4. Without it, both dtypes train normally. This enables float16 on Tesla T4 and other GPUs without bfloat16 support.	2026-04-06 07:33:28 -07:00
Daniel Han	ab65b47c73	Add tests for is_vision_model() caching behaviour (#4855 ) * Add tests for is_vision_model() caching behaviour * Fix review feedback: remove dead helper, fix exception test - Remove unused _make_config() helper function (dead code) - Fix test_exception_result_cached to actually exercise the exception path by mocking load_model_config to raise OSError instead of using side_effect=[False] which only tested normal False returns * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use strict mock specs so tests exercise intended detection paths Use MagicMock(spec=[]) for all config mocks so hasattr() only returns True for explicitly set attributes. Without this, MagicMock defaults make all hasattr checks truthy, allowing tests to pass via unintended detection paths (e.g. img_processor instead of vision_config). --------- Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-06 06:41:40 -07:00
Roland Tannous	278f462996	[Studio][Optimization]Add vision detection cache to is_vision_model() (#4853 ) * Add vision detection cache to is_vision_model() to avoid redundant subprocess spawns is_vision_model() is called 4-5 times per training run for the same model with zero caching. For transformers 5.x models, each call spawns a full subprocess (~6s each). This adds a module-level _vision_detection_cache dict following the same pattern as the existing _audio_detection_cache used by detect_audio_type(). The function is refactored into a thin cache wrapper around _is_vision_model_uncached(), saving ~12s per training run. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Include hf_token in vision cache key for gated model correctness Cache key is now (model_name, hf_token) instead of just model_name. This prevents stale False results when an unauthenticated probe for a gated model is followed by an authenticated call. * Remove test file from main PR - will be submitted separately * Fix vision cache: normalize model names and skip caching transient failures - Normalize model names in cache key using resolve_cached_repo_id_case() to avoid duplicate entries for different casings of the same HF repo (aligns with case normalization from #4822) - Return None instead of False on transient failures (network errors, subprocess timeouts, HF API issues) so the cache layer can distinguish "definitely not a vision model" from "failed to check" - Only cache definitive True/False results; transient failures are retried on the next call instead of being permanently locked in as False * Refine failure handling: cache deterministic failures, guard normalization - Subprocess non-zero exit, JSON errors, and general exceptions return False (deterministic, cached) instead of None (retryable). Only subprocess.TimeoutExpired returns None since timeouts are transient. - Wrap cache key normalization in try/except so resolve_cached_repo_id_case or normalize_path failures fall back to raw model_name instead of crashing callers. * Harden vision detection cache: fix transient failure handling, thread safety, token security - All subprocess failure paths now return None (transient) instead of False, preventing permanent misclassification of VLMs after temporary HF/auth/network errors - Use SHA256 fingerprint for hf_token in cache key instead of raw bearer token - Add threading.Lock with double-checked locking to prevent thundering herd of concurrent subprocess spawns for the same uncached model - Distinguish permanent failures (RepositoryNotFoundError, GatedRepoError, ValueError) from transient ones in _is_vision_model_uncached - Pass resolved/normalized model name to detection (not just cache key) - Log normalization fallback at debug level instead of silent swallow - Thread hf_token through callers in routes/models.py and trainer.py that previously omitted it * Refine lock strategy and token fingerprint - Move detection computation outside the lock to avoid serializing long-running subprocess spawns (60s timeout) and HF API calls across all concurrent model checks. Lock is now only held for cache writes. - Use full SHA256 digest for token fingerprint instead of truncated 16-char prefix to eliminate collision risk. * Fix huggingface_hub import fallback and use atomic cache read - Add fallback import path for RepositoryNotFoundError/GatedRepoError from huggingface_hub.utils (older hub versions) when .errors is not available - Use sentinel-based dict.get() for single atomic cache read instead of two-step in/[] pattern (future-proof for no-GIL runtimes) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-06 06:41:20 -07:00
Leo Borcherding	68965988cf	Fix/studio colab button message: Add fallback message for Colab Studio button when proxy URL fails (#4866 ) * Add fallback message for Colab Studio button when localhost link doesn't work * Make fallback message darker grey for better readability * Make fallback message bold for better visibility --------- Co-authored-by: LeoBorcherding <LeoBorcherding@users.noreply.github.com>	2026-04-05 21:57:45 -07:00
Daniel Han	6100867447	Bump minimum unsloth version to 2026.4.2 in install scripts (#4842 )	2026-04-03 15:14:28 -07:00
Daniel Han	170c4b9b99	Update _utils.py	2026-04-03 15:02:14 -07:00
Daniel Han	4020a70a93	Add tests for cache case resolution (from PR #4822 ) (#4823 ) Tests for resolve_cached_repo_id_case and get_model_config case resolution, separated from the runtime changes in PR #4822.	2026-04-03 13:58:26 -07:00
Daniel Han	4f65cc94bc	Add Gemma 4 model sampling defaults (#4838 ) Add per-model YAML configs and MODEL_NAME_MAPPING entries for all 8 Gemma 4 models (4 instruct + 4 base): - gemma-4-31B-it / gemma-4-31B - gemma-4-26B-A4B-it / gemma-4-26B-A4B - gemma-4-E2B-it / gemma-4-E2B - gemma-4-E4B-it / gemma-4-E4B GGUF variants (only for -it models) resolve via the gemma-4 family entry in inference_defaults.json. Sampling defaults: temperature=1.0, top_p=0.95, top_k=64, min_p=0.0, no repetition or presence penalty. Matches gemma-3n and gemma-3.	2026-04-03 13:57:15 -07:00
Daniel Han	a32b871f0e	studio: add speculative decoding support (ngram-mod, on by default) (#4836 ) * studio: add speculative decoding support (ngram-mod, on by default) Enable n-gram speculative decoding for GGUF models in Unsloth Studio. Uses llama.cpp's ngram-mod mode which gives 10-40% faster generation with zero VRAM cost via a 4MB fixed hash table that auto-resets on low acceptance rates. Backend: - Add speculative_type field to LoadRequest, LoadResponse, and InferenceStatusResponse pydantic models - Add speculative_type parameter to LlamaCppBackend.load_model() with allowlist validation (ngram-simple, ngram-mod) - Pass --spec-type, --spec-ngram-size-n 16, --draft-max 24 flags to llama-server when ngram-mod is active - Default to ngram-mod for non-vision GGUF models server-side - Silently skip speculative decoding for vision models (unsupported in llama.cpp server-context.cpp) Frontend: - Add speculative_type to TS API types - Add speculativeType/loadedSpeculativeType to chat runtime store with default value of "ngram-mod" - Add On/Off toggle in Model settings section (GGUF only, hidden for vision models), included in dirty check for Apply/Reset - Wire speculative_type through model load request and response - Restore speculative type state on page refresh/reconnect * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: remove server-side speculative decoding override The backend was overriding speculative_type=None to "ngram-mod" for non-vision GGUF models, which prevented users from disabling spec decoding via the UI toggle. The frontend store already defaults to "ngram-mod", so the backend fallback was redundant and blocked the explicit "Off" setting. * fix: use recommended ngram-mod params from llama.cpp docs Update speculative decoding params to match the recommended values from llama.cpp docs (docs/speculative.md): --spec-ngram-size-n 24 (was 16, docs say small n not recommended) --draft-min 48 (was 0) --draft-max 64 (was 24, docs note MoEs need long drafts) Also fix comment: ngram-mod uses ~16 MB (4M entries * 4 bytes), not 4 MB. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add benchmark table and references to speculative decoding comment Include speedup numbers from llama.cpp PRs #18471 and #19164 as an inline comment so future readers understand the expected gains. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-03 13:56:59 -07:00
Daniel Han	2c73ab7871	fix(studio): harden sandbox security for terminal and python tools (#4827 ) * fix(studio): harden sandbox security for terminal and python tools The existing command blocklist used naive str.split() which is trivially bypassable via quoting, full paths, nested shells, variable expansion, and cross-tool pivoting through Python os.system/subprocess. Fixes #4818. Changes: - Replace str.split() blocklist with shlex.split() + os.path.basename() tokenization and regex scanning at shell command boundaries - Add sanitized subprocess environment (_build_safe_env) that strips credentials (HF_TOKEN, WANDB_API_KEY, GH_TOKEN, AWS_, etc.) and restricts PATH to /usr/local/bin:/usr/bin:/bin - Add PR_SET_NO_NEW_PRIVS via prctl on Linux so sudo/su/pkexec fail at the kernel level regardless of how they are invoked - Add RLIMIT_NPROC (256) and RLIMIT_FSIZE (100MB) to prevent fork bombs and disk filling attacks - Extend AST safety checker to detect os.system(), os.popen(), subprocess.run/Popen/call/check_output, os.exec, os.spawn* calls containing blocked commands or dynamic (non-literal) arguments - Add cross-platform support: cmd.exe on Windows, bash on Unix; CREATE_NO_WINDOW flag on Windows, preexec_fn on Unix - Expand blocklist from 7 to 14 commands: add su, chown, passwd, mount, umount, fdisk, kill, killall, pkill - Apply all layers to both _bash_exec and _python_exec Zero measurable performance overhead -- shlex parsing and a single prctl syscall per subprocess fork. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix review findings: exception_catching dead code, false positives, process substitution - Include exception_catching reasons in _check_code_safety so bare except-in-loop timeout evasion is actually blocked (was computed in _check_signal_escape_patterns but never read by the caller) - Remove base.split() inner loop that caused false positives on quoted text arguments containing blocked words (e.g. echo "kill this process") - Add targeted nested shell detection for bash/sh/zsh -c arguments instead, which catches bash -c 'sudo whoami' without false positives - Add <() process substitution to the regex character class so diff <(rm -rf /path) is also caught - Fix error message to say "unsafe patterns" instead of specifically mentioning signal manipulation when other categories trigger * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review feedback: regex paths, keyword args, list element scanning - Regex now matches blocked commands after optional path prefix at shell boundaries (catches ls; /usr/bin/sudo and similar) - Nested shell detection uses os.path.basename so bash -c "/bin/rm" is caught - AST checker now inspects keyword arguments (not just positional) so subprocess.run(args="sudo ...", shell=True) is detected - List elements in subprocess calls are now checked via _find_blocked_commands for consistency (catches subprocess.run(["bash", "-c", "rm -rf /"])) - Dynamic argument check uses _is_safe_literal that validates list contents are all string literals * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix nested shell scan to only check the script body, not positional args bash -c 'script' arg0 arg1 -- only tokens[i+1] is the script body; subsequent tokens are $0, $1 positional parameters passed to the script and are not executed as shell commands. Scanning all remaining tokens caused false positives. * Add subshell parentheses to regex command boundary detection (sudo whoami) was not caught because ( was not in the regex character class for shell command boundaries. Add ( to the set alongside ;, &, \|, backtick, newline. * Address high-priority review findings from 7 parallel reviewers - Track from-imports of dangerous functions (from os import system, from subprocess import run as r, etc.) via shell_exec_aliases dict so bare-name calls are detected by the AST checker - Include the active Python interpreter and virtualenv directories in the sanitized PATH so pip, uv, and Studio packages remain accessible in the sandbox - Add Windows-specific blocked commands (rmdir, takeown, icacls, runas, powershell, pwsh) only on win32 platform - Add os.posix_spawn and os.posix_spawnp to _SHELL_EXEC_FUNCS - Handle tuple literals same as list literals in AST argument inspection (both _extract_strings_from_list and _is_safe_literal) * Fix false positive on check=True kwargs and recursive nested shell scanning - Only inspect command-carrying keyword arguments (args, command, executable, path, file) in the AST checker, not control flags like check=True, text=True, capture_output=True which are booleans and were incorrectly flagged as non-literal dynamic arguments - Replace split() in nested shell detection with recursive call to _find_blocked_commands so that quoted commands (bash -c '"sudo" whoami') and semicolons (bash -c "sudo;ls") within nested shells are properly detected through the full shlex + regex pipeline * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move preexec_fn imports to module level and use find_library for libc Addresses two Gemini review findings: 1. preexec_fn thread safety: _sandbox_preexec previously imported ctypes and resource inside the function body, which runs between fork() and exec() in the child process. In a multi-threaded server, this could deadlock if the import machinery locks were held by another thread at fork time. Now all imports and the libc handle are resolved once at module load time, so _sandbox_preexec only calls C-level functions (prctl, setrlimit) with no Python import activity. 2. Hardcoded libc.so.6 path: replaced with ctypes.util.find_library("c") which works on glibc (libc.so.6), musl (libc.musl-.so.1), and other Linux distributions where libc has a different soname. Apply Gemini style suggestions: combined regex, dict.fromkeys, constant hoisting - Combine per-word regex loop into a single re.findall with alternation pattern, avoiding repeated regex compilation and searching - Replace manual dedup loop with dict.fromkeys for PATH entries - Hoist _CMD_KWARGS frozenset out of visit_Call to avoid recreating it on every AST node visit * Add cmd /c nested shell detection for Windows parity The nested shell scan only checked for Unix shells (bash -c, sh -c, etc). Add cmd /c and cmd.exe /c detection so that Windows nested shell invocations are also recursively scanned for blocked commands. The token scan already catches blocked commands at any position, so this is defense-in-depth for consistency across platforms. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Handle combined shell flags (-lc, -xc) and interleaved flags (--login -c) The nested shell scan only matched token == "-c" with the immediately preceding token being a shell name. This missed: - Combined flags: bash -lc 'rm ...' (-lc ends with c, is a valid combined flag meaning -l -c) - Interleaved flags: bash --login -c 'sudo ...' (--login sits between bash and -c) Now matches any short flag ending in 'c' (e.g. -lc, -xc, -ic) and walks backwards past intermediate flags to find the shell binary. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix /bin/bash bypass, remove RLIMIT_NPROC, reduce AST false positives Addresses three high-consensus findings from 20-reviewer pass: 1. /bin/bash -c 'sudo whoami' bypassed nested shell scan because the backwards flag-skip logic treated paths starting with / as flags. Now only skips tokens starting with - as Unix flags; on Windows only skips short /X flags (not /bin/bash style paths). [9/20] 2. RLIMIT_NPROC=256 caused subprocess.run to fail with EAGAIN because Linux enforces NPROC per real UID, not per process tree. Removed RLIMIT_NPROC entirely; RLIMIT_FSIZE and PR_SET_NO_NEW_PRIVS remain as the primary resource and privilege controls. [5/20] 3. AST checker rejected safe dynamic subprocess usage like cmd=["git","status"]; subprocess.run(cmd) as shell_escape_dynamic. Now only flags dynamic args for shell-string functions (os.system, os.popen, subprocess.getoutput, etc.) or when shell=True is explicitly set. List-based subprocess calls with shell=False (the default) do not pass through a shell and are not flagged. [12/20] * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Handle Windows drive letter paths and .exe extensions in command detection Gemini review found that Windows absolute paths (C:\Windows\System32\ shutdown.exe) and executable extensions (.exe, .com, .bat, .cmd) were not handled: - Token scan now strips .exe/.com/.bat/.cmd extensions before checking the blocklist, so sudo.exe matches sudo, shutdown.bat matches shutdown - Regex pattern now includes optional Windows drive letter prefix ([a-zA-Z]:[/\\]) and optional executable extension suffix, so commands after shell metacharacters with full Windows paths are also caught * Handle kwargs dict expansion, non-literal shell=, and except Exception false positive Addresses three findings from second 20-reviewer pass: 1. kwargs dict expansion (9/20): subprocess.run({"args": "rm ...", "shell": True}) bypassed the AST checker because kwargs were treated as opaque. Now expands literal dict kwargs to inspect their keys, and flags opaque kwargs (variable dicts) as unsafe. 2. Non-literal shell= values (7/20): shell=variable was treated as shell=False (safe). Now any shell= value that is not literally False is treated as potentially True (conservative default). 3. except Exception false positive (1/20): except Exception in a loop was flagged as timeout evasion, but Exception does not catch SystemExit or KeyboardInterrupt which are used for timeout enforcement. Narrowed to only flag except BaseException and except TimeoutError in loops. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-03 13:33:42 -07:00
Neodon	c027ec192e	fix(studio): ensure first chat tool call starts in session sandbox (#4810 ) Fixes #4809 On a new Studio chat, the first tool call could start before the frontend initializes the thread ID. That meant the first request could go out without a session_id, so the backend started the tool in the shared sandbox root instead of the chat's session sandbox. Frontend: - Eagerly initialize the thread when switching to a new chat - Resolve the thread ID once at request time and keep it stable through async model-load waits - Disable ActiveThreadSync during new-chat initialization to prevent stale thread IDs from being written back - Add error handling for thread initialization failures - Clear activeThreadId on all compare-mode entry paths to prevent cross-session leakage - Fix exitCompare to restore context usage from the saved view - Coerce falsy thread IDs to undefined for consistent backend/frontend fallback behavior - Use _default as the image sessionId fallback to match the backend Backend: - Use ~/studio_sandbox/_default when a request arrives without a session_id	2026-04-03 11:44:22 -07:00
Lee Jackson	a29b4e23fd	studio: reuse HF cached repo casing to prevent duplicate downloads (#4822 ) * fix(studio): reuse HF cached repo casing to prevent duplicate downloads * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move cache case resolution tests to separate PR Tests for resolve_cached_repo_id_case and get_model_config case resolution belong in their own PR to keep this change focused on the runtime fix. * fix(studio): debug-log HF_HUB_CACHE fallback in path_utils * Fix stale memoization in resolve_cached_repo_id_case - Check exact-case path before memo to ensure a newly-appeared exact match always wins over a previously memoized variant - Validate memoized entries still exist on disk before returning them to prevent stale results when cache dirs are deleted/recreated * Minor cleanups for cache case resolution - Use .is_dir() instead of .exists() for exact-case cache check (cache entries are always directories) - Remove redundant fallback in _detect_audio_from_tokenizer since get_cache_path already handles case resolution and returns None when the model is not cached --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-03 05:48:24 -07:00
Wasim Yousef Said	50dede11cc	Allow non-LLM recipes to run and move Data tab first in executions (#4805 ) * feat: allow non-LLM recipes to run without provider block * feat: reorder execution tabs and add generation-aware data tab empty state * fix: add accessibility attrs to data tab spinner and use literal ellipsis * fix(studio): use shared spinner, stub provider, and hide unused LLM metrics Backend: inject stub model provider for sampler-only recipes so DataDesigner init does not reject empty provider lists. Frontend: use shared Spinner component, hide LLM columns metric and model usage card when recipe has no LLM columns. * Fix tab reset and terminal auto-scroll regressions for PR #4805 Reset detailTab to "data" when switching between executions so the Data tab default is applied consistently, not only on first mount. Also add detailTab to the terminal scroll effect deps so auto-scroll-to-bottom fires when the user opens the Overview tab after landing on Data. * Guard terminal scroll reset to only fire on Overview tab The previous scroll effect ran on every tab switch, which could reset the user's manual scroll position if they scrolled up in the terminal and briefly switched tabs. Now the scroll-to-bottom and sticky-bottom reset only fires when navigating to the Overview tab. * Use None for stub provider api_key instead of literal string The stub ModelProvider that satisfies the DataDesigner registry for non-LLM recipes should not carry a fake credential string. Using None avoids sending an Authorization header if the provider is ever inadvertently invoked. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-03 05:37:26 -07:00
Wasim Yousef Said	5b7c0615f3	feat(studio): differentiate web search and URL fetch in chat tool UI (#4802 ) Differentiate web_search query searches from URL fetches in the Studio chat UI. Backend (llama_cpp.py): - Emit "Reading: hostname" for URL fetches and "Searching: query" for query searches in SSE status events - Only show hostname for valid http/https URLs; schemeless/non-http URLs get "Reading page..." generic fallback - Strip www. prefix for consistency with the frontend Frontend (tool-ui-web-search.tsx): - Tool card shows "Read hostname" / "Reading hostname..." for URL fetches - Shows "Searched query" / "Searching for query..." for query searches - Uses new URL() with protocol check; falls back to "Read page" / "Reading page..." for non-http URLs	2026-04-03 05:03:27 -07:00
Daniel Han	8981e6c804	Update test_pr4562_bugfixes.py for simplified install policy (#4817 ) - Add TestFetchJsonRetries for JSON retry logic and max_pages - Update TestSourceCodePatterns for simplified --simple-policy flow - Add tests for installed prebuilt release reporting - Add test for CUDA toolkit version-sorted nvcc discovery - Remove assertions for removed --resolve-install-tag / --resolve-source-build paths	2026-04-03 04:06:14 -07:00
DoubleMathew	ac562bac66	Fix/llama.cppbuilding (#4804 ) * Simplify llama.cpp install logic * print release tag * Retry failed json decode * don't pull all ggml releases * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove test file changes from main PR Test changes for test_pr4562_bugfixes.py will be submitted in a separate PR to keep this PR focused on the install path simplification. * Fix setup.sh executable bit and direct tag lookup for pinned releases - Restore setup.sh file mode to 100755 (was accidentally changed to 100644) - Add direct GitHub API tag lookup in iter_release_payloads_by_time for non-latest requested tags (e.g. b7879) instead of relying on paginated release scans that may miss older releases beyond the 5-page limit - Update stale DEFAULT_PUBLISHED_REPO comment to match new value * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix force-compile default ref and remove dead code in setup.ps1 - Change FORCE_COMPILE_DEFAULT_REF from "main" to "master" in all three files (install_llama_prebuilt.py, setup.sh, setup.ps1) since ggml-org/llama.cpp uses "master" as its default branch, not "main". Using "main" would cause git clone --branch to fail when UNSLOTH_LLAMA_FORCE_COMPILE=1 with UNSLOTH_LLAMA_TAG=latest. - Remove dead if ($SkipPrebuiltInstall) block inside the else branch of setup.ps1 that could never be reached (the outer elseif already handles $SkipPrebuiltInstall=true). - Maintain setup.sh executable bit (100755). * Improve iter_release_payloads_by_time error handling for direct tag lookup When a pinned release tag is not found (HTTP 404), fall through to the paginated release scan instead of silently returning empty results. Non-404 errors (network failures, rate limits) are propagated to the caller so users get actionable error messages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-03 00:34:20 -07:00
Michael Han	c1685b9459	Gemma 4 update.md	2026-04-02 22:54:03 -07:00
Manan Shah	a7e6964117	Fix/gemma4 install script (#4815 ) * transformer 5.5.0 has now been released * fallback for python < 3.10 :	2026-04-02 22:03:35 -07:00
Roland Tannous	6644a771b4	fix: patch PEFT for Gemma4ClippableLinear in loader checkpoint path (fixes export) (#4807 ) * fix: patch PEFT for Gemma4ClippableLinear in loader checkpoint path The same Gemma4ClippableLinear monkey-patch that exists in vision.py for training is needed in loader.py for loading existing checkpoints (used by export and inference). Gemma4ClippableLinear wraps nn.Linear but does not subclass it, so PEFT's LoRA injection fails with "Target module not supported". The patch redirects PEFT to target the inner .linear child instead. Applied only to the vision model PeftModel.from_pretrained path. Temporary fix until PEFT adds native support (peft#3129). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: wrap ClippableLinear patch in try/finally to always restore Ensures _create_and_replace is restored even if PeftModel.from_pretrained raises, preventing leaked global state across subsequent model loads. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-03 04:03:54 +04:00
Roland Tannous	f91ef8f9b0	fix(studio): lazy-import transformers in model_config to fix 5.x version switch (#4806 ) * fix(studio): lazy-import AutoConfig in model_config.py to fix transformers 5.x version switch Move `from transformers import AutoConfig` from module level to inside load_model_config() where it is actually used. model_config.py is transitively imported at module load time via: core/inference/__init__ → llama_cpp → utils.models → model_config In inference subprocesses (mp.spawn), this chain runs before _activate_transformers_version() can prepend .venv_t5/ to sys.path. The eager import caches transformers 4.57.6 in sys.modules, and the subsequent sys.path change has no effect — Python always checks sys.modules before sys.path. Making the import lazy ensures transformers is not loaded until after version activation, so the subprocess picks up the correct version. * fix(studio): also lazy-import extract_model_size_b in llama_cpp.py Belt-and-suspenders: make the import that originally triggered the chain lazy as well, so future module-level AutoConfig additions in utils.models cannot reintroduce the problem. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-03 02:56:01 +04:00
Daniel Han	e553a8ad0b	fix(studio): suppress fatal error when prebuilt manifest is missing (#4799 ) When DEFAULT_PUBLISHED_REPO is ggml-org/llama.cpp, the prebuilt resolver raises PrebuiltFallback because ggml-org releases do not include a llama-prebuilt-manifest.json asset. This was caught by the generic Exception handler and printed as "fatal helper error" to stderr, which triggers NativeCommandError on PowerShell. Catch PrebuiltFallback separately in the top-level __main__ handler and exit with EXIT_FALLBACK (code 2) instead of EXIT_ERROR (code 1). The message is still logged but without the "fatal helper error" prefix. The shell scripts already handle non-zero exits and fall back to source builds. Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-04-02 12:18:11 -07:00
Daniel Han	8ffd5826f2	Gemma-4	2026-04-02 11:59:37 -07:00
Daniel Han	934478ae31	fix(studio): revert llama.cpp default tag to latest (#4797 ) * fix(studio): revert llama.cpp default tag to latest The latest ggml-org/llama.cpp release (b8637) now includes Gemma 4 support. Revert the temporary "b8637" pin from #4796 to "latest" so the prebuilt resolver always picks the newest release automatically without needing manual tag bumps. * docs: add comment explaining latest vs master for llama.cpp tag Document in all three files why "latest" is preferred over "master" and when "master" should be used as a temporary override. --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-04-02 11:52:37 -07:00
Daniel Han	401621618b	fix(studio): don't set trust_remote_code for Gemma 4 training (#4795 ) Gemma 4 is a native transformers 5.5 model and does not need trust_remote_code=True. The auto-enable logic (added for NemotronH) was catching all transformers 5.x models, including Gemma 4. When trust_remote_code=True, unsloth_compile_transformers() returns early without running the compiler. This disables the fused cross entropy patch, causing logged training loss to be inflated by the gradient_accumulation_steps factor. Exclude models matching "gemma-4" or "gemma4" from the auto-enable so the compiler runs and applies fused cross entropy correctly.	2026-04-02 11:44:26 -07:00
Daniel Han	8d1712b4ea	fix(studio): pin llama.cpp to b8637 release (Gemma 4 support) (#4796 ) ggml-org/llama.cpp b8637 includes Gemma 4 support (ggml-org/llama.cpp#21309). Revert the temporary "master" default back to a pinned release tag. This eliminates the HTTP 422 errors from the prebuilt resolver (which could not find a release matching "master"), avoids unnecessary source builds, and restores prebuilt binary downloads on all platforms. Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-04-02 11:43:53 -07:00
DoubleMathew	7ae9b7f45f	fix windows llama.cpp compile from source issue (#4793 ) * fix windows llama.cpp compile from source issue * undo local repo usage * fix llama.cpp install * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix windows * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: route resolve-source-build call through Invoke-LlamaHelper The --resolve-source-build call at the source-build resolution path was still calling install_llama_prebuilt.py directly instead of going through Invoke-LlamaHelper. On PS7+ with ErrorActionPreference=Stop, stderr from the 422 response (when tag is "master") would trigger a terminating NativeCommandError and crash setup. * fix: suppress stderr error records from Invoke-LlamaHelper ErrorActionPreference=Continue prevents termination but PowerShell still displays stderr lines as visible ErrorRecord objects. Capture all output via 2>&1 and split stdout from stderr manually so that stderr lines never appear on the console. When StderrPath is given the stderr content is written to that file for diagnostics. * fix: always rebuild llama.cpp on Windows when tag is master When the requested llama.cpp tag is "master" (a moving target), skip the "already built" early exit so the build path runs and syncs to the latest commit. Without this, existing llama-server binaries from an older build (e.g. b8635 which lacks Gemma 4 support) are reused and model loading fails. Pinned tags (e.g. b8635) still skip the rebuild when the binary already exists, since the tag is immutable. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-04-02 11:43:46 -07:00
Daniel Han	7023e2a4ff	fix(studio): prioritize curated defaults over HF download ranking in Recommended (#4792 ) The model list merge order was `top_gguf + top_hub + static_models`, which meant the HF download-ranked models always came first. New models like Gemma 4 have low download counts and were not in the HF top-40, so they got buried after 80 other models despite being at the top of the curated static defaults in defaults.py. Flip the merge to `static_models + top_gguf + top_hub` so editorial picks (new model launches, promoted models) always appear first in the Recommended section, with HF popularity backfilling after. Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-04-02 10:46:53 -07:00
Roland Tannous	0446d46689	fixed name (#4791 )	2026-04-02 21:04:42 +04:00
Daniel Han	1ce83c40aa	fix(studio): build llama.cpp from master instead of latest release tag (#4790 ) The latest ggml-org/llama.cpp release (b8635) does not include Gemma 4 support (ggml-org/llama.cpp#21309 merged after the release was cut). This causes `llama-server` to fail with "unknown model architecture: gemma4" when loading Gemma 4 GGUFs. Temporarily default _DEFAULT_LLAMA_TAG to "master" so all new installs build from the llama.cpp master branch which includes Gemma 4 support. Once a new upstream release is cut with Gemma 4, this can be reverted back to "latest". Changes: - setup.sh: add _DEFAULT_LLAMA_TAG="master" maintainer default - setup.ps1: add $DefaultLlamaTag="master" maintainer default - install_llama_prebuilt.py: change DEFAULT_LLAMA_TAG fallback to "master" Users can still override via UNSLOTH_LLAMA_TAG env var.	2026-04-02 09:45:56 -07:00
Daniel Han	2af53bf9a6	Pin transformers and huggingface-hub in main Studio venv (#4788 ) Revert the >= loosening from `f9c4b08` back to exact pins. Using transformers>=4.57.6 allows pip to install 5.x into the main Studio venv, which breaks huggingface_hub imports (is_offline_mode removed in newer hub versions). The main venv must stay on transformers==4.57.6 and huggingface-hub==0.36.2. The 5.x version lives only in .venv_t5/ and is dynamically switched via sys.path at runtime.	2026-04-02 09:21:30 -07:00
Daniel Han	a241c58d84	Use transformers v5.5-release branch and pin to 5.5.0 (#4786 ) The v5.5-release branch now exists on huggingface/transformers. Use transformers==5.5.0 for all install paths and git+transformers.git@v5.5-release for the MLX installer. Also bumps huggingface_hub from 1.7.1 to 1.8.0 in setup.sh and setup.ps1 to stay consistent.	2026-04-02 09:10:02 -07:00
Daniel Han	a353557249	Force llama.cpp to always use mainline ggml-org (#4785 ) Hardcode the release repo to ggml-org/llama.cpp and remove the UNSLOTH_LLAMA_RELEASE_REPO and UNSLOTH_LLAMA_SOURCE env var overrides so that all users always build/download from mainline llama.cpp.	2026-04-02 09:03:00 -07:00
Daniel Han	f1c3b9caa9	Pin Gemma-4 transformers requirement to 5.5.0 stable (#4784 ) Gemma-4 support landed in transformers main (huggingface/transformers#45192). Update the version pin from 5.5.0.dev0 to 5.5.0 across loader, Studio version switcher, and the MLX installer. Also fix install_gemma4_mlx.sh which referenced a non-existent v5.5-release branch -- pin it to the correct commit (91b1ab1) instead.	2026-04-02 08:59:21 -07:00
Daniel Han	4f9986ecb9	fix(studio): improve tool-calling re-prompt for small models (#4783 ) Small GGUF models (<9B) frequently generate full code or lengthy explanations instead of calling tools, bypassing the existing plan-without-action re-prompt mechanism. Three issues: 1. _REPROMPT_MAX_CHARS=500 was too low -- models that output full HTML/code responses (often 1000+ chars) never triggered the re-prompt at all, since it only fires on short responses. 2. _MAX_REPROMPTS=1 gave the model only one chance to comply. Small models often need 2-3 nudges before switching from text generation to tool calling. 3. The re-prompt text ("Please use the available tools...") was too polite for small models to follow reliably. 4. Tool-calling detection missed chat templates using Jinja whitespace-trimming syntax ({%- if tools -%}) since only ({%- if tools %}) and ({% if tools %}) were checked. Changes: - Raise _REPROMPT_MAX_CHARS from 500 to 2000 so longer responses (code blocks, multi-paragraph plans) still trigger re-prompts - Raise _MAX_REPROMPTS from 1 to 3 for more retry budget - Use direct, imperative re-prompt language that small models follow more reliably ("STOP. You MUST call a tool NOW.") - Strengthen the system prompt tool nudge to explicitly forbid outputting code blocks (redirect to the python tool instead) - Add Jinja whitespace-trimmed variants to the tool_markers list so all template styles are detected correctly	2026-04-02 08:59:02 -07:00
Daniel Han	f9c4b08726	UI Changes (#4782 ) * UI Changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unrelated test file --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-02 08:05:55 -07:00
Roland Tannous	3b613eb1e8	ui improvement (#4781 ) * ui * ui * ui	2026-04-02 07:57:47 -07:00
Daniel Han	c8d311a053	feat(studio): display images from Python tool execution in chat UI (#4778 ) * feat(studio): display images from Python tool execution in chat UI When the model calls the Python tool to create a matplotlib plot or other image file, the image now displays inline in the chat output instead of being invisible to the user. Backend: - Detect new image files (png/jpg/gif/webp/bmp) after Python subprocess completes by diffing os.listdir before/after execution - Append __IMAGES__ sentinel to tool result for frontend consumption - Strip sentinel before injecting result into LLM context (role: tool) so the model never sees file paths - Add GET /sandbox/{session_id}/{filename} endpoint with JWT auth (header or query param), path traversal protection, extension allowlist, realpath containment check, and nosniff header Frontend: - Parse __IMAGES__ sentinel in tool_end SSE events, create structured result with text/images/sessionId - Render <img> tags in Python tool UI pointing at the sandbox endpoint Also fixes a bug where SyntaxError in user code was misreported as "unsafe code detected" instead of showing the actual Python traceback. The _check_code_safety function now lets SyntaxError pass through to the subprocess for a proper error message. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(studio): improve SVG detection and strip XML preamble Handle <?xml ...?> declarations before <svg> tags in code fences, strip XML declaration from SVGs before data URI rendering, and update the sloth suggestion prompt to request showing code. * fix(studio): persist parentId so retries survive reload The append() handler was destructuring only { message } from ExportedMessageRepositoryItem and discarding parentId. When loading a saved thread, load() used ExportedMessageRepository.fromArray() which chains all messages sequentially, flattening retry branches into a linear list. Now append() writes parentId to the MessageRecord, and load() reconstructs the tree when parentIds are present. Old threads without parentId fall back to the existing fromArray() behavior. * fix(studio): address review findings for image display and retry persistence Image detection: - Use mtime comparison instead of filename-only diff so overwritten files (e.g. plt.savefig("chart.png") called twice) are detected Sentinel parsing: - Use rsplit/lastIndexOf instead of split/indexOf so user code that prints __IMAGES__: does not collide with the backend sentinel Mixed legacy/new threads: - For old messages without a stored parentId, infer sequential parent from the previous message instead of null, preventing multiple roots Sandbox endpoint: - Change Cache-Control from "public, max-age=3600" to "private, no-store" since these are authenticated responses --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-02 05:08:16 -07:00
Lee Jackson	5a5f1a4f34	studio: fix chat font changes leaking outside chat page (#4775 ) * fix(frontend): scope sans font overrides to chat thread only * fix(frontend): use font-sans fallback for heading stack and simplify chat font rules --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-02 05:04:23 -07:00
DoubleMathew	1ce8a8e7cd	Feat/custom llama prebuilt (#4771 ) * update logic to incorporate custom prebuilt installs * bug fixes * update for review comments * fix tags * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate test changes from main PR Move test file changes out of this PR to keep the diff focused on the install_llama_prebuilt.py and setup script changes. Test updates will be submitted in a follow-up PR. * Fix branch ref normalization and harden JSON parsing - Add checkout_friendly_ref() to strip refs/heads/ prefix from branch refs before emitting them in SourceBuildPlan. git clone --branch does not accept fully qualified refs like refs/heads/main. - Apply normalization in source_build_plan_for_release() and the direct-ref fallback in resolve_source_build_plan(). - Allow validated_checksums_for_bundle() to accept releases that carry only an exact-commit source archive without the legacy upstream-tag source tarball. - Add 2>/dev/null \|\| true guards to all inline python -c JSON parsing in setup.sh so a malformed payload does not abort the script under set -e. * Fix Windows CUDA asset ordering and tag ref normalization - Reorder windows_cuda_upstream_asset_names to prefer the main binary archive (llama-{tag}-bin-win-cuda-) over the cudart sidecar archive (cudart-llama-bin-win-cuda-). The cudart ZIP only contains CUDA runtime DLLs, not llama-server or llama-quantize binaries. - Extend checkout_friendly_ref to also strip refs/tags/ prefix for tag refs, matching the refs/heads/ handling for branch refs. * Simplify JSON parsing consistency in setup.sh Use json.load(sys.stdin) consistently for all inline JSON parsing in setup.sh, instead of the more complex json.loads(raw) pattern on the install-tag resolution path. The 2>/dev/null \|\| true guard already handles empty/malformed input gracefully. * Fix source build plan fallback for commit ref kind in PR #4771 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <daniel@unsloth.ai> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-02 04:52:26 -07:00
Daniel Han	b20efc370a	Add regression tests for custom llama prebuilt installer (#4772 ) Expand test coverage for install_llama_prebuilt.py: - Add tests for source build plan resolution with custom repos - Add tests for branch/commit/PR ref matching and normalization - Add tests for manifest checksum validation - Add tests for Windows CUDA upstream asset name patterns - Update capsys checks to capture stderr after log() redirect	2026-04-02 04:45:09 -07:00
Michael Han	e2fd946fe1	Add files via upload	2026-04-02 03:00:10 -07:00
Michael Han	31d6aeb197	Unsloth new logo	2026-04-02 02:58:21 -07:00
Daniel Han	e4d1499230	fix(studio): prevent small models from stalling on tool-calling tasks (#4769 ) * fix(studio): prevent small models from stalling on tool-calling tasks Small GGUF models (< 9B params) in "Think, Search, Code" mode would often describe what they planned to do ("Let me create this dashboard") and then stop generating without ever calling a tool. Three changes: 1. Simplify web_tips for small models: remove the "fetch its full content by calling web_search with the url parameter" guidance for models < 9B. This multi-step instruction causes small models to plan elaborate search-then-fetch-then-code sequences they cannot reliably execute. 2. Add "always call tools directly" imperative to the system prompt nudge so models act immediately instead of narrating their intentions. 3. Add plan-without-action re-prompt in the agentic loop: when the model emits planning text (matching patterns like "let me", "I'll", etc.) without calling any tool, inject a nudge asking it to call the tool and continue the loop. Capped at 2 re-prompts per request. Benchmarked with Qwen3.5-4B-GGUF (N=5 trials per variant): - Baseline: 40% of requests had any tool call - Combined fix: 100% of requests had at least one tool call * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-02 02:11:07 -07:00
Daniel Han	dc0729aadf	Add regression test for shell injection fix in GGML conversion (#4773 ) AST-based test ensures subprocess.Popen calls in GGML conversion functions use argv lists instead of shell=True. Companion to PR #4768.	2026-04-02 00:10:47 -07:00
mateeaaaaaaa	752cef3299	fix(security): shell injection in GGML export conversion (#4768 ) * Fix shell injection in GGML conversion paths * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove test file from security fix PR Move test_save_shell_injection.py to a separate PR to keep this PR focused on the security fix itself. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-02 00:10:43 -07:00
AdamPlatin123	ba8081fc96	fix(chat): correct loading text for cached models during inference (#4764 ) Distinguish between actual network downloads and GPU memory loading for cached LoRA adapters in Studio chat. - Add isCachedLora detection for local LoRA adapter paths using comprehensive cross-platform regex (Unix, Windows, UNC, WSL, tilde) - Thread isCachedLora through loadInfo to chat-page inline status for proper 3-way distinction (cached / local LoRA / downloading) - Skip download progress polling for cached LoRA models (no useless /download-progress API calls) - Fix initial toast state to use isCachedLoad consistently instead of only checking isDownloaded - Fix cancelLoading toast to not mention background downloads for cached/local loads - Keep download-specific text ("Downloading model..." / "Download complete") inside the download-only polling block	2026-04-01 20:24:48 -07:00
Lee Jackson	ca4ea8b9fb	studio: align composer/code, unify fonts, and remove tool collapse jitter (#4763 ) - Add min-w-0 guards to thread/message/markdown containers to prevent content overflow past the composer width - Unify chat typography from Hellix/Space Grotesk to the sans stack, keeping monospace for code blocks and inline code - Restructure desktop navbar right-side controls with shrink-0 wrappers for consistent spacing across HoverCard roots - Soften tool-call label styling (font-medium + text-foreground/85 instead of bold) - Add responsive code block sizing via @container queries - Add horizontal scrolling for wide code blocks within the thread column - Scope list-item code block alignment CSS to .aui-thread-root - Preserve useScrollLock in tool-fallback and tool-group collapsibles - Fall back to bg-background on ViewportFooter when hideComposer is true - Widen inline code monospace selector to cover th, blockquote, and heading elements - Remove unused @fontsource-variable/space-grotesk import	2026-04-01 19:57:10 -07:00
DoubleMathew	71b934ef9d	Fix custom llama.cpp source builds and macos metal source builds (#4762 ) * Fix script unbound variable error * remove stale test script, add llama.cpp metal source builds, update tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Metal precedence, test sync, and add behavioral tests - Move macOS arm64 Metal check before CUDA/ROCm in GPU backend decision chain so Metal is not bypassed when nvcc is in PATH - Remove RPATH flags from CPU fallback CMAKE_ARGS (only needed for Metal library linking) - Update test_llama_pr_force_and_source.py to match _CLONE_ARGS rename from _CLONE_BRANCH_ARGS in setup.sh - Add confirm_install_tree guard test for existing_install_matches_choice - Add TestMacOSMetalBuildLogic bash subprocess tests verifying Metal flag selection, nvcc precedence, and CPU fallback behavior * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Metal CPU fallback to also cover cmake build failures and update tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. _GPU_BACKEND_FRAGMENT synced -- removed dead CPU_FALLBACK_CMAKE_ARGS= init (6/8) 2. RPATH assertion replaced -- new test_macos_arm64_cpu_fallback_args_exclude_rpath checks the actual runtime CPU_FALLBACK_CMAKE_ARGS output for @loader_path and -DCMAKE_BUILD_WITH_INSTALL_RPATH=ON (6/8) 3. _TRY_METAL_CPU_FALLBACK=false reset after both configure-failure and build-failure fallback branches in setup.sh (4/8) 4. macOS test now removes libmtmd.0.dylib instead of the platform-agnostic convert_hf_to_gguf.py (3/8) 5. Empty-string tag test added -- test_empty_tag_omits_branch_flag for resolved_tag= (2/8) 6. RPATH checks on cmake call logs -- both fallback tests now assert @loader_path and -DCMAKE_BUILD_WITH_INSTALL_RPATH=ON are absent from CPU fallback cmake calls, plus baseline flag preservation (multiple) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tests clean up * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-01 14:06:39 -05:00
Daniel Han	39fe23ded8	Tests for architecture-aware KV cache estimation (#4760 ) * test: add 66 tests for architecture-aware KV cache estimation Covers all 5 estimation paths (MLA, Hybrid Mamba, Sliding Window, Standard GQA, Legacy), GGUF parser for 8 new metadata fields, _can_estimate_kv gate conditions, quantization scaling, edge cases, path priority ordering, and lifecycle (init/unload/reparse). Zero external dependencies beyond pytest. No GPU or network required. Cross-platform (Linux, macOS, Windows, WSL). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-01 06:13:37 -07:00
Daniel Han	653eb3819a	fix(studio): allow context length slider to reach model's native limit (#4746 ) * fix(studio): allow context length slider to reach model's native limit The context length slider was hard-capped to the VRAM-estimated maximum, preventing users from requesting higher context even though the backend already handles it safely (multi-GPU selection, --fit fallback). Expose the model's native context length from GGUF metadata as a separate API field and use it as the slider ceiling instead. Add an amber warning when the selected context exceeds the estimated VRAM capacity. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Raise VRAM budget to 90% and add native_context_length tests Increase the GPU memory utilization threshold from 70% to 90% across _select_gpus and _fit_context_to_vram, allowing longer context lengths before VRAM capping kicks in. Add 33 tests for the native_context_length feature covering the backend property, context value separation invariants, Pydantic models, route completeness, edge cases, and cross-platform binary I/O. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-01 06:12:52 -07:00
Daniel Han	d22b2a18f9	fix: add tokenizers to no-torch deps and TORCH_CONSTRAINT for arm64 macOS py313+ (#4748 ) * fix: add tokenizers to no-torch runtime deps and add TORCH_CONSTRAINT for arm64 macOS py313+ Two installer fixes: 1. Add `tokenizers` to `no-torch-runtime.txt` before `transformers`. Without it, `from transformers import AutoConfig` crashes on startup because `--no-deps` skips transitive dependencies. 2. Add `TORCH_CONSTRAINT` variable to `install.sh`. On arm64 macOS with Python 3.13+, tighten the torch requirement to `>=2.6` since torch <2.6 has no cp313 arm64 wheels. The variable replaces the previously hard-coded constraint in the uv pip install line. Includes 66 tests (42 pytest + 24 bash) covering: - Structural checks on install.sh, install.ps1, no-torch-runtime.txt - Shell snippet tests with mocked python for 13 platform/version combos - Mock uv integration verifying correct constraint string - E2E venv tests on Python 3.12 and 3.13 confirming AutoConfig works - Negative control proving AutoConfig fails without tokenizers - Full no-torch sandbox regression guards (safetensors, huggingface_hub) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix incomplete no-torch manifest and align E2E tests with real --no-deps path - Add missing transitive deps to no-torch-runtime.txt that are required under --no-deps: regex, typing_extensions, filelock, httpx, httpcore, certifi, idna, anyio, sniffio, h11. Without these, `from transformers import AutoConfig` still fails after install.sh --no-torch. - Change all E2E tests to use --no-deps (matching what install.sh does) instead of normal dep resolution. Previous tests passed even with an incomplete manifest because uv backfilled transitive deps. - Rewrite negative control to derive from the real no-torch-runtime.txt with tokenizers stripped, proving the specific fix matters. - Replace GNU-only sed -i with heredoc in shell test for macOS compat. - Remove unused os/sys imports from Python test file. - Quote SKIP_TORCH and mock uv paths in bash -c strings. * Assert install succeeds before checking import results in E2E tests Address review feedback: test_torch_not_importable and test_tokenizers_directly_importable in Group 3 now assert that uv pip install returns 0 before checking import behavior. This prevents false positives when the install itself fails silently. * Assert install succeeds in negative control and tighten error check - Add missing install-success assertion in test_negative_control_no_tokenizers to prevent false positives from network/install failures. - Tighten error message check to look for "tokenizers" in stderr or ModuleNotFoundError, rather than the generic "No module" substring which could match unrelated import failures. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-01 06:12:17 -07:00
Daniel Han	76cb48be0b	fix: studio web search SSL failures and empty page content (#4754 ) - Fix SSL handshake failures (SSLV3_ALERT_HANDSHAKE_FAILURE, CERTIFICATE_VERIFY_FAILED) when fetching HTTPS pages by introducing _PinnedHTTPSConnection that separates TCP connect (to pinned IP) from TLS handshake (with real hostname for SNI/cert verification) - Fix SSRF DNS-rebinding vulnerability: previous impl swapped conn.host before connect(), causing fresh DNS resolution; new subclass keeps TCP pinned to validated IP - Fix SPA/JS-rendered doc sites returning empty content by rotating real browser User-Agents (Chrome/Firefox/Safari) - Strip nav/footer from HTML-to-Markdown output so article content is not buried under navigation chrome - Increase raw fetch cap from 64KB to 512KB so SSR article content is reached on GitBook/Docusaurus/Next.js pages - Fix IPv6 address bracketing in URL netloc construction - Hoist SSL context, handler classes, and stdlib imports to module level (created once, not per-call) - Use consistent UA across redirect hops to avoid breaking session-aware bot detection	2026-04-01 06:12:02 -07:00
Daniel Han	f84c2d03d3	Add installer test coverage for prebuilt llama.cpp changes (#4756 ) Split out from #4741 to keep the main PR focused on installer logic. - New test_install_llama_prebuilt_logic.py: tests for resolve logic, fallback behavior, env_int, busy/lock handling - New test_validate_llama_prebuilt.py: validator tests for staged release_tag/upstream_tag handling - New test_llama_pr_force_and_source.py: tests for PR_FORCE and LLAMA_SOURCE maintainer defaults - Updated test_selection_logic.py: expanded selection/fallback coverage - Updated test_pr4562_bugfixes.py: updated bugfix tests for new logic - Updated smoke_test_llama_prebuilt.py: minor update	2026-04-01 06:06:29 -07:00
DoubleMathew	428efc7d95	Resolve latest usable published llama.cpp release instead of fixed pinned tag (#4741 ) Replaces the fixed prebuilt llama.cpp tag with dynamic published-release resolution, adds bounded fallback across older published releases, and introduces maintainer-editable defaults for PR/source overrides. Changes: - Resolve latest from the latest usable published release in unslothai/llama.cpp - Use the selected release upstream_tag as the authoritative llama.cpp version - Prefer Unsloth-published platform assets when available - Fall back to same-tag upstream ggml-org/llama.cpp assets where allowed - Keep Linux CUDA anchored to Unsloth-published CUDA bundles only - Add bounded fallback across older Unsloth published releases - Add separate busy/in-use install handling (exit code 3) - Skip reinstall when the installed bundle already matches the selected candidate - Add maintainer-editable _DEFAULT_LLAMA_PR_FORCE and _DEFAULT_LLAMA_SOURCE - Harden env parsing so malformed installer env vars do not crash import-time fallback logic - Honor UNSLOTH_LLAMA_RELEASE_TAG in all resolve steps - Always sync git remote URL in existing-checkout path	2026-04-01 06:06:17 -07:00
Daniel Han	5d7d882ce6	Fix save_pretrained_merged for full-finetuned models (#4755 ) * Fix save_pretrained_merged for full-finetuned models save_pretrained_merged and push_to_hub_merged silently do nothing when the model is not a PeftModel (i.e. full finetuning without LoRA). merge_and_overwrite_lora returns None immediately for non-PeftModel, and unsloth_generic_save does not check the return value. Add a non-PeftModel branch in unsloth_generic_save that falls back to model.save_pretrained / model.push_to_hub. When save_method contains "16bit", cast weights to bfloat16 (or float16) via a state_dict copy to honor the user's intent without mutating the live model. The existing PeftModel (LoRA) code path is unchanged. * Forward create_pr and revision to tokenizer.push_to_hub The tokenizer push_to_hub call was missing create_pr and revision, which could cause the tokenizer to push to the wrong branch or bypass PR creation when the model push uses them. * Honor merged_16bit dtype contract for full-finetuned models Cast state_dict to bfloat16/float16 when save_method contains "16bit" to match the documented behavior of save_pretrained_merged. Also pass state_dict and save kwargs consistently to both save_pretrained and push_to_hub paths. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review feedback for PR #4755 - Simplify PeftModel isinstance check (PeftModelForCausalLM inherits from PeftModel) - Add is_main_process guard for distributed training - Forward variant to save_pretrained - Set tokenizer padding_side to "left" before saving (matches other save paths) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-01 06:05:37 -07:00
Daniel Han	77e1a9edc9	feat(studio): architecture-aware KV cache VRAM estimation (#4757 ) * feat(studio): architecture-aware KV cache VRAM estimation Replace the single legacy formula (2 * n_kv_heads * head_dim * n_layers * n_ctx * bpe) with 5-path estimation that reads 8 additional GGUF metadata fields: 1. MLA (DeepSeek-V2/V3, GLM-4.7, GLM-5, Kimi-K2.5) -- K-only cache using compressed KV latent + RoPE; no separate V allocation 2. Hybrid Mamba (Qwen3.5-27B, Qwen3.5-35B-A3B) -- only attention layers (1 in N) carry KV; Mamba layers have none 3. Sliding Window (Gemma-3, gpt-oss) -- SWA layers cache min(ctx, window) tokens instead of the full context 4. Standard GQA -- uses explicit key_length/value_length from GGUF instead of embed // n_heads (which is wrong for many models) 5. Legacy fallback -- identical to old formula for old GGUFs New GGUF fields parsed: attention.key_length, attention.value_length, attention.sliding_window, full_attention_interval, attention.kv_lora_rank, attention.key_length_mla, ssm.inner_size, ssm.state_size. Validated against 9 real GGUF files (72/72 field checks pass). The legacy formula was off by +682% for Gemma-3 and -81% for DeepSeek-V3.1. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix MLA fallback and SWA global/local ratio heuristic Two fixes based on review findings: 1. MLA fallback now uses key_length_mla from GGUF metadata instead of hardcoded rope_dim=64. Falls back to 64 only when key_length_mla is absent. This ensures correct estimates for MLA variants that use rope dimensions other than 64. 2. SWA global/local layer ratio changed from 50/50 to 1/4 (25% global, 75% SWA). Most sliding window architectures have predominantly local layers (Gemma-3 uses ~17% global, gpt-oss uses ~50%). The 1/4 heuristic is closer to the common case and still a large improvement over the legacy formula which ignores SWA entirely. * Tighten _can_estimate_kv gate and treat sliding_window=0 as disabled Two additional fixes from review round 1 (5/8 and 4/8 reviewer consensus): 1. _can_estimate_kv now requires BOTH key_length AND value_length for the explicit-dims path. Previously key_length alone was enough, which could cause silent fallthrough to the legacy formula with fabricated defaults (n_kv=1, head_dim=128) when value_length was absent from the GGUF. 2. SWA path now requires sliding_window > 0. Some GGUFs use 0 as a disabled sentinel. Without this guard, min(ctx, 0) would zero out all SWA layer contributions, severely underestimating KV cache. * Fix MLA n_kv safety and use ceiling division for hybrid path Addresses Gemini Code Assist review findings: 1. MLA path now uses n_kv_mla = n_kv_heads or 1 (not n_heads). This prevents a 128x overestimate for DeepSeek-V3 if head_count_kv is absent from the GGUF (n_heads=128 would have been used instead). 2. Hybrid path now uses ceiling division for attention layer count. This prevents undercounting by 1 when n_layers is not perfectly divisible by full_attention_interval. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-01 06:04:12 -07:00
Daniel Han	3f3757b143	Fix forward compatibility with transformers 5.x (#4752 ) * Fix forward compatibility with transformers 5.x Tested on transformers 4.57.6, 5.3.0, and 5.4.0. All changes are no-ops on transformers 4.x. 1. Skip exec-based config patching for transformers >= 5.0 Config classes in v5 use @strict, @auto_docstring, and interval() which break exec(inspect.getsource(...)). Those configs already use rope_parameters (the v5 replacement for rope_scaling). 2. Slice position_ids to last token in fast_forward_inference Transformers 5.x generate() accumulates position_ids as [batch, full_seq_len] across decode steps instead of [batch, 1]. cos[position_ids] then produces the wrong shape for rotary embeddings. Fixed in llama, qwen3, falcon_h1, gemma2, cohere, granite. No-op on 4.x since position_ids is already [batch, 1]. 3. Handle @strict config kwargs for sequence classification num_labels, max_position_embeddings, id2label etc. are set on the config object and passed via config= instead of as kwargs. AutoModelForSequenceClassification routing added to FastModel loader. 4. Exclude modernbert from flex_attention ModernBERT with flex_attention hits CUDA illegal memory access in create_block_mask. Falls back to eager attention safely. 5. Propagate token_type_ids and mm_token_type_ids through GRPO VLM path Gemma3 Vision requires token_type_ids during training. Qwen3VL requires mm_token_type_ids for M-RoPE. Extract from inputs in compute_loss, pass to grpo_accumulated_loss, and extend mm_token_type_ids for completion tokens in _generate_and_score_completions. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add try/except safety net around config exec for pre-release transformers versions * Pop config-level kwargs in seqclass path and use except Exception --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-01 06:04:03 -07:00
Roland Tannous	41df4ec437	feat(studio): strip org prefix in model search to surface unsloth variants (#4749 ) When searching for a specific publisher model (e.g. `openai/gpt-oss-20b`), the unsloth search used the full `openai/gpt-oss-20b` string with `author=unsloth`, which returned zero results because no unsloth model contains the publisher prefix in its name. Users never discovered unsloth variants. This PR strips the org prefix for publisher-qualified queries so unsloth variants surface, then pins the original publisher model after a small batch of unsloth results. Plain queries (no slash) and unsloth-prefixed queries are unchanged. - Strict regex (`/^([^/\s]+)\/([^/\s]+)$/`) only triggers on valid `owner/repo` identifiers; incomplete typeahead, multi-slash, and URL-like inputs are rejected - Queries for `unsloth/...` models (case-insensitive) keep the full 20-result prefetch and secondary sort - Pinned model lookup fires in parallel with the unsloth prefetch - Canonical-name dedup prevents duplicates when HF normalizes casing - Publisher detection extracted into a single `useMemo` block	2026-04-01 04:37:28 -07:00
Leo Borcherding	63ad6dbd6d	Fix OOM model styling in Studio model selectors (#4738 ) Replace strikethrough + opacity-50 OOM styling with gray text and red pill badge across all Studio model selectors (chat, training, onboarding). - Use gray-500/gray-400 for OOM model names (better contrast than strikethrough) - Red pill badge for OOM indicator with light/dark mode support - Scope GGUF gray override to quant name only so downloaded/recommended labels keep colors - Add !important on TIGHT/OOM badges to resist ComboboxItem hover overrides	2026-04-01 02:06:49 -07:00
Daniel Han	6c0826a9e4	Fix Windows local GGUF model loading crash (#4730 ) * Fix Windows "Non-relative patterns are unsupported" when loading local GGUF models When a user loads a GGUF model from a local Windows path (e.g. C:\Users\danie\.lmstudio\models\unsloth\functiongemma-270m-it-GGUF), the model identifier contains backslashes and a drive letter. Both load_model_defaults() and _has_specific_yaml() constructed a YAML filename from the full absolute path and passed it to Path.rglob(), which rejects non-relative patterns on Windows. Fixed by detecting Windows-style paths (drive letters, UNC paths, backslashes) in addition to Unix-style paths, and using only the directory basename for the YAML filename lookup when the identifier is a local filesystem path. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor: reuse is_local_path helper, fix case-sensitive suffix lookup - Replace inline local-path detection in model_config.py and inference_config.py with the existing is_local_path() from utils.paths, which already handles Unix, Windows drive-letter, UNC, and backslash paths - Fix case-sensitive suffix lookup in load_model_defaults(): the _REVERSE_MODEL_MAPPING is lowercase-keyed, so suffix comparisons must use .lower() to match paths like /path/to/Spark-TTS-0.5B/LLM * Fix WSL path parsing and _has_specific_yaml suffix lookup - Use normalize_path() before Path() operations so backslash Windows paths (e.g. C:\Users\...\model) are correctly split on POSIX/WSL hosts where pathlib treats backslashes as literal characters - Add suffix-based (2-component and 1-component) lookup to _has_specific_yaml() so it matches the same resolution rules as load_model_defaults(), fixing wrong inference params for local suffix-mapped models like Spark-TTS-0.5B/LLM --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-01 01:38:09 -07:00
Datta Nimmaturi	256c6e4884	Refactor flex attn to prefer flash if possible (#4734 ) Replaces prefer_flex_attn_if_supported (which only returned flex_attention or None) with determine_attention_implementation, a centralized hierarchy: FA2 > Flex > SDPA > Eager. Changes: - New determine_attention_implementation function in _utils.py with clear priority chain - _set_attn_impl helper to stamp config consistently - _FLEX_EXCLUDED_MODELS / _FLEX_EXCLUDED_PREFIXES for model-specific exclusions - Gemma3N explicit eager override in vision.py (timm vision towers) - Preserved sdpa fallback for unmapped/remote-code vision configs - Config re-stamped to eager when supports_sdpa guard fires Co-authored-by: Datta Nimmaturi <Datta0@users.noreply.github.com>	2026-04-01 00:30:21 -07:00
Wasim Yousef Said	d63cc57e1e	fix: clear tool status badge immediately after tool execution (#4733 ) * fix: clear tool status badge immediately after tool execution The tool status timer badge (Searching 1s, 2s...) persisted after tool calls finished because the status clear event was only sent at the start of the next generation iteration, not after tool execution completed. Backend: yield status clear after all tools finish in the agentic loop iteration, before continue starts the next generation pass. Frontend: debounce badge visibility by 300ms so sub-second tool calls dont flash the badge. * Fix debounce regression for consecutive tool calls Only apply the 300ms show-delay when transitioning from idle to tool-active. When switching between consecutive tools in the same turn (e.g. web_search -> python), keep the badge visible immediately so it does not flicker or disappear during multi-tool runs. * Delay wasActiveRef reset to bridge inter-iteration tool gaps The backend emits a status-clear event between tool iterations, which was resetting wasActiveRef immediately and causing the next tool to be re-debounced (300ms hidden gap between consecutive tools in the same turn). Now the ref reset is delayed by 500ms so a follow-up tool within the same agentic turn shows the badge immediately, while a genuinely new turn still gets the debounce. * Use thread lifecycle to track tool-run boundaries Replace the 500ms wall-clock timeout with the actual thread.isRunning state to determine when wasActiveRef should reset. This properly handles all cases: - Consecutive tools within the same run stay visible without flicker - The badge hides only when the thread run actually ends - New turns always get a fresh 300ms debounce on the first tool - No heuristic timeout that can misfire on slow or fast inference * Consolidate wasActiveRef reset into single effect Removes the separate isThreadRunning effect to avoid a race where the ref resets before the tool-status effect reads it (when isThreadRunning flips to false before setToolStatus(null) from the adapter's finally block). Now wasActiveRef resets only when both toolStatus is null AND the thread run has ended, eliminating any flicker on the last tool of a run. * Simplify debounce: use visible state instead of ref tracking Drop wasActiveRef entirely and use the visible state as the debounce gate. When the badge is not yet on screen, debounce for 300ms before showing. When already visible from a prior tool, keep showing immediately. This correctly handles all cases: - All fast tools (<300ms) are suppressed, not just the first - Consecutive tools after the badge is shown stay visible - Badge persists across inter-iteration clears while thread runs - New turns get a fresh debounce after visible resets --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-01 00:28:38 -07:00
Wasim Yousef Said	4fb9778988	feat: move folder management into model selector dropdown (#4731 ) * refactor: move folder management from sidebar into model selector * Fix folder management: restore LoRA picker sync, error handling, caching - Restore onFoldersChange callback to keep LoRA adapter picker in sync when scan folders are added/removed (fixes regression from sidebar move) - Thread onFoldersChange through ModelSelector -> HubModelPicker prop chain - Add module-level _scanFoldersCache to prevent folder list flash on re-open - Surface error toast on folder removal failure instead of silently ignoring - Guard handleAddFolder against concurrent double-submit via folderLoading - Clear folderInput on Escape key dismiss to prevent stale input on re-open - Add refreshLocalModelsList and refreshScanFolders to useEffect dep array * Fix compare-mode folder sync, Escape key propagation, cancel toggle state - Wire onFoldersChange through CompareContent/GeneralCompareContent so compare-mode selectors also refresh local models after folder changes - Add e.stopPropagation() on Escape key in folder input to prevent Radix Popover from closing the entire model selector dropdown - Add e.preventDefault() on Enter key to prevent form submission - Clear folderInput and folderError when cancel toggle hides the input, matching the Escape key behavior for consistency * Fix folder mutation state ordering and touch accessibility - Use optimistic updates for add/remove so the folder list reflects changes immediately instead of waiting on a second listScanFolders round-trip that could silently fail. - Move refreshScanFolders out of the finally block in handleRemoveFolder so it runs after the cache update, not after onFoldersChange. - Make the remove button visible on touch/mobile devices and reachable via keyboard focus (opacity-100 on small screens, focus-visible). - Add aria-label to the remove button for screen readers. * Deduplicate optimistic folder add to match backend behavior The backend returns the existing ScanFolderInfo row when adding a path that is already registered. The optimistic update was blindly appending the returned row, producing duplicate entries and React key warnings. Now checks by id before appending. * Add aria-label to folder toggle button and strengthen dedup check - Add aria-label to the +/cancel icon button for screen readers. - Extend optimistic dedup check to also compare by path, not just id, to handle edge cases where the cache is stale. --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-31 23:15:50 -07:00
Lee Jackson	2cac3e8e4d	studio: Polish Windows installer/setup logs (#4736 ) * style(windows): clean installer/setup log output and remove seeded credential banner * Keep startup credential hint without exposing plaintext password Print the username and .bootstrap_password file path on first-run admin creation instead of the raw password. Headless / Docker / SSH operators still get a startup-time hint for initial sign-in, and the plaintext credential no longer appears in terminal output or logs. --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-03-31 23:12:42 -07:00
Daniel Han	6984e118eb	Bump installer minimum version pin to 2026.3.18 (#4729 ) Matches the latest PyPI release.	2026-03-31 07:00:51 -07:00
Daniel Han	cfeb8c3245	Versioning	2026-03-31 06:51:34 -07:00
Wasim Yousef Said	1e8875584d	feat: custom scan folders for GGUF model discovery (#4723 ) * feat: add scan_folders table and CRUD functions to studio_db * feat: add scan folders API endpoints and integrate into model scan * feat: add scan folders API client and update source types * feat: add custom source to model filters and selector * feat: add Model Folders section to chat settings sidebar * style: fix biome formatting in ModelFoldersSection * fix: address review findings for custom scan folders empty string bypass, concurrent delete crash guard, Windows case normalization, response_model on endpoints, logging, deduplicated filter/map, module level cache for custom folder models, consistent source labels, handleRemove error surfacing, per folder scan cap * fix: show custom folders section regardless of chatOnly mode * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor: extract shared refreshLocalModelsList in pickers * Harden custom scan folder validation and scanning - Validate path exists, is a directory, and is readable before persisting - Apply per-folder model cap during traversal instead of after (avoids scanning millions of inodes in large directories) - Wrap per-folder scan in try/except so one unreadable folder does not break the entire /api/models/local endpoint for all callers - Normalize case on Windows before storing so C:\Models and c:\models dedup correctly - Extend macOS denylist to cover /private/etc and /private/tmp (realpath resolves /etc -> /private/etc, bypassing the original denylist) - Add /boot and /run to Linux denylist * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Improve scan robustness and preserve Windows path casing - Preserve original Windows path casing in DB instead of lowercasing (normcase used only for dedup comparison, not storage) - Catch PermissionError per child directory so one unreadable subdirectory does not skip the entire custom folder scan - Wrap list_scan_folders() DB call in try/except so a DB issue does not break the entire /api/models/local endpoint * fix: scan custom folders for both flat and HF cache layouts * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Windows case-insensitive path dedup with COLLATE NOCASE Use COLLATE NOCASE on the scan_folders.path column so that the UNIQUE constraint correctly deduplicates C:\Models and c:\models on Windows without lowercasing the stored path. Also use COLLATE NOCASE in the pre-insert lookup query on Windows to catch existing rows with different casing. * Restore early-exit limit in _scan_models_dir for custom folders Keep the limit parameter so _scan_models_dir stops iterating once enough models are found, avoiding unbounded traversal of large directories. The post-traversal slice is still applied after combining with _scan_hf_cache results. * feat: scan custom folders with LM Studio layout too * Fix custom folder models being hidden by dedup Custom folder entries were appended after HF cache and models_dir entries. The dedup loop kept the first occurrence of each model id, so custom models with the same id as an existing HF cache entry were silently dropped -- they never appeared in the "Custom Folders" UI section. Use a separate dedup key for custom-source entries so they always survive deduplication. This way a model can appear under both "Downloaded" (from HF cache) and "Custom Folders" (from the user-registered directory) at the same time. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden LM Studio scan and fix COLLATE NOCASE on Linux - Add per-child and per-publisher OSError handling in _scan_lmstudio_dir so one unreadable subdirectory does not discard the entire custom folder's results - Only apply COLLATE NOCASE on the scan_folders schema on Windows where paths are case-insensitive; keep default BINARY collation on Linux and macOS where /Models and /models are distinct directories * Use COLLATE NOCASE in post-IntegrityError fallback SELECT on Windows The fallback SELECT after an IntegrityError race now uses the same case-insensitive collation as the pre-insert check, so a concurrent writer that stored the path with different casing does not cause a false "Folder was concurrently removed" error. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-31 06:40:31 -07:00
Daniel Han	9a8b622306	Studio: simplify tool-call dedup and replace html2text with builtin converter (#4722 ) * Simplify tool-call dedup: drop hashlib, inline helpers The duplicate tool-call detector only compares calls within a single request from the same JSON parser, so dict key order is guaranteed identical for identical calls (Python 3.7+ insertion-ordered dicts). - Replace hashlib.md5(json.dumps(...)) with name + str(args) - Inline _tool_call_key, _is_duplicate_call, _record_tool_call since each was a one-liner used once - Remove unused hashlib import * Remove tool_calling_benchmark_results.md from repo * Replace html2text with builtin HTML-to-Markdown converter Drop the external html2text (GPL-3.0) dependency and its regex fallback. Add _html_to_md.py (~190 lines, stdlib only) using html.parser.HTMLParser that handles headings, links, bold/italic, lists, tables, blockquotes, code blocks, and entity decoding. Strips script/style/head tags entirely. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use json.dumps(sort_keys=True) for tool-call dedup key str(dict) is sensitive to insertion order, so semantically identical calls with different key ordering would bypass duplicate detection. Switch to json.dumps with sort_keys=True for a canonical representation. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert dedup key to str(arguments) json.dumps(sort_keys=True) is unnecessary here -- the arguments dict always comes from the same JSON parser within a single request, so key insertion order is deterministic (Python 3.7+). str() is faster and sufficient for consecutive-call dedup. * Address review comments on _html_to_md.py - Remove "hr" from _BLOCK_TAGS so the dedicated hr handler is reachable - Prefix all newlines with ">" inside blockquotes (multi-line support) - Emit full ![alt](url) for images instead of alt text only - Replace newlines with spaces inside table cells - Track header cells per-row (_row_has_th) instead of last-cell-only - Strip trailing tabs in addition to spaces in cleanup regex * Fix blockquote rendering, truncated-HTML buffer flush, and dedup key canonicalization _html_to_md.py: - Rewrite blockquote handling with stack-based buffer approach so nested blockquotes, pre blocks inside blockquotes, and multi-paragraph quotes all render correctly with proper "> " prefix on every line. - Add flush_pending() to recover content from truncated HTML where closing tags are missing (common when _fetch_page_text caps the download size). Flushes open <a>, <td>, <pre>, and blockquote buffers. - Skip <img> tags to match prior html2text ignore_images=True behavior and avoid data-URI amplification consuming the output budget. - Collapse all whitespace (including newlines) in non-pre content per standard HTML whitespace rules: \s+ -> single space. - Escape pipe characters in table cell content to prevent column breakage. - Emit separator row after the first row for tables without <th> headers. - Guard against IndexError on _ol_counter for orphan <li> elements. - Normalize CRLF line endings before parsing. llama_cpp.py: - Restore canonical dedup key with json.dumps(sort_keys=True) so that semantically identical tool calls with different JSON key order are correctly detected as duplicates. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix table optional end tags, inline code whitespace, and link text normalization _html_to_md.py: - Extract _finish_cell() and _finish_row() helpers to handle HTML tables that omit optional </td>, </th>, or </tr> end tags. This is valid HTML and common on real web pages -- previously the parser would silently drop earlier cells and entire rows. - Call _finish_cell()/_finish_row() from handle_starttag for <tr>/<td>/<th>, handle_endtag for </tr>/<td>/<th>/<table>, and flush_pending() so all three paths (normal close, implicit close, truncated HTML) use the same row-finalization logic including header separator emission. - Add _in_inline_code flag so handle_data() preserves literal whitespace inside <code> spans instead of collapsing it. Source like <code>pip install unsloth</code> now correctly renders as `pip install unsloth` rather than `pip install unsloth`. - Extract _finish_link() helper that normalizes accumulated link text with \s+ -> single space before building the Markdown link. Prevents block- level content inside <a> tags (e.g. <a><div>one</div><div>two</div></a>) from producing multiline [one\n\ntwo](href) link labels. - Empty blockquotes now produce no output instead of a stray ">". - Remove unused _bq_depth field (all routing uses _bq_stack). - Flush open cells and rows in handle_endtag("table") for robustness. * Support <ol start=N>, <dl>/<dt>/<dd>, and preserve code block whitespace _html_to_md.py: - Honor <ol start="N"> attribute so ordered lists preserve their original numbering instead of always restarting from 1. Important for docs/tutorials that continue numbering across sections. - Add dl, dt, dd to _BLOCK_TAGS so definition lists (common on MDN, Python docs, Django docs) produce separated text instead of concatenated blobs. - Rewrite _cleanup() to be fence-aware: content inside fenced code blocks is now preserved verbatim (intentional blank lines in <pre> content are no longer collapsed). Outside code blocks, blank runs are limited to one and trailing whitespace is stripped. - Fix _prefix_blockquote() to strip trailing whitespace before collapsing blank lines, preventing the "\n\n \n\n" pattern from sneaking through. * Suppress whitespace-only text nodes between table structural elements Indented HTML tables (nearly all real-world pages) produce whitespace text nodes between <table>, <tr>, </tr> etc. that land in the output as leading spaces before table rows, breaking Markdown table alignment. Skip whitespace-only text nodes when inside a table but not inside a cell, so indentation from source HTML does not leak into the output. * Revert dedup key to str(arguments) with explanatory comment json.dumps(sort_keys=True) is unnecessary overhead here: arguments always comes from json.loads on model output within a single request, so dict insertion order is deterministic in Python 3.7+. A repeated call from the model produces the same JSON, which parses to the same dict repr. str() avoids re-serialization on every tool call. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-31 06:15:18 -07:00
Lee Jackson	9451bb1bac	fix(export): preserve selected/manual model on enter and blur (#4726 )	2026-03-31 17:05:55 +04:00
Daniel Han	e159b93b97	studio: improve GGUF tool calling accuracy and reliability (#4700 ) * studio: improve GGUF tool calling accuracy and reliability - Add URL fetching to web_search tool so models can read full page content instead of only getting search snippets. Uses html2text for clean markdown conversion with regex fallback. - Inject current date and behavioral guidance (URL fetch workflow, no repeated queries, use code for data processing) into the tool-use system prompt. - Append error recovery nudge to tool results that indicate failure, helping small models avoid looping on the same broken call. - Strip leaked <tool_call> XML from assistant messages in conversation history and from the outgoing SSE stream. - Raise default max tool iterations from 10 to 25 across backend, model schema, and frontend defaults. - Increase _MAX_PAGE_CHARS from 4k to 16k so fetched pages contain enough content for the model to extract useful information. - Add "IMPORTANT: These are only short snippets" hint to search results so models know to fetch full pages when needed. Tested with Qwen3.5-4B-GGUF (UD-Q4_K_XL), 10 runs before/after: - XML leaks in responses: 10/10 -> 0/10 - URL fetch usage: 0 -> 4/10 runs - Runs producing actual correct answers: 0/10 -> 2/10 - Average tool calls per query: 5.5 -> 3.8 (more efficient) - Average response time: 12.3s -> 9.8s * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add tool calling benchmark results across model sizes and quants Tested 16 configurations (4 models x 2 quants x 2 KV cache types) with 10 runs each on NVIDIA B200. Best config: 27B UD-Q4_K_XL + bf16 KV -- 6/10 runs found all 4 correct songs, 0 XML leaks, 131s average response time. * Add duplicate tool-call detection and final-answer synthesis When the model repeats the exact same tool call (same name + arguments) twice in a row, skip execution and return a redirect message telling it to try a different approach. This prevents the 8x-repeated-query loops observed on 27B and 35B models. When the tool iteration cap (25) is reached, inject a "provide your final answer now" message before the final streaming pass. This lets the model synthesize a useful answer from everything it gathered instead of being silently cut off. Tested on Qwen3.5-27B UD-Q4_K_XL (10 runs): - Repeated query runs: 4/10 -> 2/10 - Cap hits: 1/10 -> 0/10 - All 4/4 accuracy: 5/10 -> 7/10 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix CodeQL alert: handle whitespace in script/style closing tags The regex fallback for HTML stripping did not match closing tags with whitespace before the angle bracket (e.g. </script >). Use \s* before > in both script and style patterns. * Address reviewer findings: SSRF, timeout crash, XML regex, dedup - SSRF: resolve hostname via getaddrinfo and reject private, loopback, link-local, multicast, and reserved addresses before fetching - Timeout: handle timeout=None (unlimited mode) in URL fetch path by defaulting to 60s instead of crashing on min(None, 60) - Download cap: read at most max_chars4+1 bytes instead of the full response body before truncating - XML regex: match both <tool_call> and <function=...> markup in the history/stream cleanup (inference.py) - CodeQL: use [^>] in closing script/style tags to handle any whitespace or attributes before > - Dedup: track whether each tool call failed so retries after transient errors are allowed; only block consecutive identical calls that both succeeded - Final-answer synthesis: guard on max_tool_iterations > 0 so callers who disable tools do not get a false "used all calls" turn * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix redirect SSRF, SSE streaming regression, dedup off-by-one - SSRF redirect bypass: disable auto-redirect in urllib, manually follow up to 5 hops with host validation at each step. Prevents public URLs from redirecting to loopback/private targets. - SSE streaming: track prev_text on the raw cumulative and strip XML from the delta only, so completed tool_call tags do not cause the cumulative to shrink and drop trailing real text. - Dedup off-by-one: check the immediately previous call (window=1) instead of requiring 2 matching history entries, so the second identical successful call is blocked rather than the third. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix redirect HTTPError handling and tighten error prefixes - Redirect fix: urllib raises HTTPError (not a normal response) when the redirect handler returns None. Catch HTTPError for 3xx codes and extract the Location header from the exception object. - Error prefixes: remove overly broad "No " prefix that matched "No results found." (a valid empty-search outcome, not an error). Replace with specific prefixes like "Blocked:", "No query provided", "Failed to resolve". This ensures empty search results are correctly classified as non-errors for duplicate-call tracking. * Fix SSE cross-chunk XML leaks, cleanup review findings - SSE streaming: sanitize the full cumulative text before diffing against the previous sanitized snapshot, so XML tags that span chunk boundaries are stripped correctly. The previous delta-based approach leaked split tags. - DRAINING fallback: use _strip_tool_markup() helper instead of a manual regex that only handled <tool_call> but not <function=...>. - Move hashlib import, _TOOL_XML_RE compile, and datetime import to module level per style guide. - Remove unused _hit_tool_cap variable. * Fix DNS rebinding, charset detection, HTTPError handling, dedup double-record - DNS rebinding: resolve hostname once via getaddrinfo, pin the returned IP, rewrite the URL to connect to the pinned IP with a Host header. Each redirect hop re-resolves and re-validates. Closes the TOCTOU window between validation and connection. - Charset: use resp.headers.get_content_charset() instead of hardcoding utf-8, so pages with other encodings decode correctly. - HTTPError: return descriptive "HTTP {code} {reason}" instead of re-raising into a generic "Search failed" message. - Dedup: remove redundant _record_tool_call in the duplicate branch; the single call at the end of the loop handles all cases. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-31 03:06:44 -07:00
Lee Jackson	815619d972	feat: add update instructions card with OS toggle and mobile expand flow (#4721 ) Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-03-31 14:05:05 +04:00
Roland Tannous	cc5e4fbf17	fix: auto-retry stalled HF downloads with HF_HUB_DISABLE_XET=1 (#4712 ) * fix: auto-retry stalled HF downloads with HF_HUB_DISABLE_XET=1 The heartbeat thread now monitors the HF Hub cache directory for file-size growth. If no bytes are written for 3 minutes, it sends a "stall" message to the orchestrator, which kills the subprocess and retries with HF_HUB_DISABLE_XET=1 (falling back from Xet to standard HTTPS). If the retry also stalls, it errors out with a clear message. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: include transport type (xet/https) in heartbeat and stall log messages Makes it clear in backend logs whether the download is using xet or https transport, and which transport stalled — helpful for debugging. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: monitor HF Hub .tmp dir to avoid false stall detections huggingface_hub downloads into .tmp/ before atomically moving to blobs/. Without monitoring .tmp, a large shard actively downloading for several minutes would show zero blob growth and trigger a false stall. * fix: scope HF cache size check to specific model being loaded Instead of scanning every models--/blobs directory (O(N) with cached models), only check the specific model's blobs dir plus the global .tmp dir. Much faster on systems with many cached models. Fix false stall detection on cached/local models and cleanup issues - Only fire stall if download activity was observed (cache size changed at least once). Previously, any model load taking >180s would trigger a false stall, even for already-cached or local models where no download is happening. - Return -1 from _get_hf_cache_size on exception to distinguish "unable to measure" from "genuinely zero bytes". Skip stall logic when measurement fails. - Add _shutdown_subprocess before raising on terminal stall path to prevent leaking a stuck subprocess. - Detect pre-existing HF_HUB_DISABLE_XET=1 in the parent environment to avoid a redundant retry cycle when Xet is already disabled. - Remove global .tmp directory scanning (not used by modern huggingface_hub; in-progress downloads use .incomplete files in blobs/ which are already captured by iterdir). - Add f.is_file() guard in cache size calculation. - Replace em dashes with ASCII dashes for Windows terminal compat. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden stall detection edge cases - Guard -1 to valid value transition: when initial _get_hf_cache_size returns -1 (error) and later recovers to a real value, do not count that as download activity. Only set saw_download_activity when the previous measurement was also valid (>= 0). - Move os import to top-level in orchestrator.py instead of inline import os as _os. - Fix misleading comment about post-download protection. * Use .incomplete files to detect active downloads for stall detection Replace the saw_download_activity heuristic with direct .incomplete file detection. huggingface_hub creates .incomplete files in blobs/ during active downloads and removes them on completion. This gives a reliable signal for whether a download is actually in progress. Benefits: - Cached models: no .incomplete files -> no stall fired even after 180s - Post-download init (quantization, GPU loading): .incomplete files gone so stall timer resets, long init phases are not killed - Pre-download hangs (XET handshake stall): .incomplete files are created at download start, so zero-byte stalls are now detected - No more false positives from -1 to valid measurement transitions The _get_hf_download_state function now returns (total_bytes, has_incomplete) tuple or None on error, replacing _get_hf_cache_size. Add debug logging to download state exception handler Log the exception at debug level when _get_hf_download_state fails, instead of silently returning None. Helps with troubleshooting cache measurement issues. * Watch both adapter and base model repos for LoRA stall detection When loading a LoRA adapter, the actual download bottleneck is often the base model, not the adapter itself. Update the heartbeat to watch both mc.identifier and mc.base_model cache directories so stall detection works for LoRA loads where the base model stalls on Xet. Also update _get_hf_download_state to accept multiple model names and skip names without "/" (local paths) since those do not have HF cache directories. * Fix model name filtering for official HF models without org prefix Models like gpt2 and bert-base-uncased do not contain a slash but are still valid HF Hub models with cache directories. Replace the "/" check with a proper local-path detection that checks for path separators and path-like prefixes instead. Also fix the base_model watch list to not require "/" in the base model name, so official models used as LoRA bases are also monitored. * Fix local path detection that broke all org/model names on Linux The os.path.sep check matched "/" in HF model IDs like "org/model" on Linux, causing the stall detector to skip ALL standard HF models. Replace with a check that only skips names starting with "/" (absolute paths), "." (relative paths), "~" (home-relative), or containing "\" (Windows paths). HF model IDs like "org/model" or "gpt2" pass through correctly on all platforms. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-31 03:00:46 -07:00
Daniel Han	e164c930ff	fix(studio): correct default weight_decay and learning rate (#4695 ) * fix(studio): change default weight_decay from 0.01 to 0.001 The default weight decay across Studio was 0.01 but should be 0.001. Updated the default in all backend fallbacks, the Pydantic model, the frontend config, and every YAML preset/model-default config. * fix(studio): auto-set learning rate based on training method Default LR should be 2e-4 for LoRA/QLoRA and 2e-5 for full fine-tuning. Frontend: track whether the user has manually edited the LR field via a _learningRateManuallySet flag (same pattern as trainOnCompletions). When switching training method and the user has not touched the LR, auto-set it to the appropriate default. Reset the flag on model load. Backend: change trainer.py start_training default from 5e-5 to 2e-4, update default.yaml fallback from 5e-5 to 2e-4, and fix full_finetune.yaml from 0.0002 (2e-4) to 2e-5. * refactor(studio): centralize weight_decay and learning rate defaults Create studio/backend/core/training/constants.py as the single source of truth for DEFAULT_WEIGHT_DECAY (0.001), DEFAULT_LEARNING_RATE (2e-4), DEFAULT_LEARNING_RATE_FULL (2e-5), and DEFAULT_LEARNING_RATE_STR ("2e-4"). All backend modules (trainer.py, training.py, worker.py, models/training.py) now import from constants.py instead of hardcoding values. On the frontend, add LR_DEFAULT_LORA and LR_DEFAULT_FULL to config/training.ts and use them in the store instead of magic numbers. A comment cross-references the backend constants file. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix model-specific LR override, persist migration, and flag resets - Preserve model-specific learning rates from YAML configs when the async autoSelectTrainingMethod callback fires (fixes Qwen2.5-1.5B getting 2e-4 instead of its configured 1e-5, etc.) - Bump zustand persist version to 9 with migration so existing users with weightDecay=0.01 get updated to 0.001 - Clear _learningRateManuallySet in reset() and applyConfigPatch() for consistency with trainOnCompletions flag behavior - Add DEFAULT_LEARNING_RATE_FULL_STR to constants.py * Refine applyConfigPatch to only clear LR flag when patch includes LR Only reset _learningRateManuallySet when the applied config patch actually provides a learningRate value. This prevents unrelated config patches from silently disarming the manual-edit guard, which would cause a subsequent setTrainingMethod call to overwrite the user's custom LR. * Preserve model-specific LR when switching between qlora and lora Only auto-switch the learning rate when the training category changes (adapter <-> full fine-tuning). Switching between qlora and lora keeps the current LR since both methods share the same learning rate range. This preserves curated per-model defaults (e.g. 1e-5 for Qwen2.5-1.5B-Instruct) when the user toggles between adapter methods. * Remove constants.py, use YAML configs as the source of truth The YAML config files (model-specific + default.yaml) are the intended config layer for training defaults. The Python backend fallbacks now use inline values that match the YAML configs, rather than importing from a separate constants module. This keeps the config architecture simple: YAML files are the single source of truth, and the inline Python fallbacks are just safety nets that mirror them. * fix(studio): preserve model-specific LR when switching training method Stash YAML-provided learning rate and use it to restore the correct value when switching between adapter and full fine-tune modes. - qlora <-> lora no longer overwrites the model's LR - full -> adapter restores the YAML LR instead of a hardcoded constant - selecting a model while on full fine-tune uses LR_DEFAULT_FULL instead of applying the YAML adapter LR --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>	2026-03-31 13:50:25 +04:00
Wasim Yousef Said	28aaf849bf	fix: throttle and cache HuggingFace modelInfo API calls (#4696 ) * fix: throttle and cache HuggingFace modelInfo API calls The frontend was firing 40 to 60 parallel modelInfo requests on app startup with zero caching or deduplication, causing HF rate limits. Adds a caching layer (hf-cache.ts) with TTL cache, inflight request dedup, and a concurrency limiter. Also debounces the HF token input so typing a token no longer re-fires all model searches per keystroke. * fix: only fetch VRAM info for visible models in chat selector * Fix cache key isolation and VRAM badge stability for PR #4696 - Cache key now includes a token fingerprint (last 8 chars) instead of a boolean, so switching HF tokens gives separate cache entries instead of serving stale data from the previous token. - Extract token via credentials?.accessToken to match the @huggingface/hub API surface. - Extend CachedResult type with safetensors/tags fields so downstream consumers no longer need unsafe `as` casts. - Merge VRAM param map with previous state on scroll instead of replacing it, preventing a brief flash of missing VRAM badges when new models become visible. * Fix VRAM badges missing for search-filtered recommended models When a user types a search query, filteredRecommendedIds can include models beyond the currently visible page. These models had no VRAM data because useRecommendedModelVram only received visibleRecommendedIds. Now we pass the union of visibleRecommendedIds and filteredRecommendedIds to the VRAM hook, so recommended models surfaced by search also show their VRAM badges. The hf-cache layer ensures no duplicate network calls. * Apply biome formatting to hf-cache.ts and use-recommended-model-vram.ts Auto-formatted with biome check --write to match project lint rules: - Block statements for single-line if/for bodies - Import sorting (type imports first) - Consistent line wrapping * Fix extractToken to handle both current and deprecated HF auth forms The @huggingface/hub CredentialsParams type is a union: - { accessToken: "hf_..." } (current preferred form) - { credentials: { accessToken: "..." } } (deprecated form) Previously only checked params.credentials?.accessToken (deprecated path). Now checks both forms so the cache key is correct regardless of which calling convention is used. * Simplify extractToken, map merge, and set construction - extractToken: remove type assertions, use direct property access with truthiness checks for cleaner union type handling - VRAM map merge: use Map spread constructor instead of manual for loop - idsForVram: use Set spread construction for more concise dedup * Add rationale comment for MAX_CONCURRENT=3 in hf-cache.ts * Skip GGUF repos in VRAM fetch and pre-populate cache from listModels Two changes to reduce redundant HF API calls: 1. Filter GGUF repos from idsForVram before passing to useRecommendedModelVram. GGUF repos have no safetensors metadata and the render layer already shows a static "GGUF" badge -- fetching modelInfo for them is a no-op that wastes a semaphore slot and a network round-trip. 2. Add primeCacheFromListing() to hf-cache.ts and call it from listModels yield sites in mergedModelIterator and priorityThenListingIterator. listModels returns the same type (ModelEntry & Pick<ApiModelInfo, T>) as modelInfo with the same additionalFields, so the data is interchangeable. Priming only writes if the key is not already fresh, so it never overwrites a recent modelInfo response. This means models discovered via listModels are already in cache when useRecommendedModelVram later calls cachedModelInfo for them, eliminating duplicate network requests. * Fix cache key mismatch: prime both token and anonymous slots The VRAM hook calls cachedModelInfo without credentials (anonymous key), but listModels results were primed only under the authenticated key. For authenticated users the priming was a no-op -- cache miss every time. Fix: prime both the token-specific slot and the anonymous slot when an access token is present. Public model metadata (safetensors, tags) is identical regardless of auth so this is safe. Also add a defensive guard in primeCacheFromListing for empty name. * Auto-prime anonymous cache slot from authenticated modelInfo fetches When cachedModelInfo is called with a token, the result was only stored under the token-specific key (e.g. model::abc12345). The VRAM hook calls cachedModelInfo without credentials and reads the anonymous slot (model::anon), causing a cache miss and duplicate fetch for every priority model. Now cachedModelInfo also writes to the anonymous slot on success when a token is present. Public model metadata (safetensors, tags) is identical regardless of auth, so this is safe and eliminates ~10 duplicate API calls on first page load. * Guard anonymous cache priming against gated/private models Only prime the anonymous cache slot for non-gated, non-private models. Previously, authenticated modelInfo responses and listing results were unconditionally copied into the anonymous slot, which could briefly expose gated/private model metadata after clearing the HF token. Now checks result.gated and result.private before writing the anon slot. Public unsloth/ models (the common case) still benefit from the optimization; gated models like meta-llama/* require a fresh fetch per auth context. * Extract primeFromListing helper to deduplicate cache priming logic The cache priming pattern (prime token slot + conditionally prime anon slot for non-gated models) was duplicated in three places. Extracted into a single primeFromListing() function for maintainability. * Export CachedResult type, add isStale helper, simplify primeFromListing - Export CachedResult so consumers can use it directly instead of the indirect Parameters<typeof ...> pattern. - Extract isStale(key) helper to deduplicate the cache freshness check that was repeated in primeCacheFromListing, cachedModelInfo, and the anonymous-slot priming logic. - Simplify primeFromListing to use CachedResult directly for both the data parameter and the gated/private guard, eliminating the double cast. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-31 02:21:17 -07:00
Datta Nimmaturi	3b5a49776b	[studio] multi gpu: revert to balanced for inference. (#4698 ) * Revert to balanced for inference * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused for_inference parameter from get_device_map Since inference and training both use "balanced" now, the for_inference flag is dead code. Remove it from the function signature, the call site in inference.py, and simplify the tests accordingly. * Remove redundant TestDeviceMapForInference test class TestGpuAutoSelection already covers the same multi-gpu and single-gpu device_map assertions. The TestDeviceMapForInference class was left over from when for_inference had distinct behavior. * Remove redundant test_get_device_map_multi_gpu_uses_balanced Its assertions ([0,1] -> balanced, [0] -> sequential) are already covered by test_get_device_map_uses_explicit_gpu_selection. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-31 01:24:41 -07:00
Daniel Han	fe6609a624	fix(studio): open tour ReadMore links in new tab (#4694 ) * fix(studio): open tour ReadMore links in new tab The quick tour "Read more" links navigate away from Studio instead of opening in a separate tab. Add target="_blank" and rel="noopener noreferrer" to the ReadMore component so external doc links open in a new browser tab. * fix(studio): only open external ReadMore links in new tab Apply target="_blank" conditionally based on whether the href starts with "http", so internal links still navigate in the same tab. * Tighten external-link detection in ReadMore component Use regex /^https?:\/\// instead of startsWith("http") so the check requires the full protocol prefix and does not match non-URL strings that happen to begin with "http". * Hoist regex to module scope for ReadMore Move EXTERNAL_URL_RE to top-level constant to satisfy the biome useTopLevelRegex lint rule and avoid re-creating the RegExp on every render. --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-03-30 23:41:14 -07:00
Lee Jackson	308bb948d1	studio: prevent false multimodal warning during model loading (#4704 ) * studio: gate multimodal incompatibility warning on settled model capabilities * Also disable Start button during isCheckingVision fallback When getModelConfig fails and the fallback checkVisionModel is still in-flight, isLoadingModelDefaults clears before isCheckingVision does. Without also gating on isCheckingVision the Start button briefly re-enables with stale capability flags. Add isCheckingVision to the disabled condition and show "Loading model..." text while either flag is active. * Show correct error message for audio dataset incompatibility The incompatibility warning always said "switch to a vision model" even when the actual issue was an audio dataset on a non-audio model. Now shows an audio-specific message when the mismatch is audio. * Extract isLoadingModel constant for clarity Pull the combined model-loading condition into a single constant reused by the settled check, the disabled prop, and the button label. --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-03-30 23:11:20 -07:00
pre-commit-ci[bot]	66f250a614	[pre-commit.ci] pre-commit autoupdate (#4705 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.15.7 → v0.15.8](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.7...v0.15.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-30 21:58:16 -07:00
Roland Tannous	d6d3f59984	fix: replace hard timeout with inactivity timeout for model loading (#4707 ) The 180s wall-clock timeout would kill model loads on slow connections even when the download was actively progressing. Now the worker sends heartbeat status messages every 30s during loading, and the orchestrator resets its 300s deadline on each one — so it only times out when the subprocess goes truly silent.	2026-03-31 07:35:04 +04:00
Roland Tannous	7f353acfd4	fix: skip download progress polling for exported GGUF models (#4709 ) * fix: skip download progress polling for exported GGUF models * fix: revert isLocalGgufDir change — exported GGUFs are file paths, not dirs * fix: set isDownloaded true for all adapters in LoraModelPicker	2026-03-31 07:21:23 +04:00
Etherll	34272a796f	Fix/bun windows bin detection (#4703 ) * fix(studio): detect bun .exe shims in Windows binary check * Update setup.sh * add .bunx checking	2026-03-30 21:58:33 +04:00
Daniel Han	6d83ad9a28	fix(studio): avoid UnicodeEncodeError on Windows cp1252 consoles (#4699 ) * fix(studio): replace unicode emoji in print() to avoid cp1252 crash on Windows On Windows the default console encoding is cp1252 which cannot encode unicode emoji like U+2705 or U+26A0. bare print() calls with these characters cause a UnicodeEncodeError at runtime. - run.py: replace emoji with ASCII status prefixes [OK] and [WARNING] - format_conversion.py: remove duplicate print() that mirrors the logger.info() call on the next line, and drop the emoji from the log message since loggers handle encoding separately * fix(studio): apply same emoji/print cleanup to parallel VLM conversion path The parallel URL-based conversion logic has the same duplicate print() with emoji that was fixed in the sequential path. Remove the bare print() and drop the emoji from the logger.info() call. * Treat install_python_stack.py failure as fatal in setup.ps1 On Linux/Mac, setup.sh runs under set -euo pipefail so a non-zero exit from install_python_stack.py aborts the installer. On Windows, setup.ps1 had no exit code check -- if the Python script crashed (eg from the cp1252 UnicodeEncodeError), the installer silently continued past the dependency loop and reported success. Studio would then fail at launch with ModuleNotFoundError for structlog, fastapi, and other deps that were never installed. Capture $LASTEXITCODE and exit 1 if the dependency installer fails, matching the error handling pattern already used for PyTorch install. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 06:40:47 -07:00
Daniel Han	a0bca759f3	Fix editable install scanning 6,500+ node_modules dirs (#4697 ) * fix: scope packages.find to prevent node_modules namespace scanning The packages.find section had no include filter, so setuptools' find_namespace_packages discovered all directories as potential Python packages -- including the 6,557 directories inside studio/frontend/node_modules/ after the frontend build step. This caused the editable install overlay step to run 20,000+ glob operations across 6,619 "packages", which on fast NVMe takes ~5s but on slower disks can take 7+ minutes. Adding an explicit include filter scopes discovery to only the packages we actually ship (unsloth, unsloth_cli, studio, studio.backend), dropping from 6,619 to 58 discovered packages and the editable build time from 5.4s to 1.2s. Also removes the broken kernels/moe exclude (used "/" instead of "." notation so it never matched) and adds a node_modules exclude as a safety net. * fix: use precise node_modules exclude patterns Use ".node_modules" and ".node_modules." instead of ".node_modules*" to avoid accidentally excluding valid packages that might contain "node_modules" as a substring in their name.	2026-03-30 02:40:29 -07:00
Datta Nimmaturi	9311df2b29	[Studio] multi gpu finetuning/inference via "balanced_low0/sequential" device_map (#4602 ) * [WIP] balanced device map for studio * gpus as a request parameter * API for multi GPU stuff * return multi gpu util in new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use balanced_low0 instead of balanced * Use balanced_low0 instead of balanced * Fix device_map typo, UUID parsing crash, set() filter bug, and broken tests - balanced_low0 -> balanced_low_0 (transformers/accelerate rejects the old string) - get_parent_visible_gpu_ids() now handles UUID/MIG CUDA_VISIBLE_DEVICES gracefully instead of crashing on int() parse - _get_backend_visible_gpu_info() set() or None bug: empty set is falsy so CUDA_VISIBLE_DEVICES=-1 would disable filtering and report all GPUs - test_gpu_selection.py: add missing get_visible_gpu_utilization import and add required job_id arg to start_training() calls * Smart GPU determinism using estimates * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * disallow gpu selection for gguf for now * cleanup * Slightly larger baseline * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Treat empty list as auto * Verbose logging/debug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Cleanup and revert unnecessary deletions * Cleanup excessive logs and guard against disk/cpu offload * auth for visibility API. cleanup redundant imports. Adjust QLoRA estimate * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * support for non cuda gpus * Fix multi-GPU auto-selection memory accounting The multi_gpu_factor was applied uniformly to all GPUs including the first one, which unfairly penalizes single-GPU capacity when transitioning to multi-GPU. This created a discontinuity where a model that barely fits 1 GPU would suddenly require 2 GPUs because the first GPU's free memory was discounted by 20%. Now the first GPU keeps its full free memory, and only additional GPUs have an overhead factor (0.85) applied to account for inter-GPU communication and sharding overhead. This gives more accurate auto-selection and avoids unnecessary multi-GPU for models that comfortably fit on one device. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add sandbox tests for multi-GPU selection logic 24 tests covering model size estimation, memory requirements, automatic GPU selection, device map generation, GPU ID validation, and multi-GPU overhead accounting. All tests use mocks so they run without GPUs on Linux, macOS, and Windows. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix reviewer findings: 4bit inference estimate, fallback, GGUF gpu_ids, retry 1. 4-bit inference now uses reduced memory estimate (model_size/3 + buffer) instead of the FP16 1.3x multiplier. This prevents over-sharding quantized models across unnecessary GPUs. 2. When model size estimation fails, auto_select_gpu_ids now falls back to all visible GPUs instead of returning None (which could default to single-GPU loading for an unknown-size model). 3. GGUF inference route now treats gpu_ids=[] as auto-selection (same as None) instead of rejecting it as an unsupported explicit request. 4. Training retry path for "could not get source code" now preserves the gpu_ids parameter so the retry lands on the same GPUs. 5. Updated sandbox tests to cover the new 4-bit inference estimate branch. * Remove accidentally added unsloth-zoo submodule * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix UUID/MIG visibility and update test expectations 1. nvidia.py: When CUDA_VISIBLE_DEVICES uses UUID/MIG tokens, the visibility APIs now return "unresolved" with empty device lists instead of exposing all physical GPUs. This prevents the UI from showing GPUs that the backend process cannot actually use. 2. test_gpu_selection.py: Updated test expectations to match the new multi-GPU overhead accounting (first GPU at full capacity, 0.85x for additional GPUs) and 4-bit inference memory estimation formula. All 60 tests now pass. * Add CPU/disk offload guard to audio inference path The audio model loading branch returned before the common get_offloaded_device_map_entries() check, so audio models loaded with a multi-GPU device_map that spilled layers to CPU/disk would be accepted instead of rejected. Now audio loads also verify no modules are offloaded. * Improve VRAM requirement estimates * Replace balanced_low_0 with balanced * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine calculations for slightly easier nums * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adjust estimates * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use nums instead of obj to avoid seralisation error * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden nvidia-smi parsing and fix fallback GPU list 1. nvidia.py: Wrap int() casts for GPU index and memory in try/except so MIG slices, N/A values, or unexpected nvidia-smi output skip the unparseable row instead of aborting the entire GPU list. 2. nvidia.py: Handle GPU names containing commas by using the last field as memory instead of a fixed positional index. 3. hardware.py: fallback_all now uses gpu_candidates (GPUs with verified VRAM data) instead of raw devices list, which could include GPUs with null VRAM that were excluded from the ranking. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * consolidate raise_if_offload * Improve MoE support. Guard against nvidia-smi failures * Improve MoE support. Guard against nvidia-smi failures * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix shared-expert LoRA undercount, torch VRAM fallback, and apply_gpu_ids edge case 1. vram_estimation.py: compute_lora_params now includes shared experts (n_shared_experts) alongside routed experts when computing MoE LoRA adapter parameters. Previously only n_experts were counted, causing the estimator to undercount adapter, optimizer, and gradient memory for DeepSeek/GLM-style models with shared experts. 2. hardware.py: _torch_get_per_device_info now uses mem_get_info (which reports system-wide VRAM usage) instead of memory_allocated (which only reports this process's PyTorch allocations). This prevents auto-selection from treating a GPU as mostly free when another process is consuming VRAM. Falls back to memory_allocated when mem_get_info is unavailable. 3. hardware.py: apply_gpu_ids([]) now returns early instead of setting CUDA_VISIBLE_DEVICES="" which would disable CUDA entirely. Empty list inherits the parent visibility, same as None. 4. hardware.py: Upgraded fallback_all GPU selection log from debug to warning so operators are notified when the model likely will not fit in available VRAM. * Guard nvidia-smi subprocess calls against OSError and TimeoutExpired get_visible_gpu_utilization and get_backend_visible_gpu_info now catch OSError (nvidia-smi not found) and TimeoutExpired internally instead of relying on callers to wrap every invocation. Returns the standard available=False sentinel on failure so the torch-based fallback in hardware.py can take over. * Guard get_primary_gpu_utilization and reset GPU caches between tests 1. nvidia.py: get_primary_gpu_utilization now catches OSError and TimeoutExpired internally, matching the pattern already used in get_visible_gpu_utilization and get_backend_visible_gpu_info. All three nvidia-smi callers are now self-contained. 2. test_gpu_selection.py: Added _GpuCacheResetMixin that resets the module-level _physical_gpu_count and _visible_gpu_count caches in tearDown. Applied to all test classes that exercise GPU selection, device map, or visibility functions. This prevents stale cache values from leaking between tests and causing flaky results on machines with real GPUs. * Fix nvidia-smi fallback regression and physical GPU count validation 1. hardware.py: get_gpu_utilization, get_visible_gpu_utilization, and get_backend_visible_gpu_info now check result.get("available") before returning the nvidia-smi result. When nvidia-smi is unavailable or returns no data (e.g., containers without nvidia-smi, UUID/MIG masks), the functions fall through to the torch-based fallback instead of returning an empty result. This fixes a regression where the internal exception handling in nvidia.py prevented the caller's except block from triggering the fallback. 2. hardware.py: resolve_requested_gpu_ids now separates negative-ID validation from physical upper-bound validation. The physical count check is only enforced when it is plausibly a true physical count (i.e., higher than the largest parent-visible ID), since torch.cuda.device_count() under CUDA_VISIBLE_DEVICES returns the visible count, not the physical total. The parent-visible-set check remains authoritative in all cases. This prevents valid physical IDs like [2, 3] from being rejected as "out of range" when nvidia-smi is unavailable and CUDA_VISIBLE_DEVICES="2,3" makes torch report only 2 devices. * Fix UUID/MIG torch fallback to enumerate devices by ordinal When CUDA_VISIBLE_DEVICES uses UUID or MIG identifiers, get_parent_visible_gpu_ids() returns [] because the tokens are non-numeric. The torch fallback in get_visible_gpu_utilization() and get_backend_visible_gpu_info() previously passed that empty list to _torch_get_per_device_info(), getting nothing back. Now both functions detect the empty-list case and fall back to enumerating torch-visible ordinals (0..device_count-1) with index_kind="relative". This means the UI and auto-selection still see real device data in Kubernetes, MIG, and Slurm-style UUID environments where nvidia-smi output cannot be mapped to physical indices. Updated test_uuid_parent_visibility to verify the new torch fallback path returns available=True with relative ordinals. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add type hint for gpu_ids parameter in InferenceOrchestrator.load_model --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-30 02:33:15 -07:00
Michael Han	fbfcbc69f2	Update README.md	2026-03-30 01:34:36 -07:00
Michael Han	d2b8ed8def	Update install.md	2026-03-30 01:33:33 -07:00
Lee Jackson	2f0a5baa87	fix(studio): preserve GGUF context max after apply and refresh (#4691 ) Fixes #4670 Separates the GGUF context slider ceiling from the currently active context length so lowering context via Chat Settings no longer locks the slider max to the reduced value. - Backend: adds `max_context_length` to GGUF load/status responses, computed from the largest VRAM/KV-fit cap across all usable GPU subsets - Frontend: stores `ggufMaxContextLength` and uses it for Context Length slider/input bounds; hydrates from both `/api/inference/load` and `/api/inference/status` - Defaults UI ceiling to native context for CPU-only and fallback paths - Seeds `effective_ctx` and `max_available_ctx` before GPU probing to prevent `UnboundLocalError` on probe failure - Property fallback uses native `_context_length`, not effective `context_length`	2026-03-30 01:33:16 -07:00
Lee Jackson	5557e1fd27	studio: unify Windows installer/setup logging style, verbosity controls, and startup messaging (#4651 ) * refactor(studio): unify setup terminal output style and add verbose setup mode * studio(windows): align setup.ps1 banner/steps with setup.sh (ANSI, verbose) * studio(setup): revert nvcc path reordering to match main * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio(setup): restore fail-fast llama.cpp setup flow * studio(banner): use IPv6 loopback URL when binding :: or ::1 * Fix IPv6 URL bracketing, try_quiet stderr, _step label clamp - Bracket IPv6 display_host in external_url to produce clickable URLs - Redirect try_quiet failure log to stderr instead of stdout - Clamp _step label to column width to prevent negative padding * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add sandbox integration tests for PR #4494 UX fixes Simulation harness (tests/simulate_pr4494.py) creates an isolated uv venv, copies the real source files into it, and runs subprocess tests for all three fixes with visual before/after demos and edge cases. Standalone bash test (tests/test_try_quiet.sh) validates try_quiet stderr redirect across 8 scenarios including broken-version contrast. 39 integration tests total (14 IPv6 + 15 try_quiet + 10 _step), all existing 75 unit tests still pass. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Truncate step() labels in setup.sh to match PS1 and Python The %-15s printf format pads short labels but does not truncate long ones. Change to %-15.15s so labels wider than 15 chars are clipped, matching the PowerShell .Substring(0,15) and Python label[:15] logic. * Remove sandbox integration tests from PR These test files are not part of the styling fix and should not ship with this PR. * Show error output on failure instead of suppressing it - install_python_stack.py: restore _red for patch_package_file warnings (was downgraded to _dim) - setup.ps1: capture winget output and show on failure for CUDA, Node, Python, and OpenSSL installs (was piped to Out-Null) - setup.ps1: always show git pull failure warning, not just in verbose mode * Show winget error output for Git and CMake installs on failure Same capture-and-print-on-failure pattern already used for Node, Python, CUDA, and OpenSSL winget installs. * fix: preserve stderr for _run_quiet error messages in setup.sh The step() helper writes to stdout, but _run_quiet's error header was originally sent to stderr (>&2). Without the redirect, callers that separate stdout/stderr would miss the failure headline while still seeing the log body on stderr. Add >&2 to both step calls inside _run_quiet to match main's behavior. * feat: add --verbose flag to setup and update commands Wire UNSLOTH_VERBOSE=1 through _run_setup_script() so that 'unsloth studio update --verbose' (and the deprecated 'setup') passes the flag to setup.sh / setup.ps1 / install_python_stack.py. * fix(studio): honor verbose logging and keep llama.cpp failures non-blocking * fix(studio): switch installer to 'studio update' and normalize Windows setup logs * chore(studio): refine localhost tip and remove skip-base setup nois * fix(studio): align Windows setup logs with Linux style and improve startup tips * fix(studio): align Windows setup logs with Linux style * refactor(windows-installer): align install/setup logs with Linux style and silence auto-launch output * refactor(windows): align installer/setup output with Linux style and reduce default verbosity * refactor(windows): match install.ps1 output style/colors to setup and quiet default logs * fix(studio-banner): update personal-computer localhost tip * fix(setup.sh): restore verbose llama.cpp build output while keeping default quiet mode * fix(install.sh): align installer logging with setup style and restore POSIX-safe color output * fix(install.sh): preserve installer reliability and launch visibility Export verbose mode for child setup processes, harden install command handling under set -e, and keep first-run studio launch non-silent so users can always see URL and port fallback output. * fix(windows installer): keep exit semantics and degrade status accurate Use quiet command redirection that preserves native exit codes, keep startup output visible on first launch, and report limited install status when llama.cpp is unavailable. * fix(setup.sh): improve log clarity and enforce GGUF degraded signaling Restore clean default setup output, add verbose-only diagnostics, fail fast on Colab dependency install errors, and return non-zero when GGUF prerequisites or llama.cpp artifacts are unavailable. * fix(installer): harden bash preflight and PowerShell GPU checks Fail fast when bash is unavailable before invoking setup.sh, and replace remaining nvidia-smi pipeline checks with stream redirection patterns that preserve reliable native exit-code handling. * fix(windows): keep verbose output visible while preserving exit codes Ensure PowerShell wrapper helpers in install/update stream native command output to host without returning it as function output, so npm logs no longer corrupt exit-code checks in verbose mode. * fix(windows): avoid sticky UNSLOTH_VERBOSE and gate studio update verbosity * Fix degraded llama.cpp exit code, PS verbose stderr, banner URLs, npm verbose - setup.sh: Do not exit non-zero when llama.cpp is unavailable; the footer already reports the limitation, and install.sh runs under set -e so a non-zero exit aborts the entire install including PATH/shortcuts/launch. - setup.ps1: Remove $? check in Invoke-SetupCommand verbose path; PS 5.1 sets $? = $false when native commands write to stderr even with exit 0. Merge stderr into stdout with 2>&1 and rely solely on $LASTEXITCODE. - startup_banner.py: Show the actual bound address when Studio is bound to a non-loopback interface instead of always showing 127.0.0.1/localhost. - setup.sh: Use run_quiet_no_exit instead of run_quiet_no_exit_always for npm install steps so --verbose correctly surfaces npm output. * Fix install.ps1 verbose stderr, propagate UNSLOTH_VERBOSE, fix git clone verbose - install.ps1: Apply same Invoke-InstallCommand fix as setup.ps1 -- merge stderr into stdout with 2>&1 and drop the $? check that misclassifies successful native commands on PS 5.1. - install.ps1 + setup.ps1: Export UNSLOTH_VERBOSE=1 to the process env when --verbose is passed so child processes like install_python_stack.py also run in verbose mode. - setup.sh: Use run_quiet_no_exit for git clone llama.cpp so --verbose correctly surfaces clone diagnostics during source-build fallback. * Surface prebuilt llama.cpp output in verbose mode, remove dead code, fix banner - setup.sh: Use tee in verbose mode for prebuilt llama.cpp installer so users can see download/validation progress while still capturing the log for structured error reporting on failure. - setup.ps1: Same fix for Windows -- use Tee-Object in verbose mode. - setup.sh: Remove run_quiet_no_exit_always() which has no remaining callers. - startup_banner.py: Avoid printing the same URL twice when Studio is bound to a specific non-loopback address that matches the display host. * Fix run_install_cmd exit code after failed if-statement The previous pattern 'if "$@"; then return 0; fi; _rc=$?' always captured $? = 0 because $? reflects the if-statement result, not the command's exit code. Switch to '"$@" && return 0; _rc=$?' which preserves the actual command exit code on failure. Applies to both verbose and quiet branches. * Fix _run_quiet exit code, double uv install, missing --local flag - setup.sh: Fix _run_quiet verbose path that always captured exit code 0 due to $? resetting after if-then-fi with no else. Switch to the same '"$@" && return 0; exit_code=$?' pattern used in install.sh. - setup.sh: Consolidate the two uv install branches (verbose + quiet) into a single attempt with conditional output. Previously, when verbose mode was on and the install failed, a second silent attempt was made. - install.ps1: Pass --local flag to 'unsloth studio update' when $StudioLocalInstall is true. Without this, studio.py's update() command overwrites STUDIO_LOCAL_INSTALL to "0", which could cause issues if setup.ps1 or install_python_stack.py later checks that variable. * Revert SKIP_STUDIO_BASE change for --no-torch, restore install banners - Revert SKIP_STUDIO_BASE from 0 to 1 for --no-torch. install.sh already installs unsloth+unsloth-zoo and no-torch-runtime.txt before calling setup.sh, so letting install_python_stack.py redo it was redundant and slowed down --no-torch installs for no benefit. - Restore the "Unsloth Studio installed!" success banner and "starting Unsloth Studio..." launch message so users get clear install completion feedback before the server starts. * Make llama.cpp build failure a hard error with proper cleanup - setup.sh: Restore exit 1 when _LLAMA_CPP_DEGRADED is true. GGUF inference requires a working llama.cpp build, so this should be a hard failure, not a silent degradation. - install.sh: Catch setup.sh's non-zero exit with '\|\| _SETUP_EXIT=$?' instead of letting set -e abort immediately. This ensures PATH setup, symlinks, and shortcuts still get created so the user can fix the build deps and retry with 'unsloth studio update'. After post-install steps, propagate the failure with a clear error message. * Revert install.ps1 to 'studio setup' to preserve SKIP_STUDIO_BASE 'studio update' pops SKIP_STUDIO_BASE from the environment, which defeats the fast-path version check added in PR #4667. When called from install.ps1 (which already installed packages), SKIP_STUDIO_BASE=1 must survive into setup.ps1 so it skips the redundant PyPI check and package reinstallation. 'studio setup' does not modify env vars. * Remove deprecation message from 'studio setup' command install.ps1 uses 'studio setup' (not 'studio update') to preserve SKIP_STUDIO_BASE. The deprecation message was confusing during first install since the user never typed the command. * Fix stale env vars, scope degraded exit, generic error message for PR #4651 - install.ps1: Always set STUDIO_LOCAL_INSTALL and clear STUDIO_LOCAL_REPO when not using --local, to prevent stale values from a previous --local run in the same PowerShell session. Fix log messages to say 'setup' not 'update' since we call 'studio setup'. - setup.sh: Only exit non-zero for degraded llama.cpp when called from the installer (SKIP_STUDIO_BASE=1). Direct 'unsloth studio update' keeps degraded installs successful since Studio is still usable for non-GGUF workflows and the footer already reports the limitation. - install.sh: Make the setup failure error message generic instead of GGUF-specific, so unrelated failures (npm, Python deps) do not show misleading cmake/git recovery advice. * Show captured output on failure in quiet mode for PR #4651 Both Invoke-InstallCommand (install.ps1) and Invoke-SetupCommand (setup.ps1) now capture command output in quiet mode and display it in red when the command fails. This matches the behavior of run_install_cmd in install.sh where failure output is surfaced even in quiet mode, making cross-platform error debugging consistent. * Match degraded llama.cpp exit on Windows, fix --local recovery hint for PR #4651 - setup.ps1: Exit non-zero for degraded llama.cpp when called from install.ps1 (SKIP_STUDIO_BASE=1), matching setup.sh behavior. Direct 'unsloth studio update' keeps degraded installs successful. - install.sh: Show 'unsloth studio update --local' in the recovery message when the install was run with --local, so users retry with the correct flag instead of losing local checkout context. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-30 00:53:23 -07:00
Roland Tannous	5bbfabb151	fix: [Studio] setup.ps1 update-flow for windows (#4667 ) * fix: add PyPI version check to setup.ps1 for fast update path Port the update-flow logic from setup.sh to setup.ps1 so that `unsloth studio update` on Windows skips Python dependency reinstall when the installed version already matches PyPI latest. * fix: clear SKIP_STUDIO_BASE in update command install.ps1 sets SKIP_STUDIO_BASE=1 which persists in the PowerShell session. If the user runs `unsloth studio update` in the same terminal, the env var causes the version check to be skipped. Clear it explicitly in the update command. * fix: harden version check and clear stale env vars in update flow - Normalize $InstalledVer with Out-String + Trim() to avoid array/whitespace comparison issues in PowerShell 5.1 (python output can be captured as string[] instead of scalar string) - Move Fast-Install --upgrade pip inside if (-not $SkipPythonDeps) so the fast path avoids unnecessary network round-trips - Clear STUDIO_LOCAL_REPO when --local is not passed to prevent a previous --local session from leaking into a plain update --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-29 21:14:36 -07:00
Roland Tannous	a6c1f893fc	Fix blank page on Windows due to broken .js MIME type (#4674 ) * Fix blank page on Windows due to broken .js MIME type in registry * Update studio/backend/main.py adding defensive suggestion by gemini where we make the mimetypes specific to windows platforms Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-28 22:26:49 +04:00
Lee Jackson	5d2dca801c	studio: add HF/local model selection UI for GGUF export (#4365 ) * feat(studio): add HF/local model selection UI for GGUF export * fix(studio):fix selector ring clipping * fix(studio): export page trust_remote_code control and label styling * fix(studio): accept hf_token in load_checkpoint orchestrator method The route was passing hf_token to load_checkpoint() but the method didn't accept it, causing a TypeError on every /api/export/load-checkpoint request. * fix(studio): clear HF model selection when input is edited Previously selectedSourceModel was only cleared when the input became empty, so editing to a different repo ID after selecting a model would silently keep the old selection. --------- Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>	2026-03-28 22:18:25 +04:00
Daniel Han	362ad3606b	Update _utils.py	2026-03-27 08:42:00 -07:00
Daniel Han	82d14b44d3	fix: preserve Windows drive-letter paths on native Windows (#4665 ) normalize_path() unconditionally converted Windows paths like C:\Users\... to WSL format /mnt/c/Users/..., which breaks path resolution on native Windows. This caused LM Studio GGUF models to fail detection (detect_gguf_model returned None for the invalid path), falling through to the Unsloth import path which requires a GPU. Now only performs the /mnt/ mapping when actually running under WSL. On native Windows, drive letters are preserved and backslashes are normalized to forward slashes.	2026-03-27 08:19:41 -07:00
Daniel Han	9477e7c43f	Bump minimum unsloth version to 2026.3.16 in install scripts (#4663 ) Update install.sh and install.ps1 to require unsloth>=2026.3.16, matching the latest PyPI release.	2026-03-27 07:47:08 -07:00
Daniel Han	df3b18c579	Update _utils.py	2026-03-27 07:24:39 -07:00
Daniel Han	844a816ed0	Update pyproject.toml	2026-03-27 07:14:03 -07:00
Roland Tannous	562e54fc6e	Fix HF cache default and show LM Studio models in chat/inference (#4653 ) * fix: default HF cache to standard platform path instead of legacy Unsloth cache * feat: show LM Studio and local models in chat Fine-tuned tab * feat: show LM Studio models in Hub models tab * fix: fetch local models after auth refresh completes * Revert "fix: fetch local models after auth refresh completes" This reverts commit `cfd61f0ac7`. * fix: increase llama-server health check timeout to 600s for large models * feat: expandable GGUF variant picker for LM Studio local models * fix: show GGUF variant label for locally loaded LM Studio models * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: show publisher name in LM Studio model labels * fix: set model_id for loose GGUF files in LM Studio publisher dirs * fix: show publisher prefix in Fine-tuned tab LM Studio models * fix: only use model_id for lmstudio source models * fix: only show LM Studio models in Hub tab on Mac/chat-only mode * fix: respect XDG_CACHE_HOME, handle Windows paths in isLocalPath, refresh LM Studio on remount - _setup_cache_env now reads XDG_CACHE_HOME (falls back to ~/.cache) instead of hard-coding ~/.cache/huggingface. This follows the standard HF cache resolution chain and respects distro/container overrides. - isLocalPath in GgufVariantExpander uses a regex that covers Windows drive letters (C:\, D:/), UNC paths (\\server\share), relative paths (./, ../), and tilde (~/) -- not just startsWith("/"). - HubModelPicker.useEffect now calls listLocalModels() before the alreadyCached early-return gate so LM Studio models are always refreshed on remount. Also seeds useState from _lmStudioCache for instant display on re-open. * fix: add comment explaining isLocalPath regex for Windows/cross-platform paths * fix: prioritize unsloth publisher in LM Studio model list * fix: scope unsloth-first sort to LM Studio models on all platforms * fix: add missing _lmStudioCache module-level declaration * fix: prioritize unsloth publisher before timestamp sort in LM Studio group --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-27 06:59:27 -07:00
Wasim Yousef Said	73969a1e4f	fix: disable OCR in pymupdf4llm PDF extraction (#4659 )	2026-03-27 06:53:33 -07:00
Daniel Han	c4e34c88c8	Fall back to parsing model name when HF API has no param count (#4656 ) Some models like unsloth/Qwen3-0.6B have no safetensors metadata on Hugging Face, so the training model selector showed no parameter size badge. The chat model picker already had extractParamLabel() as a fallback that parses sizes like "0.6B" from the model name. Add the same fallback to the training model selector and the onboarding model selection step. Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-03-27 05:57:49 -07:00
Wasim Yousef Said	4ab7fb1f7b	fix: replace navbar shutdown text button with icon-only button (#4655 )	2026-03-27 05:44:59 -07:00
Daniel Han	e36f72c685	Detect always-on reasoning models and show Think button as locked-on (#4654 ) * Detect always-on reasoning models and show Think button as locked-on Models with hardcoded <think>/<think> tags or reasoning_content in their chat template (e.g. distilled reasoning models) always produce thinking output regardless of any toggle. Previously these models were not detected as reasoning-capable at all, so the Think button was grayed out even though the model was actively reasoning. Backend: - Detect <think>/<think> and reasoning_content in GGUF chat templates as a fallback when enable_thinking is not present - Add reasoning_always_on flag to LoadResponse and InferenceStatusResponse - Pass the flag through all GGUF load and status response paths Frontend: - Add reasoningAlwaysOn to the chat runtime store and API types - When reasoning_always_on is true, show the Think button as lit (active) but not clickable, with a tooltip explaining the model always uses thinking - Force reasoningEnabled=true when the model always reasons * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use pointer-events-none instead of disabled for always-on Think button The HTML disabled attribute was not fully blocking clicks on the Think button for always-on reasoning models. Switch to pointer-events-none CSS class which prevents all mouse interaction at the CSS level. * Use a static span instead of disabled button for always-on Think Replace the button element with a plain span when reasoning is always on. This makes it physically impossible to toggle since there is no clickable element at all, avoiding any CSS or disabled-attribute edge cases. * Simplify always-on Think button to stay lit and remain toggleable Keep the Think button as a normal toggleable button but ensure it shows as lit when reasoning_always_on is true. The model always reasons regardless of the toggle state so there is no need to block interaction. --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-27 05:42:26 -07:00
Daniel Han	eacaf6827c	fix: no-torch install deps without pulling torch transitively (#4650 ) Use --no-deps for ALL packages (unsloth, unsloth-zoo, and runtime deps) since the current PyPI metadata for unsloth still declares torch as a hard dependency. Runtime deps (typer, pydantic, safetensors, transformers, etc.) are installed from no-torch-runtime.txt with --no-deps to prevent transitive torch resolution from accelerate, peft, trl, and sentence-transformers. no-torch-runtime.txt now includes unsloth's own direct deps (typer, pydantic, pyyaml, nest-asyncio) since --no-deps skips those too. install.sh installs no-torch-runtime.txt directly (via helper function _find_no_torch_runtime). install.ps1 does the same via Find-NoTorchRuntimeFile. SKIP_STUDIO_BASE stays at 1 to avoid setup.sh fast-path issues. install_python_stack.py NO_TORCH branch does the same for unsloth studio update, using package_name instead of hardcoded "unsloth".	2026-03-27 05:19:26 -07:00
Daniel Han	a7c43bc46d	Fix inference failing for transformers 5.x models (trust_remote_code) (#4652 ) * Fix inference failing for transformers 5.x models (trust_remote_code) The training worker in core/training/worker.py auto-enables trust_remote_code for unsloth/* models that need transformers 5.x (e.g. NVIDIA-Nemotron-3-Nano-4B). The inference worker did not have the same logic, so loading these models for chat would fail with "No config file found" while training worked fine. Add the same auto-detection to the inference worker so trust_remote_code is set automatically when needed. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-27 04:51:30 -07:00
Wasim Yousef Said	887b8cb1c2	fix: add auth + UX improvements to shutdown button (#4642 ) * Studio shutdown button * fix: add auth to shutdown endpoint and improve UX - Add JWT auth (Depends(get_current_subject)) to POST /api/shutdown - Use authFetch instead of bare fetch in shutdown dialog - Only show beforeunload prompt when training is running - Remove Ctrl+W/Cmd+W interception (browsers don't allow it) - Store shutdown task on app.state to prevent GC --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-27 04:36:08 -07:00
Daniel Han	1fb9fe3304	Fix orphan server cleanup killing user's own llama-server (#4622 ) * fix: only kill studio-managed llama-server processes, not user's own servers _kill_orphaned_servers() checked for "unsloth" anywhere in the process cmdline, which matched the user's own llama-server when serving models from unsloth/ HF repos (the model path in -m contains "unsloth"). This caused the user's server to get SIGKILLed on Studio startup, destroying their prompt cache and forcing full model re-loads. Narrow the check to only match processes whose binary path lives under ~/.unsloth/llama.cpp/ (the Studio install directory). * Address review: cover env var paths, move Path.home() inside try block - Also check LLAMA_SERVER_PATH and UNSLOTH_LLAMA_CPP_PATH so orphans from custom install locations are still cleaned up. - Move studio_dirs construction inside the try/except so a Path.home() failure (containers without HOME) does not crash the constructor. * Address reviewer feedback: proper path ancestry, /proc/pid/exe, legacy paths Changes based on 10-reviewer consensus: - Use Path.is_relative_to() instead of substring matching to prevent false positives on sibling paths like ~/.unsloth/llama.cpp-backup/. - Use /proc/<pid>/exe (symlink to real binary) instead of parsing the first cmdline token, which breaks on paths with spaces. Falls back to cmdline parsing on non-Linux or when /proc is unavailable. - Add legacy in-tree install paths (project_root/llama.cpp/ and project_root/bin/) so orphans from older setup.sh are still cleaned. - Treat LLAMA_SERVER_PATH as an exact binary match rather than widening it to its parent directory, which could match unrelated servers in shared locations like /usr/local/bin/. - Keep everything inside the try/except so Path.home() failures in containers do not crash the constructor. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: add Linux platform guard and log cleanup errors - Guard pgrep fallback with sys.platform check so it does not crash on Windows/macOS when psutil is unavailable. - Replace silent except-pass with logger.warning for observability. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-27 04:33:04 -07:00
Daniel Han	b1c3a1e857	fix: replace [huggingfacenotorch] with no-torch-runtime.txt requirements (#4649 ) The [huggingfacenotorch] extras only exist in pyproject.toml but are NOT published on PyPI, so uv pip install "unsloth[huggingfacenotorch]" fails on fresh installs from the registry. Fix: add studio/backend/requirements/no-torch-runtime.txt with the runtime deps (safetensors, transformers, datasets, accelerate, etc.) that mirror [huggingfacenotorch] from pyproject.toml. In no-torch mode: 1. install.sh/ps1 install unsloth + unsloth-zoo with --no-deps 2. SKIP_STUDIO_BASE=0 so install_python_stack.py's NO_TORCH branch runs 3. install_python_stack.py installs no-torch-runtime.txt	2026-03-27 03:58:51 -07:00
Daniel Han	9d68621614	Streaming tool detection: guard late tool_calls, filter incomplete fragments (#4648 ) * Guard against late tool_calls after visible content, filter incomplete fragments 1. If visible content was already emitted (_last_emitted is non-empty) when delta.tool_calls arrives, ignore the tool_calls instead of reclassifying the turn as a tool call. llama-server never interleaves content and tool_calls (they are mutually exclusive), but this guard is defensive for other OpenAI-compatible backends. 2. Filter out incomplete structured tool_calls fragments before execution. Entries with empty function.name (from truncation by max_tokens, disconnect, or interruption) are skipped instead of being passed to execute_tool(). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-27 03:40:14 -07:00
Wasim Yousef Said	5c7c3883cb	feat: update app icons to rounded logo (#4640 ) Replace favicon.png, unsloth-gem.png, and unsloth.ico with rounded.png. Update install.sh to source rounded.png for Linux/macOS shortcuts.	2026-03-27 03:18:20 -07:00
Daniel Han	79d9bf0c9a	Fix GGUF GPU fit check to account for KV cache VRAM (#4623 ) * fix: account for KV cache in GGUF GPU fit check and auto-cap context length The GPU fit check only compared GGUF file size against free VRAM, ignoring KV cache memory. Models with large native context lengths (e.g. Qwen3.5-9B at 262k) would pass the fit check since the GGUF is only 5.6 GB, but the KV cache at 262k context needs ~40 GB at f16. This caused llama-server to silently fall back to CPU inference. Changes: - Parse block_count, head_count_kv, head_count, and embedding_length from GGUF metadata alongside context_length - Add KV cache VRAM estimation based on architecture params and the selected cache quantization type (f16, q8_0, q4_0, etc.) - Auto-reduce context length to the maximum that fits in available GPU VRAM when the native context would exceed it - Include estimated KV cache size in the _select_gpus total so the fit decision reflects actual runtime memory, not just file size For the reported scenario (Qwen3.5-9B on RTX 3090 with 22415 MiB free), context is auto-reduced from 262144 to ~63k with f16 KV cache, keeping the model fully on GPU. With q4_0 KV cache quantization the context can reach ~226k. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: resolve 6 bugs in KV cache VRAM estimation and add test harness - Fix q8_0 BPE constant: 1.125 -> 34/32 (1.0625) to match llama.cpp block size - Fix _fit_context_to_vram returning min_ctx when weights exceed budget (should return requested_ctx unchanged, let --fit handle it) - Fix binary search inflating below-2048 requests (lo=min_ctx=2048 > hi) - Fix n_ctx=0 regressing to 4096 when metadata unavailable (preserve sentinel) - Fix multi-GPU auto-cap using single-GPU budget instead of aggregate - Fix _context_length being overwritten with capped effective value Add tests/test_gguf_kv_vram.py: 43 cross-platform pytest tests covering pure logic, integration (monkeypatched load_model), and real GGUF parsing. Runs in an isolated uv venv with only pytest -- no GPU/torch/structlog needed. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: complete _effective_context_length lifecycle - Initialize _effective_context_length in __init__ (prevents AttributeError) - Reset _effective_context_length in unload_model (prevents stale values) - Update context_length property to return effective (capped) value for the UI/API, falling back to native _context_length if not set * fix: multi-GPU selection tries smallest subset first The previous approach summed all GPUs' memory to cap context, then selected GPUs afterward. This was overly optimistic for heterogeneous setups (e.g., 48 GiB + 4 GiB): the context was inflated by the tiny GPU's contribution, then both GPUs were dragged in. Now we try GPU subsets from smallest (1 GPU) to largest, capping context for each. We pick the smallest subset where the model+KV fits. This prefers single-GPU when possible (simpler, no tensor split overhead) and avoids pulling in GPUs that barely help. Add tests: test_multi_gpu_prefers_fewer_gpus, test_multi_gpu_heterogeneous. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: prefer fewer GPUs over higher context in GPU selection Multi-GPU inference is slower due to tensor-split overhead, so we should prefer fewer GPUs with reduced context over more GPUs with full context. Now the loop stops at the first GPU subset where the model fits, rather than continuing to find subsets that allow higher context. Only if the model can't fit on N GPUs do we try N+1. This preserves the original behavior: use multi-GPU only when the model doesn't fit on a single GPU. * fix: make _kill_orphaned_servers cross-platform via psutil Replace pgrep + os.kill(SIGKILL) with psutil.process_iter() and proc.kill(), which work on Linux, macOS, and Windows. Build an allowlist of install roots matching _find_llama_server_binary so only studio-managed servers are killed. * fix: skip KV estimation loop when effective context is unknown When n_ctx=0 and GGUF metadata lacks context_length, effective_ctx stays 0. _estimate_kv_cache_bytes(0) returns 0, so a GPU could be selected with no KV headroom. Guard the loop with effective_ctx > 0 to fall back to file-size-only GPU selection in this case. * chore: temporarily remove test harness (will add back separately) * refactor: deduplicate UINT32/UINT64 handling in GGUF parser Replace duplicated if/elif chains for vtype 4 and 10 with a single block using setattr. No behavioral change. * fix: honor explicit n_ctx by using multi-GPU before capping When the user explicitly sets n_ctx, try to fit the full requested context using _select_gpus (which adds GPUs as needed). Only cap context if it doesn't fit on any GPU combination. When n_ctx=0 (auto/native context), keep the existing behavior: prefer fewer GPUs with reduced context, since multi-GPU is slower and the user didn't ask for a specific context length. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: context_length property returns native value for frontend slider The frontend uses context_length as the slider max. Returning the capped effective value prevented users from requesting higher context on reload (e.g., after switching to q4_0 KV cache). Revert to returning the native GGUF metadata value -- the backend auto-caps at load time regardless. * revert: context_length returns effective (capped) value The UI slider should show what the server is actually running at, not the theoretical maximum. Revert to returning the effective context length. * fix: raise minimum context floor from 2048 to 4096 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-27 03:14:42 -07:00
Daniel Han	e318da21a7	Fix ~1.2s TTFT penalty when tools are enabled in Studio (#4639 ) * Fix ~1.2s TTFT penalty when tools are enabled in Studio When users enable web search, Python execution, or terminal tools, every message gets a ~1.2s delay before any text appears -- even when the model does not call any tool. This happens because generate_chat_completion_with_tools() does a non-streaming detection pass (stream: False) first, waits for the complete response, then checks for tool calls. For the ~90% of messages that don't trigger a tool call, this blocking wait is entirely wasted. Root cause: the detection pass payload uses stream: False, forcing llama-server to generate the entire response before returning any tokens. Fix: replace the non-streaming detection pass with a streaming pass (stream: True) and a speculative buffer state machine that detects tool signals in the first 1-2 SSE chunks: - BUFFERING: accumulate content tokens, check first chars for tool signal prefixes (<tool_call>, <function=) - STREAMING: no tool detected, yield tokens to caller immediately - DRAINING: tool signal found, silently accumulate rest of stream Three detection paths: 1. Structured delta.tool_calls -- detected instantly, transition to DRAINING, accumulate fragments, assemble at stream end. 2. XML tool markup in content -- buffer holds up to 32 chars checking for <tool_call> or <function= prefix, then transitions to DRAINING. 3. No tool signal -- first non-whitespace, non-XML char triggers immediate transition to STREAMING (fast path, ~90% of requests). Safety net: after any stream ends in STREAMING state, check accumulated content for XML tool signals. Handles rare "content before tool call" edge case. Additional supporting changes: - Add headers parameter to _stream_with_retry for auth forwarding - Share _strip_tool_markup and regex patterns between the detection pass and the final streaming pass (removes duplication) - Remove the iteration==0 non-streaming content shortcut (no longer needed since all iterations stream directly) - Keep the final streaming pass as fallback for max_tool_iterations exhaustion Benchmarked on Qwen3.5-4B Q4_K_XL: - No tools: TTFT ~112ms (unchanged) - Tools enabled, no call: TTFT ~112ms (was ~1207ms) - Decode TPS: 226 (unchanged in all cases) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add unit tests for streaming tool detection state machine 16 tests covering every tool call parsing path: - Plain text (no tool call) streaming - Structured delta.tool_calls detection and fragment assembly - XML <tool_call>JSON</tool_call> detection via buffer - XML <function=name> tag detection via buffer - Whitespace before tool XML - Safety net (content then tool XML) - Parallel multi-tool calls - Reasoning token bypass (thinking models) - Reasoning then tool call - Empty response handling - Buffer prefix timeout (HTML not mistaken for tool) - Non-XML first char instant streaming - False positive rejection (<tool_tip> vs <tool_call>) - Arguments split across multiple chunks - auto_heal_tool_calls=False respects the flag - Metrics accumulation across tool iterations * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix reasoning-only BUFFERING, pre-tool content emission, and code duplication Addresses review feedback on the streaming tool detection: 1. Reasoning tokens are no longer yielded during BUFFERING/DRAINING states. The consumer in routes/inference.py tracks prev_text across tool iterations without resetting it, so yielding reasoning during a detection pass that resolves to a tool call would corrupt the delta computation for subsequent iterations. Reasoning is now silently accumulated during detection (matching the old non-streaming behavior) and flushed together with content when the buffer resolves to STREAMING. 2. Handle reasoning-only responses in the BUFFERING resolver. When a thinking model emits only reasoning_content with no content tokens, the stream ends while still in BUFFERING state. The resolver now detects this case and yields reasoning as plain text (without <think> wrapper), matching the final streaming pass behavior for models like Qwen3 in always-think mode. 3. Replace duplicated re.sub calls for stripping tool markup with the existing _strip_tool_markup(content_text, final=True) helper, removing ~40 lines of redundant regex code. 4. Update tests: adjust reasoning test expectations to match the new behavior (reasoning batched with content, not streamed individually during BUFFERING). Add test_reasoning_only_no_content for the reasoning-only edge case. 17/17 tests pass. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address remaining reviewer findings: late tool_call IDs and XML speculation 1. Late-arriving tool_calls.id: when a provider sends the real ID on a later delta chunk (after the initial one with index and function name), the accumulator now updates the ID instead of keeping the synthetic "call_{idx}" placeholder. (P2, 2/10 reviewers) 2. XML speculation respects auto_heal_tool_calls: when auto_heal is explicitly disabled, _TOOL_XML_SIGNALS is empty so the BUFFERING state never speculatively holds content for XML prefix detection. Content starting with literal "<tool_call>" or "<function=" text flows straight through without delay. (P2, 1/10 reviewers) Skipped: finish_reason="tool_calls" without delta.tool_calls fallback (P1, 1/10 reviewers). llama-server always sends delta.tool_calls fragments in streaming mode. A non-streaming fallback for this edge case would add complexity for a scenario that does not occur in practice with the supported backend. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Check request.is_disconnected() every 20 tokens instead of every token The disconnect check is an async round-trip that adds overhead on every loop iteration. Since the cancel watcher in llama_cpp.py already handles connection teardown (closes the streaming response on cancel), this route-layer check is a secondary safety net that does not need to run on every single token. Check every 20 tokens across all 4 streaming paths: - gguf_tool_stream (tool-enabled GGUF) - gguf_stream_chunks (standard GGUF) - audio_input_generate (audio/whisper input) - generic backend stream (non-GGUF fallback) * Fix safety net, DRAINING metadata, and test import path 1. Safety net no longer retroactively executes tools after visible content was already emitted to the user. Once _last_emitted is non-empty, the stream is committed to normal content mode. Retroactive tool execution after visible output would violate the streaming contract and corrupt the route-layer cumulative delta tracker (prev_text). The tool XML is still stripped by _strip_tool_markup so the user sees clean content. 2. DRAINING false-positive path now merges accumulated metrics from prior tool iterations instead of dropping them. Uses the same merge formula as the STREAMING path. 3. Test import path fixed to use repo root instead of hardcoded sibling directory. Works in clean checkouts and CI. 4. Renamed test_content_then_tool_xml_safety_net to test_content_then_tool_xml_no_retroactive_execution to reflect the corrected behavior. 17/17 tests pass. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Redact --api-key value from llama-server startup log When UNSLOTH_DIRECT_STREAM=1, the generated bearer token was logged verbatim in the startup command. Replace the secret with <redacted> before logging. * Remove test file temporarily * Revert disconnect throttle, reset prev_text on tool_start, restore XML safety net Addresses all P1 findings from reviewer round 3 (10 reviewers): 1. Revert disconnect check to every iteration (was every 20th). All 10 reviewers flagged this as a correctness regression for short streams and sparse tool event loops. The cancel watcher in llama_cpp.py is the primary mechanism but the route-layer check must remain per-iteration for completeness. [10/10] 2. Reset prev_text on tool_start in gguf_tool_stream. When a tool cycle begins after visible content was already streamed, the route-layer cumulative delta tracker (prev_text) must be reset so the post-tool synthesis response is not truncated or dropped. [9/10] 3. Remove the _last_emitted gate from the XML safety net. The gate was added to prevent retroactive tool execution after visible content, but with prev_text now reset on tool_start (#2), the root cause is fixed and the safety net can correctly handle content-then-tool-XML responses (matching pre-PR behavior). [8/10] * Use None instead of {} for empty auth headers in TTS methods * Include accumulated metrics in STREAMING metadata check * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-27 03:13:38 -07:00
Lee Jackson	0233fe7f9c	studio: setup log styling (#4494 ) * refactor(studio): unify setup terminal output style and add verbose setup mode * studio(windows): align setup.ps1 banner/steps with setup.sh (ANSI, verbose) * studio(setup): revert nvcc path reordering to match main * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio(setup): restore fail-fast llama.cpp setup flow * studio(banner): use IPv6 loopback URL when binding :: or ::1 * Fix IPv6 URL bracketing, try_quiet stderr, _step label clamp - Bracket IPv6 display_host in external_url to produce clickable URLs - Redirect try_quiet failure log to stderr instead of stdout - Clamp _step label to column width to prevent negative padding * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add sandbox integration tests for PR #4494 UX fixes Simulation harness (tests/simulate_pr4494.py) creates an isolated uv venv, copies the real source files into it, and runs subprocess tests for all three fixes with visual before/after demos and edge cases. Standalone bash test (tests/test_try_quiet.sh) validates try_quiet stderr redirect across 8 scenarios including broken-version contrast. 39 integration tests total (14 IPv6 + 15 try_quiet + 10 _step), all existing 75 unit tests still pass. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Truncate step() labels in setup.sh to match PS1 and Python The %-15s printf format pads short labels but does not truncate long ones. Change to %-15.15s so labels wider than 15 chars are clipped, matching the PowerShell .Substring(0,15) and Python label[:15] logic. * Remove sandbox integration tests from PR These test files are not part of the styling fix and should not ship with this PR. * Show error output on failure instead of suppressing it - install_python_stack.py: restore _red for patch_package_file warnings (was downgraded to _dim) - setup.ps1: capture winget output and show on failure for CUDA, Node, Python, and OpenSSL installs (was piped to Out-Null) - setup.ps1: always show git pull failure warning, not just in verbose mode * Show winget error output for Git and CMake installs on failure Same capture-and-print-on-failure pattern already used for Node, Python, CUDA, and OpenSSL winget installs. * fix: preserve stderr for _run_quiet error messages in setup.sh The step() helper writes to stdout, but _run_quiet's error header was originally sent to stderr (>&2). Without the redirect, callers that separate stdout/stderr would miss the failure headline while still seeing the log body on stderr. Add >&2 to both step calls inside _run_quiet to match main's behavior. * feat: add --verbose flag to setup and update commands Wire UNSLOTH_VERBOSE=1 through _run_setup_script() so that 'unsloth studio update --verbose' (and the deprecated 'setup') passes the flag to setup.sh / setup.ps1 / install_python_stack.py. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-27 03:12:48 -07:00
Daniel Han	3a5e3bbd6d	Make Studio shortcuts launch in a visible terminal (#4638 ) * Make Studio shortcuts launch in a visible terminal Studio shortcuts (Desktop/Start Menu) previously launched the server as a hidden background process. Closing the browser tab did not stop the server, leaving users with no obvious way to shut it down. This change makes shortcuts open a visible terminal window so users can see server output and close the terminal to stop Studio. Launcher changes (install.sh): - Add TTY detection in the launcher's main section. When a TTY is present (foreground mode), the launcher spawns a background browser-opener and then exec's the studio process directly. This means closing the terminal sends SIGHUP to studio, stopping it cleanly. When no TTY is present (background mode, e.g. macOS .app or headless), the existing _spawn_terminal behavior is preserved. - Add _open_browser_when_ready helper that polls health on the specific launch port and opens the browser once ready. - Add WSL fallback in _open_browser: uses powershell.exe Start-Process or cmd.exe /c start instead of unreliable xdg-open under WSL. Linux .desktop shortcut: - Change Terminal=false to Terminal=true so the desktop environment opens the user's default terminal emulator for the launcher. WSL support: - Remove the early-return that skipped WSL entirely. WSL now gets the launcher script and studio.conf written. - Add WSL shortcut creation: generates Windows Desktop and Start Menu .lnk files via a temp PowerShell script. Targets wt.exe (Windows Terminal) with automatic fallback to wsl.exe. Uses WSL_DISTRO_NAME for multi-distro setups. Windows launcher (install.ps1): - Add Find-FreeLaunchPort function that mirrors the Unix _find_launch_port logic, scanning Get-NetTCPConnection for busy ports and returning the first free port in the configured range. - Replace the hardcoded $basePort with the dynamic port result, with a MessageBox error dialog if no free port is found. * Fix review findings: lock race, WSL quoting, Windows port fallback Foreground lock race (10/10 reviewers): The foreground mode released the single-instance lock before exec, allowing a second launcher to acquire the lock and race for the same port during startup. Move lock release into the background subshell so it only happens after the health check passes. WSL shortcut quoting (10/10 reviewers): WSL_DISTRO_NAME values with spaces (e.g. "Ubuntu Preview", "Fedora Remix for WSL") were not quoted, causing the distro name to be split across multiple arguments. Add double-quoting around the distro name and launcher path in the generated shortcut arguments. Windows port fallback (3/10 reviewers): Find-FreeLaunchPort silently assumed no ports were listening when Get-NetTCPConnection was unavailable, which could return 8888 even when busy. Add a Test-PortBusy fallback that probes ports with TcpListener when Get-NetTCPConnection fails. Also scope the Get-NetTCPConnection query to only the port range we care about. * Skip powershell.exe shortcut creation if wslpath fails If wslpath -w fails (returns empty), do not attempt to pass a Linux-style path to powershell.exe -- it would always fail. Only run powershell.exe when we have a valid Windows path for the temp PS1 script. * Remove dead code and fix background health poll target - Remove unused _open_browser_when_ready function - Background mode now polls only the specific _launch_port instead of scanning all ports via _find_healthy_port, matching foreground behavior - Add launcher test harness (22 unit + 19 integration tests) * Fix port probe scope, lock ownership, and T4 test coverage - Test-PortBusy: bind on Any instead of Loopback to match Studio's 0.0.0.0 bind scope (prevents false-free in fallback path) - _release_lock: verify PID ownership before removing lock dir (prevents a timed-out subshell from deleting another launcher's lock) - T4 test: fail first curl call so the test actually exercises the lock-contention wait path instead of short-circuiting via fast path * Temporarily remove launcher test scripts Tests will be re-added in a follow-up PR to keep this diff focused on the launcher changes.	2026-03-27 03:12:26 -07:00
Daniel Han	6b5da2ea0f	Fix missing num_items_in_batch in unsloth_prediction_step (#4616 ) * Fix missing num_items_in_batch in unsloth_prediction_step unsloth_prediction_step calls compute_loss without num_items_in_batch during evaluation. This causes _unsloth_pre_compute_loss to see num_items_in_batch=None, which triggers a spurious warning for every model when gradient_accumulation_steps > 1: "Unsloth: Not an error, but {model} does not accept num_items_in_batch. Using gradient accumulation will be very slightly less accurate." The standard transformers prediction_step computes num_items_in_batch via _get_num_items_in_batch before passing it to compute_loss. This patch does the same in unsloth_prediction_step. Tested on Llama-3.2-1B-Instruct and Olmo-3-7B-Instruct with gradient_accumulation_steps=3 and eval_steps=3. Warning is gone and eval loss is computed correctly for both. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Guard _get_num_items_in_batch for older transformers versions _get_num_items_in_batch was added in transformers 4.46. Wrap the call in try/except so older versions fall back to num_items_in_batch=None, which preserves the original behavior of not passing it. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-27 03:06:59 -07:00
Michael Han	0ffac92cf4	Update Install instructions.md	2026-03-27 03:04:07 -07:00
Michael Han	19298a0b41	Update Uninstall instructions.md	2026-03-27 02:56:34 -07:00
Daniel Han	5c9a22b816	Fix Gemma3N audio training stride assertion with non-reentrant checkpointing (#4629 ) * Fix Gemma3N audio training stride assertion with non-reentrant checkpointing Gemma3N audio conformer processes variable-length audio tensors that cause stride mismatches in AOT autograd compiled backward when non-reentrant gradient checkpointing is used. The error manifests as: AssertionError: expected size 2==2, stride 1928==1936 at dim=0 This happens because the audio conformer's conv/norm layers produce tensors whose strides vary with audio clip duration, but AOT autograd traces the backward graph assuming fixed strides from the first batch. The notebook sets gradient_checkpointing_kwargs={"use_reentrant": False} and TRL 0.27.0+ also forces this. Both override Unsloth's own use_reentrant=True set during prepare_model_for_training. Fix: intercept gradient_checkpointing_enable on Gemma3N models to always force use_reentrant=True, regardless of what the notebook or TRL passes. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-27 02:53:21 -07:00
Daniel Han	3c9f0ed149	fix: use unsloth[huggingfacenotorch] instead of --no-deps in no-torch mode (#4647 ) The previous --no-deps approach skipped ALL dependencies, not just torch. This left safetensors, transformers, datasets, accelerate, etc. missing, causing PackageNotFoundError at runtime. Fix: in no-torch mode, install unsloth[huggingfacenotorch] (which pulls all runtime deps except torch), then install unsloth-zoo with --no-deps (since zoo's published metadata still declares torch as a hard dep). This gives a working no-torch environment with all non-torch packages. Applied to all three installer files: install.sh, install.ps1, and studio/install_python_stack.py.	2026-03-27 02:38:11 -07:00
Daniel Han	2ffc8d2cea	tests: add no-torch / Intel Mac test suite (#4646 ) * tests: add no-torch / Intel Mac test suite Add comprehensive test coverage for the no-torch / --no-torch installer and Studio backend changes introduced in #4624. Shell tests (tests/sh/test_mac_intel_compat.sh): - version_ge edge cases (9 tests) - Architecture detection + Python version resolution (4 tests) - get_torch_index_url on Darwin (2 tests) - UNSLOTH_NO_TORCH propagation via SKIP_TORCH (5 tests) - E2E uv venv creation at Python 3.12 (3 tests) - E2E torch skip with mock uv shim (4 tests) - UNSLOTH_NO_TORCH env propagation (4 tests) - --python override flag parsing + resolution (11 tests) - --no-torch flag parsing (4 tests) - SKIP_TORCH unification (3 tests) - CPU hint printing (2 tests) Python tests (tests/python/test_no_torch_filtering.py): - _filter_requirements unit tests with synthetic + real requirements files - NO_TORCH / IS_MACOS constant parsing - Subprocess mock of install_python_stack() across platform configs - install.sh --no-torch flag structural + subprocess tests Python tests (tests/python/test_studio_import_no_torch.py): - AST checks for data_collators.py, chat_templates.py, format_conversion.py - Parametrized venv tests (Python 3.12 + 3.13) for no-torch exec - Dataclass instantiation without torch - format_conversion convert functions without torch - Negative controls (import torch fails, torchao fails) Python tests (tests/python/test_e2e_no_torch_sandbox.py): - Before/after import chain tests - Edge cases (broken torch, fake torch, lazy import) - Hardware detection without torch - install.sh logic tests (flag parsing, version resolution) - install_python_stack filtering tests - Live server startup tests (opt-in via @server marker) * fix: address review comments on test suite - Fix always-true assertion in test_studio_import_no_torch.py (or True) - Make IS_MACOS test platform-aware instead of hardcoding Linux - Restore torchvision + torchaudio in server test cleanup (not just torch) - Include server stderr in skip message for easier debugging * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-27 02:33:45 -07:00
Daniel Han	e9ac785346	fix: install.sh Mac Intel compatibility + Studio no-torch support (#4624 ) * fix: install.sh Mac Intel compatibility + Studio no-torch support (#4621) On Intel Macs (x86_64), PyTorch has no wheels for torch >= 2.3, so the installer crashes. Even when torch is absent, Studio crashes on startup because two files have bare top-level torch imports. Studio's GGUF inference (llama.cpp) does not need PyTorch. Training and HF-inference already isolate torch to subprocesses. Only 2 files in the server startup chain had top-level torch imports preventing startup. Changes: - install.sh: detect architecture, default to Python 3.12 on Intel Mac, skip torch install, add Python 3.13.8 guard for arm64, pass UNSLOTH_NO_TORCH env var to setup.sh - data_collators.py: remove unused `import torch` (no torch.* refs) - chat_templates.py: lazy-import IterableDataset into function bodies - install_python_stack.py: add IS_MACOS/NO_TORCH constants, skip torch-dependent packages, skip overrides.txt, skip triton on macOS No existing working flow changes. Linux/WSL and macOS arm64 behavior is identical. * tests: add test suite for Mac Intel compat + no-torch mode Shell tests (test_mac_intel_compat.sh): - version_ge edge cases (9 tests) - Architecture detection for Darwin x86_64/arm64, Linux x86_64/aarch64 - get_torch_index_url returns cpu on simulated Darwin - UNSLOTH_NO_TORCH propagation to both setup.sh branches Python unit tests (test_no_torch_filtering.py): - _filter_requirements with NO_TORCH_SKIP_PACKAGES - NO_TORCH env var parsing (true/1/TRUE/false/0/unset) - IS_MACOS constant check - Overrides skip and triton macOS skip guards Python import tests (test_studio_import_no_torch.py): - data_collators.py loads in isolated no-torch venv - chat_templates.py has no top-level torch imports - Negative control confirms import torch fails without torch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tests: add E2E sandbox tests for Mac Intel no-torch mode Replace static/synthetic test stubs with real sandbox tests: - Shell: E2E uv venv creation at Python 3.12, mock uv shim to verify torch install is skipped when MAC_INTEL=true, dynamic env propagation test for UNSLOTH_NO_TORCH in both local and non-local install paths - Python filtering: test real extras.txt and extras-no-deps.txt with NO_TORCH_SKIP_PACKAGES, subprocess mock of install_python_stack() for 5 platform configs (NO_TORCH+macOS, Windows+NO_TORCH, normal Linux, Windows-only, macOS-only), VCS URL and env marker edge cases - Python imports: parametrized Python 3.12+3.13 venv fixture, dataclass instantiation for all 3 collator classes, chat_templates.py exec with stubs, negative controls proving import torch and torchao install fail in no-torch venvs 91 total tests, all passing. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: address reviewer findings for Intel Mac no-torch mode P1 fixes: - Auto-infer NO_TORCH in install_python_stack.py via platform.machine() so `unsloth studio update` preserves GGUF-only mode without needing the UNSLOTH_NO_TORCH env var (6/10 reviewers) - Add openai-whisper and transformers-cfg to NO_TORCH_SKIP_PACKAGES since both have unconditional torch dependencies (4/10 reviewers) - Skip unsloth-zoo on Intel Mac --local installs (depends on torch) in both migrated and fresh install paths (1/10) - Recreate stale 3.13 venvs as 3.12 on Intel Mac re-runs (1/10) - Detect Apple Silicon under Rosetta via sysctl hw.optional.arm64 and warn user to use native arm64 terminal (1/10) P2 fixes: - Wire new test files into tests/run_all.sh (4/10 reviewers) - Add update-path tests (skip_base=False) for Intel Mac - Add _infer_no_torch tests for platform auto-detection P3 fixes: - Fix macOS progress bar total (triton step skipped but was counted) - Fix temp file leak when Windows + NO_TORCH filters stack All tests pass: 30 shell, 66 Python (96 total). * feat: add --python override flag to install.sh Lets users force a specific Python version, e.g. ./install.sh --python 3.12. Addresses M2 Mac users whose systems resolve to a problematic 3.13.x patch. When --python is set, the Intel Mac stale-venv guard and 3.13.8 auto-downgrade are skipped so the user's choice is respected. * tests: add comprehensive E2E sandbox tests for no-torch mode Add test_e2e_no_torch_sandbox.py with 7 test groups (43 tests total) covering the full no-torch import chain, edge cases, and install logic: - Group 1: BEFORE vs AFTER import chain comparison (proves the bug existed and the fix works by synthetically prepending top-level torch imports) - Group 2: Dataclass instantiation without torch - Group 3: Edge cases with broken/fake torch modules on sys.path - Group 4: Hardware detection fallback to CPU without torch - Group 5: install.sh flag parsing, version resolution, arch detection - Group 6: install_python_stack.py NO_TORCH filtering - Group 7: Live server startup without torch (marked @server, skipped when studio venv is unavailable) All 43 tests pass on both Python 3.12 and 3.13 isolated venvs. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: add --no-torch flag to install.sh/ps1, fix lazy import bug in dataset formatting - Fix chat_templates.py: narrow torch IterableDataset import into inner try/except ImportError so dataset.map() works without torch installed - Fix format_conversion.py: same lazy import fix for convert_chatml_to_alpaca and convert_alpaca_to_chatml - Add --no-torch flag to install.sh with unified SKIP_TORCH variable (driven by --no-torch flag OR MAC_INTEL auto-detection) - Add --no-torch flag to install.ps1 with $SkipTorch variable - Print CPU hint when no GPU detected and --no-torch not set - Replace MAC_INTEL guards with SKIP_TORCH in torch install sections - Update shell tests (40 pass) and Python tests (90 pass) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: address reviewer findings for --no-torch installer paths - Fix migrated-env branch in install.sh and install.ps1: check SKIP_TORCH first, then branch on STUDIO_LOCAL_INSTALL. Previously SKIP_TORCH+non-local fell into else and installed unsloth-zoo (which depends on torch), defeating --no-torch mode. - Fix $env:UNSLOTH_NO_TORCH leak in install.ps1: always set to "true" or "false" instead of only setting on the true branch. Prevents stale no-torch state from leaking across runs in the same PS session. - Fix install_python_stack.py update path: add NO_TORCH guard around base.txt install so unsloth studio update does not reinstall unsloth-zoo (which depends on torch) in no-torch mode. * fix: install unsloth + unsloth-zoo with --no-deps in no-torch mode Instead of skipping unsloth-zoo entirely (which breaks unsloth's dependency on it), install both packages with --no-deps so they are present but torch is not pulled in transitively. Applied consistently across all no-torch paths: migrated-env, fresh-local, fresh-non-local in install.sh, install.ps1, and install_python_stack.py. * chore: temporarily remove test files (will be added in a follow-up) * refactor: deduplicate SKIP_TORCH conditional branches in installers Collapse if/else blocks that differ only by --no-deps into a single branch with a conditional flag variable. Applied to migrated-env and fresh-local paths in install.sh, install.ps1, and install_python_stack.py. * fix: apply --no-deps to fresh non-local --no-torch install path The non-local else branch was missing $_no_deps_arg/$noDepsArg, so uv pip install unsloth would resolve torch from PyPI metadata (the published unsloth package still declares torch as a hard dep). Now --no-deps is applied consistently to all SKIP_TORCH code paths. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-27 02:09:21 -07:00
Daniel Han	d57a4d993d	studio: fix chat CPU spike (#4632 ) Inline querier identity changed every render, forcing useLiveQuery to resubscribe continuously causing CPU spikes. Store querier in a ref and only re-subscribe when explicit deps change.	2026-03-27 06:20:26 +00:00
Daniel Han	e62085a3d6	Fix repetition_penalty default causing 24% TPS drop in GGUF inference (#4634 ) The ChatCompletionRequest Pydantic model defaulted repetition_penalty to 1.1 when clients omitted the field. This silently forced llama-server to perform per-token repetition scanning, dropping streaming throughput from ~225 TPS to ~172 TPS (a 24% penalty). The Studio frontend always sends repetition_penalty=1.0 explicitly, so UI users were unaffected. But any API client hitting /v1/chat/completions without setting the field (curl, third-party integrations, Open WebUI, etc.) would get the slow path. Benchmarked on Qwen3.5-4B Q4_K_XL, GPU 0: - repeat_penalty=1.0: 225.2 TPS - repeat_penalty=1.1: 172.7 TPS (24% slower) - LM Studio (which applies rp internally): 170.8 TPS This aligns the Pydantic default with the frontend default (1.0), generate_chat_completion's function signature default (1.0), and llama-server's own default (1.0).	2026-03-26 20:20:53 -07:00
Roland Tannous	e79a178200	Allow install_python_stack to run on Colab (#4633 ) * Allow install_python_stack to run on Colab The _COLAB_NO_VENV flag was setting _SKIP_PYTHON_DEPS=true, which skipped both the PyPI version check (needs $VENV_DIR/bin/python) and install_python_stack (uses sys.executable, works without a venv). Introduce a separate _SKIP_VERSION_CHECK flag for the version check, so install_python_stack still runs on Colab. The _SKIP_PYTHON_DEPS flag remains available for the "versions match" fast path. * Remove colab.py workarounds that broke transformers/hf-hub compatibility PR #4601 added _pip_install_backend_deps(), _bootstrap_studio_venv(), and _is_colab() to colab.py as workarounds for install_python_stack being skipped on Colab. These workarounds: - Stripped version constraints from studio.txt and installed into system Python - Upgraded huggingface-hub to >=1.0, breaking Colab's pre-installed transformers which requires huggingface-hub<1.0 With install_python_stack now running on Colab (previous commit), these workarounds are unnecessary — all deps are properly installed by setup.sh. Restore colab.py to its original PR #4237 structure: just get_colab_url(), show_link(), and start(). * Remove --local flag from setup.sh in Colab notebook The --local flag is not needed for the standard Colab flow since install_python_stack now runs on Colab and installs deps from PyPI.	2026-03-27 00:29:27 +04:00
Wasim Yousef Said	71781272dd	fix: add python-json-logger dependency to data-designer-deps (#4627 )	2026-03-26 09:50:51 -07:00
Radouane Elhajali	a6fe743ebe	studio: humanize ETA display for long training runs (#4608 ) * studio: humanize ETA display for long training runs When training takes hours or days, the ETA displayed raw minutes (e.g. '560m 50s'). This changes the format to: - Under 1 hour: Xm Ys (unchanged) - 1-24 hours: Xh Ym Zs - Over 24 hours: Xd Xh Xm * Fix formatDuration edge cases and consolidate duplicate for PR #4608 - Guard NaN/Infinity inputs with Number.isFinite() (matches formatNumber in same file) - Add sub-minute branch so 30s displays as "30s" instead of "0m 30s" - Accept undefined in type signature to match formatNumber pattern - Remove duplicate formatDuration from history-card-grid.tsx and import the shared one --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-26 06:55:54 -07:00
Michael Han	937da02f6c	Update Unsloth_Studio_Colab.ipynb	2026-03-26 05:45:30 -07:00
Etherll	b3a3435ac3	fix: Windows installer fails on _yaml.pyd Access Denied (os error 5) (#4617 ) * fix: avoid _yaml.pyd lock on Windows during dependency overrides * fix: move pytorch_tokenizers and kernels to no-deps install to avoid Windows _yaml.pyd loc	2026-03-26 05:15:19 -07:00
Lee Jackson	352455610b	studio: align Dataset/Parameters/Training cards, fix expandable height, animate LoRA settings (#4614 ) * fix(studio): align config cards, dynamic height for expanders, LoRA collapsible * Fix clipping regressions in training, dataset, and params section cards - training-section: Add hasMessage conditional so the card expands (min-h) when startError, vision/audio incompatibility, or config validation messages are present instead of always using fixed height - dataset-section: Expand card when a local dataset is selected via upload (datasetSource === "upload" && selectedLocalDataset), not only when the Advanced panel is open - params-section: Guard loraOpen behind isLora so switching to full fine-tune collapses the card instead of staying expanded from stale React useState * Fix dataset card clipping for direct file uploads Use uploadedFile instead of selectedLocalDataset in the card height condition. selectedLocalDataset is derived from localDatasets.find() which only resolves for Data Recipe entries, not direct file uploads (.jsonl, .csv, .parquet, .arrow). The card already renders the Eval Dataset panel based on uploadedFile (line 750), so the height gate should match. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-26 04:05:30 -07:00
Wasim Yousef Said	07abcb46de	fix: normalize search matching for recommended models and LoRA picker (#4615 ) Recommended models matching the query were filtered from HF results but the Recommended section was hidden during search, causing them to vanish entirely. - Show filtered recommended models during search by introducing `filteredRecommendedIds` - Switch `recommendedSet` to use filtered IDs when searching so dedup against HF results is correct - Hide empty "Hugging Face" label when recommended matches cover the query - Add `normalizeForSearch` helper to strip separators (spaces, hyphens, underscores, dots) so queries like "llama 3" match "Llama-3.2-1B" and "qwen 2.5" matches "Qwen2.5-7B" in both the recommended model filter and the LoRA adapter filter	2026-03-26 03:40:11 -07:00
Roland Tannous	6b3eb504b2	Fix Colab setup skipping llama.cpp installation (#4618 ) * Fix Colab setup skipping llama.cpp installation The early exit 0 in the Colab no-venv path prevented setup.sh from ever reaching the llama.cpp install section. Remove the early exit and instead guard only the venv-dependent Python deps section, so execution continues through to the llama.cpp prebuilt/source install. * Simplify _SKIP_PYTHON_DEPS initialization * Add --local flag to setup.sh in Colab notebook	2026-03-26 13:55:46 +04:00
Abhinav	74ddef1402	fix: skip flex_attention for models with non-zero attention_dropout (#4605 )	2026-03-26 01:12:23 -07:00
Michael Han	d4e9b708bb	Update Install instructions.md	2026-03-25 19:55:30 -07:00
Michael Han	d3049db427	Update install instructions.md	2026-03-25 19:04:10 -07:00
Roland Tannous	88a6dfc5cd	Revert "Update README.md" This reverts commit `c30e1d2029`.	2026-03-25 19:54:12 +00:00
Roland Tannous	c30e1d2029	Update README.md remove newline from windows command	2026-03-25 23:26:37 +04:00
Daniel Han	9fa67809e6	Update README.md	2026-03-25 09:43:55 -07:00
Roland Tannous	c23c3a17e9	Update README.md (#4604 ) Update install instructions for studio	2026-03-25 09:42:32 -07:00
Daniel Han	55db24fc31	Update _utils.py	2026-03-25 09:40:17 -07:00
Daniel Han	baabfa0a6e	Fix Colab huggingface-hub conflict, ensurepip fallback, bump to 2026.3.14 (#4603 ) * Fix Colab huggingface-hub conflict, ensurepip fallback, bump to 2026.3.14 - colab.py / setup.sh: relax == pins to >= when installing studio.txt on Colab so huggingface-hub does not clobber Colab's bundled version (breaks transformers is_offline_mode import) - install_python_stack.py: when uv is unavailable and pip is missing (uv-created venvs), bootstrap via ensurepip before attempting upgrade - Bump version to 2026.3.14 - Bump installer min version pins to 2026.3.14 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-25 09:38:02 -07:00
Daniel Han	9cb698c774	Update _utils.py	2026-03-25 09:04:23 -07:00
Daniel Han	23eb7fc0a7	Fix Colab Studio launch and setup.ps1 box alignment (#4601 ) * Fix Colab Studio launch and setup.ps1 box alignment - colab.py: when the Studio venv is missing on Colab, pip-install backend dependencies (structlog, fastapi, etc.) from studio.txt into the current Python instead of failing with ModuleNotFoundError - setup.sh: on Colab without a venv, install backend deps into system Python and skip venv-dependent sections (Python stack update, llama.cpp build) that would otherwise fail - setup.ps1: use PadRight(47) for the done-line so "Setup Complete!" and "Update Complete!" both align with the box border * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-25 09:00:08 -07:00
Daniel Han	b713a5085a	Bump installer min version to 2026.3.12 (#4600 )	2026-03-25 08:40:53 -07:00
Daniel Han	55d24d7c49	feat(studio): editable context length with Apply/Reset for GGUF settings (#4592 ) * feat(studio): editable context length with Apply/Reset for GGUF model settings Previously the Context Length field was read-only and the backend hardcoded `-c 0`, ignoring custom values entirely. KV Cache Dtype also triggered an immediate model reload with no way to cancel. Backend: - llama_cpp.py: pass the actual n_ctx value to `-c` instead of always 0 - models/inference.py: relax max_seq_length to 0..1048576 (0 = model default) so GGUF models with large context windows are supported Frontend: - chat-runtime-store: add customContextLength and loadedKvCacheDtype state fields for dirty tracking - chat-settings-sheet: make Context Length an editable number input, stop KV Cache Dtype from auto-reloading, show Apply/Reset buttons when either setting has been changed - use-chat-model-runtime: send customContextLength as max_seq_length in the load request, reset after successful load * fix: preserve maxSeqLength for non-GGUF models in load request customContextLength ?? 0 sent max_seq_length=0 for non-GGUF models, breaking the finetuning/inference path that needs the slider value. Now uses a three-way branch: - customContextLength set: use it (user edited GGUF context) - GGUF without custom: 0 (model's native context) - Non-GGUF: maxSeqLength from the sampling slider * fix: keep max_seq_length default at 4096 for non-GGUF callers Only relax the bounds (ge=0 for GGUF's "model default" mode, le=1048576 for large context windows). The default stays at 4096 so API callers that omit max_seq_length still get a sane value for non-GGUF models. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(studio): rename trust remote code toggle and hide when no model selected - Rename "Trust remote code" to "Enable custom code" - Shorten subtitle to "Only enable if sure" - Hide the toggle when no model is loaded (already hidden for GGUFs) * fix: restore ge=128 for max_seq_length validation Keep the minimum at 128 so the API rejects nonsensical values. GGUF path now sends the model's native context length (from ggufContextLength) instead of 0 when the user has not customized it. The upper bound stays at 1048576 for large-context GGUF models. * feat(studio): replace Context Length input with slider Use a ParamSlider (512 to model's native context, step 512) instead of a small number input. Shows "Max" when at the model's native context length. Consistent with the other slider controls in the settings panel. * feat(studio): add editable number input alongside Context Length slider The slider and number input stay synced -- dragging the slider updates the number, typing a number moves the slider. The input also accepts values beyond the slider range for power users who need custom context lengths larger than the model default. * fix(studio): widen context length input and use 1024 step for slider Make the number input wider (100px) so large values like 262144 are fully visible. Change slider step from 512 to 1024 and min from 512 to 1024. * fix(studio): context length number input increments by 1024 * fix(studio): cap context length input at model's native max Adds max attribute and clamps typed/incremented values so the context length cannot exceed the GGUF model's reported context window. * fix(studio): point "What's new" link to changelog page Changed from /blog to /docs/new/changelog. * fix(studio): preserve custom context length after Apply, remove stale subtitle - After a reload with a custom context length, keep the user's value in the UI instead of snapping back to the model's native max. ggufContextLength always reports the model's native metadata value regardless of what -c was passed, so we need to preserve customContextLength when it differs from native. - Remove "Reload to apply." from KV Cache Dtype subtitle since the Apply/Reset buttons now handle this. * feat(studio): auto-enable Search and Code tools when model supports them Previously toolsEnabled and codeToolsEnabled stayed false after loading a model even if it reported supports_tools=true. Now both toggles are automatically enabled when the loaded model supports tool calling, matching the existing behavior for reasoning. * fix(studio): auto-enable tools in autoLoadSmallestModel path The suggestion cards trigger autoLoadSmallestModel which bypasses selectModel entirely. It was hardcoding toolsEnabled: false and codeToolsEnabled: false even when the model supports tool calling. Now both are set from the load response, matching the selectModel behavior. Also sets kvCacheDtype/loadedKvCacheDtype for dirty tracking consistency. * fix(studio): re-read tool flags after auto-loading model The runtime state was captured once at the start of the chat adapter's run(), before autoLoadSmallestModel() executes. After auto-load enables tools in the store, the request was still built with the stale snapshot that had toolsEnabled=false. Now re-reads the store after auto-load so the first message includes tools. * fix(studio): re-read entire runtime state after auto-load, not just tools The runtime snapshot (including params.checkpoint, model id, and all tool/reasoning flags) was captured once before auto-load. After autoLoadSmallestModel sets the checkpoint and enables tools, the request was still built with stale params (empty checkpoint, tools disabled). Now re-reads the full store state after auto-load so the first message has the correct model, tools, and reasoning flags. * feat(studio): add Hugging Face token field in Preferences Adds a password input under Configuration > Preferences for users to enter their HF token. The token is persisted in localStorage and passed to all model validate/load/download calls, replacing the previously hardcoded null. This enables downloading gated and private models. * fix(studio): use model native context for GGUF auto-load, show friendly errors The auto-load paths and selectModel for GGUF were sending max_seq_length=4096 which now actually limits the context window (since we fixed the backend to respect n_ctx). Changed to send 0 for GGUF, which means "use model's native context size". Also replaced generic "An internal error occurred" messages with user-friendly descriptions for known errors like context size exceeded and lost connections. LoadRequest validation changed to ge=0 to allow the GGUF "model default" signal. The frontend slider still enforces min=128 for non-GGUF models. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(studio): filter out FP8 models from model search results Hide models matching -FP8- or FP8-Dynamic from both the recommended list and HF search results. These models are not yet supported in the inference UI. --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-25 08:32:38 -07:00
Daniel Han	6d6008a1ef	Add PID file tracking and `unsloth studio stop` command (#4598 ) * Add PID file tracking and `unsloth studio stop` command On macOS the .app shortcut launches Studio via osascript into a Terminal window, then the launcher script exits. The server process runs outside of the launcher's context with no PID file, so there is no straightforward way to find or stop it. This adds: - PID file at ~/.unsloth/studio/studio.pid, written after the server starts and removed on graceful shutdown or via atexit - `unsloth studio stop` command that reads the PID file and sends SIGTERM (or taskkill on Windows) to shut down the server The PID file is only removed if it still contains the current process ID, avoiding races when a new server instance replaces a crashed one. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move atexit PID cleanup into run_server() The atexit registration was only in the __main__ block, so it did not cover the `unsloth studio` CLI path that calls run_server() directly via studio_default(). Moving it into run_server() ensures the PID file is cleaned up on unexpected exit regardless of entry point. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-25 08:27:27 -07:00
Daniel Han	561f0f39be	Fix install.ps1 --local: pass script args to Install-UnslothStudio The function was called with no arguments, so $args inside the function was always empty. Script-level args (--local, --package) were never forwarded. Use @args splatting to pass them through.	2026-03-25 15:14:51 +00:00
Daniel Han	289c7dd7bb	Add --local and --package flags to install.ps1 Windows install.ps1 had no way to install from a local repo checkout, unlike install.sh which supports ./install.sh --local. This adds: - --local: install from the local repo via editable install (-e . --no-deps) after installing deps from PyPI, mirroring install.sh behavior - --package: install a different package name for testing The --local flag: 1. Validates pyproject.toml exists at the script's directory 2. Installs torch + unsloth deps normally 3. Overlays the local checkout with uv pip install -e <repo> --no-deps 4. Passes STUDIO_LOCAL_INSTALL and STUDIO_LOCAL_REPO to setup.ps1	2026-03-25 15:12:56 +00:00
Daniel Han	2683c2ab58	Add unsloth to User PATH on Windows after install (#4597 ) After installation, `unsloth studio` only works if the user activates the Studio venv first or uses the full absolute path. The Desktop/Start Menu shortcuts work fine, but typing `unsloth studio` in a fresh terminal does not. This adds the venv Scripts dir to the persistent User PATH env var (if not already present) so `unsloth studio` works from any new terminal window. The current session is also updated via the existing Refresh-SessionPath helper.	2026-03-25 08:00:44 -07:00
Roland Tannous	48a7884584	feat: multi-source model discovery (HF default, legacy cache, LM Studio) (#4591 ) * feat: multi-source model discovery (HF default, legacy cache, LM Studio) * Fix multi-source model discovery bugs - Fix lmstudio_model_dirs: add ~/.lmstudio/models as default path, remove dead sys.platform branch, add dedup via seen set - Fix _setup_cache_env: preserve legacy HF cache env vars when the legacy hub directory exists and is non-empty - Fix _scan_lmstudio_dir: use absolute path for id field so is_local_path() returns True - Remove LM Studio dirs from allowed_roots (scanned unconditionally) - Replace bare except passes with logger.warning in legacy cache blocks - Fix delete_cached_model to search both default and legacy HF caches - Make lmstudio_dirs non-optional in TS interface (matches Python schema) - Exclude lmstudio source from trainable model filter - Remove unused import sys * Scan HF default cache alongside legacy and active caches When _setup_cache_env overrides HF_HUB_CACHE to the legacy Unsloth path, the standard HF default cache (~/.cache/huggingface/hub) was never scanned, hiding models downloaded before Unsloth Studio was installed. Add hf_default_cache_dir() and _all_hf_cache_scans() helper that deduplicates and scans all three HF cache locations (active, legacy, default). Used in list_local_models, list_cached_gguf, list_cached_models, and delete_cached_model. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-25 07:48:04 -07:00
Daniel Han	ebe22c1e9e	Update _utils.py	2026-03-25 07:30:40 -07:00
Daniel Han	366fb048d4	fix(studio): add bun cache validation to Windows setup.ps1 (#4596 ) Port the bun cache corruption fix from setup.sh to setup.ps1. bun's package cache can become corrupt, storing only package metadata without actual content. This causes bun install to exit 0 but leave binaries like tsc missing from node_modules/.bin/. Changes: - After bun install, verify tsc and vite exist in node_modules\.bin\ - Check for both bare names and .cmd wrappers (Windows creates both) - If missing, clear the bun cache and retry once - Only fall back to npm if the retry also fails	2026-03-25 07:27:08 -07:00
Daniel Han	3efea63e2f	fix(studio): source-build fallback prefers Unsloth's tested tag over upstream latest (#4593 ) * fix(studio): source-build fallback prefers Unsloth's tested tag over upstream latest When the prebuilt install fails and falls back to source build, --resolve-llama-tag now queries the Unsloth release repo (unslothai/llama.cpp) first to get the latest tested/approved tag (e.g. b8508), instead of going straight to ggml-org/llama.cpp which may return a newer untested tag (e.g. b8514). This ensures the source-build fallback compiles the same version that the prebuilt path would have installed, rather than a potentially incompatible bleeding-edge release. Resolution order for "latest": 1. Unsloth release repo (tested/approved) 2. ggml-org upstream (bleeding-edge) 3. Raw requested tag string (last resort) Changes: - resolve_requested_llama_tag() accepts optional published_repo param with docstring explaining the resolution order - CLI --resolve-llama-tag passes --published-repo through - setup.sh and setup.ps1 pass --published-repo to --resolve-llama-tag with inline comments explaining the preference * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-25 07:25:47 -07:00
Daniel Han	bc9cf31478	Pin torch>=2.4,<2.11.0 in Studio installers (#4595 ) torch 2.11.0 has a torch.compile/dynamo bug that causes a StopIteration crash in dict_keys_getitem when compiling MoE router functions (e.g. GptOssTopKRouter_forward). Pin to <2.11.0 until the upstream fix lands. Applies to both install.sh (Linux/macOS) and install.ps1 (Windows) fresh install paths.	2026-03-25 07:20:55 -07:00
Daniel Han	2e4569e06a	fix(studio): clear bun cache on failure and retry before falling back to npm (#4594 ) bun's package cache can become corrupt, storing only package metadata (package.json, README) without actual content (bin/, lib/). When this happens, bun install exits 0 and reports packages as installed, but binaries like tsc are missing from node_modules/.bin/. For example, a corrupt typescript cache entry is 64KB (metadata only) vs 23MB when correctly downloaded. Changes: - After bun install, verify tsc and vite exist in node_modules/.bin/ - If missing, clear the bun cache with bun pm cache rm and retry once - Only fall back to npm if the retry also fails - Revert bun installation to npm install -g bun (the binary is fine, the cache was the problem)	2026-03-25 07:05:02 -07:00
Daniel Han	457c42964f	fix(studio): validate bun install and retry from official source on failure (#4589 ) bun install (specifically the npm "bun" shim v1.3.x installed via npm install -g bun) can exit 0 while silently failing to install packages. This causes the frontend build to fail with "tsc: not found" or missing type declarations, since the fallback to npm only triggers on a non-zero exit code. Changes: 1. Initial bun install now tries the official bun.sh installer first (which gives a real bun runtime), falling back to npm install -g bun only if that fails. 2. After bun install reports success, verify that critical binaries (tsc, vite) actually exist in node_modules/.bin/. If they are missing, reinstall bun from the official source and retry once before falling back to npm. 3. Extract the bun install + validation logic into _try_bun_install() to avoid duplicating the check/cleanup across both attempts.	2026-03-25 06:38:32 -07:00
Roland Tannous	1f498a73e6	Revert "feat: multi-source model discovery (HF default, legacy cache, LM Studio)" This reverts commit `d56b115bb4`.	2026-03-25 13:35:03 +00:00
Roland Tannous	d56b115bb4	feat: multi-source model discovery (HF default, legacy cache, LM Studio)	2026-03-25 13:24:46 +00:00
Daniel Han	ae2b1b97ba	fix(studio): add pip-installed nvidia CUDA libs to LD_LIBRARY_PATH for llama-server (#4590 ) The prebuilt llama.cpp binary (cuda13-newer) links against libcudart.so.13 and libcublas.so.13. When torch is installed via pip, these libraries live in the venv's site-packages under nvidia/cu13/lib/, not in /usr/local/cuda/. The existing LD_LIBRARY_PATH logic only searched /usr/local/cuda* paths (which have CUDA 12.x), so the CUDA backend failed to load silently and llama-server fell back to CPU -- even with -ngl -1. This adds a glob scan of the venv's nvidia package directories (cu*, cudnn, nvjitlink) to LD_LIBRARY_PATH before launching llama-server, matching where pip puts the CUDA runtime. Tested on Colab with RTX PRO 6000 Blackwell (CUDA 13.0, pip torch): before -- 3 MiB GPU, 0% util, CPU inference after -- 13317 MiB GPU, 77% util, full GPU inference Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-03-25 06:24:40 -07:00
Daniel Han	d87c21aebf	fix(studio): add -ngl -1 when model fits on GPU to enable GPU offloading (#4588 ) When _select_gpus determines that a GGUF model fits on the selected GPU(s), the code sets CUDA_VISIBLE_DEVICES but never passes -ngl (number of GPU layers) to llama-server. Without -ngl or --fit, llama-server defaults to 0 GPU layers and runs entirely on CPU. This adds -ngl -1 (offload all layers) in the elif branch where gpu_indices is set and use_fit is False, so models that fit in VRAM actually use the GPU for inference. Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-03-25 06:14:33 -07:00
DoubleMathew	f4d8a246bf	Use prebuilt llama.cpp for unsloth studio setup (#4562 ) * Use prebuilt llama.cpp for unsloth studio setup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix 3 issues that cause unnecessary fallback to source build 1. Make filelock import optional -- environments without filelock (e.g. minimal installs) crashed at import time instead of gracefully skipping the lock. 2. Use already-verified converter script from the hydrated source tree instead of re-downloading from raw.githubusercontent.com with no checksum. Adds symlink with copy fallback for the legacy filename. 3. Initialize $SkipPrebuiltInstall in setup.ps1 before first use to prevent potential uninitialized variable errors. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Keep network fallback in ensure_converter_scripts Prefer the local verified copy from the hydrated source tree, but retain the original network download as a fallback if the file is missing. Create the legacy hyphenated filename as a symlink with a copy fallback instead of writing a second full copy. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix 4 bugs in source-build fallback and binary_env paths - setup.ps1: Replace git pull + checkout FETCH_HEAD with fetch + checkout -B to avoid detached HEAD state that breaks re-runs. Use pinned tag in both fetch and clone paths. - setup.sh: Move rm -rf after cmake/git prerequisite checks so a missing tool no longer deletes the existing install. Add --branch tag to clone. - install_llama_prebuilt.py: Add binary_path.parent to Linux LD_LIBRARY_PATH in binary_env() so bundled .so files in build/bin are found even without RPATH, matching the existing Windows PATH logic. - Add test for binary_env LD_LIBRARY_PATH on Linux. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Handle unresolved "latest" tag in source-build fallback clone When tag resolution fails and the requested tag is "latest", both setup scripts now omit --branch from git clone so the default branch is cloned instead of failing on a nonexistent "latest" branch/tag. Similarly, the PS1 fetch path fetches the default ref when the tag is "latest". * Resolve actual latest ggml-org tag instead of using literal "latest" When both Python tag resolution attempts fail and the requested tag is "latest", query the GitHub API for the actual latest release tag from ggml-org/llama.cpp (e.g. b8508) instead of passing the literal string "latest" to git clone --branch, which would fail since no such branch/tag exists. setup.sh uses curl + python json parsing; setup.ps1 uses Invoke-RestMethod. Both fall back to the raw requested tag if the API call also fails. * Try Unsloth release repo before ggml-org when resolving latest tag When falling back to the GitHub API to resolve "latest", query the Unsloth release repo (unslothai/llama.cpp) first since it has the prebuilt binaries pinned to tested tags. Only fall back to ggml-org/llama.cpp if the Unsloth repo query fails. * Add comprehensive sandbox tests for PR #4562 bug fixes 35 tests covering all fixes across platforms: - binary_env cross-platform (Linux LD_LIBRARY_PATH, Windows PATH, macOS DYLD_LIBRARY_PATH) with edge cases (dedup, ordering, existing paths) - resolve_requested_llama_tag (concrete, latest, None, empty) - setup.sh logic via subprocess: prereq check ordering (cmake/git missing preserves install), pinned tag in clone, fetch+checkout -B pattern, fetch failure warns instead of aborting - "latest" tag resolution fallback chain (Unsloth API -> ggml-org -> raw) with mock curl: success, failure, malformed JSON, empty body, empty tag_name, env overrides - Source code pattern verification for both .sh and .ps1 files All 138 tests pass in isolated uv venv. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add binary_path.parent to macOS DYLD_LIBRARY_PATH in binary_env macOS prebuilt .dylib files are overlaid into build/bin (same as Linux), but binary_env only added install_dir to DYLD_LIBRARY_PATH. Add binary_path.parent so the loader can find sibling dylibs even without embedded loader paths. Mirrors the existing fix for Linux LD_LIBRARY_PATH and the Windows PATH pattern. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Guard --branch when resolved tag is "latest"; fix broken test assertion When all API fallbacks fail and the tag stays as literal "latest", omit --branch from git clone (clones default branch instead of failing). Both setup.sh and setup.ps1 now check for "latest" before passing --branch to git clone/fetch. Also fix test_setup_ps1_clone_uses_branch_tag which used Python tuple syntax (assert "x", "y" in z) that always passes. Changed to assert "x" in z and "y" in z. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix macOS DYLD trailing colon, install_lock no-op, and debug log - binary_env macOS: use dedupe_existing_dirs instead of raw string concatenation. Eliminates trailing colon in DYLD_LIBRARY_PATH (which causes dyld to search CWD for libraries) and deduplicates when binary_path.parent == install_dir. Now consistent with the Linux and Windows branches. - install_lock: when filelock is not installed, use os.O_CREAT\|O_EXCL as a fallback exclusive file lock with timeout, instead of yielding with no locking. Prevents concurrent installs from corrupting each other's staging directories. - setup.ps1: remove [DEBUG] log line that printed to every user on every Windows setup run. * Add stale-lock detection and atomic clone-then-swap install_lock fallback (no filelock): write PID to lock file and check if the holder process is still alive on contention. Dead PIDs (ProcessLookupError) and unreadable lock files trigger immediate cleanup. Live processes owned by other users (PermissionError) are correctly recognized as alive -- the lock is not removed. setup.sh/setup.ps1 source-build: clone into a temporary directory first, then swap into place only on success. If git clone fails, the existing install is preserved instead of being deleted by the premature rm -rf. * Remove redundant upstream_tag != release_tag check load_approved_release_checksums compared checksums.upstream_tag against the Unsloth release_tag, which are different namespaces (upstream ggml-org tag vs Unsloth published tag). This only worked because both happened to be "b8508" by convention. Would break if Unsloth ever uses a different release naming scheme. The existing check at parse_approved_release_checksums (line 950) already validates the release_tag field correctly. * Fix lock TOCTOU race and build-in-temp-dir swap install_lock fallback: add os.fsync(fd) after writing PID to ensure the PID is visible to racing processes before they check. Treat empty lock files (PID not yet written) as "wait and retry" instead of stale, closing the window where two processes could both see an empty file, both unlink it, and both acquire the lock. setup.sh/setup.ps1 source-build: clone AND build in a temp directory (LLAMA_CPP_DIR.build.$$). Only swap into the final LLAMA_CPP_DIR after the build succeeds. If clone or cmake or build fails, the temp dir is cleaned up and the existing working install is preserved. Previously, rm -rf ran after clone but before build, destroying the existing install even if the build later failed. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-25 05:42:43 -07:00
Lee Jackson	cc1be75621	studio: stabilize reasoning panel scroll behavior and prevent composer overlap (#4587 ) * fix(studio): reasoning panel scroll and thread footer overlap * refactor(studio): dedupe reasoning scroll lock teardown	2026-03-25 05:32:31 -07:00
Roland Tannous	19e9c60a8e	Consolidate dual venvs and separate install from update (#4530 ) * refactor: consolidate dual venvs into single ~/.unsloth/studio/unsloth_studio * refactor: separate install.sh (first-time) from setup.sh (smart update with PyPI version check) * fix: install.sh calls setup.sh directly, keep both setup and update CLI commands * fix: use importlib.resources.files() directly without _path attribute * fix: bootstrap uv before pip upgrade to handle uv venvs without pip * fix: frontend 404 when launched via CLI, add global symlink to ~/.local/bin * feat: add --local flag to install.sh and unsloth studio update for branch testing * fix: resolve repo root from script location for --local installs * feat: add --package flag to install.sh for testing with custom package names * feat: add --package flag to unsloth studio update * fix: always nuke venv in install.sh for clean installs * revert: remove Windows changes, will handle in separate PR * fix: error when --package is passed without an argument * revert: restore Windows scripts to current main * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: always explicitly set STUDIO_LOCAL_INSTALL and STUDIO_PACKAGE_NAME env vars * fix: pass explicit STUDIO_LOCAL_REPO env var for --local installs * fix: align banner box for Setup vs Update labels * deprecate: hide 'unsloth studio setup' command, point users to update/install.sh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: check stdout not stdin for auto-launch detection (curl pipe fix) * fix: update install URL to unsloth.ai/install.sh * fix: update install.sh usage comments to unsloth.ai/install.sh * fix: use --upgrade-package for base deps to preserve existing torch/CUDA installs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: --local install now also installs unsloth-zoo via base.txt before editable overlay * fix: don't skip base packages for --local installs (editable needs unsloth-zoo) * refactor: move --local full dep install to install.sh, keep SKIP_STUDIO_BASE for all paths * feat: add migration support for old .venv and CWD-based installs in setup.sh * Revert "feat: add migration support for old .venv and CWD-based installs in setup.sh" This reverts commit `301291d002`. * feat: migrate old .venv layout in install.sh instead of always nuking * feat: validate old .venv with torch CUDA test before migration, recovery message on launch failure * fix: try CUDA then fall back to CPU for migration validation * fix: upgrade unsloth/unsloth-zoo with --reinstall-package on migration to preserve torch * remove: delete unused unsloth ui command (use unsloth studio instead) * Fix Windows venv path mismatch between install.ps1, setup.ps1, and studio.py install.ps1 was creating the venv CWD-relative ($VenvName = "unsloth_studio"), setup.ps1 was using an absolute path to ".unsloth\studio\.venv", and studio.py looks for ".unsloth\studio\unsloth_studio". All three paths were different, so the Windows installer would never produce a working Studio setup. install.ps1: - Use absolute $StudioHome + $VenvDir matching the Linux install.sh layout - Add 3-way migration: old .venv at STUDIO_HOME, CWD-relative ~/unsloth_studio from the previous install.ps1, or fresh creation with torch validation - For migrated envs, upgrade unsloth while preserving existing torch/CUDA wheels - Set SKIP_STUDIO_BASE=1 before calling setup.ps1 (matches install.sh behavior) - Fix launch instructions to use the absolute venv path setup.ps1: - Change $VenvDir from ".unsloth\studio\.venv" to ".unsloth\studio\unsloth_studio" - Add SKIP_STUDIO_BASE guard: error out if venv is missing when called from install.ps1 (which should have already created it) - Differentiate "Setup" vs "Update" in banners based on SKIP_STUDIO_BASE * setup.ps1: unconditionally error if venv missing, matching setup.sh setup.sh always errors out if the venv does not exist (line 224-228), telling the user to run install.sh first. setup.ps1 was conditionally creating a bare venv with python -m venv when SKIP_STUDIO_BASE was not set, which would produce an empty venv with no torch or unsloth. Now setup.ps1 matches setup.sh: always error, always point to install.ps1. * Fix --torch-backend=auto CPU solver dead-end on Linux, macOS, and Windows On CPU-only machines, `uv pip install unsloth --torch-backend=auto` falls back to unsloth==2024.8 because the CPU solver cannot satisfy newer unsloth's dependencies. install.ps1 already solved this with a two-step approach; this applies the same fix to install.sh and install_python_stack.py. install.sh: add get_torch_index_url() that detects GPU via nvidia-smi and maps CUDA versions to PyTorch index URLs (matching install.ps1's Get-TorchIndexUrl). Fresh installs now install torch first via explicit --index-url, then install unsloth with --upgrade-package to preserve the pre-installed torch. All 5 --torch-backend=auto removed from primary paths. install.ps1: add fallback else-branch when TorchIndexUrl is empty, using --torch-backend=auto as last resort (matching install.sh). install_python_stack.py: remove unconditional --torch-backend=auto from _build_uv_cmd. Torch is pre-installed by install.sh/setup.ps1 by the time this runs. Callers that need it can set UV_TORCH_BACKEND. Both install.sh and install.ps1 now share the same three-branch logic: migrated env (upgrade-package only), normal (torch-first + index-url), and fallback (--torch-backend=auto if URL detection fails). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use --reinstall-package for migrated envs on both Linux and Windows For migrated environments (moved from legacy venv location), --reinstall-package is better than --upgrade-package because it forces a clean reinstall even if the same version is already installed. This ensures proper .dist-info and .pyc state in the new venv location. --upgrade-package remains correct for the fresh install path where torch is already installed and we just want to add unsloth without re-resolving torch. * Address review findings: portability, parity, and stale comments - Replace grep -oP (GNU Perl regex) with POSIX sed in get_torch_index_url() so the script works on BSD grep (macOS is already guarded by the Darwin early-return, but Alpine/BusyBox would silently get the wrong CUDA tag) - Add LC_ALL=C before nvidia-smi invocation to prevent locale-dependent output parsing issues - Add warning on stderr when nvidia-smi output is unparseable, matching install.ps1's [WARN] message - Add explicit unsloth-zoo positional arg to install.ps1 migrated path, matching install.sh (--reinstall-package alone won't install it if it was never present in the migrated env) - Fix stale comment in install_python_stack.py line 392 that still claimed --torch-backend=auto is added by _build_uv_cmd - Add sed to test tools directory (function now uses sed instead of grep) * Add --index-url to migrated env path to prevent CPU torch resolution The migrated path runs uv pip install with --reinstall-package for unsloth/unsloth-zoo. While uv should keep existing torch as satisfied, the resolver could still re-resolve torch as a transitive dependency. Without --index-url pointing at the correct CUDA wheel index, the resolver would fall back to plain PyPI and potentially pull CPU-only torch. Adding --index-url $TORCH_INDEX_URL ensures CUDA wheels are available if the resolver needs them. Applied to both install.sh and install.ps1. * Revert --index-url on migrated env path The original install.ps1 on main already handles the migrated path without --index-url and it works correctly. --reinstall-package only forces reinstall of the named packages while uv keeps existing torch as satisfied. No need for the extra flag. * Fix unsloth studio update --local not installing local checkout studio.py sets STUDIO_LOCAL_REPO when --local is passed, but install_python_stack.py never read it. The update path always installed from PyPI regardless of the --local flag. Add a local_repo branch that first updates deps from base.txt (with --upgrade-package to preserve torch), then overlays the local checkout as an editable install with --no-deps. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-25 05:24:21 -07:00
Daniel Han	3446e0c489	Add ROCm (AMD GPU) support to studio setup (#4585 ) * Add support for ROCm in studio setup * Fix ROCm detection bugs: ROCM_PATH resolution, CUDA guard, compiler selection - Set GPU_BACKEND="cuda" when nvcc is found (CUDA path was unreachable) - Guard ROCm detection with `if [ -z "$GPU_BACKEND" ]` so CUDA takes priority on mixed-toolchain hosts - Rename ROCM_PATH to ROCM_HIPCC for the hipcc binary; resolve the actual ROCm root via readlink -f and hipconfig -R into ROCM_ROOT - Export both ROCM_PATH and HIP_PATH as the resolved root directory - Use HIPCXX via hipconfig -l instead of legacy CMAKE_C_COMPILER=hipcc - Switch grep -oP to grep -oE for portability across Linux distros - Use GPU_TARGETS (upstream cmake variable) instead of AMDGPU_TARGETS - Remove stale hardcoded fallback targets; let cmake auto-detect instead * Fix gfx regex to match gfx90a (MI210/MI250/MI250X) The grep and bash regex used {3,4} digits after 'gfx', which silently excluded gfx90a (2 digits + letter 'a') -- the architecture for AMD Instinct MI210, MI250, and MI250X data-center GPUs. Change to {2,4} so all real gfx targets from gfx90a through gfx1200 are matched. --------- Co-authored-by: edamamez <eda.zhou@amd.com>	2026-03-25 04:50:23 -07:00
cz-03	7eb48512bc	feat(tokenizer): add get_tokenizer_info() diagnostic helper (#4436 ) * feat(tokenizer): add get_tokenizer_info() diagnostic helper Adds get_tokenizer_info(tokenizer) to tokenizer_utils.py returning a concise dict of key tokenizer properties class name, is_fast, vocab size, added token count, model_max_length, padding side, special tokens (bos, eos, pad, unk), chat template presence, and total special token count. All fields use getattr(..., None) fallbacks so the function never raises on unusual or partially initialized tokenizers. Exported via __all__ alongside the existing public helpers. Useful for logging, debugging, and surfacing tokenizer state in the Unsloth Studio UI. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix docstring, remove artifact, restore valuable comments in tokenizer_utils.py - Fix get_tokenizer_info() docstring example: correct tokenizer_class to PreTrainedTokenizerFast, vocab_size to 128000, swap added_tokens_count (256) and special_tokens_count (3) to match actual Llama-3.2-1B-Instruct output - Remove accidentally committed "# ... (rest of file unchanged)" diff artifact - Restore fix_sentencepiece_gguf() docstring with llama.cpp upstream link - Restore 10 comments containing upstream URLs, model-specific workarounds, and non-obvious context (issue #292, sentencepiece#121, Starling hack, Kaggle /tmp limit, Deepseek slow tokenizer, twitter/danielhanchen references) * Revert "Fix docstring, remove artifact, restore valuable comments in tokenizer_utils.py" This reverts commit `4e525b734b`. * Revert all deletions, keep only get_tokenizer_info() addition Restore tokenizer_utils.py to main and add only the new get_tokenizer_info() function and its __all__ entry. All comment removals, dead code cleanup, and formatting changes from the original PR are reverted. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-25 04:29:01 -07:00
Etherll	d69d60ff19	perf(studio): upgrade to Vite 8 + auto-install bun for faster frontend builds (#4522 ) * perf(studio): upgrade to Vite 8 + auto-install bun for 3x faster frontend builds * fix(studio): make bun-to-npm fallback actually reachable setup.sh used run_quiet() for the bun install attempt, but run_quiet calls exit on failure. This killed the script before the npm fallback could run, making the "falling back to npm" branch dead code. Replace the run_quiet call with a direct bun invocation that captures output to a temp file (same pattern, but returns instead of exiting). Also clean up partial node_modules left by a failed bun install before falling back to npm, in both setup.sh and build.sh. Without this, npm inherits a corrupted node_modules tree from the failed bun run. * fix(studio): restore commonjsOptions for dagre CJS interop The previous commit removed build.commonjsOptions, assuming Vite 8's Rolldown handles CJS natively. While optimizeDeps.include covers the dev server (pre-bundling), it does NOT apply to production builds. The resolve.alias still points @dagrejs/dagre to its .cjs.js entry, so without commonjsOptions the production bundle fails to resolve the CJS default export. This causes "TypeError: e is not a function" on /chat after build (while dev mode works fine). Restore the original commonjsOptions block to fix production builds. * fix(studio): use motion/react instead of legacy framer-motion import * fix(studio): address PR review findings for Vite 8 + bun upgrade Fixes: - Remove bun.lock from repo and add to .gitignore (npm is source of truth) - Use & bun install > $null pattern in setup.ps1 for reliable $LASTEXITCODE - Add Remove-Item node_modules before npm fallback in setup.ps1 - Print bun install failure log in setup.sh before discarding - Add Refresh-Environment after npm install -g bun in setup.ps1 - Tighten Node version check to ^20.19.0 \|\| >=22.12.0 (Vite 8 requirement) - Add engines field to package.json - Use string comparison for _install_ok in build.sh - Remove explicit framer-motion ^11.18.2 from package.json (motion pulls framer-motion ^12.38.0 as its own dependency — the old pin caused a version conflict) Fix Colab Node bypass and bun.lock stale-build trigger Gate the Colab Node shortcut on NODE_OK=true so Colab environments with a Node version too old for Vite 8 fall through to the nvm install path instead of silently proceeding. Exclude bun.lock from the stale-build probe in both setup.sh and setup.ps1 so it does not force unnecessary frontend rebuilds on every run. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Shine1i <wasimysdev@gmail.com>	2026-03-25 04:27:41 -07:00
Daniel Han	be2cd7087a	Add macOS and Linux desktop shortcuts to install.sh (#4568 ) * Add macOS and Linux desktop shortcuts to install.sh Adds create_studio_shortcuts() function that creates platform-native shortcuts after `unsloth studio setup` completes, mirroring the Windows shortcut behavior from PR #4558. Linux: .desktop file in ~/.local/share/applications/ and ~/Desktop/ macOS: .app bundle in ~/Applications/ with Info.plist, exec stub, and optional .icns icon built from unsloth-gem.png via sips+iconutil Both platforms share a Bash launcher script at ~/.local/share/unsloth/launch-studio.sh that provides: - Health check with service fingerprint verification - Port scanning (8888-8908) via ss/lsof - PID-file single-instance guard (no flock dependency) - Terminal spawning (macOS: Terminal.app; Linux: gnome-terminal etc.) - Browser open after health poll with 60s timeout WSL is skipped (no native desktop environment). * Fix 6 issues found by 10 parallel reviewers 1. [10/10] Health check now supports wget as fallback to curl via _http_get() helper, matching the installer's own download() pattern. Previously wget-only systems would time out on every launch. 2. [9/10] Exe path substitution now escapes sed metacharacters (&, \, \|) and shell single-quotes before injection, preventing launcher corruption for paths like /opt/R&D/bin/unsloth. 3. [4/10] Linux .desktop Exec= field now quotes the launcher path, fixing launches from home directories containing spaces. 4. [3/10] macOS AppleScript command now escapes backslashes and double-quotes before interpolation into do script "...", fixing Terminal.app launch failures. 5. [3/10] Single-instance guard now uses atomic mkdir instead of racy check-then-write PID file, preventing duplicate concurrent launches on rapid double-click. 6. [1/10] Launcher now scans for a free port via _find_launch_port() instead of always hardcoding -p 8888, so Studio starts correctly when another service already occupies port 8888. Also fixed: `open` command on Linux (openvt) no longer incorrectly triggers the macOS browser-open path -- now gated on uname=Darwin. * Fix mktemp guard and exe path escaping from PR review comments Two real issues identified from automated review comments: 1. Guard mktemp -d failure in macOS icns generation. If mktemp -d returned empty, dirname would resolve to / and rm -rf would attempt to delete the root directory. Now checks that the temp dir was actually created before proceeding. 2. Replace sed-based exe path substitution with a conf file approach. The previous sed escaping broke paths containing apostrophes (e.g. /home/O'Connor/) because the '\'' escape introduced backslashes that were then double-escaped by the metacharacter pass. Now writes UNSLOTH_EXE to a separate studio.conf file that the launcher sources at runtime, eliminating all sed metacharacter and shell quoting interaction issues. This also addresses the sed -i.bak portability concern (now moot since sed is no longer used on the launcher file). * Fix unbound variable crash and per-user lock in launcher - Use ${UNSLOTH_EXE:-} so set -u does not crash before the friendly error message when studio.conf is missing or empty. - Append $(id -u) to the fallback lock path so each user gets their own lock directory when XDG_RUNTIME_DIR is unset. * Mark desktop shortcut as trusted for GNOME/Nautilus On modern GNOME desktops, chmod +x alone is not sufficient to make a .desktop file launchable by double-click on ~/Desktop. Nautilus requires the metadata::trusted attribute to be set via gio, otherwise it shows a warning dialog instead of launching the application.	2026-03-25 03:37:37 -07:00
Daniel Han	6872c6e850	Remove advanced CodeQL workflow in favor of default setup (#4584 ) The repo has both the CodeQL "default setup" (configured in repo settings) and this advanced workflow file enabled. GitHub does not allow both simultaneously, causing all PR CI runs to fail with: "CodeQL analyses from advanced configurations cannot be processed when the default setup is enabled" Since the default setup already covers the same languages (Python, JavaScript/TypeScript) with the same build-mode (none), remove the redundant advanced workflow file.	2026-03-25 03:34:21 -07:00
dependabot[bot]	38405cc18c	build(deps): bump oxc-parser (#4571 ) Bumps the npm-oxc-validator group in /studio/backend/core/data_recipe/oxc-validator with 1 update: [oxc-parser](https://github.com/oxc-project/oxc/tree/HEAD/napi/parser). Updates `oxc-parser` from 0.116.0 to 0.121.0 - [Release notes](https://github.com/oxc-project/oxc/releases) - [Changelog](https://github.com/oxc-project/oxc/blob/main/napi/parser/CHANGELOG.md) - [Commits](https://github.com/oxc-project/oxc/commits/crates_v0.121.0/napi/parser) --- updated-dependencies: - dependency-name: oxc-parser dependency-version: 0.121.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: npm-oxc-validator ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-25 02:44:38 -07:00
dependabot[bot]	f294161e26	build(deps): bump the actions group with 2 updates (#4570 ) Bumps the actions group with 2 updates: [actions/checkout](https://github.com/actions/checkout) and [github/codeql-action](https://github.com/github/codeql-action). Updates `actions/checkout` from 4 to 6 - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v4...v6) Updates `github/codeql-action` from 3 to 4 - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/v3...v4) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major dependency-group: actions - dependency-name: github/codeql-action dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major dependency-group: actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-25 02:44:22 -07:00
Pete Kloehn	efedbe9740	Feature/add dependabot and codeql security checks (#4479 ) * Add CodeQL analysis workflow configuration * Add Dependabot configuration for package updates Configure Dependabot to check for updates in various ecosystems weekly. * Fix dependabot.yml: bun ecosystem, missing dir, grouping for PR #4479 1. studio/frontend uses bun.lock not package-lock.json, so change npm to bun 2. Add missing studio/backend/requirements/ pip entry (consumed by studio/setup.sh) 3. Add groups with patterns [""] to all pip/bun/npm entries to batch updates and avoid 30+ individual Dependabot PRs on the first run Consolidate pip blocks to fix overlapping directory violation GitHub Dependabot forbids multiple same-ecosystem entries with overlapping directories on the same branch. The root "/" directory overlapped the 3 nested pip dirs. Merge all 4 pip blocks into one using the `directories:` (plural) key. Also remove redundant open-pull-requests-limit from the bun block since grouping with patterns: ["*"] already limits PR count. --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-03-25 02:41:33 -07:00
Datta Nimmaturi	04359be333	[Studio] Try installing causal-conv1d from prebuilt wheels if avialable (#4547 ) * Try installing causal-conv1d from prebuilt wheels if avialable * Prefer installing mamba-ssm from wheel to speed up things * undo python stack install changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "undo python stack install changes" This reverts commit `d943551092`. * add comments * Fix wheel installer: model detection, platform tags, torch pin, error handling - Add nemotron-h (hyphen) and granite-4.0-h / granitemoehybrid to model detection for both causal-conv1d and mamba-ssm. These hybrid Mamba models were silently skipped since nemotron_h (underscore) never matches real HF model IDs like nvidia/Nemotron-H-8B-Base, and granite was missing entirely despite being a supported model in model_config.py and loader.py. - Fix _causal_conv1d_platform_tag to detect linux_aarch64 via platform.machine() instead of hardcoding linux_x86_64. Both upstream releases publish aarch64 wheels. Drop win_amd64 since neither repo publishes Windows wheels (avoids a wasted HTTP probe on every run). - Pin torch to >=2.6.0,<2.11.0 instead of <=2.10.0 to add a version floor and document the wheel coverage range with upstream release links. - Strip non-numeric suffixes from torch minor version so nightly builds like 2.7a0 correctly resolve to wheel tag torch2.7 instead of torch2.7a0. - Use stderr=_sp.PIPE instead of stderr=_sp.STDOUT in the env probe so torch import warnings do not corrupt the JSON output. - Add timeout=30 to the env probe subprocess to prevent indefinite hangs. - Catch Exception (not just ImportError) on the existing-install check so ABI-broken installs with OSError/RuntimeError are retried rather than silently accepted. - Guard uv invocation with shutil.which("uv") to prevent FileNotFoundError crash when uv is not on PATH. Wrap the top-level ensure calls in try/except so failures do not kill the training worker. - Hoist _SSM_MODEL_SUBSTRINGS to module level. - Remove redundant --torch-backend=auto flag from direct wheel URL install. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add LFM2 to causal-conv1d detection; stop training on install failure - Add "lfm2" to _model_wants_causal_conv1d so Studio picks up the fast kernel path for Liquid Foundation Model 2. - Replace silent logger.warning on SSM dependency install failure with an error event that tells the user to choose another model and stops the training job immediately. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Catch subprocess timeout in torch probe; narrow import guard to ImportError - _probe_causal_conv1d_env: wrap subprocess.run in try/except for TimeoutExpired so a slow torch import returns None (falls back to PyPI) instead of killing the training job. - _install_package_wheel_first: narrow except Exception to except ImportError on the __import__ check so unexpected errors from a broken module still propagate. * Remove unconditional torch pin from install_python_stack The torch>=2.6.0,<2.11.0 pin was added to ensure prebuilt causal-conv1d / mamba-ssm wheels exist, but it runs at install time for all users regardless of model choice. This can downgrade or unnecessarily upgrade torch. The worker already handles wheel compatibility at training time by probing the environment and falling back to PyPI, so the install-time pin is not needed. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-25 02:22:26 -07:00
Wasim Yousef Said	926e74509d	feat(chat): cleaner tool UI, inline LaTeX, clickable links (#4561 ) * feat(chat): ghost-style tool containers Remove borders and card styling from tool call UI. ToolFallback uses minimal padding with indented content. ToolGroup defaults to ghost variant with subtle background for multi-tool grouping. * feat(chat): compact web search source pills Switch sources from vertical full-width badges to horizontal wrapping pills with smaller icons. * feat(chat): left-accent code and terminal tool UI Replace bordered card layout with a left border accent for Python and Terminal tool output. Add timer cleanup on unmount for the copy button in both components. * feat(chat): inline latex and clickable links Enable single-dollar $...$ math rendering via createMathPlugin. Add styled link component with target=_blank for external links. * fix(chat): inline generating indicator, static tailwind classes, misc fixes Move generating indicator from viewport footer into assistant message using AnimatedShinyText shimmer. Only shows when message content is empty, hides once tool calls or text appear. Use static size class map in SourceIcon for Tailwind v4 compat. Use unique keys for web search sources. Remove px-3 from ghost tool group variant. * fix(chat): only show generating indicator while message is running Hide the shimmer when message is cancelled or errored with no content, preventing stale loading UI on empty completed messages. * fix: escape currency dollar signs in LaTeX math rendering and fix TS build error - Add preprocessLaTeX() in lib/latex.ts to escape currency patterns ($5, $1,000, $5.99, $100K) before they reach the math parser, preventing false positives when singleDollarTextMath is enabled. Code blocks and already-escaped dollars are left untouched. - Use preprocessLaTeX via useMemo in markdown-text.tsx so Streamdown receives clean input. - Fix TS18048 in thread.tsx: message.status?.type (optional chaining) since status can be undefined. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-25 02:06:03 -07:00
Daniel Han	3998f67680	Bump Data Designer to 0.5.4 (removes litellm dependency) (#4569 ) * Bump Data Designer to 0.5.4 (removes litellm dependency) NVIDIA Data Designer v0.5.4 removes litellm entirely and replaces it with native OpenAI and Anthropic adapters. This follows the litellm supply chain incident where versions 1.82.7 and 1.82.8 were compromised with a credential stealer. Release notes: https://github.com/NVIDIA-NeMo/DataDesigner/releases/tag/v0.5.4 Changes: - Bump data-designer, data-designer-config, data-designer-engine to 0.5.4 - Sync data-designer-deps.txt with 0.5.4 engine requirements: - Added: chardet, fsspec, mcp - Removed: python-json-logger, pymupdf, pymupdf4llm, mammoth (these remain in the unstructured-seed plugin which still needs them) - duckdb constraint relaxed from <1.5 to <2 (upstream fixed record_batch) - Bump plugin lower bound to >=0.5.4 * Keep pymupdf, pymupdf4llm, mammoth in data-designer-deps The unstructured-seed plugin is installed with --no-deps, so its pyproject.toml dependencies are not auto-resolved. These three packages are needed by the seed route (studio/backend/routes/ data_recipe/seed.py) and must remain in the explicit deps list.	2026-03-25 02:01:43 -07:00
Avaya Aggarwal	45d0a343b5	feat: Implement Q-GaLore optimizer and custom embedding learning rate… (#4511 ) * feat: Implement Q-GaLore optimizer and custom embedding learning rate in the Unsloth trainer. * feat: Implement QGaLoreAdamW8bit optimizer with 8-bit states, GaLore low-rank gradient projection, and optional INT8 weight quantization, along with supporting projector and tests. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: Introduce Q-GaLore AdamW optimizer with low-rank quantized gradient projection and integrate into the trainer, along with dedicated tests. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: Implement Q-GaLore AdamW optimizer with gradient projection and quantization, including trainer integration and corresponding tests. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix 3 bugs in Q-GaLore optimizer and add weight_quant forward hooks 1. Fix use-after-delete crash: move `del p._saved_data` after the weight decay block so decoupled weight decay can reference the current weights correctly (p.data). 2. Fix substring matching in make_q_galore_param_groups: split parameter names on "." and check exact component matches to prevent false positives (e.g. "not_q_proj" matching "q_proj"). 3. Implement forward pre-hooks for weight_quant: after the optimizer quantizes weights to INT8, replace p.data with a 1-element placeholder to free float memory. A register_forward_pre_hook dequantizes back to float before each forward pass. The trainer calls install_weight_quant_hooks() when weight_quant is enabled. 4. Update test_weight_decay_uses_saved_data to match the fixed code path (decoupled decay uses p.data, expected value 2.7). Add test_weight_quant_hook_restores_float to verify the INT8-to-float hook round-trip. All 24/24 Q-GaLore tests pass. Benchmarked on Llama-3.2-1B-Instruct FFT: Q-GaLore saves 32% VRAM (10.63 -> 7.24 GB) with better loss convergence (1.3 vs 2.0 at step 100). No regressions in 31-notebook sweep across Llama, Qwen, Mistral, Phi, Gemma, vision, and GRPO. * Default weight_quant to False in QGaloreConfig Benchmarks show weight_quant=True adds ~1 GB on Llama-3.2-1B due to INT8 copy/scale overhead exceeding savings from the placeholder trick. Users can still opt in explicitly. The optimizer logic is unchanged. * Optimize Q-GaLore projector and optimizer step performance Projector (q_galore_projector.py): - Use torch.svd_lowrank with oversampling p=10 (Halko et al. 2009) instead of full SVD for large matrices. Falls back to full SVD when min(m,n) <= 2rank. SVD steps are 6-8x faster on Llama-3.2-1B (22s -> 3s for first step). - Cache the dequantized ortho matrix between project() and project_back() to avoid redundant dequantization when quant=True. - Replace F.cosine_similarity with torch.dot for 1-D unit vectors in the adaptive schedule. Remove unused torch.nn.functional import. - Use collections.deque(maxlen=queue_size) instead of list with manual pop(0). Optimizer (q_galore_adamw.py): - Remove redundant .clone() on dequantized weights (line 151) and on float data before re-quantization (line 211). _dequantize already returns a fresh tensor and _quantize/_quantize_stochastic only reads its input. - Consolidate per-group torch.cuda.synchronize() into a single call after all param groups complete. - Use torch.empty instead of torch.zeros for the scalar placeholder tensor that is never read. Verified: 24/24 unit tests pass. Llama-3.2-1B 61-step training produces losses within 0.24% relative diff (correlation >0.9999) of the original. [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-25 01:03:10 -07:00
Krishna Chaitanya	11606c5025	fix: remove auto wandb.finish() after train() to allow post-training evaluate() (#4564 ) * fix: remove auto wandb.finish() after train() to allow post-training evaluate() The prepare_for_training_mode wrapper unconditionally called wandb.finish() after trainer.train() completed. This terminated the active W&B run, causing trainer.evaluate() to fail with "You must call wandb.init() before wandb.log()". Users who need multiple training runs in one session can call wandb.finish() manually between runs to avoid data overwriting. Fixes #3954 * fix: defer wandb.finish() to next train() call instead of removing it Instead of calling wandb.finish() at the end of train() (which breaks evaluate/log) or removing it entirely (which causes data overwriting on multiple train() calls), defer it to the start of the next train() call. This way: - train() + evaluate() works (run stays open after train) - train() + train() gets separate W&B runs (previous run finished first) - train() + evaluate() + train() also works correctly Also resets HF's WandbCallback._initialized flag so it re-calls wandb.init() for the new run. Fixes #3954 --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-25 01:00:12 -07:00
Wasim Yousef Said	208862218d	feat(studio): training history persistence and past runs viewer (#4501 ) * feat(db): add SQLite storage layer for training history * feat(api): add training history endpoints and response models * feat(training): integrate DB persistence into training event loop * feat(ui): add training history views and card grid * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(studio): address review issues in training history persistence - Strip hf_token/wandb_token from config before SQLite storage - Add UUID suffix to job_id for collision resistance - Use isfinite() for 0.0 metric handling throughout - Respect _should_stop in error event finalization - Run schema DDL once per process, not per connection - Close connection on schema init failure - Guard cleanup_orphaned_runs at startup - Cap _metric_buffer at 500 entries - Make FLUSH_THRESHOLD a class constant - Map 'running' to 'training' phase in historical view - Derive LR/GradNorm from history arrays in historical view - Fix nested button with div[role=button] in history cards - Guard String(value) against null/undefined in config popover - Clear selectedHistoryRunId on auto tab switch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(studio): address round-2 review findings across training backend and frontend Backend (training.py): - Move state mutation after proc.start() so a failed spawn does not wedge the backend with is_training=True - Create DB run row eagerly after proc.start() so runs appear in history during model loading, not after first metric event - Rewrite _flush_metrics_to_db() with snapshot-before-insert pattern to preserve metrics arriving during the write and retain buffer on failure - Guard eval_loss with float() coercion and math.isfinite(), matching the existing grad_norm guard - Increase pump thread join timeout from 3s to 8s to cover SQLite's default 5s lock timeout Frontend (studio-page.tsx): - Fix history navigation: check isTrainingRunning instead of showTrainingView in onSelectRun so completed runs are not misrouted - Replace activeTab state + auto-switch useEffect with derived tab to eliminate react-hooks/set-state-in-effect lint violation Frontend (historical-training-view.tsx): - Add explicit "running" branch to message ternary so running runs no longer fall through to "Training errored" - Derive loading from detail/error state and move cleanup to effect return to eliminate react-hooks/set-state-in-effect lint violation Frontend (progress-section.tsx): - Derive stopRequested from isTrainingRunning && stopRequestedLocal to eliminate react-hooks/set-state-in-effect lint violation and remove unused useEffect import * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(studio): resolve 3 remaining bugs from round-2 review 1. Stuck on Current Run tab [12/20]: Only force "current-run" tab when isTrainingRunning is true, not when stale completed-run data exists. After training ends, users can freely navigate to Configure. 2. Incomplete metric sanitization [7/20]: Apply float() coercion and isfinite() guards to loss and learning_rate, matching the existing pattern used by grad_norm and eval_loss. Prevents TypeError from string values and NaN leaks into history arrays. 3. Stop button state leak across runs [10/20]: Add key={runtime.jobId} to ProgressSection so React remounts it when a new run starts, resetting stopRequestedLocal state. * fix(studio): deduplicate loss/lr sanitization in training event handler Reuse _safe_loss/_safe_lr from the progress update block instead of re-sanitizing the same raw event values for metric history. * fix(studio): restore loss > 0 guard to prevent eval steps injecting 0.0 into metric histories Round-2/3 fixes relaxed the history append guard from `loss > 0` to `loss is not None`, which let eval-only log events (where loss defaults to 0.0) append fake zeros into loss_history and lr_history. Restore the `loss > 0` check to match the worker's own has_train_loss gate. The float() coercion and isfinite() sanitization from round-3 remain intact. * fix(studio): resolve training history bugs — nullable loss/lr, tab nav, sparkline * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-25 00:58:55 -07:00
Daniel Han	3108750bb0	Remove duplicate frontend assets from wheel to reduce package size (#4567 ) The wheel currently ships frontend/public/, frontend/src/, and frontend/.lock alongside frontend/dist/. These are build-time inputs that Vite already copies into dist/ during the build step: - public/ is copied verbatim into dist/ by vite build (28.6 MB duplicate) - src/ is TSX source compiled into dist/assets/.js (2.1 MB, not used at runtime) - *.lock files are package manager lockfiles (0.9 MB, not used at runtime) The backend only serves from frontend/dist/ (see main.py setup_frontend and run.py frontend_path). Nothing references public/ or src/ at runtime. This drops the wheel from ~62.7 MB to ~31 MB.	2026-03-24 23:48:49 -07:00
Lee Jackson	557743f027	studio: windows desktop shortcut launcher (#4558 ) * feat(windows): add Studio desktop/Start shortcuts with health-check launcher * chore(windows): bundle sloth.ico and set shortcut icons when valid * chore(windows):add images/sloth.ico * fix(windows): guard PSScriptRoot for Studio shortcut icon in iex installs * fix(install): high-DPI sloth.ico and relocate to studio/frontend/publi * chore(studio): update sloth.ico for clearer desktop and shell icons * chore(studio): use unsloth.ico for Studio shortcut icon * feat(windows): improve Studio shortcut launcher (fast health + browser UX) * fix(windows): stable unsloth.ico URL and Unicode-safe Studio launcher scripts * fix(windows): escape $ in exe path and write launcher UTF-8 with BOM * fix(windows): skip shortcuts when Desktop or APPDATA paths are missing * fix(install): log shortcut/icon/port failures and warn early on missing paths * fix(install): guard missing LOCALAPPDATA before shortcut paths * fix(install): harden New-StudioShortcuts and improve success messaging * fix(install): include port 8908 in studio health check * fix(install): fix launch-studio.ps1 quoting * Fix launcher edge cases and normalize indentation in install.ps1 - Handle silent timeout: show a message when Studio is still starting but did not become healthy within the timeout, instead of exiting with no feedback - Add -NoProfile to the visible PowerShell terminal launch so the user profile cannot hang or error before Studio runs - Add a named mutex (Local\UnslothStudioLauncher) to prevent double-click from spawning duplicate terminals; second instance polls for health and opens the browser when ready - Normalize indentation inside New-StudioShortcuts outer try block from mixed 8/12-space to consistent 12-space * Simplify Get-CandidatePorts port dedup with Sort-Object -Unique Replace the foreach/-notcontains loop with a single pipeline: $ports = (@($basePort) + $listening) \| Sort-Object -Unique * Harden health probe and handle abandoned mutex in launcher - Test-StudioHealth now checks resp.service == 'Unsloth UI Backend' to avoid fingerprinting collisions with other local services on the same port range. - Wrap the mutex WaitOne(0) call in a try/catch for AbandonedMutexException so the launcher recovers gracefully when a previous instance was killed while holding the mutex. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-24 23:41:02 -07:00
Krishna Chaitanya	9b989ee898	fix: prevent UnicodeEncodeError on Windows CP1252 consoles in studio setup (#4563 ) * fix: prevent UnicodeEncodeError on Windows CP1252 consoles in studio setup On Windows, `unsloth studio setup` crashes with a UnicodeEncodeError when install_python_stack.py tries to print Unicode status glyphs (✅, ❌, ⚠️) to a console that uses a legacy code page like CP1252. Add a _safe_print() helper that catches UnicodeEncodeError and gracefully degrades emoji to ASCII equivalents ([OK], [FAIL], [!]). Replace all print() calls that emit Unicode glyphs with _safe_print(). Fixes #4509 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Replace Unicode dashes with ASCII in install_python_stack.py Box-drawing (U+2500) and em dash (U+2014) chars in section dividers and comments are themselves not representable on CP1252 -- replace with plain ASCII dashes for consistency with the fix. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-24 22:04:09 -07:00
TR-3B	8c94b461fb	Add GRPO resume vLLM cleanup guard (#4411 ) * Add GRPO resume vLLM cleanup guard * Guard GRPO resume sleep on vLLM sleep mode * Harden GRPO resume vLLM cleanup guard - Wrap llm.sleep(1) in try/except so a failed sleep does not block training resume (best-effort cleanup) - Also check kwargs["model_path"] which transformers.Trainer.train() still accepts and normalizes to resume_from_checkpoint internally --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-24 21:37:45 -07:00
Wasim Yousef Said	085f9529b6	Regroup chat settings sidebar into focused sections (#4551 ) * feat(chat): regroup settings sidebar into Model, Sampling, Tools, and Preferences sections Split the monolithic Settings collapsible into focused sections with icons. Model section shows context length and KV cache dtype for GGUF models, trust remote code for non GGUF. Tools section groups auto heal, max tool calls, and tool call timeout. Preferences section holds auto title toggle. * feat(chat): persist collapsible section open/closed state in localStorage Remember which sections the user expanded or collapsed across sidebar toggles, mobile sheet reopens, and browser sessions. * fix(chat): harden collapsible state persistence and restore defaultOpen - Validate localStorage values are booleans before using them, preventing corrupted entries like string "false" from being treated as truthy - Use Object.hasOwn() instead of `in` operator to avoid prototype chain matches on keys like "constructor" or "toString" - Restore defaultOpen={true} on Model and Preferences sections so they are expanded on first visit, matching the old Settings section behavior - Fix misleading Context Length description to reflect it is read-only - Downgrade console.error to console.warn for non-critical localStorage parse failures * fix(chat): remove redundant disabled styles on Context Length input The Input component already applies opacity-50 and cursor-not-allowed via its disabled: variants. Specifying them unconditionally in the className is redundant. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-24 19:39:27 -07:00
Daniel Han	acc881452f	fix: pin unsloth>=2026.3.11 in install.sh and install.ps1 (#4556 ) Ensures both install scripts always pull a version that has the litellm removal fix. Without the pin, stale uv/pip caches could resolve the older 2026.3.10 which still had litellm in data-designer-deps.txt, causing setup to fail at step 8/11 while PyPI has litellm quarantined.	2026-03-24 07:44:07 -07:00
Daniel Han	76a2f17470	fix(studio): remove litellm dep (quarantined on PyPI) (#4553 ) litellm has been quarantined on PyPI due to a supply chain attack in version 1.82.8 (malicious credential-stealing .pth file). No versions are currently installable, which blocks `unsloth studio setup` at step 8/11 (data-designer deps). Remove litellm from the single-env data-designer requirements so setup completes. litellm can be re-added once PyPI lifts the quarantine. Ref: https://github.com/BerriAI/litellm/issues/24512	2026-03-24 07:10:26 -07:00
Daniel Han	fac6f7887e	Versioning	2026-03-24 06:50:36 -07:00
Daniel Han	95d2748278	fix: give @0xKushwaha git history credit for completion_only_loss fix (#4552 ) * Revert "fix: handle prompt/completion datasets in slow-path BOS detection (#4548)" This reverts commit `fca83182af`. * fix: support completion_only_loss=True with prompt/completion dataset columns When completion_only_loss=True, TRL rejects formatting_func but Unsloth's patched _prepare_dataset/_prepare_non_packed_dataloader assumed either formatting_func or dataset_text_field was always set, causing a catch-22. Now handles prompt/completion columns as a third case for BOS token detection, with a safe None fallback for all other cases. (cherry picked from commit `978f78c6f1`) * fix: handle prompt/completion datasets in slow-path BOS detection The slow-path check_text blocks in rl_replacements.py and tokenizer_utils.py crash when a prompt/completion dataset is used because they unconditionally access dataset[0][dataset_text_field] even when the dataset does not have a text field. This fixes both files to: - Default dataset_text_field to None instead of raising when undefined - Detect prompt/completion columns and concatenate them for BOS check - Guard with isinstance(str) on both prompt and completion to handle conversational format (list of dicts) by setting test_text to None - Add test_text is not None guard on has_bos_token_already to prevent AttributeError on NoneType.startswith() This is the slow-path complement to unslothai/unsloth-zoo#560 which fixes the fast-path in sft_prepare_dataset. Closes #4486 (cherry picked from commit `b6ce5786d0`) * fix: preserve chat_template BOS check when test_text is None The has_bos_token_already guard wrapped both test_text.startswith() and bos_token in chat_template with test_text is not None, which disabled the chat_template BOS detection for conversational datasets where test_text is set to None. Split the guard so test_text is not None only applies to the startswith() call, while bos_token in chat_template is always checked. (cherry picked from commit `40bd8b8917`) --------- Co-authored-by: Ayush Kushwaha <148432773+ayushkushwaha240@users.noreply.github.com>	2026-03-24 06:38:57 -07:00
Daniel Han	fca83182af	fix: handle prompt/completion datasets in slow-path BOS detection (#4548 ) * fix: handle prompt/completion datasets in slow-path BOS detection The slow-path check_text blocks in rl_replacements.py and tokenizer_utils.py crash when a prompt/completion dataset is used because they unconditionally access dataset[0][dataset_text_field] even when the dataset does not have a text field. This fixes both files to: - Default dataset_text_field to None instead of raising when undefined - Detect prompt/completion columns and concatenate them for BOS check - Guard with isinstance(str) on both prompt and completion to handle conversational format (list of dicts) by setting test_text to None - Add test_text is not None guard on has_bos_token_already to prevent AttributeError on NoneType.startswith() This is the slow-path complement to unslothai/unsloth-zoo#560 which fixes the fast-path in sft_prepare_dataset. Closes #4486 * fix: preserve chat_template BOS check when test_text is None The has_bos_token_already guard wrapped both test_text.startswith() and bos_token in chat_template with test_text is not None, which disabled the chat_template BOS detection for conversational datasets where test_text is set to None. Split the guard so test_text is not None only applies to the startswith() call, while bos_token in chat_template is always checked.	2026-03-24 05:27:59 -07:00
Michael Han	a41dbb6ab2	Add r/unsloth Reddit.md	2026-03-24 04:13:38 -07:00
Michael Han	381f509695	Adding Qwen3.5 RL.md	2026-03-24 04:06:23 -07:00
Wasim Yousef Said	c8057d911b	fix: system prompt ignored in unsloth inference (#4528 ) * fix: system prompt was dropped in unsloth text and vision inference * refactor: simplify system prompt message construction * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: use multimodal typed content parts for vision system message and add fallback The system message content must use typed content parts ([{"type": "text", "text": ...}]) instead of a plain string to match the multimodal processor contract (consistent with the audio path). Plain strings cause some processors (e.g. LLaVA) to silently drop the system prompt. Also wraps processor.apply_chat_template in try/except so models that reject the system role gracefully fall back to no system message with a warning log. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: capture and log original exception in vision system prompt fallback --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-24 04:01:33 -07:00
Wasim Yousef Said	3dc212e218	fix: always show chat tool icons (#4525 ) * fix: always show chat tool icons, gray out when model doesn't support them Tool icons (Think, Search, Code) were hidden unless a model was loaded and supported those features. Now they're always visible so users can see and pre-select them. If a loaded model doesn't support a feature, the button gets grayed out and disabled instead of being removed. * refactor: centralize Qwen thinking params in store * fix: disable tool buttons when no model is loaded Change disabled condition from `modelLoaded && !supportsX` to `!modelLoaded \|\| !supportsX` so buttons are grayed out both when no model is loaded and when the loaded model lacks the capability. * Fix Qwen3 param clobbering and restore SuggestionItem capability guards - Revert setReasoningEnabled() in the store to a pure boolean setter. Moving the Qwen3 param logic into it caused reconnect/load/refresh paths (which also call setReasoningEnabled) to silently overwrite user-customized or server-provided temperature/topP/topK/minP. - Restore applyQwenThinkingParams() as a standalone function called only from explicit user toggle click handlers in thread.tsx and shared-composer.tsx, matching the pre-PR behavior. - Re-add supportsReasoning/supportsTools guards in the SuggestionItem click handler so that clicking a suggestion card only activates tool toggles the loaded model actually supports. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-24 03:26:56 -07:00
Daniel Han	77b21333fb	fix(studio): restore scroll lock on reasoning panel collapse (#4545 ) PR #4543 removed useScrollLock from ReasoningRoot, causing the thread viewport to jump when a user collapses a reasoning panel. Restore the hook to freeze scrollTop during the 200ms collapse animation, matching the pattern used by tool-fallback.tsx and tool-group.tsx.	2026-03-24 02:27:06 -07:00
Wasim Yousef Said	1129ea44bc	fix(studio): show Windows-specific reset-password command on login error (#4529 )	2026-03-23 23:04:00 -07:00
Daniel Han	5916bcb2e3	Fix Studio port conflict detection for loopback addresses (#4532 ) * Fix port conflict detection when loopback address is held by another process * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use getaddrinfo for IPv6 host support, restore emojis in terminal output * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Guard against conn.pid being None in _get_pid_on_port psutil.net_connections() can return entries with pid=None when the current user lacks privileges to see the owning process (common on macOS without root, Windows without admin, and some Linux configs). psutil.Process(None) does not raise -- it silently returns the current process, which would make the warning incorrectly blame Unsloth Studio itself for blocking the port. Skip entries with pid=None so the caller falls back to the generic "port is already in use" message instead. * Update studio/backend/run.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-23 22:34:47 -07:00
Lee Jackson	45e4a0473a	studio: stop scroll hijack during generation and fix thinking panel layout shift (#4543 ) * fix(chat): stabilize thinking panel and thread scroll during generation * fix: match ChatGPT scroll and thinking panel behavior - Remove autoScroll={false} from thread viewport to restore default follow-scroll during streaming (pauses when user scrolls up, resumes at bottom) - Rewrite reasoning panel state: auto-opens on stream start, user can close during streaming, auto-collapses when reasoning ends, user can re-expand after collapse --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-23 22:33:46 -07:00
Lee Jackson	01d7dce3f4	studio: persist system prompt and preset settings across navigation (#4538 ) * fix(studio): harden system prompt persistence and storage fallback * Exclude checkpoint from localStorage persistence for PR #4538 checkpoint is backend-owned state -- refresh() already syncs it from getInferenceStatus() on every page load. Persisting it to localStorage causes a stale model ID to survive across backend restarts, which prevents auto-load from triggering when no model is actually loaded. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-23 22:21:04 -07:00
金黄色葡萄球君君	2b330e2f24	fix: store embedding_learning_rate on self in UnslothTrainingArguments (#4531 ) Fixes #4492 The embedding_learning_rate parameter was assigned to a local variable instead of self.embedding_learning_rate, causing UnslothTrainer.create_optimizer() to always get None via getattr and silently fall back to a single param group. Bug: embedding_learning_rate = embedding_learning_rate (no-op) Fix: self.embedding_learning_rate = embedding_learning_rate	2026-03-23 21:08:29 -07:00
pre-commit-ci[bot]	a5be6904a6	[pre-commit.ci] pre-commit autoupdate (#4542 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.15.6 → v0.15.7](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.6...v0.15.7) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-23 14:55:27 -07:00
Datta Nimmaturi	cd65584f19	Update issue template	2026-03-23 10:10:15 +05:30
Daniel Han	1ecb55faa2	Update _utils.py	2026-03-22 08:23:40 -07:00
Daniel Han	797ddd201e	Fix Studio silently exiting on Windows without error output (#4527 ) * Fix Studio silently exiting on Windows without error output On Windows, `unsloth studio` launches a child process via subprocess.Popen to run the server in the studio venv. If the child crashes (e.g. due to a missing package), the parent just calls typer.Exit(rc) with no message -- the user sees "Launching Unsloth Studio... Please wait..." and then the prompt returns with zero feedback. Root cause: `data_designer_unstructured_seed` is imported at the top level in seed.py. If this package is not installed in the studio venv, the entire import chain (seed.py -> routes/__init__.py -> main.py -> run_server()) crashes with ModuleNotFoundError. Since run.py has no try/except around run_server() and studio.py does not report nonzero exit codes, the failure is completely silent. Changes: - run.py: wrap run_server() in try/except, print clear error with traceback to stderr. Also reconfigure stderr encoding on Windows so tracebacks with non-ASCII paths do not cause secondary failures. - studio.py: print an error message when the child process exits with a nonzero code on Windows, so the user knows something went wrong. - seed.py: make data_designer_unstructured_seed import optional with a try/except fallback. The server starts normally and only returns HTTP 500 if the unstructured seed endpoints are actually called. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Skip Anaconda/Miniconda Python when creating Studio venv on Windows Conda-bundled CPython ships modified DLL search paths that prevent torch from loading c10.dll on Windows. The Studio server fails silently at startup because the venv was created with conda's Python. Standalone CPython (python.org, winget, uv) does not have this issue. Both install.ps1 and setup.ps1 now skip any Python binary whose path contains conda, miniconda, anaconda, miniforge, or mambaforge when selecting the interpreter for the studio venv. If only conda Python is available, the scripts print an error with instructions to install standalone CPython. * Fix multi-file preview crash and improve setup.ps1 Python discovery Addresses review findings [10/10] and [8/10]: 1. seed.py: _read_preview_rows_from_multi_files() had a hard import of build_multi_file_preview_rows inside the function body, bypassing the optional-plugin guard. Moved it into the top-level try/except block and added a None guard matching the other functions. 2. setup.ps1: Python discovery now probes py.exe (Python Launcher) first, uses Get-Command -All to look past conda entries that shadow standalone CPython further down PATH, skips WindowsApps stubs, and resolves the actual executable path so venv creation does not re-resolve back to a conda interpreter. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Check sys.base_prefix to catch venvs created from conda Python A venv created from conda Python (e.g. C:\Users\danie\.venv) has a path that does not contain "conda", but sys.base_prefix still points to the conda install (e.g. C:\Users\danie\miniconda3). The previous path-only check missed this case entirely. Both install.ps1 and setup.ps1 now use a Test-IsConda helper that checks both the executable path AND sys.base_prefix against the conda/miniconda/anaconda/miniforge/mambaforge pattern. This catches: - Direct conda Python executables - Venvs created from conda Python (base_prefix reveals the origin) * Fix install.ps1 passing version string to uv venv instead of resolved path Find-CompatiblePython returned a bare version string (e.g. "3.13") which was passed to `uv venv --python 3.13`. uv performs its own interpreter discovery and can resolve that version string back to a conda Python, defeating the entire conda-skip logic. Now Find-CompatiblePython returns a hashtable with both .Version (for display) and .Path (the resolved absolute executable path). The venv is created with `uv venv --python <absolute-path>`, ensuring uv uses the exact interpreter we validated. * Quote resolved Python path in uv venv call for paths with spaces --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-22 08:23:03 -07:00
Daniel Han	866cb33ce0	Update _utils.py	2026-03-22 06:14:35 -07:00
NuoFang	4cedeba8c2	fix(studio): prevent ModuleNotFoundError in dataset.map() on Windows (#4473 ) * fix(studio): prevent ModuleNotFoundError in dataset.map() on Windows On Windows, dataset.map() uses "spawn", which requires workers to import compiled modules from disk. Previously, clear_unsloth_compiled_cache() deleted the entire directory, causing workers to crash when looking for UnslothSFTTrainer.py. Changes: 1. Added `preserve_patterns` to cache cleanup to keep `UnslothTrainer.py` on Windows while clearing model-specific files. 2. Added the cache directory to PYTHONPATH for spawn workers. Linux/macOS behavior is unchanged. [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix spawn-platform coverage, CWD path mismatch, and race condition for PR #4473 - Extend platform guard from win32-only to include macOS (also uses spawn since Python 3.8, same ModuleNotFoundError would occur) - Replace fragile CWD-based PYTHONPATH registration with centralized register_compiled_cache_on_path() that uses the same __file__-relative _CACHE_DIRS already used by cache_cleanup -- fixes path mismatch when studio is launched from a directory other than the repo root - Move PYTHONPATH registration to the top of _train_worker(), before any dataset.map() call (previously it ran late in config assembly, after dataset formatting which also calls dataset.map()) - Update inference.py model-unload to preserve trainer files on spawn platforms, preventing a race where unloading a model via inference tab would delete UnslothSFTTrainer.py while training workers are importing it * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix cache-dir precedence reversal in register_compiled_cache_on_path() Iterating _CACHE_DIRS in forward order while calling insert(0) each time reverses the declared priority: later entries shadow earlier ones. When multiple compiled-cache directories exist, spawned workers could import a stale trainer from the wrong cache. Fix: iterate in reverse so that the highest-priority entry (first in _CACHE_DIRS) is inserted last and ends up at position 0 in sys.path and PYTHONPATH. * fix: harden worker-count helpers against cpu_count=None and desired<=0 - safe_num_proc: guard os.cpu_count() with `or 1`, clamp multi-GPU path with max(1, min(4, desired)), clamp return with max(1, desired) - safe_thread_num_proc: same os.cpu_count() guard and return clamp - Add regression tests (31 L1 unit + 10 sandbox edge-case tests) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove regression tests from PR --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-22 06:11:24 -07:00
Daniel Han	62e3de181f	Update weather dashboard suggestion to request HTML code output (#4523 ) The previous prompt "Show me a live weather dashboard, no API key needed" was too vague. The new wording explicitly asks for HTML code, which produces more useful and consistent responses.	2026-03-22 06:09:48 -07:00
Leo Borcherding	71c77d4e96	fix(install.ps1): fix non-NVIDIA package resolution — split torch+unsloth install (#4515 ) * fix(install.ps1): split torch+unsloth install to fix non-NVIDIA package resolution --torch-backend=auto on a non-NVIDIA Windows machine causes uv to resolve unsloth==2024.8 (pre-CLI, no unsloth.exe). Fix: detect GPU robustly (PATH + hardcoded fallback paths, mirrors setup.ps1), install torch first with an explicit --index-url (CUDA variant for NVIDIA, CPU for everyone else), then install unsloth separately without --torch-backend so the solver always picks a modern release that ships the Studio CLI. Closes the remaining gap flagged in #4478. * fix(install.ps1): align warning with setup.ps1, add --upgrade, handle CUDA 11.x - Match the no-GPU warning message to studio/setup.ps1 wording (chat-only GGUF mode, driver download link) - Add CUDA 11.x floor check in Get-TorchIndexUrl so old drivers fall back to CPU wheels instead of silently getting cu124 - Log a warning when nvidia-smi output cannot be parsed - Add --upgrade to both uv pip install calls so re-runs pick up newer package versions * revert --upgrade from uv pip install calls uv pip install already resolves to the latest satisfying version; --upgrade is unnecessary and could force unwanted re-installs. * fix: replace frozen cu124 fallbacks with cu126, guard CUDA 11.x cu124 wheels are frozen at torch 2.6.0 -- falling back to them pins users to an outdated PyTorch. Three issues fixed in both install.ps1 and setup.ps1: 1. CUDA 12.0-12.5 now maps to cu126 (was cu124). 2. CUDA 11.x and older now falls back to cpu (was cu124, which would silently install incompatible GPU wheels). 3. Parse-failure and no-nvidia-smi fallbacks updated to cu126/cpu. Adds tests/test_cuda_wheel_mapping.py covering the mapping logic, nvidia-smi parsing, PS1 file sync, PyTorch index URL validation, and sandbox torch installs. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove test file from PR branch Test file kept locally, not needed in the PR. * fix: map CUDA 11.x to cu118 instead of cpu PyTorch still publishes cu118 wheels (up to torch 2.7.1), so CUDA 11.x users get GPU-accelerated torch rather than being forced to CPU-only. Only CUDA 10.x and older fall back to cpu. * fix: revert CUDA 12.0-12.5 to cu124, handle cpu tag in setup.ps1 CUDA 12.0-12.5 drivers only support up to their reported CUDA version, so cu126 wheels (built with CUDA 12.6) fail to load. Revert the catch- all for 12.0-12.5 back to cu124. Also fix setup.ps1 caller: when Get-PytorchCudaTag returns "cpu" (e.g. CUDA 10.x driver), the installer now correctly skips Triton and prints "CPU-only" instead of "CUDA support (cpu)". * fix: add --upgrade to unsloth install for stale venv repair On reruns against an existing venv, uv pip install unsloth makes no changes if unsloth==2024.8 is already installed (it satisfies the constraint). Adding --upgrade only to the unsloth install ensures stale installs get repaired without forcing a multi-GB torch re-download. * fix: use --upgrade-package to avoid clobbering torch CUDA wheels `--upgrade unsloth` re-resolves torch from default PyPI, stripping the +cuXXX suffix installed in step 1. `--upgrade-package unsloth unsloth` upgrades only unsloth (and pulls missing deps like transformers, trl) while preserving the pinned torch from the CUDA-specific index. * docs: explain why split-install and --upgrade-package are needed Expand the inline comment block to document both design decisions: 1. Why torch is installed separately (solver fallback to 2024.8) 2. Why --upgrade-package is used instead of --upgrade (preserves CUDA wheels) --------- Co-authored-by: LeoBorcherding <LeoBorcherding@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-22 05:41:58 -07:00
Daniel Han	100b8857f2	Fix Studio crash on Anaconda/conda-forge Python (#4484 ) * Fix Studio crash on Anaconda Python due to platform._sys_version() parse failure Anaconda and conda-forge modify sys.version to include distributor metadata between pipe characters, e.g.: 3.12.4 \| packaged by Anaconda, Inc. \| (main, ...) [MSC v.1929 ...] Python's platform._sys_version() has a hardcoded regex that cannot parse this format, raising ValueError. CPython closed this as "not planned" (cpython#102396) since Anaconda modified the binary. This breaks the import chain: run.py -> structlog -> rich -> attrs, which calls platform.python_implementation() at module scope. Fix: before any library imports, strip the pipe segments, parse the cleaned version string via the standard parser, and cache the result under the original sys.version key so all subsequent platform calls hit the cache. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add defensive fallback for unpaired pipe edge cases in version patch Address Gemini review suggestion: if the paired-pipe regex leaves residual pipes (hypothetical single-pipe distributor metadata), fall back to extracting the version number and the parenthesized build info directly. Wrap the entire patch in try/except so unexpected version string formats degrade gracefully instead of crashing the patch itself. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor into shared _platform_compat module, cover colab.py entrypoint Address reviewer feedback: 1. Extract the Anaconda/conda-forge sys.version fix into a shared _platform_compat.py module that wraps platform._sys_version() with a retry-on-ValueError fallback. This is more robust than cache-seeding because it handles all future platform._sys_version() calls, not just the first one. 2. Import the fix from both run.py and colab.py entrypoints, so Studio no longer crashes on Anaconda Python regardless of the launch path. 3. The wrapper is idempotent (guarded by a flag) and handles edge cases: paired pipes (Anaconda, conda-forge), unpaired pipes (hypothetical), and standard CPython strings (no-op since ValueError is never raised). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Replace monkey-patch with cache-prime, fix colab.py duplicate sys.path, cover main.py - Rewrite _platform_compat.py: replace function-wrapping monkey-patch with one-shot cache seed (_seed_sys_version_cache). Parses cleaned sys.version once and seeds platform._sys_version_cache so the stdlib parser never sees the problematic Anaconda/conda-forge pipe-delimited string. No function replacement, no idempotency flag, no reload edge cases. - colab.py: remove duplicate backend_path sys.path insertion after _bootstrap_studio_venv(). The early insertion (before _platform_compat import) already covers it. This also fixes backend/ ending up behind venv site-packages in sys.path ordering. - run.py: move PYTHONWARNINGS=ignore before _platform_compat import to preserve original intent of suppressing warnings early. - main.py: add sys.path + _platform_compat import before route imports, covering the direct `uvicorn main:app` launch path. - Add test_platform_compat.py with 7 tests covering Anaconda, conda-forge, and standard CPython version strings, plus the loggers import chain. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove test_platform_compat.py from PR * Handle Format B conda-forge version strings with duplicate paren groups Some conda-forge builds produce sys.version with the build info both before and after the pipe label (e.g. "3.9.7 (default, ...) \| packaged by conda-forge \| (default, ...) \n[GCC 7.5.0]"). After stripping the pipe segment, two consecutive (...) groups remain, which still fails platform._sys_version(). Add a second regex pass to drop the duplicate paren group. * Guard _sys_version call with try/except to avoid making things worse If the cleaned version string is still unparseable by the stdlib regex (e.g. nested parens, exotic multi-pipe formats), silently give up instead of letting ValueError propagate at import time -- which would be a worse crash than the original deferred one. --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-22 05:36:55 -07:00
Andrew Barnes	2c5d3c48ec	fix: subprocess crash during map operation on Windows (#4507 ) * fix: handle Windows subprocess crash during dataset.map() Windows uses spawn (not fork) for multiprocessing. Spawned workers cannot resolve Unsloth's dynamically compiled cache modules from unsloth_compiled_cache/, causing ModuleNotFoundError and RuntimeError during dataset.map() tokenization. Add two platform-guarded patches for sys.platform == "win32": 1. Force HF_DATASETS_MULTITHREADING_MAX_WORKERS=1 and set spawn method 2. Monkey-patch Dataset.map() to force num_proc=None Fixes #4490 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address review: extend spawn fix to macOS, add multiprocess fallback - Change platform checks from sys.platform == "win32" to sys.platform != "linux" so macOS (also spawn-based) is covered - Wrap multiprocess import in try/except falling back to stdlib multiprocessing when the multiprocess package isn't installed - Rename _win32_safe_map to _spawn_safe_map to reflect broader scope Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: replace global Dataset.map monkey-patch with targeted num_proc routing The previous approach had issues: Patch 1 set HF_DATASETS_MULTITHREADING_MAX_WORKERS and forced set_start_method (dead code on platforms already using spawn), and Patch 2 globally monkey-patched Dataset.map() (too broad, missed Dataset.filter()). Replace with a two-layer fix: 1. Studio layer: Add dataset_map_num_proc() that returns None on spawn platforms (Windows, macOS). Unlike num_proc=1 which still creates Pool(1) and spawns a worker, num_proc=None runs Dataset.map()/filter() truly in-process. Update all dataset.map() callsites to use it. ThreadPoolExecutor callers (format_conversion.py) keep using safe_num_proc() since threads are unaffected. 2. Root-cause layer: Propagate UNSLOTH_COMPILE_LOCATION via PYTHONPATH on spawn platforms so spawned workers can import compiled modules. Mirrors the .venv_t5 pattern in worker.py. Does not import unsloth_zoo.compiler (heavy torch/triton imports). Completely skipped on Linux. Also extend safe_num_proc() to return 1 on macOS (was only guarding Windows), and narrow the transformers 5.x dataloader guard from != "linux" to explicit ("win32", "darwin"). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: add safe_thread_num_proc() for ThreadPoolExecutor callsites safe_num_proc() correctly caps to 1 on macOS/Windows for process-based multiprocessing, but format_conversion.py reuses it for ThreadPoolExecutor workers. Threads share address space and are unaffected by spawn, so capping to 1 makes image URL downloads sequential -- a real regression. Add safe_thread_num_proc() that skips the platform guard but keeps the cpu_count heuristic, and switch both ThreadPoolExecutor callsites in format_conversion.py to use it. * fix: remove double-wrap in dataset_num_proc + fix num_proc=1 in datasets route - trainer.py:3009: Replace safe_num_proc(max(1, os.cpu_count() // 4)) with max(1, (os.cpu_count() or 1) // 4) to avoid double-wrapping inside dataset_map_num_proc which already calls safe_num_proc - trainer.py:15-20: Clarify comment on PYTHONPATH propagation - datasets.py:445: Change num_proc=1 to num_proc=None for 10-row preview slice (avoids unnecessary multiprocessing overhead) * fix: guard os.cpu_count() against None in worker-count helpers os.cpu_count() can return None on some platforms. Use (os.cpu_count() or 1) to prevent TypeError in safe_num_proc() and safe_thread_num_proc(). --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-22 05:21:09 -07:00
DoubleMathew	4c1a6cb962	gate on min uv version and shortcut python candidate search if known (#4489 ) * gate on min uv version and shortcut python candidate search if known * fix sort -V cross compat issue, run_quiet early exit on llamacpp, autolaunch * update launch message * Fix PR comments * auto launch and find open port * remove dev install * Fix review findings: major-version guard, non-fatal port fallback, tty comment, restore local * Remove autolaunch, clean up dead state and debug noise - Remove find_open_port, TTY-gated autolaunch, and </dev/tty redirection from install.sh; just print launch instructions - Remove unused BEST_MAJOR variable from studio/setup.sh - Remove stray "finished finding best python" debug echo - Fix stale comment "below 3.12" to "below 3.11" * Reject prerelease uv at exact minimum version boundary * Remove 2>/dev/null from version_ge numeric comparisons Let non-numeric version parts surface errors on stderr instead of being silently swallowed. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-22 05:20:25 -07:00
Daniel Han	64f9f389a0	Created using Colab	2026-03-22 04:57:26 -07:00
Velsa	981f477e31	fix: reconfigure stdout to UTF-8 on Windows to prevent UnicodeEncodeError on startup (#4493 ) * fix: reconfigure stdout UTF-8 on Windows to prevent UnicodeEncodeError from emoji * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: default frontend_path when None to fix blank page when venv is pre-activated * Restore Windows UTF-8 stdout fix dropped in earlier commit The cp1252 console encoding on Windows cannot render emoji characters used in startup messages (e.g. print("✅ Frontend loaded ...")). This causes UnicodeEncodeError and crashes the server before it starts. Place sys.stdout.reconfigure(encoding="utf-8", errors="replace") at the top of run_server(), unconditionally before any print() or structlog call, so all emoji output is covered -- including the frontend status messages and silent=True paths that the original placement missed. Guarded by sys.platform == "win32" and hasattr check, so it is a no-op on Linux/macOS and safe in non-standard stdout environments (Jupyter, piped IO). * fix: preserve run_server(None) as headless, fix CLI frontend kwarg Remove the frontend_path=None fallback in run_server() that changed None from "headless/API-only" to "mount bundled frontend", breaking backwards compatibility for embedders. The blank-page bug was actually caused by the CLI wrappers always passing frontend_path=frontend (even when frontend=None), which overrode run_server()'s default. Fix studio.py and ui.py to only pass frontend_path when the user explicitly sets --frontend. * fix: use timeout loop for shutdown event in ui command Match studio_default()'s shutdown loop that uses a 1-second timeout on Event.wait(). Without a timeout, the bare wait() blocks at the C level on Linux, preventing Python from delivering SIGINT (Ctrl+C). --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-22 04:49:59 -07:00
Leo Borcherding	96edad9c95	PR: Fix/cuda minimum check and abort (#4517 ) * fix: add CUDA minimum version check and abort for llama.cpp (>= 12.4) - setup.ps1/setup.sh: abort with clear error if CUDA toolkit < 12.4 (llama.cpp requirement); link to cuda-toolkit-archive for upgrade - setup.ps1: promote CUDA VS integration copy failure from WARN to ERROR + exit 1; remove manual-copy hack instructions per Roland — correct fix is re-installing CUDA/MSBuild, not a manual workaround Fixes: https://github.com/unslothai/unsloth/issues/4437 Reported by: Sebastien * fix: wipe stale studio venv when torch CUDA tag changes When the NVIDIA driver is updated, the required PyTorch CUDA tag changes (e.g. cu124 -> cu130) but setup.ps1 was silently reusing the existing .venv, leaving the old torch wheel in place and breaking the UI for everyone on the next setup run. Before creating/reusing the venv, inspect the installed torch version string. If its CUDA tag does not match what the current driver requires, wipe the venv so we always get a clean, correct install. * Fix CUDA version check: portability, non-fatal fallback, stale venv detection - setup.sh: Replace grep -oP with POSIX sed for macOS compatibility - setup.sh: Replace exit 1 with NVCC_PATH="" to fall back to CPU-only build - setup.sh: Move version check before -DGGML_CUDA=ON append - setup.sh: Add else branch warning when nvcc version is unparseable - setup.ps1: Replace exit 1 with $NvccPath=$null for non-fatal CUDA fallback - setup.ps1: Add driver vs toolkit guidance in version warning - setup.ps1: Guard CUDA env/VS integration setup with if ($NvccPath) - setup.ps1: VS integration catch: downgrade to WARN, restore source/dest paths - setup.ps1: Stale venv: detect CPU torch and untagged wheels, not just +cuNNN - setup.ps1: Stale venv: rebuild on failed torch import - setup.ps1: Stale venv: wrap Remove-Item in try/catch for locked files * Remove incorrect CUDA >= 12.4 check, keep only stale venv detection llama.cpp has no hard minimum CUDA version -- it builds with CUDA as old as 11.2 and degrades features gracefully via #if CUDART_VERSION guards. The 12.4 figure was the default Docker/CI baseline, not a build requirement. Reverted: - CUDA version check in setup.sh (entirely removed) - CUDA version check in setup.ps1 (entirely removed) - VS integration catch block cosmetic changes (restored to main) - if ($NvccPath) guard around CUDA env setup (not needed without version check) Kept: - Stale venv detection in setup.ps1: detects torch CUDA tag mismatch (cu124 vs cu130, cpu vs cuXXX, broken torch import) and rebuilds venv * Fix stale venv detection: incomplete venvs, timeout, fatal delete failure - Add 30s timeout for torch import probe via ProcessStartInfo/WaitForExit - Use Test-Path -PathType Container to reject files masquerading as venv dir - Trigger rebuild when python.exe is missing (incomplete venv) - Make Remove-Item failure fatal ([ERROR] + exit 1) instead of warn-and-continue - Move $expectedTorchTag computation inside -not $shouldRebuild guard --------- Co-authored-by: LeoBorcherding <LeoBorcherding@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-22 04:46:36 -07:00
Daniel Han	bcf28466c2	fix: exclude .ipynb from ruff pre-commit hook (#4521 ) The ruff pre-commit hook runs on all file types by default, including .ipynb notebooks. Colab notebooks are authored in Colab's editor and can contain IPython magics (%cd, !git) that ruff cannot parse. This causes pre-commit.ci to fail on unrelated PRs when a notebook on main has syntax ruff does not understand. Add `exclude: '\.ipynb$'` to the ruff hook so notebooks are skipped.	2026-03-22 03:25:58 -07:00
Daniel Han	17dc83dc34	Created using Colab	2026-03-22 01:56:35 -07:00
Michael Han	d50e605f08	Update README.md	2026-03-21 14:55:18 -07:00
Sridhar Nandigam	a20c824711	FIX: Broken link to NVIDIA DataDesigner in README (#4500 )	2026-03-21 14:41:09 -07:00
Wasim Yousef Said	50cccfd55e	feat(chat): server-side timings, context display & source hover cards (#4467 ) * feat(chat): add server-side timings and context display for GGUF Extract timings/usage metadata from llama-server SSE stream and forward through the full stack. Replace client-side estimates with accurate server-reported metrics (prompt eval, tok/s, token counts, cache hits). Add context window usage bar to chat top nav. * feat(chat): source badges with hover cards and 2-row collapse - Add hover cards to source badges showing favicon, title, URL and snippet description on hover - Limit source badges to 2 rows with +X more expand/collapse - Parse snippet from web search results for hover card descriptions - Replace individual Source rendering with grouped SourcesGroup component * fix(chat): add null guards for server timings edge cases * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(chat): reset contextUsage on thread switch, remove unused context-display * fix(chat): stop double-counting completion tokens in tool-calling path * fix(chat): skip metadata events in llm_assist consumers * fix(chat): hide context usage bar in compare mode * fix(chat): harden timings pipeline and context usage persistence Accumulate prompt_ms, predicted_ms, and predicted_n from intermediate tool-detection passes so the final metadata reflects total server work. Persist contextUsage in message metadata (Dexie) and restore on thread load. Add type guard in gguf_stream_chunks for unexpected dict events. Clear contextUsage when entering compare mode. * feat(chat): make GGUF stream metadata OpenAI-compatible * fix(chat): address PR review feedback * feat(chat): address PR review feedback * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-20 23:42:01 -07:00
Wasim Yousef Said	dd283b0605	feat(studio): multi-file unstructured seed upload with better backend extraction (#4468 ) * fix(recipe-studio): prevent fitView from zooming to wrong location on recipe load * feat: add pymupdf/python-docx deps and unstructured uploads storage root * feat: add POST /seed/upload-unstructured-file endpoint * feat: add multi-file chunking with source_file column * feat: update frontend types and API layer for multi-file upload * feat: round-robin preview rows across source files Ensures every uploaded file is represented in the preview table by cycling through sources instead of just taking the first N rows. * fix: disable OCR, fix auto-load timing, fix persistence on reload - Disable pymupdf4llm OCR with write_images=False, show_progress=False - Replace onAllUploaded callback with useEffect that detects uploading→done transition (avoids stale closure reading empty file IDs) - Fix importer to preserve file IDs from saved recipes instead of clearing (clearing only happens at share time via sanitizeSeedForShare) * fix: harden unstructured upload with input validation and state fixes Validate block_id/file_id with alphanumeric regex to prevent path traversal, use exact stem match for file deletion, add error handling for metadata writes and empty files, fix React stale closures and object mutations in upload loop, and correct validation logic for unstructured seed resolved_paths. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: address PR review - legacy path import, share sanitizer, sync effect Promote legacy source.path into resolved_paths for old unstructured recipes, clear source.paths in share sanitizer to prevent leaking local filesystem paths, and gate file sync effect to dialog open transition so users can actually delete all uploaded files. * fix: CSV column fix (BOM + whitespace + unnamed index re-save) for #4470 * fix: harden unstructured upload flow and polish dialog UX * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-20 13:22:42 -07:00
Michael Han	f113f3511d	Update Install method.md	2026-03-20 05:17:05 -07:00
Daniel Han	ef0491e0fe	Fix Windows installer Python detection and winget error handling (#4483 ) * Fix Windows installer Python detection and winget error handling The PowerShell installer crashes on some Windows machines due to two issues: 1. Windows Store App Execution Aliases: Get-Command finds the stub at WindowsApps\python.exe, then python --version writes to stderr. With $ErrorActionPreference = "Stop" on PowerShell 5.1, stderr from native commands becomes a terminating error, killing the script before it tries to install Python. 2. winget "already installed" exit code: winget returns -1978335189 (APPINSTALLER_CLI_ERROR_UPDATE_NOT_APPLICABLE) when the package is already at the latest version. The script treated any non-zero exit as failure. The fallback Get-Command check could also find the Store stub or fail if Python was partially uninstalled. Changes: - Add Find-CompatiblePython helper that tries the py launcher first, then python3/python via Get-Command -All, explicitly skipping any WindowsApps stubs. All invocations wrapped in try-catch so stderr never triggers ErrorActionPreference. - Replace exit-code-based winget error handling with outcome-based: re-detect Python after install, retry with --force if not found, show actionable manual install instructions on final failure. - Deduplicate PATH entries in Refresh-SessionPath to prevent unbounded growth from repeated machine+user path prepending. * Address reviewer feedback: wrap winget calls, remove blanket WindowsApps filter Three fixes based on code review: 1. Wrap all winget install calls in $ErrorActionPreference = "Continue" blocks so that winget stderr (progress bars, warnings) does not become a terminating error on PowerShell 5.1. This matches the pattern already used in studio/setup.ps1 line 983. 2. Remove the blanket \WindowsApps\ path filter that rejected all WindowsApps executables including valid Microsoft Store Python installs. Instead, rely on the existing try-catch + version regex probing to determine if a candidate is functional. Non-functional entries (App Execution Alias stubs) fail the try-catch and are skipped naturally. 3. Use $pyLauncher.Source (resolved path) instead of bare py name, add -CommandType Application to avoid matching aliases/functions, and derive winget package ID from $PythonVersion variable instead of hardcoding Python.Python.3.13. * Add back WindowsApps filter for python3/python fallback path The App Execution Alias stubs in WindowsApps can open the Microsoft Store as a side effect when invoked, even though the try-catch handles the error. Since the py launcher (tried first) already detects legitimate Store Python -- Store packages include py since Python 3.11 -- filtering WindowsApps in the python3/python fallback is safe and avoids the Store popup. --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-03-20 02:01:23 -07:00
Leo Borcherding	239ca98643	fix: detect AMD/no-NVIDIA GPU early in Windows installer and guard unsloth.exe existence (#4478 ) * fix(install.ps1): detect AMD/no-NVIDIA GPU early and guard unsloth.exe existence When a user has an AMD GPU (no nvidia-smi), uv's --torch-backend=auto resolves to CPU torch, which constrains the solver to unsloth==2024.8. That ancient release has no unsloth.exe CLI entry point, so the subsequent & \ studio setup call throws a confusing PowerShell 'module could not be loaded' CommandNotFoundException instead of a clear error. Two fixes: - Detect nvidia-smi early; if no NVIDIA GPU is found, print a clear error explaining AMD/Intel GPUs are unsupported and exit before wasting time installing the wrong package version. - Guard Test-Path \ before invoking it, so any future case where the CLI entry point is missing produces a readable error instead of a cryptic PowerShell exception. Fixes: unsloth_studio\Scripts\unsloth.exe CommandNotFoundException on AMD GPU systems (Windows). * fix(install.ps1): correct GPU support message - AMD is Linux-only via ROCm * Slim down to just the unsloth.exe existence guard Remove the early NVIDIA GPU detection gate -- Studio supports Windows and Mac without a GPU (finetuning is simply disabled). The GPU gate was blocking legitimate non-NVIDIA users from installing. Keep only the Test-Path guard on unsloth.exe before invoking it. This turns the confusing PowerShell CommandNotFoundException into a clear error message pointing at the likely cause (older unsloth version resolved by the package solver that does not include the Studio CLI). * Fix quickstart link in unsloth.exe guard message --------- Co-authored-by: LeoBorcherding <LeoBorcherding@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-20 01:48:45 -07:00
Roland Tannous	ebe45981dd	`feat: support GGUF export for non-PEFT models + fix venv_t5 switching for local checkpoints` (#4455 ) * feat: support full model GGUF export, disable incompatible methods in UI * fix: resolve base model from config.json for venv_t5 export switching * feat: detect BNB-quantized models and disable all export methods for quantized non-PEFT checkpoints * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: relocate Ollama Modelfile alongside GGUFs during non-PEFT export cleanup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-20 12:13:18 +04:00
Manan Shah	be901ecdea	Adding launch command to install scripts (#4477 ) * Adding launch command to install scripts * Making launch only for interactive env	2026-03-20 10:45:33 +04:00
Michael Han	07aabf45c0	Update Install instructions.md	2026-03-19 21:51:10 -07:00
Daniel Han	d0e5a1d61e	Fix macOS install.sh: stdin consumption and Python discovery (#4472 ) * Fix macOS install.sh: stdin consumption and Python discovery Two issues when running `curl \| sh` on macOS: 1. Commands like `brew install` consume bytes from the piped stdin, causing the shell to lose its place in the script. The remaining source code gets printed as text instead of being executed, so users have to run the installer twice. Fixed by redirecting stdin from /dev/null for brew, apt-get, xcode-select, and the uv installer subprocess. 2. setup.sh searches for Python 3.11-3.13 on the system PATH via `compgen -c`. On macOS systems that only have Python 3.9 and/or 3.14, this fails with "No Python version between 3.11 and 3.13 found" even though uv already installed Python 3.13 into the venv. Fixed by adding the venv's bin/ to PATH before invoking `unsloth studio setup`. * Guard PATH export against empty VENV_ABS_BIN If cd into the venv bin/ fails, VENV_ABS_BIN would be empty and PATH would start with ":", causing the current directory to be searched for executables. Wrap the export in a non-empty check.	2026-03-19 11:52:32 -07:00
Michael Han	29270a3726	Data recipes now works for Mac and CPU.md	2026-03-19 07:26:28 -07:00
Daniel Han	3faa9af148	Update _utils.py	2026-03-19 02:31:45 -07:00
Daniel Han	709a611356	Update README.md	2026-03-19 02:28:53 -07:00
Daniel Han	074a07981e	Merge branch 'main' of https://github.com/unslothai/unsloth	2026-03-19 02:26:46 -07:00
Daniel Han	2b8bfa5b19	Update README.md	2026-03-19 02:26:18 -07:00
Datta Nimmaturi	729a0cb0ae	[studio] full finetuning studio (#4461 ) * full finetuning studio * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update studio/backend/core/training/trainer.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-19 02:18:46 -07:00
Manan Shah	6f129a214b	Fix Install commands for Windows + 1 line installs (#4447 ) * One liner setup for unsloth studio * Fix install scripts: system deps, activation bugs, curl/wget support - install.sh: detect platform (macOS/Linux/WSL) and check for missing system dependencies (cmake, git, build-essential, libcurl4-openssl-dev). Prompt user once for permission to install all missing packages via brew (macOS) or sudo apt-get (Linux/WSL). Add wget fallback via download() helper since curl is not always present on minimal Linux installs. Fix nested curl\|sh stdin stealing by downloading uv installer to a tempfile first. Replace venv activation (no-op in a pipe subshell) with explicit --python flag for uv pip install and direct venv binary invocation. Add idempotency guard for venv creation. Redirect stdin on unsloth studio setup to prevent pipe consumption. On macOS, check for Xcode Command Line Tools and trigger install if missing. - install.ps1: wrap script body in Install-UnslothStudio function so that errors use return instead of exit (exit kills the terminal when run via irm\|iex). Remove activate.ps1 invocation entirely -- use explicit --python path for uv pip install and & $UnslothExe for studio setup. This avoids both the child-scope activation bug (& vs dot-source) and the execution policy error on default Windows systems. Add winget availability check with clear error message. Fix PATH refresh to append registry paths instead of replacing the session PATH. Add uv installer fallback via astral.sh PowerShell script if winget install does not put uv on PATH. Broaden Python version check to accept 3.11-3.13. Add idempotency guard for venv creation. - README.md: add wget one-liner alternative for systems without curl. * Fix Tailwind CSS v4 .gitignore bug on Windows (#4444) - Add .gitignore hiding workaround to setup.ps1 (matching existing setup.sh logic) so venv .gitignore files containing "" don't prevent Tailwind's oxide scanner from finding .tsx source files - Add CSS size validation to setup.sh, setup.ps1, and build.sh to catch truncated Tailwind builds early - Remove stray force-rebuild overrides that made the "skip build if current" cache check dead code in both setup scripts - Add rm -rf dist to build.sh to force clean rebuilds for wheel packaging Change default port 8000 to 8888, fix installer bugs, improve UX - Change default Studio port from 8000 to 8888 across all entry points (run.py, studio.py, ui.py, colab.py, vite.config.ts, setup scripts) - Update launch banner: "Launching with studio venv..." to "Launching Unsloth Studio... Please wait..." - Add "Open your web browser" banner and rename labels (Local -> Local Access, External -> Worldwide Web Address) - Fix venv idempotency: check for bin/python instead of just directory existence, clean up partial venvs on retry - Fix build.sh CSS validation: handle empty CSS case that silently bypassed the check with "integer expression expected" - Fix install.sh sudo handling: try apt-get without sudo first (works when root), then escalate with per-package tracking and user prompt - Fix install.ps1: check exit code from studio setup, fail on error - Add pciutils to WSL GGUF build dependencies - Apply same smart apt-get escalation pattern to studio/setup.sh * Use detected Python version for venv, abort on non-apt Linux - install.ps1: detect existing Python 3.11/3.12/3.13 and use that version for venv creation instead of always forcing 3.13 - install.sh: exit with error on non-apt Linux distros when required packages cannot be auto-installed, instead of silently continuing * Make sudo permission prompt more prominent with warning banner * Add Accept [Y/n] sudo prompt to studio/setup.sh for consistency * Fix native command exit code handling and sudo decline flow install.ps1: Add $LASTEXITCODE checks after winget (Python), uv venv, and uv pip install calls. $ErrorActionPreference only catches PowerShell cmdlet errors, not native executable failures. The Python check also handles winget returning non-zero for "already installed". setup.sh: Skip llama-server build when user declines sudo or sudo is unavailable. Previously the script continued to section 8 which would fail with confusing errors (e.g. "gcc: command not found") since build-essential was never installed. * Move rm -rf llama.cpp inside build branch to preserve existing install When _SKIP_GGUF_BUILD is set (user declined sudo or sudo unavailable), the previous rm -rf would destroy an already-working llama-server before the skip check ran. Move it inside the else branch so existing builds are preserved when the rebuild is skipped. --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-19 02:09:09 -07:00
Wasim Yousef Said	6c2bfebb20	fix(studio): mobile navbar layout and chat settings sheet (#4458 ) * fix(studio): mobile navbar layout and chat settings sheet * fix(studio): portal select dropdowns inside sheet modal subtree	2026-03-19 02:04:53 -07:00
Manan Shah	72b768e0be	Fixing Qwen3.5 bug and adding Outetts dependencies (#4459 ) * Fixing Qwen3.5 bug and adding Outetts dependencies * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestion from @danielhanchen --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-19 01:52:07 -07:00
Manan Shah	e793c378db	turning data recipes on for mac (#4454 )	2026-03-19 11:11:27 +04:00
Michael Han	e6a42d0073	Update Install instructions.md	2026-03-18 20:13:48 -07:00
Michael Han	6f9d8ad4c3	Add BETA in README.md	2026-03-18 17:15:10 -07:00
Daniel Han	8b4a0f2191	Update README.md	2026-03-18 11:13:12 -07:00
Daniel Han	e0a9e772d1	Update README.md	2026-03-18 09:57:44 -07:00
Datta Nimmaturi	d4c8c0cb84	Make instructions mac friendly (#4432 )	2026-03-18 09:48:02 -07:00
Daniel Han	28407a1742	Update _utils.py	2026-03-18 09:10:36 -07:00
Daniel Han	8582ce3e9c	Fix studio chat crash on Mac: vendor check_signal_escape_patterns (#4431 ) * Fix studio crash on Mac: vendor check_signal_escape_patterns from unsloth_zoo Vendor the `check_signal_escape_patterns` function from `unsloth_zoo.rl_environments` directly into `tools.py`. The function is pure Python (only uses stdlib `ast`) and has zero GPU dependencies, but importing it from unsloth_zoo triggers `unsloth_zoo.__init__` which calls `get_device_type()` at module scope -- raising NotImplementedError on Apple Silicon Macs. By vendoring the code, the safety checks still run on all platforms (Mac, Linux, Windows) without needing unsloth_zoo at all. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-18 09:10:13 -07:00
Daniel Han	e38212281a	Fix TypeScript build errors in studio frontend (#4429 ) - tool-ui-python.tsx: use explicit tuple type instead of `as const` to match the mutable `[BundledTheme, BundledTheme]` expected by Streamdown - chat-adapter.ts: add missing `argsText` field required by ToolCallMessagePart and fix `args` type to use ReadonlyJSONObject	2026-03-18 08:44:38 -07:00
Michael Han	7d270825fb	Update README.md	2026-03-18 08:30:53 -07:00
Daniel Han	596fae1de2	Update _utils.py	2026-03-18 08:29:09 -07:00
Daniel Han	9c95148045	Fix tool call parsing, add tool outputs panel and UI improvements (#4416 ) * Add elapsed timer to tool status pill in Studio Show a count-up seconds timer (0s, 1s, 2s, ...) next to the tool status text in the composer area. Helps users gauge how long a tool call (web search, code execution) has been running. Timer resets when a new tool starts and disappears when all tools finish. * Fix tool call parsing, add tool outputs panel and reasoning copy button Backend: - Rewrite tool call XML parser to use balanced-brace JSON extraction instead of greedy regex, fixing truncation on nested braces in code/JSON arguments - Handle optional closing tags (</tool_call>, </function>, </parameter>) that models frequently omit - Support bare <function=...> tags without <tool_call> wrapper - Strip tool call markup from streamed content so raw XML never leaks into the chat UI - Use a persistent ~/studio_sandbox/ working directory for tool execution so files persist across calls within a session - Emit tool_start/tool_end SSE events so the frontend can display tool inputs and outputs Frontend: - Add collapsible "Tool Outputs" panel below assistant messages showing each tool call's input and output with copy buttons - Add copy button to reasoning blocks - Add elapsed timer to tool status pill - Update project URLs in pyproject.toml (http -> https, add docs link) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add interactive HTML preview with fullscreen toggle for code blocks HTML code fences now render an interactive sandboxed iframe preview below the syntax-highlighted code, similar to how SVG fences show an image preview. The iframe uses sandbox="allow-scripts" to allow JavaScript execution while blocking access to the parent page. Includes a fullscreen toggle (enlarge/minimize button) that expands the preview into a viewport overlay, dismissible via button, Escape key, or backdrop click. A streaming placeholder prevents partial HTML from rendering mid-stream. * Add tool call settings: auto-heal toggle, max iterations, timeout Add three user-configurable tool call settings to the Studio Settings panel: - Auto Heal Tool Calls: toggle to control fallback XML parsing of malformed tool calls from model output (default: on) - Max Tool Calls Per Message: slider 0-40 + Max to cap tool call iterations per message (default: 10) - Max Tool Call Duration: slider 1-30 minutes + Max to set per-tool-call execution timeout (default: 5 minutes) All settings persist to localStorage and flow through the full stack: frontend store -> API request -> Pydantic model -> route -> llama_cpp -> tools. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tool call timeout: respect no-limit and apply to web search - Use a sentinel to distinguish timeout=None (no limit) from the default (300s). Previously None was silently replaced with _EXEC_TIMEOUT. - Pass the configured timeout to DDGS() for web searches so the setting applies uniformly to all tool types. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add input validation bounds and per-thread sandbox isolation - Add ge=0 constraint to max_tool_calls_per_message (rejects negative values) - Add ge=1 constraint to tool_call_timeout (minimum 1 second) - Thread session_id from frontend through backend to tool execution - Scope sandbox directories per conversation: ~/studio_sandbox/{thread_id}/ - Backwards compatible: API callers without session_id use ~/studio_sandbox/ * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix non-monotonic streaming and Python temp script path - Split tool markup stripping into closed-only (mid-stream) and full (final flush) to prevent cumulative text from shrinking mid-stream - Enforce monotonicity: only emit when cleaned text grows, so the proxy's delta logic (cumulative[len(prev_text):]) never breaks - Place Python temp scripts in the sandbox workdir instead of /tmp so sys.path[0] points to the sandbox and cross-call imports work * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Sanitize session_id to prevent path traversal in sandbox Strip path separators and parent-dir references from session_id before using it as a directory name. Verify the resolved path stays under ~/studio_sandbox/ as a second guard. * feat(chat): proper assistant-ui tool call UIs with sources Replace custom metadata-based ToolOutputsGroup with native assistant-ui tool-call content parts. Backend SSE tool_start/tool_end events now emit proper { type: "tool-call" } parts from the adapter, enabling per-tool UIs registered via tools.by_name in MessagePrimitive.Parts. - Web search: Globe icon, Source badges with favicons, auto-collapse when LLM starts responding - Python: Code icon, syntax-highlighted code via Streamdown/shiki, output block with copy - Terminal: Terminal icon, command in trigger, output with copy - ToolGroup wraps consecutive tool calls (skips for single calls) - Sources component renders URL badges at end of message - Flattened code block CSS (single border, no nested boxes) * fix(inference): respect empty enabled_tools allowlist `if payload.enabled_tools:` is falsy for [], falling through to ALL_TOOLS. Use `is not None` so an explicit empty list disables all tools as intended. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Shine1i <wasimysdev@gmail.com>	2026-03-18 08:28:02 -07:00
Daniel Han	11b5e7abf3	Update README.md	2026-03-18 08:15:07 -07:00
Daniel Han	d45abae5b3	Update README.md	2026-03-18 08:12:20 -07:00
Daniel Han	7ddb660b0c	revert: always rebuild frontend, override caching with _NEED_FRONTEND_BUILD=true (#4427 ) * revert: remove frontend build caching from setup scripts The mtime-based caching introduced in #4404/#4413 can incorrectly skip frontend builds -- e.g. after git pull when filesystem timestamps are not preserved, or after our Tailwind v4 discovery that the site-packages .gitignore must be hidden before vite build (which the cached path doesn't handle). Always rebuild the frontend on setup. The build takes ~15s and is safer than risking a stale dist/. * revert: disable frontend build caching, keep code commented out Caching disabled by always setting _NEED_FRONTEND_BUILD=true. The mtime-based logic is preserved in comments for future re-enabling. Reasons for disabling: - Git does not preserve file timestamps, so cached dist/ can appear newer than freshly checked-out source after a pull - Tailwind v4 requires hiding site-packages/.gitignore before vite build; the cache path bypasses this, producing broken CSS * revert: always rebuild frontend, remove mtime caching * revert: always rebuild frontend, override caching with _NEED_FRONTEND_BUILD=true	2026-03-18 07:37:53 -07:00
Daniel Han	2a7646c4ca	Update README.md	2026-03-18 07:27:04 -07:00
Daniel Han	e9fa12acd3	Update pyproject.toml	2026-03-18 07:26:40 -07:00
Daniel Han	1ab020115e	Update pyproject.toml	2026-03-18 07:17:20 -07:00
Daniel Han	6bf81e4a48	Update README.md	2026-03-18 06:59:37 -07:00
Daniel Han	38217bcdcc	Update README.md	2026-03-18 06:58:42 -07:00
Daniel Han	9c89d7b22b	Update README.md	2026-03-18 06:52:27 -07:00
Daniel Han	7517e0fb2f	Update README.md	2026-03-18 06:33:54 -07:00
Daniel Han	52f9c30513	fix: exclude nemotron_h from flex_attention (#4424 ) * fix: exclude nemotron_h from flex_attention NemotronHForCausalLM does not support flex_attention and raises: NotImplementedError: NemotronHForCausalLM does not support an attention implementation through torch's flex_attention. Add nemotron_h to the exclusion list alongside gpt_oss and mllama so Unsloth falls back to the default attention implementation. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-18 06:11:11 -07:00
Wasim Yousef Said	51c08ded9b	fix(studio): deduplicate context length validation and sync input with store (#4423 )	2026-03-18 06:06:54 -07:00
Daniel Han	95bfc50b35	Fix inference stall during prefill (retry storm) (#4409 ) * Fix inference stall during prefill by removing retry storm The _stream_with_retry method used a 0.5s read timeout and retried by sending a brand new POST request each time. During prompt prefill (which can take 5-30+ seconds for long contexts or reasoning models), this caused 10-60 duplicate requests that forced llama-server to restart processing from scratch each time, resulting in 10-20s stalls visible as "Generating" with no progress in the UI. Fix: send the request ONCE with a 120s read timeout for the initial response headers. Cancel support during the prefill wait is handled by a background thread that monitors cancel_event (checked every 0.3s) and closes the response to unblock the httpx read immediately. This preserves the ability to stop/cancel/refresh during generation. The existing 0.5s timeout on the httpx.Client is still used by _iter_text_cancellable for per-token cancel checking during streaming (after prefill), which is unaffected by this change. * Fix race in cancel watcher when response is not yet created When cancel_event fires before client.stream() returns (response is still None), the watcher would hit return and exit without closing anything. The main thread stays blocked for up to 120s. Fix: after cancel is requested, keep polling _response_ref every 0.1s until the response object appears (then close it) or _cancel_closed is set (main thread finished on its own). * Minor cleanup: remove redundant None check, add debug logging in cancel watcher Address Gemini review: cancel_event is guaranteed non-None when the watcher thread runs, and logging the close exception aids debugging. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Retry r.close() on failure instead of giving up If r.close() raises, stay in the polling loop and retry rather than returning and leaving the main thread blocked for up to 120s. * fix: keep short read timeout during token streaming The prefill_timeout (read=120s) was passed to client.stream(), which applied to ALL reads -- not just the initial response headers. This meant _iter_text_cancellable's ReadTimeout-based cancel checking was broken during token streaming: the Stop button could take up to 120s to respond instead of 0.5s. Fix: keep the client's short read timeout (0.5s) for the stream call. During prefill, catch ReadTimeout in a loop and re-check cancel_event instead of re-sending the POST (which was the original retry storm). Once the first bytes arrive, yield the response with a PrependStream wrapper so iter_text() sees the buffered first chunk. This preserves both: - Fast cancel during prefill (via cancel watcher + ReadTimeout loop) - Fast cancel during streaming (via _iter_text_cancellable's 0.5s ReadTimeout, which now fires correctly again) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: swap to short-timeout stream after prefill completes Address two review issues: 1. _PrependStream did not inherit from httpx.SyncByteStream, so Response.iter_raw() would raise RuntimeError. Replaced with a _ShortTimeoutStream that inherits SyncByteStream properly. 2. client.stream() entry itself raises ReadTimeout during slow prefill (before headers arrive). The previous fix tried to catch this at the body-read level but missed the connection-level timeout. New approach: keep the 120s read timeout for client.stream() so the connection survives long prefills. Once headers arrive, replace the response stream with _ShortTimeoutStream -- a wrapper that uses a background reader thread and a Queue with a short get() timeout to re-raise ReadTimeout at the original 0.5s interval. This way _iter_text_cancellable's cancel-checking remains responsive during token streaming while prefill gets the long timeout it needs. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: move _ShortTimeoutStream before LlamaCppBackend class The class was placed inside LlamaCppBackend's body, splitting the class in two and making _codec_mgr and other attributes unreachable. Move it to module level before LlamaCppBackend. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: remove _ShortTimeoutStream, use watcher for all cancel _ShortTimeoutStream had two critical issues: 1. Raising ReadTimeout from a generator kills it -- Python finalizes generators after an uncaught exception, so the next next() call hits StopIteration and streaming ends mid-response. 2. The unbounded Queue in the background reader loses backpressure, causing memory spikes with slow clients. Simpler approach: use the 120s read timeout for the entire stream and rely on the cancel watcher thread for all cancellation (both prefill and streaming). The watcher closes the response on cancel_event, which unblocks any blocking httpx read within ~0.3s. This eliminates the need for short timeout tricks entirely. Cancel latency: - Prefill: ~0.3s (watcher polls cancel_event every 0.3s) - Streaming: ~0.3s (same watcher mechanism) - Both faster than the old 0.5s ReadTimeout approach * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docs: clarify cancel limitations in _stream_with_retry The docstrings claimed ~0.3s cancel in all cases, but httpx cannot interrupt a blocked read before the response object exists. Update the docstrings to accurately describe the behavior: - Cancel during prefill (header wait) is deferred until headers arrive - Cancel during streaming works via response.close() from the watcher - _iter_text_cancellable docstring updated to reflect the watcher-based cancel mechanism instead of the old ReadTimeout polling --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-18 05:10:32 -07:00
Michael Han	0922a2bb17	Update README.md	2026-03-18 04:21:18 -07:00
Daniel Han	1f12ba16df	Combine studio setup fixes: frontend caching, venv isolation, Windows CPU support (#4413 ) * Allow Windows setup to complete without NVIDIA GPU setup.ps1 previously hard-exited if nvidia-smi was not found, blocking setup entirely on CPU-only or non-NVIDIA machines. The backend already supports CPU and MLX (Apple Silicon) in chat-only GGUF mode, and the Linux/Mac setup.sh handles missing GPUs gracefully. Changes: - Convert the GPU check from a hard exit to a warning - Guard CUDA toolkit installation behind $HasNvidiaSmi - Install CPU-only PyTorch when no GPU is detected - Build llama.cpp without CUDA flags when no GPU is present - Update doc comment to reflect CPU support * Cache frontend build across setup runs Skip the frontend npm install + build if frontend/dist already exists. Previously setup.ps1 nuked node_modules and package-lock.json on every run, and both scripts always rebuilt even when dist/ was already present. On a git clone editable install, the first setup run still builds the frontend as before. Subsequent runs skip it, saving several minutes. To force a rebuild, delete frontend/dist and re-run setup. * Show pip progress for PyTorch download on Windows The torch CUDA wheel is ~2.8 GB and the CPU wheel is ~300 MB. With \| Out-Null suppressing all output, the install appeared completely frozen with no feedback. Remove \| Out-Null for the torch install lines so pip's download progress bar is visible. Add a size hint so users know the download is expected to take a while. Also moves the Triton success message inside the GPU branch so it only prints when Triton was actually installed. * Guard CUDA env re-sanitization behind GPU check in llama.cpp build The CUDA_PATH re-sanitization block (lines 1020-1033) references $CudaToolkitRoot which is only set when $HasNvidiaSmi is true and the CUDA Toolkit section runs. On CPU-only machines, $CudaToolkitRoot is null, causing Split-Path to throw: Split-Path : Cannot bind argument to parameter 'Path' because it is null. Wrap the entire block in `if ($HasNvidiaSmi -and $CudaToolkitRoot)`. * Rebuild frontend when source files are newer than dist/ Instead of only checking if dist/ exists, compare source file timestamps against the dist/ directory. If any file in frontend/src/ is newer than dist/, trigger a rebuild. This handles the case where a developer pulls new frontend changes and re-runs setup -- stale assets get rebuilt automatically. * Fix cmake not found on Windows after winget install Two issues fixed: 1. After winget installs cmake, Refresh-Environment may not pick up the new PATH entry (MSI PATH changes sometimes need a new shell). Added a fallback that probes cmake's default install locations (Program Files, LocalAppData) and adds the directory to PATH explicitly if found. 2. If cmake is still unavailable when the llama.cpp build starts (e.g. winget failed silently or PATH was not updated), the build now skips gracefully with a [SKIP] warning instead of crashing with "cmake : The term 'cmake' is not recognized". * Fix frontend rebuild detection and decouple oxc-validator install Address review feedback: - Check entire frontend/ directory for changes, not just src/. The build also depends on package.json, vite.config.ts, tailwind.config.ts, public/, and other config files. A change to any of these now triggers a rebuild. - Move oxc-validator npm install outside the frontend build gate in setup.sh so it always runs on setup, matching setup.ps1 which already had it outside the gate. * Show cmake errors on failure and retry CUDA VS integration with elevation Two fixes for issue #4405 (Windows setup fails at cmake configure): 1. cmake configure: capture output and display it on failure instead of piping to Out-Null. When the error mentions "No CUDA toolset found", print a hint about the CUDA VS integration files. 2. CUDA VS integration copy: when the direct Copy-Item fails (needs admin access to write to Program Files), retry with Start-Process -Verb RunAs to prompt for elevation. This is the root cause of the "No CUDA toolset found" cmake failure -- the .targets files that let MSBuild compile .cu files are missing from the VS BuildCustomizations directory. * Address reviewer feedback: cmake PATH persistence, stale cache, torch error check 1. Persist cmake PATH to user registry so Refresh-Environment cannot drop it later in the same setup run. Previously the process-only PATH addition at phase 1 could vanish when Refresh-Environment rebuilt PATH from registry during phase 2/3 installs. 2. Clean stale CMake cache before configure. If a previous run built with CUDA and the user reruns without a GPU (or vice versa), the cached GGML_CUDA value would persist. Now the build dir is removed before configure. 3. Explicitly set -DGGML_CUDA=OFF for CPU-only builds instead of just omitting CUDA flags. This prevents cmake from auto-detecting a partial CUDA installation. 4. Fix CUDA cmake flag indentation -- was misaligned from the original PR, now consistently indented inside the if/else block. 5. Fail hard if pip install torch returns a non-zero exit code instead of silently continuing with a broken environment. * Remove extra CUDA cmake flags to align Windows with Linux build Drop GGML_CUDA_FA_ALL_QUANTS, GGML_CUDA_F16, GGML_CUDA_GRAPHS, GGML_CUDA_FORCE_CUBLAS, and GGML_CUDA_PEER_MAX_BATCH_SIZE flags. The Linux build in setup.sh only sets GGML_CUDA=ON and lets llama.cpp use its defaults for everything else. Keep Windows consistent. * Address reviewer round 2: GPU probe fallback, Triton check, stale binary rebuild 1. GPU detection: fallback to default nvidia-smi install locations (Program Files\NVIDIA Corporation\NVSMI, System32) when nvidia-smi is not on PATH. Prevents silent CPU-only provisioning on machines that have a GPU but a broken PATH. 2. Triton: check $LASTEXITCODE after pip install and print [WARN] on failure instead of unconditional [OK]. 3. Stale llama-server: check CMakeCache.txt for GGML_CUDA setting and rebuild if the existing binary does not match the current GPU mode (e.g. CUDA binary on a now-CPU-only rerun, or vice versa). * Fix frontend rebuild detection and npm dependency issues Addresses reviewer feedback on the frontend caching logic: 1. setup.sh: Fix broken find command that caused exit under pipefail. The piped `find \| xargs find -newer` had paths after the expression which GNU find rejects. Replaced with a simpler `find -maxdepth 1 -type f -newer dist/` that checks ALL top-level files (catches index.html, bun.lock, etc. that the extension allowlist missed). 2. setup.sh: Guard oxc-validator npm install behind `command -v npm` check. When the frontend build is skipped (dist/ is cached), Node bootstrap is also skipped, so npm may not be available. 3. setup.ps1: Replace Get-ChildItem -Include with explicit path probing for src/ and public/. PowerShell's -Include without a trailing wildcard silently returns nothing, so src/public changes were never detected. Also check ALL top-level files instead of just .json/.ts/.js/.mjs extensions. * Fix studio setup: venv isolation, centralized .venv_t5, uv targeting - All platforms (including Colab) now create ~/.unsloth/studio/.venv with --without-pip fallback for broken ensurepip environments - Add --python sys.executable to uv pip install in install_python_stack.py so uv targets the correct venv instead of system Python - Centralize .venv_t5 bootstrap in transformers_version.py with proper validation (checks required packages exist, not just non-empty dir) - Replace ~150 lines of duplicated install code across 3 worker files with calls to the shared _ensure_venv_t5_exists() helper - Use uv-if-present with pip fallback; do not install uv at runtime - Add site.addsitedir() shim in colab.py so notebook cells can import studio packages from the venv without system-Python double-install - Update .venv_t5 packages: huggingface_hub 1.3.0->1.7.1, add hf_xet - Bump transformers pin 4.57.1->4.57.6 in requirements + constraints - Add Fast-Install helper to setup.ps1 with uv+pip fallback - Keep Colab-specific completion banner in setup.sh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix nvidia-smi PATH persistence and cmake requirement for CPU-only 1. Store nvidia-smi as an absolute path ($NvidiaSmiExe) on first detection. All later calls (Get-CudaComputeCapability, Get-PytorchCudaTag, CUDA toolkit detection) use this absolute path instead of relying on PATH. This survives Refresh-Environment which rebuilds PATH from the registry and drops process-only additions. 2. Make cmake fatal for CPU-only installs. CPU-only machines depend entirely on llama-server for GGUF chat mode, so reporting "Setup Complete!" without it is misleading. GPU machines can still skip the llama-server build since they have other inference paths. * Fix broken frontend freshness detection in setup scripts - setup.sh: Replace broken `find \| xargs find -newer` pipeline with single `find ... -newer` call. The old pipeline produced "paths must precede expression" errors (silently suppressed by 2>/dev/null), causing top-level config changes to never trigger a rebuild. - setup.sh: Add `command -v npm` guard to oxc-validator block so it does not fail when Node was not installed (build-skip path). - setup.ps1: Replace `Get-ChildItem -Include` (unreliable without -Recurse on PS 5.1) with explicit directory paths for src/ and public/ scanning. - Both: Add .html to tracked file patterns so index.html (Vite entry point) changes trigger a rebuild. - Both: Use -print -quit instead of piping to head -1 for efficiency. Fix bugs found during review of PRs #4404, #4400, #4399 - setup.sh: Add \|\| true guard to find command that checks frontend/src and frontend/public dirs, preventing script abort under set -euo pipefail when either directory is missing - colab.py: Use sys.path.insert(0, ...) instead of site.addsitedir() so Studio venv packages take priority over system copies. Add warning when venv is missing instead of silently failing. - transformers_version.py: _venv_t5_is_valid() now checks installed package versions via .dist-info metadata, not just directory presence. Prevents false positives from stale or wrong-version packages. - transformers_version.py: _install_to_venv_t5() now passes --upgrade so pip replaces existing stale packages in the target directory. - setup.ps1: CPU-only PyTorch install uses --index-url for cpu wheel and all install commands use Fast-Install (uv with pip fallback). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix _venv_t5_is_valid dist-info loop exiting after first directory Remove premature break that caused the loop over .dist-info directories to exit after the first match even if it had no METADATA file. Now continues iterating until a valid METADATA is found or all dirs are exhausted. * Capture error output on failure instead of discarding with Out-Null setup.ps1: 6 locations changed from `\| Out-Null` to `\| Out-String` with output shown on failure -- PyTorch GPU/CPU install, Triton install, venv_t5 package loop, cmake llama-server and llama-quantize builds. transformers_version.py: clean stale .venv_t5 directory before reinstall when validation detects missing or version-mismatched packages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix ModuleNotFoundError when CLI imports studio.backend.core The backend uses bare "from utils." imports everywhere, relying on backend/ being on sys.path. Workers and routes add it at startup, but the CLI imports studio.backend.core as a package -- backend/ was never added. Add sys.path setup at the top of core/__init__.py so lazy imports resolve correctly regardless of entry point. Fixes: unsloth inference unsloth/Qwen3-8B "who are you" crashing with "No module named 'utils'" Fix frontend freshness check to detect all top-level file changes The extension allowlist (.json, .ts, .js, .mjs, .html) missed files like bun.lock, so lockfile-only dependency changes could skip the frontend rebuild. Check all top-level files instead. Add tiktoken to .venv_t5 for Qwen-family tokenizers Qwen models use tiktoken-based tokenizers which fail when routed through the transformers 5.x overlay without tiktoken installed. Add it to the setup scripts (with deps for Windows) and runtime fallback list. Integrates PR #4418. * Fix tiktoken crash in _venv_t5_is_valid and stray brace in setup.ps1 _venv_t5_is_valid() crashed with ValueError on unpinned packages like "tiktoken" (no ==version). Handle by splitting safely and skipping version check for unpinned packages (existence check only). Also remove stray closing brace in setup.ps1 tiktoken install block. --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-18 03:52:25 -07:00
Wasim Yousef Said	7b07ad0fa3	fix(studio): UI fixes for chat and studio routes (#4419 ) Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-18 03:47:13 -07:00
Daniel Han	65acefd2a6	feat(studio): infinite scroll for recommended models list (#4414 ) * feat(studio): infinite scroll for recommended models list The model selector showed a hard cap of 4 GGUFs + 4 safetensors in the Recommended section. Users who wanted to browse more had to search manually on Hugging Face. Backend: increase the default model pool from 8+8 to 40+40 (the HF fetch already pulls 80, so no extra network cost). Frontend: replace the static 4+4 cap with on-demand lazy loading. A page counter tracks how many groups of 4 to show per category. An IntersectionObserver on a sentinel div at the bottom of the list increments the page when the user scrolls down. Models are interleaved in groups of 4 GGUFs then 4 hub models per page for a balanced view. Key implementation details: - Callback ref for the sentinel so the observer attaches reliably on first popover open (useRef would miss the initial mount) - Observer disconnects after each fire and re-attaches via useEffect with a 100ms layout delay to prevent runaway page loading - VRAM info fetched incrementally via useRecommendedModelVram on the visible slice only - recommendedSet uses visible IDs so HF search dedup stays correct * refactor: address review feedback on recommended infinite scroll - Simplify visibleRecommendedIds: use findIndex to locate the GGUF/hub split point instead of re-filtering the entire array each time. recommendedIds is already sorted GGUF-first, so a single slice is enough. - Fix VRAM refetch churn: pass the full recommendedIds (stable across page increments) to useRecommendedModelVram instead of the growing visibleRecommendedIds slice. The hook derives its stableKey from the sorted+joined input, so passing the same pool on every page avoids redundant HF modelInfo requests.	2026-03-18 03:17:01 -07:00
Michael Han	67d3519cab	Update README.md	2026-03-17 23:04:54 -07:00
Daniel Han	767c31b0e0	Update README.md	2026-03-17 22:53:11 -07:00
Daniel Han	24753290ba	Update README.md	2026-03-17 22:50:55 -07:00
Lee Jackson	9232126734	fix(studio): use explicit Cancel for model load toast (#4377 )	2026-03-17 22:39:51 -07:00
Daniel Han	f3f52e2d84	Use blobless clone in README install instructions (#4403 ) Reduces clone size from ~50MB to ~5MB by skipping blobs that are no longer in the current tree but still in git history.	2026-03-17 22:07:21 -07:00
Daniel Han	3a28446a54	Trim ~255 MB of unused packages from Studio setup (#4395 ) * Comment out large unused packages from Studio setup requirements Audited all packages installed by `unsloth studio setup` against actual imports in unsloth, unsloth_zoo, and studio/backend. The following have zero imports anywhere and are the largest offenders by disk size: - gradio (148 MB) in studio.txt -- Studio uses React + FastAPI, not Gradio - executorch (41.5 MB) in extras-no-deps.txt -- no imports found - scikit-learn (31.8 MB) in extras.txt -- no imports found - MeCab (19.9 MB) in extras.txt -- Japanese tokenizer, no imports found - coremltools (10.2 MB) in extras.txt -- Apple CoreML, no imports found - uroman (4.0 MB) in extras.txt -- romanization tool, no imports found Total savings: ~255 MB (~32% of the 805 MB installed by setup). Each line is commented out with the package size annotated so they can be re-enabled easily if needed in the future. * Restore scikit-learn -- needed by sentence_transformers sentence_transformers is installed with --no-deps in extras-no-deps.txt, so its sklearn dependency is not auto-resolved. Multiple modules in sentence_transformers import sklearn at the top level (evaluation, util/similarity), so removing scikit-learn would break embedding jobs.	2026-03-17 21:32:38 -07:00
Coenraad Loubser	ca87669937	Unused return value causes build failures (#4385 ) * Unused return value causes build failures * Update toast messages to include model loading status	2026-03-17 20:57:27 -07:00
DoubleMathew	fd72376a7e	Fix/studio full finetuning (#4391 ) * Wire Studio full finetuning into training loaders * Preserve load_model positional compatibility	2026-03-17 20:47:26 -07:00
Daniel Han	0c8d407793	Rename cli/ to unsloth_cli/ to fix namespace collision with stringzilla (#4393 ) * Rename cli/ to unsloth_cli/ to fix namespace collision with stringzilla stringzilla installs a namespace package at cli/ (cli/split.py, cli/wc.py) in site-packages without an __init__.py. When unsloth is installed as an editable package (pip install -e .), the entry point script does `from cli import app` which finds stringzilla's namespace cli/ first and fails with `ImportError: cannot import name 'app' from 'cli'`. Non-editable installs happened to work because unsloth's cli/__init__.py overwrites the namespace directory, but this is fragile and breaks if stringzilla is installed after unsloth. Renaming to unsloth_cli/ avoids the collision entirely and fixes both editable and non-editable install paths. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update stale cli/ references in comments and license files --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-17 20:40:21 -07:00
Michael Han	75da2e00c2	Update install instructions.md	2026-03-17 20:04:04 -07:00
Michael Han	8bca62aa78	Dual License clarification.md	2026-03-17 18:48:00 -07:00
Michael Han	e138a3d48b	Update install instructions.md	2026-03-17 16:08:40 -07:00
Wasim Yousef Said	03736a82ba	Relax frontend unused local check (#4388 )	2026-03-17 16:04:11 -07:00
Michael Han	523ebf1e2f	Update Unsloth_Studio_Colab.ipynb	2026-03-17 15:42:38 -07:00
Manan Shah	93ab09d195	[Feature] compare for 2 diff models (#4356 ) * compare for 2 diff models * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolving gemini comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(studio): refine model-load toast stop action and compare selector sizing (#4369) Co-authored-by: imagineer99 <samleejackson0@gmail.com> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: imagineer99 <samleejackson0@gmail.com>	2026-03-17 22:58:34 +04:00
Michael Han	2a8d6b2b82	Update Unsloth_Studio_Colab.ipynb	2026-03-17 11:31:09 -07:00
Michael Han	01dfbab5a8	Update Unsloth_Studio_Colab.ipynb	2026-03-17 11:27:04 -07:00
Michael Han	a46cb120bb	Update Unsloth_Studio_Colab.ipynb	2026-03-17 11:24:54 -07:00
Michael Han	051b6f27f9	Update Unsloth_Studio_Colab.ipynb	2026-03-17 11:16:20 -07:00
Michael Han	a943705b4c	Update Unsloth_Studio_Colab.ipynb	2026-03-17 11:14:45 -07:00
Michael Han	685a0348e1	Update Unsloth_Studio_Colab.ipynb	2026-03-17 10:00:57 -07:00
Michael Han	881e057964	Unsloth Studio update.md	2026-03-17 08:42:03 -07:00
Daniel Han	880b59a301	Update README.md	2026-03-17 08:03:32 -07:00
Michael Han	deb76dfa1d	Update README.md	2026-03-17 07:57:46 -07:00
Daniel Han	1fffd0e17a	Merge branch 'main' of https://github.com/unslothai/unsloth	2026-03-17 07:54:41 -07:00
Daniel Han	ebfaa18094	Update pyproject.toml	2026-03-17 07:54:32 -07:00
Michael Han	c60636695c	Unsloth Studio.md	2026-03-17 07:53:50 -07:00
Daniel Han	0acd1c7eec	studio: improve onboarding UX, tooltips, and training defaults (#4355 ) * studio: improve onboarding UX, tooltips, and training defaults - Change splash text to "Train and run LLMs locally" - Add "Chat Only" card with BubbleChatIcon to skip directly to chat - Add Skip/Skip to Chat buttons in sidebar and footer - Back button on step 1 returns to splash screen instead of being disabled - Change "Watch video guide" to "Get started with our guide" with new URL - Update intro text to mention all model types + chat - Make all tooltips clickable (in addition to hover) via React context - Strip surrounding quotes from pasted HF tokens - Rename "Eval Split" to "Evaluation Split" - Add SparklesIcon to "Auto Detect" format option - Change step 4 heading to "Choose your training parameters" - Default max_steps to 60 - Learning rate displayed in scientific notation with +/- stepper - Context length options capped by model's max_position_embeddings (via AutoConfig) - Fix "QLORA"/"LORA" to "QLoRA"/"LoRA" in summary step - Backend: add max_position_embeddings to model config endpoint * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * compare for 2 diff models * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolving gemini comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: disable thinking for Qwen3.5 <9B and always for AI Assist - Change Qwen3.5 thinking threshold from <=2B to <9B (0.8B, 2B, 4B all disable thinking by default; 9B+ enables it) - Always pass enable_thinking=False in AI Assist helper calls (_run_with_helper and _generate_with_backend) regardless of chat thinking settings * studio: address PR review comments - Extract _get_max_position_embeddings helper to DRY config extraction - Fix "Skip to Chat" to navigate to /chat on step 1 (was /studio) * fix: comment out debug print statements * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: skip Shiki highlighting for incomplete SVG code fences While streaming SVG content, the syntax highlighter (Shiki) re-parses the entire growing SVG on every token, blocking the main thread and freezing the code area until the fence closes. Show a plain-text preview for incomplete SVG fences instead, similar to how Mermaid diagrams show a placeholder while streaming. * studio: fix default top_k from 50/40 to 20 for chat inference Per Qwen3.5 docs (unsloth.ai/docs/models/qwen3.5), top_k should be 20 for both thinking and non-thinking modes. The model-specific config in inference_defaults.json already had top_k=20 for Qwen3.5, but the generic fallback defaults were wrong: - Frontend DEFAULT_INFERENCE_PARAMS.topK: 50 -> 20 - Backend generate_chat_completion top_k: 40 -> 20 - Backend generate_chat_completion_with_tools top_k: 40 -> 20 - Frontend title generation top_k: 40 -> 20 * studio: set universal inference defaults for unknown models Default params for any model without specific config: temperature=0.6, top_p=0.95, top_k=20, min_p=0.01, presence_penalty=0.0, repetition_penalty=1.0 Models with entries in inference_defaults.json (Qwen3.5, Gemma-3, Llama, etc.) override these with their recommended values. Updated in: frontend DEFAULT_INFERENCE_PARAMS, backend Pydantic request models, and backend generate_chat_completion defaults. * studio: only trust_remote_code for unsloth/ models in AutoConfig Only set trust_remote_code=True when the model name starts with "unsloth/". All other models default to False for safety. * studio: move Generating spinner above the composer The "Generating" spinner was below the send message bar, causing the bar to jump up and down. Move it above the composer in both the regular thread view and the welcome/empty view. * studio: adjust toast close button position away from edge Move the X close button on toasts (like "Starting model...") from top-1.5 to top-3 and add right-3, giving more breathing room from the top-right corner. * studio: make Think button smaller with tighter icon-text gap Reduce gap from 1.5 to 0.5, padding from px-2.5/py-1 to px-2/py-0.5, and icon from size-3.5 to size-3. * studio: multiple onboarding and chat UX improvements - Move Generating spinner above composer (fixes jumping send bar) - Make Think button smaller with tighter icon-text gap - Chat card now inside grid (same size as Audio/Embeddings cards) - Rename "Chat Only" to "Chat" - Chat card requires Continue to proceed (no auto-advance) - Continue on Chat selection skips onboarding and goes to /chat - Tooltip (i) click on Chat card doesn't trigger navigation - Step 1 footer Back button goes back to splash (label is "Back") - Splash "Skip Onboarding" renamed to "Skip to Chat", navigates to /chat - Toast close button moved away from edge * studio: align Skip to Chat button, add Skip to footer - Sidebar "Skip to Chat" now uses primary (green) Button style with arrow icon, full width, aligned like step items. Shows on all steps. - Footer: added "Skip" outline button next to Continue that goes directly to /studio with progress saved (markOnboardingDone) * studio: change default max steps from 30 to 60 in toggle hook The DEFAULT_MAX_STEPS in use-max-steps-epochs-toggle.ts was still 30, used as fallback when toggling from epochs back to max steps. * studio: extend context length options to 262K CONTEXT_LENGTHS now includes 65536, 131072, 262144 in addition to the existing 512-32768 range. The onboarding step filters these by the model's max_position_embeddings (e.g. Nemotron-3-Nano-4B has 262144), showing powers of 2 up to the model's maximum. * studio: auto-select LoRA vs QLoRA based on model size and GPU memory After selecting a model in onboarding, detect the total model weight file size from HF Hub (safetensors/bin files). Then estimate memory needed: model_size_gb * 1.5 * context_scale, where context_scale is: - <=8192 tokens: 1.0x - >8192 tokens: 1.7x - >=16384 tokens: 2.0x - >=32768 tokens: 4.0x If the estimate fits in free GPU VRAM, default to LoRA (16-bit). Otherwise default to QLoRA (4-bit). Backend changes: - Add model_size_bytes to ModelDetails (models.py) - Add _get_model_size_bytes() using HfApi.repo_info (routes/models.py) - Add vram_free_gb to get_gpu_summary (hardware.py) Frontend changes: - Add autoSelectTrainingMethod() in training-config-store.ts - Called after model defaults are loaded - Add model_size_bytes to ModelConfigResponse type - Add vramFreeGb to HardwareInfo hook * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: rename "Importing ML libraries..." to "Importing Unsloth..." * studio: show model/dataset in training status, fix LoRA/QLoRA casing - Training status now shows 'Training "model_name"' and 'Dataset = ...' instead of generic "Starting training..." - Fix Studio progress section to show QLoRA/LoRA instead of QLORA/LORA * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: rename 'Skip to Chat' to 'Skip Onboarding' on splash screen * studio: add presence_penalty support for chat inference Add presence_penalty as a parameter across the full stack: - Backend: llama_cpp.py generate_chat_completion/with_tools, Pydantic models (inference.py), routes/inference.py pass-through - Frontend: InferenceParams type, DEFAULT_INFERENCE_PARAMS (0.0), chat-adapter.ts payload, chat-settings-sheet.tsx slider (0-2), model defaults loading from inference_defaults.json - Set Qwen3.5 default presence_penalty to 1.5 per official docs - Default for unknown models is 0.0 (off) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: fix Chat card deselecting Text and aligning with other cards * studio: fix presence_penalty not loading from inference defaults The inference_config.py load_inference_config() was not including presence_penalty in the returned config dict, so the Qwen3.5 default of 1.5 from inference_defaults.json never reached the frontend. Added it to the config builder. * studio: add delete button for cached models in model selector Add trash icon on each downloaded model row (GGUF and safetensors) with confirmation dialog. Backend DELETE /api/models/delete-cached endpoint uses huggingface_hub scan_cache_dir + delete_revisions to cleanly remove cached repos, refusing if the model is currently loaded. * studio: restore inference defaults, reasoning, and tools on page refresh On page refresh with a model already loaded, the frontend was not re-applying model-specific inference defaults (presence_penalty, temperature, etc.) or restoring reasoning/tools support flags. Backend: Add inference config, supports_reasoning, supports_tools, and context_length to InferenceStatusResponse. Frontend: In the refresh callback, when an active model is detected, apply mergeRecommendedInference and restore reasoning/tools flags with proper Qwen3.5 size-based defaults. * studio: fix delete dialog closing before async completes Prevent AlertDialogAction's default close behavior with e.preventDefault() so the dialog stays open during deletion. Also block onOpenChange dismiss while deleting is in progress. * fix: add Dict and Any imports to inference models * studio: fix Qwen3.5 reasoning threshold in frontend load path The frontend loadModel handler had the old threshold (<=2) for disabling reasoning on small Qwen3.5 models. Changed to <9 to match the backend. This was causing 4B to not properly disable thinking by default when auto-loaded. * studio: move GGUF delete to per-variant level For GGUF repos, the trash icon now appears on each downloaded variant row inside the quantization expander instead of on the repo-level row. Backend accepts optional variant param to delete specific GGUF files (blob + symlink) rather than the entire repo cache. * studio: restore ggufContextLength on page refresh The Max Tokens slider was capped at 32768 on page refresh because ggufContextLength was not restored from the status response. Now set it from statusRes.context_length on reconnect. * fix: remove <think> from Qwen3.5 response template marker The train-on-responses-only feature uses template markers to find where the assistant response starts. The Qwen3.5 response marker included '<think>\n' which is only present when thinking mode is enabled. With thinking disabled (default for <9B), the marker never matched, causing 100% of samples to be dropped. Changed response marker from '<\|im_start\|>assistant\n<think>\n' to '<\|im_start\|>assistant\n' which works regardless of thinking mode. * studio: fix sloth ASCII art alignment in training overlay * fix: correct sloth ASCII art alignment to match Unsloth banner * studio: add Python and terminal tool calling to chat Register python and terminal tools alongside web search. Python executor validates imports (stdlib only) via unsloth_zoo rl_environments, runs code in a subprocess sandbox with 5-min timeout and cancel support. Terminal executor blocks dangerous commands (rm, sudo, etc.) and runs in a temp directory. Update llama_cpp tool loop to show tool-specific status messages and pass cancel_event through to executors. Rename composer toggle from "Search" to "Tools" and show TerminalIcon for execution status pills. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: fix Nemotron/transformers 5.x support, onboarding navigation, port binding Backend: - Dynamic transformers 5.x detection via tokenizer_config.json fetch (checks for TokenizersBackend class, cached per-model) - Bump transformers 5.x version from 5.2.0 to 5.3.0 across all workers, setup scripts (setup.sh, setup.ps1) - Auto-enable trust_remote_code for unsloth/* models needing transformers 5.x (workaround for NemotronH config parsing bug in transformers) - Auto-install mamba-ssm/causal-conv1d for SSM models (NemotronH, Falcon-H1) with --no-build-isolation --no-deps to avoid torch version conflicts - Add SO_REUSEADDR to port check in run.py (fixes Colab proxy stale connection falsely reporting port as in-use) Frontend: - Fix "Skip to Chat" navigation: use window.location.href instead of React Router navigate() to bypass useEffect redirect race - Fix "Skip Onboarding" on splash: navigates to /studio (not /chat) - Fix onboarding guard: only check isOnboardingDone() on initial mount - Fix Chat card on step 1: add sr-only spacer for consistent alignment - Fix Chat+Text both selected: clear RadioGroup value when Chat is selected * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: split tools toggle into Search and Code buttons Replace the single "Tools" toggle with two independent toggles: - "Search" (globe icon) enables web search only - "Code" (terminal icon) enables Python and terminal execution Add enabled_tools list field to the inference payload so the backend only registers the tools the user has toggled on. Both toggles appear in the main composer and the compare composer. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: fix tool calling import validation and error logging Replace unsloth_zoo-dependent import checker with a standalone ast-based validator using sys.stdlib_module_names. This properly blocks non-stdlib imports (numpy, requests, etc.) and returns a clear error message to the model so it can rewrite using only stdlib. Add full traceback to tool streaming error logs for debugging. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: parse gpt-oss harmony channels for clean safetensors chat output gpt-oss models emit multi-channel output via harmony protocol tokens (<\|channel\|>analysis<\|message\|>... and <\|channel\|>final<\|message\|>...). TextIteratorStreamer with skip_special_tokens=True strips the special tokens but leaves channel names concatenated with content, producing garbled output like "analysisWe need to...assistantfinalHello!". Add HarmonyTextStreamer that decodes with skip_special_tokens=False, parses harmony markup via regex, and emits <think>analysis</think> for the analysis channel and plain text for the final channel -- reusing the existing frontend reasoning UI. Also expose supports_reasoning=True for non-GGUF gpt-oss models in the /status endpoint so the frontend enables the Think toggle. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: use unsloth_zoo for Python sandbox validation Set UNSLOTH_IS_PRESENT=1 and import check_python_modules and check_signal_escape_patterns directly from unsloth_zoo instead of a standalone fallback. This gives us the full Unsloth validation including stdlib-only import checks and signal/timeout escape pattern detection. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: allow all imports in Python tool sandbox Remove stdlib-only import restriction. Keep signal escape pattern detection via unsloth_zoo for safety. * studio: fix ReadTimeout on tool streaming final pass The 0.5s read timeout used for cancel-checking during streaming also fires when waiting for the first response from llama-server (e.g. reasoning model thinking for 15+ seconds). Add _stream_with_retry() context manager that retries on ReadTimeout while checking cancel_event, so the model has unlimited time to think before producing the first token. Applied to both the regular streaming path and the tool-calling final pass. * fix: rewrite HarmonyTextStreamer with stateful incremental parsing The delta-on-transformed approach had two critical bugs: 1. Before the full <\|channel\|>X<\|message\|> pattern was complete, the strip-tokens fallback emitted "analysis" as plain text. Then when the regex matched, _transform returned a completely different format (<think>...</think>) and the delta was computed against the wrong base string, producing fragments like "think>", "nk>", ">". 2. Even with full matches, the closing </think> tag shifted position as content grew, so text[prev_len:] produced garbled deltas. Replace with stateful incremental parsing that: - Buffers until a complete channel+message pair is seen - Emits <think> once when analysis channel first appears - Streams analysis content deltas (computed on channel content directly) - Emits </think> once when final channel first appears - Streams final content deltas - Closes open think tags in end() Also skip the generic all_special_tokens stripping in _clean_generated_text for gpt-oss since HarmonyTextStreamer already produces clean output and the generic stripping was mangling <think> tags. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: strip all <\|...\|> tokens in gpt-oss cleanup, not just harmony subset The gpt-oss tokenizer has added tokens like <\|return\|> (id=200002) that are not part of the harmony channel protocol but can leak into output. The previous regex only stripped channel\|message\|start\|end tokens. Broaden the _clean_generated_text regex for gpt-oss to <\\|[a-z_]+\\|> which catches all pipe-delimited tokens (return, constrain, reserved, etc.) without matching <think>/<\/think> tags. Verified: gpt-oss all_special_tokens are only <\|return\|>, <\|reserved_200017\|>, <\|startoftext\|> -- none overlap with <think>. The harmony tokens (channel, message, start, end) are added_tokens but not in all_special_tokens. * fix: hide config-only model repos from cached models list Repos that only have metadata/config files cached (no .safetensors or .bin weight files) were showing up in the Downloaded list with tiny sizes like "1.8 KB" or "24 KB". These are just leftover config snapshots from architecture checks, not usable models. Filter the cached-models endpoint to only include repos that contain actual model weight files (.safetensors or .bin). * studio: fix toast description text contrast in dark mode Add explicit !text-muted-foreground to toast description classNames so secondary text (e.g. "Releases VRAM and resets inference state.") is readable in dark mode. * studio: fix Chat card icon alignment with size-4 spacer Replace sr-only span (takes no space) with a size-4 shrink-0 div matching the RadioGroupItem dimensions in other cards, so the Chat icon aligns vertically with Text/Audio/Vision/Embeddings icons. --------- Co-authored-by: workspace <user@workspace.local> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Manan17 <shahmanan170602@gmail.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>	2026-03-17 07:46:07 -07:00
Daniel Han	29f7fddac6	Studio UI	2026-03-17 07:44:54 -07:00
Michael Han	f3b6e0e486	Add files via upload	2026-03-17 06:42:25 -07:00
Roland Tannous	c6bd55ec61	fix(llm_assist): disable thinking mode for helper model JSON output (#4358 ) * fix(llm_assist): disable thinking mode for helper model JSON output Pass enable_thinking=False to generate_chat_completion() in both _run_with_helper() and _generate_with_backend() so the Qwen3.5-4B helper model produces clean JSON instead of wrapping responses in <think> tags. * fix(llm_assist): log per-request enable_thinking=False override Add info-level log lines so the user can see that each helper/advisor request overrides the server-level thinking default to False. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-17 15:58:08 +04:00
Roland Tannous	a0aba96ebd	fix: comment out debug print statements (#4357 )	2026-03-17 15:43:27 +04:00
Daniel Han	37fe04f7bf	studio: add SVG preview, fix streaming bug and model selector state (#4354 ) - Add SVG preview rendering below code blocks using safe data URI in <img> tag. Includes sanitization to block script/event handlers. - Fix GGUF streaming crash: cache response.iter_text() iterator instead of creating a new one on every loop iteration. - Fix model selector showing "Select model..." after auto-load by re-reading store state after setCheckpoint before setParams. - Remove unused warmupToastShown variable (TS6133 build error). - Change default suggestion to "Draw an SVG of a cute sloth".	2026-03-17 02:34:05 -07:00
Datta Nimmaturi	33dc47da72	Fix spacing in setup.sh echo statements	2026-03-17 14:53:55 +05:30
Roland Tannous	df01a139a0	fix: remove unused warmupToastShown variable to fix TS6133 build error (#4353 )	2026-03-17 02:03:15 -07:00
Daniel Han	fe05b700dc	studio: fix slow cancellation of GGUF generation (#4352 ) The streaming loop used response.iter_text() with timeout=None, which blocks until the next chunk arrives from llama-server. On large models like Qwen3.5-27B where each token takes seconds, pressing Stop in the UI would not take effect until the next token was produced. Fix by using a 0.5s read timeout and a new _iter_text_cancellable() helper that checks cancel_event between timeout windows and explicitly closes the response when cancelled. Applied to both the regular chat completion and tool-calling streaming paths.	2026-03-17 01:47:21 -07:00
Daniel Han	b437c9a36d	studio: update Creative/Precise presets, show "Off" for disabled samplers (#4350 ) Creative: temperature=1.5, min_p=0.1, top_p=Off (1.0), top_k=Off (0) Precise: temperature=0.1, top_p=0.95, top_k=80, min_p=0.01 Also show "Off" in the slider label for top_p=1.0, top_k=0, and repetition_penalty=1.0 since those values disable their respective samplers. Changed top_k slider min from -1 to 0.	2026-03-17 01:32:18 -07:00
Daniel Han	ee6f057cc2	studio: show "Off" for repetition penalty = 1 (#4349 )	2026-03-17 01:28:33 -07:00
Daniel Han	c00a993a68	studio: fix stale GGUF metadata, update helper model, auth improvements (#4346 ) * studio: switch helper model to Qwen3.5-4B-GGUF Replace Qwen3-4B-Instruct-2507-GGUF with Qwen3.5-4B-GGUF as the default helper model for LLM-assisted dataset detection. Same UD-Q4_K_XL variant. * studio: fix stale GGUF metadata when switching models (#4347) Reset _supports_reasoning, _supports_tools, _context_length, and _chat_template at the start of _read_gguf_metadata() to prevent stale settings from a previous model leaking into the next load. Co-authored-by: Daniel Han <daniel@unsloth.ai> * studio: change login error to "Incorrect password", add reset-password CLI - Login error now says "Incorrect password" instead of the generic "Incorrect username or password" since Studio only has one account. - Add `unsloth studio reset-password` command that deletes the auth database so a fresh admin account with a new random password is created on the next server start. * studio: include reset command in login error message * studio: change password setup subtitle wording	2026-03-17 01:22:08 -07:00
Daniel Han	eeffa4c065	studio: web search, KV cache dtype, training progress, inference fixes ## Summary - Add web search tool calling for GGUF models (Search toggle, DuckDuckGo via ddgs) - Add KV cache dtype dropdown (f16/bf16/q8_0/q5_1/q4_1) in Chat Settings - Fix Qwen3/3.5 inference defaults per official docs (thinking on/off params) - Enable reasoning by default for Qwen3.5 4B and 9B - Replace "Generating" toast with inline spinner - Fix stop button via asyncio.to_thread (event loop no longer blocked) - Fix CUDA 12 compat lib paths for llama-server on CUDA 13 systems - Fix auto-load model name not appearing in selector - Training progress messages + dataset_num_proc fix Integrated PRs: - #4327 (imagineer99): BETA badge alignment (already in tree) - #4340 (Manan Shah): prioritize training models in model selection - #4344 (Roland Tannous): setup.sh macOS python version compatibility - #4345 (Manan Shah): revamp model+dataset checking logic	2026-03-17 00:30:01 -07:00
pluesclues	f5e1f52b48	Add check to disable xformers on newer GPUs (#4342 ) Disable xformers for GPUs with compute capability >= 12 to ensure compatibility with newer hardware.	2026-03-16 22:42:38 -07:00
Michael Han	a804325171	Update Unsloth_Studio_Colab.ipynb	2026-03-16 22:30:12 -07:00
Michael Han	674ce29131	Update Unsloth_Studio_Colab.ipynb	2026-03-16 22:28:58 -07:00
Michael Han	f0afafd4ba	Update Unsloth_Studio_Colab.ipynb	2026-03-16 22:16:39 -07:00
Michael Han	227759df61	Update Unsloth_Studio_Colab.ipynb	2026-03-16 22:15:07 -07:00
Datta Nimmaturi	bbf6414caf	Fix formatting of launch command in setup.ps1	2026-03-17 10:19:16 +05:30
Leo Borcherding	df98569f12	studio: improve Colab notebook, redesign ready popup, and clean up install output (#4339 ) * Removing .precommit config * edited colab comments * studio: update Unsloth_Studio_Colab.ipynb * studio: update Unsloth_Studio_Colab.ipynb * studio: add Colab T4 GPU metadata to force T4 instance * style: update colab popup to black/white theme with gem icon and play button * feat: center landscape image in colab notebook * style: shrink popup to fit content, truncate URL display * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: center landscape image in colab notebook * feat: use GitHub raw URL for studio landscape image in notebook * chore: update colab notebook * feat: add studio landscape colab display image and update notebook * feat: update notebook with studio landscape image * style: remove colors, add progress bar, add VERBOSE flag to install output * docs: add comments explaining VERBOSE flag and progress bar * chore: update colab notebook * fix: define VERBOSE, _STEP, _TOTAL at module level to fix NameError --------- Co-authored-by: LeoBorcherding <LeoBorcherding@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-16 21:39:25 -07:00
Daniel Han	dc2879a048	Fix xformers Blackwell guard: broader coverage and root cause docs (#4338 ) * Remove outdated xformers Blackwell version guard The guard at _utils.py:976-989 blocked xformers 0.0.32.post2 on Blackwell/RTX 50x/Jetson GPUs (SM 10.0/11.0/12.0) due to a FA3 dispatch bug that caused CUDA errors (issue #1329). This is no longer needed because: 1. xformers fixed the FA3 dispatch in 0.0.33.post2 by capping it at SM <= 9.0, so FA3 is never attempted on Blackwell. The FA2 backend works correctly via PTX forward compatibility. 2. The only blocked version (0.0.32.post2) was built for torch 2.8.0 and cannot load on torch 2.9+ due to ABI mismatch, so the guard never actually triggers for any current user. 3. The existing _register_extensions() check plus the except Exception fallback already handle broken xformers installs gracefully by falling back to SDPA. Verified on NVIDIA RTX PRO 6000 Blackwell (SM 12.0) with both pre-built wheels (0.0.33.post2) and source builds -- all attention tests pass with exact numerical match vs SDPA. * Update xformers Blackwell guard with root cause and broader coverage Changes to the xformers version guard for Blackwell/RTX 50x/Jetson GPUs: 1. Broaden version check from `in (0.0.32.post2,)` to `<= 0.0.32.post2` to cover all versions with the broken FA3 dispatch, not just one. 2. Add `DEVICE_TYPE == "cuda"` guard to avoid calling `get_device_capability()` on non-CUDA devices (XPU, etc.). 3. Document the root cause: xformers <= 0.0.32.post2 used `capability >= (9, 0)` in the FA3 dispatch, which matched Blackwell SM 12.0 and attempted sm_90a Hopper kernels on it. Fixed upstream in 0.0.33 with `<= (9, 0)`. 4. Update error message to include the installed version, mention the fix (upgrade to >= 0.0.33), and keep the build-from-source fallback. The raise is caught by `except Exception` which shows the message when UNSLOTH_ENABLE_LOGGING is set and falls back to SDPA. Verified on NVIDIA RTX PRO 6000 Blackwell (SM 12.0): - xformers 0.0.33.post2 pre-built wheel: works (FA2 via PTX) - xformers source build: works (FA2 native) - Both have exact numerical match vs SDPA --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-03-16 21:30:03 -07:00
Daniel Han	6912a15a42	fix: add Qwen3.5 version gate in loader dispatch (#4335 ) * fix: add Qwen3.5 version gate in loader dispatch (#4188) Qwen3.5 (model_type qwen3_5) only exists in transformers >= 5.0.0. Without this gate, loading a Qwen3.5 model on transformers 4.x gives an unhelpful generic error. This adds a clear version check before the qwen3 dispatch to prevent substring misrouting and give a useful error message pointing users to upgrade. No dedicated FastQwen3_5Model is needed -- the compiler already applies fused CE automatically via apply_fused_lm_head for both Qwen3_5ForCausalLM and Qwen3_5ForConditionalGeneration. The generic FastModel fallback path handles everything. FORCE_FLOAT32 already has qwen3_5 on main. Tested on transformers 5.3.0: Qwen3.5-0.8B 4bit, 1.38 GB peak memory. Backwards compatible: import unsloth works on transformers 4.57.6. * fix: update FORCE_FLOAT32 comment for qwen3_5 The (1+w) RMSNorm pattern does not overflow float16 since Qwen3_5RMSNorm computes in float32 internally. The actual reason FORCE_FLOAT32 is needed is that Qwen3.5 GDN layers produce NaN grad norms during float16 training. Updated the comment to reflect the real reason. * fix: move qwen3_5 version check before dispatch chain The elif block intercepted qwen3_5 on transformers >= 5.0.0 without setting dispatch_model, causing UnboundLocalError at line 715. Move the version check before the if/elif dispatch chain so on transformers >= 5.0.0 the model_type falls through to the generic FastModel path as intended. * fix: qwen3_5 requires transformers >= 5.2.0, not 5.0.0 Checked all 5.x releases: - 5.0.0: no qwen3_5 module - 5.1.0: no qwen3_5 module - 5.2.0: qwen3_5 available * fix: move qwen3_5 version check into AutoConfig error handler The previous version check at the dispatch chain was unreachable -- AutoConfig.from_pretrained fails first with a generic "does not recognize this architecture" error on transformers < 5.2.0, so execution never reached the check. Move the qwen3_5-specific error message into the AutoConfig exception handler where "architecture" errors are caught. This intercepts the error before the generic message and gives users a clear upgrade path. Also remove the now-redundant check before the dispatch chain. Both FastLanguageModel and FastModel paths are covered. Tested: transformers 4.57.6 shows the Qwen3.5-specific error, transformers 5.3.0 loads and trains normally. --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-03-16 20:37:42 -07:00
Leo Borcherding	262271a20d	Fix/colab comment edits (#4317 ) * Removing .precommit config * edited colab comments * studio: update Unsloth_Studio_Colab.ipynb * studio: update Unsloth_Studio_Colab.ipynb * studio: add Colab T4 GPU metadata to force T4 instance * style: update colab popup to black/white theme with gem icon and play button * feat: center landscape image in colab notebook * style: shrink popup to fit content, truncate URL display * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: center landscape image in colab notebook * feat: use GitHub raw URL for studio landscape image in notebook * chore: update colab notebook --------- Co-authored-by: LeoBorcherding <LeoBorcherding@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-16 16:15:46 -07:00
pre-commit-ci[bot]	1c3f201943	[pre-commit.ci] pre-commit autoupdate (#4332 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.15.5 → v0.15.6](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.5...v0.15.6) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-16 14:41:49 -07:00
Roland Tannous	46f9be3dd1	fix: Resolve CUDA toolkit mismatch on multi-CUDA Windows systems (#4324 ) * fix: prefer existing CUDA_PATH toolkit to avoid version mismatch on multi-CUDA systems * fix: validate GPU arch support before accepting CUDA toolkit (sm_120 + CUDA 12.4 fallback) * debug: add temporary CUDA compatibility check print * fix: auto-copy CUDA VS integration files when missing (No CUDA toolset found) * fix: return false when nvcc --list-gpu-arch unavailable (reject old toolkit, scan for newer) * fix: re-sanitize CUDA env vars before cmake build (survives Refresh-Environment) * fix: use --list-gpu-code (sm_) instead of --list-gpu-arch (compute_) for arch probing	2026-03-16 18:16:16 +04:00
Daniel Han	44dcf30b9b	studio: per-model inference defaults, GGUF slider fix, reasoning toggle (#4325 ) * studio: extract param count from model name as fallback When HuggingFace API doesn't return totalParams for a model, extract the param count from the model name (e.g. "Qwen3-0.6B" -> "0.6B", "Llama-3.2-1B-Instruct" -> "1B"). Applied to both the recommended list and HF search results. * studio: read GGUF context_length via fast header parser, set max tokens - Fast GGUF metadata reader (~30-55ms) parses only KV header, skips tensor data and large arrays (tokenizer vocab etc) - Extracts context_length and chat_template from GGUF metadata - Returns context_length in LoadResponse for frontend to use - Frontend sets maxTokens to actual context_length for GGUFs (e.g. 262144 for Qwen3.5-9B, 131072 for Qwen2.5-7B) - Max Tokens slider shows "Max" and is locked for GGUFs - Auto-load path also uses actual context_length from load response - Toast auto-dismiss (5s) and close button for auto-load toast * studio: GGUF TTS audio support (from PR #4318) Add GGUF TTS audio generation via llama-server. When a GGUF model loads, the backend probes its vocabulary to detect audio codecs (SNAC/BiCodec/DAC/CSM/Whisper). If detected, the codec is pre-loaded and the model is reported as audio to the frontend. During chat, TTS models route to the audio generation path which sends a per-codec prompt to llama-server's /completion endpoint, extracts generated tokens/text, and decodes to WAV using AudioCodecManager. Also strips base64 audio data from prior assistant messages to prevent context overflow. Co-authored-by: Manan Shah <mananshah511@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove package-lock.json from tracking * studio: per-model inference defaults, GGUF max tokens fix, reasoning toggle - Add inference_defaults.json with per-model-family sampling parameters for ~50 families (Qwen3.5, Qwen3, Gemma-3, Llama-3, DeepSeek, etc.). Values sourced from unslothai/docs and Ollama params blobs. - Family-based lookup in inference_config.py: extracts model family from identifier, matches against patterns (longest match first), merges with priority: model-specific YAML > family JSON > default.yaml. - Fix GGUF Max Tokens slider locked at "Max": store ggufContextLength separately from maxTokens so the slider is adjustable (step=64). - Fix Ministral YAML: top_p was literal string "default", now 0.95. - Add reasoning toggle for thinking models (Qwen3.5, Qwen3, DeepSeek-R1, DeepSeek-V3.1, etc.): detect enable_thinking support from GGUF chat template metadata, pass --jinja to llama-server, send chat_template_kwargs per-request. Frontend shows "Reasoning is ON/OFF" pill button next to attachment button in composer. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: remove default system prompt injection Backend was injecting "You are a helpful AI assistant." when no system prompt was provided. Neither unslothai/docs nor Ollama specify a default system prompt for most models. Now defaults to empty string, letting the model's own chat template handle system behavior. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: use lightbulb icons and "Think" label for reasoning toggle Lightbulb on when thinking enabled, lightbulb-off when disabled. Label is just "Think" in both states; grayed out styling when off. * studio: fix HTML file upload breaking chat Replace SimpleTextAttachmentAdapter with custom TextAttachmentAdapter (excludes text/html) and HtmlAttachmentAdapter that strips tags via DOMParser, removing scripts/styles and extracting readable text content instead of dumping raw HTML markup into the conversation. * studio: show chat template in Configuration panel Display the model's Jinja2 chat template in a new "Chat Template" section under Settings (now open by default). For GGUFs, reads from GGUF metadata; for safetensors, reads from tokenizer.chat_template. Template is editable with a "Restore default chat template" button that appears when modified. Section only shows when a model with a chat template is loaded. * studio: editable chat template with Apply & Reload Chat template section now functional: - Editing the template shows "Apply & Reload" (reloads model with custom template) and "Revert changes" buttons - For GGUFs: writes template to temp .jinja file, passes --chat-template-file to llama-server on reload - For non-GGUF: passes chat_template_override in load request - Settings section now open by default - selectModel supports forceReload to reload same model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: fix DeepSeek reasoning detection and auto-load metadata - Set _model_identifier before _read_gguf_metadata so DeepSeek "thinking" template detection works (was always None before) - Populate ggufContextLength, supportsReasoning, reasoningEnabled, defaultChatTemplate in autoLoadSmallestModel GGUF path * studio: add spacing before BETA badge in navbar Add gap-1.5 on the logo Link container to space the BETA label from the wordmark. Co-authored-by: Imagineer99 <Imagineer99@users.noreply.github.com> * studio: vertically center BETA badge with logo --------- Co-authored-by: Manan Shah <mananshah511@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Imagineer99 <Imagineer99@users.noreply.github.com>	2026-03-16 06:37:55 -07:00
Roland Tannous	6d12a6b13b	Improve AI Assist: Update default model, model output parsing, logging, and dataset mapping UX (#4323 ) * Strip <think> blocks from LLM assist model output * Add debug logging for raw LLM assist output * Quiet llama-server logs, use structlog in llm_assist * Fix think-tag stripping when response is inside tags * Remove debug logging of raw model output * Clarify GGUF download logs: show cache hit vs actual download * Clarify heuristic-detected mapping in UI text * Default helper model to Qwen3-4B-Instruct-2507 UD-Q4_K_XL * Remove package-lock.json from tracking, add to .gitignore * Auto-open mapping dialog on Start Training for custom_heuristic format * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use last think block when extracting inner content (review feedback) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-16 16:04:35 +04:00
Daniel Han	11449208f4	Fix VLM GRPO matmul shape mismatch in _get_per_token_logps_and_entropies (#4301 ) * Fix VLM GRPO matmul shape mismatch in _get_per_token_logps_and_entropies VLM models (e.g. Qwen2.5-VL) can return logits [BT, vocab_size] instead of hidden states [BT, hidden_dim] from their forward pass. When this happens, chunked_hidden_states_selective_log_softmax tries to compute logits @ lm_head.t() which fails with a shape mismatch. Add a shape guard in the VLM branch of _get_per_token_logps_and_entropies: check output.shape[-1] against lm_head.shape[1] (hidden_dim). When hidden states are returned, the existing path is taken. When logits are returned, scaling/softcapping/temperature are applied manually and chunked_selective_log_softmax is used instead. Also add chunked_selective_log_softmax to the import from unsloth_zoo. The text-only branch (pixel_values is None) is unchanged. Companion PR to unslothai/unsloth-zoo for grpo_accumulated_loss. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove redundant scaling in logits fallback path When COMPILE_DISABLE=1 and the model returns logits directly, scaling and softcapping are already applied by the model forward. Only temperature (a GRPO training parameter) needs to be applied. * Pass temperature to chunked_selective_log_softmax instead of manual cast Use the new temperature parameter in chunked_selective_log_softmax (added in companion zoo PR) to avoid casting the entire logits tensor to float32 before the function call. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-16 03:54:16 -07:00
Daniel Han	356538d760	Apply use_reentrant removal to all TRL trainer configs, not just GRPO The existing fix that removes use_reentrant=False from gradient_checkpointing_kwargs was gated behind RLConfig_name == "GRPOConfig", so only GRPOConfig was protected. SFTConfig, DPOConfig, KTOConfig, CPOConfig, ORPOConfig etc. were all still affected. Remove the GRPOConfig guard so the fix applies to all compiled trainer configs when TRL >= 0.27.0. This is defense-in-depth alongside the unsloth_zoo fix that forces use_reentrant=True in unsloth_checkpoint() itself.	2026-03-16 03:51:35 -07:00
Daniel Han	ec9a0906eb	studio: GGUF unlimited context, auto-load, settings UX, recommended list - GGUF: use -c 0 for model's native context size (no 4096 cap) - GGUF: hide Max Seq Length slider (irrelevant), set Max Tokens to Max - Non-GGUF: default Max Tokens to 4096 - Max Tokens slider shows "Max" label when at ceiling for GGUFs - Run non-GGUF load_model in asyncio.to_thread for progress polling - Auto-load smallest downloaded model when chatting without selection - Wait for in-progress model load before inference (modelLoading store flag) - Recommended list: 4 GGUFs + 4 hub models after case-insensitive dedup - Model selector waits for cached data before rendering - Toast close button repositioned, Sampling section open by default - Add logging to _get_repo_size_cached exception handler	2026-03-16 02:46:56 -07:00
pre-commit-ci[bot]	9945843fa9	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-16 02:46:56 -07:00
Daniel Han	991a2bfc35	studio: GGUF unlimited context, auto-load, wait-for-load, UX fixes - Use -c 0 for llama-server (model's native context size, no 4096 cap) - Run non-GGUF backend.load_model in asyncio.to_thread for progress polling - Auto-load smallest downloaded model when user chats without selecting one - Wait for in-progress model load before inference (no "No model loaded" error) - Add modelLoading flag to zustand store for cross-component coordination - Dynamic top models: send 8 GGUFs + 8 hub models, frontend caps 4+4 after dedup - Case-insensitive dedup: downloaded models correctly hide from recommended list - Prevent duplicate toasts: guard against double selectModel calls - Model selector waits for cached data before rendering (no empty flash) - Toast close button positioned at top-right with proper spacing - Sampling section expanded by default in chat settings - Global toast close button styling fix	2026-03-16 02:46:56 -07:00
Daniel Han	20c6d9a26a	Set repetition_penalty default to 1.0 (disabled) everywhere Change all repetition_penalty defaults from 1.1 (or 1.05/1.2 in presets) to 1.0 across the entire backend and frontend. Most models handle repetition well on their own and a non-1.0 penalty can degrade output quality, especially for code, structured output, and creative tasks. Files changed: - Backend: inference.py, llama_cpp.py, orchestrator.py, worker.py, models/inference.py (Field defaults) - Frontend: chat-settings-sheet.tsx (Creative/Precise presets), runtime-provider.tsx (auto-title generation)	2026-03-16 02:46:56 -07:00
Daniel Han	b985471637	Increase default max tokens to 8192, disable repetition penalty - maxTokens: 2048 -> 8192. The old 2048 limit caused generation to stop mid-output for longer responses (e.g. reasoning/thinking models that produce long chain-of-thought before the answer). - repetitionPenalty: 1.1 -> 1.0 (disabled). Most models handle repetition well on their own. A penalty of 1.1 can hurt quality for creative tasks like code generation and ASCII art. - Change welcome message from "Run LLMs or test your fine-tune" to "Chat with your model".	2026-03-16 02:46:56 -07:00
Daniel Han	8ffd86012f	Change "Stop loading" to outlined "Stop" button	2026-03-16 02:46:56 -07:00
Daniel Han	9cbeecc16a	Incorporate PR #4304 toast UX improvements Merge the toast UX refactor from PR #4304 (by @Shine1i): - Toast duration 5s default with close button (X) for manual dismiss - Inline progress bar component (ModelLoadInlineStatus) shown in the header after toast is dismissed - Model switch warning only for image compatibility (not generic) - activeThreadId tracked in store via ActiveThreadSync - Loading state cleanup via resetLoadingUi helper - Toast uses Infinity duration during loading with onDismiss handler Re-applied non-GGUF download progress additions on top: - getDownloadProgress for all models (not just GGUF) - hasShownProgress flag, loadingModelRef race condition checks - First poll at 500ms, bytes-only fallback when expected size unknown	2026-03-16 02:46:56 -07:00
Daniel Han	042598d9f1	Suppress model-switch warning on empty chat threads Don't show "Model changed for this chat" toast when the thread has no messages. On a fresh page load with a stale thread from a previous session, this warning is confusing. The warning is only useful mid-conversation to alert about image compatibility with the new model. When messages.length === 0, silently update the thread's modelId and proceed with loading.	2026-03-16 02:46:56 -07:00
Daniel Han	f4d54a8de7	Fix vision detection subprocess using undefined logger The _VISION_CHECK_SCRIPT subprocess used logger.info() but logger was never defined in the subprocess context. This caused a NameError on every vision check, making all transformers 5.x models (Qwen3.5, GLM, etc.) fall back to text-only mode even when they support vision. Replace logger.info() with print() since the parent process reads the subprocess stdout via result.stdout.	2026-03-16 02:46:56 -07:00
Daniel Han	3a5d751f19	Add logging to download-progress exception handler	2026-03-16 02:46:56 -07:00
Daniel Han	2642f6d21d	Add sloth emoji to section labels, friendlier network error - Add sloth emoji prefix to "Downloaded" and "Recommended" section labels in the Hub model picker so they are visually distinct. - Replace browser network errors ("NetworkError when attempting to fetch resource" / "Failed to fetch") with a clearer message: "Studio isn't running -- please relaunch it."	2026-03-16 02:46:56 -07:00
pre-commit-ci[bot]	a45babc620	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-16 02:46:56 -07:00
Daniel Han	d417407087	Convert images to PNG before sending to llama-server llama-server uses stb_image internally which does not support WebP, TIFF, AVIF, and other formats that browsers accept for upload. Uploading a WebP image to a vision GGUF model caused a 400 error: "Failed to load image or audio file" / "failed to decode image bytes". Convert all uploaded images to PNG via PIL before base64-encoding and forwarding to llama-server. This handles WebP, TIFF, BMP, GIF, AVIF, and any other format PIL supports. RGBA images are converted to RGB first since PNG with alpha can cause issues in some vision pipelines.	2026-03-16 02:46:56 -07:00
pre-commit-ci[bot]	c842e019d8	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-16 02:46:56 -07:00
Daniel Han	39854f4429	Auto-download mmproj for vision-capable GGUF models GGUF repos with mmproj files (e.g. Qwen3.5-0.8B-GGUF) are already detected as vision-capable by list_gguf_variants(), and is_vision is set correctly in ModelConfig. However, the HF download path only downloaded the main GGUF file without the mmproj projection file, so llama-server started without --mmproj and rejected image uploads with "text-only model" errors. Add _download_mmproj() to LlamaCppBackend that: - Lists repo files for mmproj*.gguf matches - Prefers mmproj-F16.gguf (best quality), falls back to any mmproj - Downloads via hf_hub_download (uses the same HF cache) In load_model(), when is_vision=True and no explicit mmproj_path was provided (HF mode), auto-download the mmproj after the main GGUF. The downloaded path is passed to llama-server via --mmproj.	2026-03-16 02:46:56 -07:00
Daniel Han	f20c7ca54d	Friendlier unsupported model errors, show estimated download size 1. Backend: When a model fails with "No config file found" or similar unsupported-model errors, wrap the message with "This model is not supported yet. Try a different model." instead of showing the raw Unsloth exception. 2. Frontend: Compute estimated download size from the HF search API's safetensors.parameters dtype breakdown (BF16=2B/param, I32=4B/param, F32=4B/param, etc.) and show it in the model picker instead of just the param count. For example, Kimi-K2.5 now shows "~554 GB" instead of "171B" (which was misleading since 171B params != 171GB download).	2026-03-16 02:46:56 -07:00
Daniel Han	1471c63b96	Fix download progress bugs: false completion, stale UI, dedup Three fixes on top of the download progress feature: 1. Backend: Replace broken "no .incomplete = done" completion check with a 95% byte threshold. HF downloads files sequentially, so between files there are briefly no .incomplete files even though the download is far from done (e.g. Kimi-K2.5 reported "done" after downloading 22KB of config files out of 595GB). 2. Frontend: Track hasShownProgress flag. Only show "Download complete. Loading into memory..." if we actually displayed download progress before. For already-cached models where the first poll returns progress=1.0, this avoids the misleading "Download complete" message. 3. Frontend: Deduplicate recommended vs downloaded -- filter out models already in the "Downloaded" section. Cache the fetched lists at module level so re-mounting the popover does not flash an empty "Downloaded" section.	2026-03-16 02:46:56 -07:00
pre-commit-ci[bot]	e03a809994	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-16 02:46:56 -07:00
Daniel Han	b84f167d5a	Add download progress bar for non-GGUF models in Chat Previously only GGUF models showed download progress in Chat. Non-GGUF models (safetensors, bnb quantized, etc.) showed a static message with no progress indication. This adds progress tracking for all model types and fixes several related issues. Backend: - Add /api/models/download-progress endpoint that checks the HF cache blobs directory for completed and .incomplete files. Uses model_info() (cached per repo) to determine expected total size for percentage. - Add /api/models/cached-models endpoint that lists non-GGUF model repos from the HF cache via scan_cache_dir(). - Fix progress stuck at 0.99: when no .incomplete files remain, report 1.0 immediately (blob deduplication can make byte totals mismatch). Frontend: - Remove the ggufVariant gate so download progress polling works for all non-cached models, not just GGUFs. - Use GGUF-specific endpoint when variant + expectedBytes available, otherwise use the general download-progress endpoint. - Fix toast stuck after load: check loadingModelRef.current before and after the async poll to prevent overwriting the success toast. - First poll at 500ms instead of waiting for the 2s interval. - Show downloaded non-GGUF models in the Hub model picker "Downloaded" section alongside GGUFs.	2026-03-16 02:46:56 -07:00
Roland Tannous	08b5879101	fix: Ctrl+C not terminating backend on Linux (#4316 ) * fix: Ctrl+C not breaking out of backend on Linux threading.Event.wait() without a timeout blocks at the C level on Linux, preventing Python from delivering SIGINT. Use a 1-second timeout loop so the interpreter can process pending signals. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-16 11:58:09 +04:00
Manan Shah	164b5a5b06	[Feature] studio: user can upload eval dataset (#4307 ) * user can upload eval dataset, removed bugs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolving merge conflicts * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolving gpt comments --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-03-16 11:15:50 +04:00
Daniel Han	6c2a593522	Fix setup.sh crash on Mac with empty gitignore array The `set -u` (nounset) flag in setup.sh causes `${_HIDDEN_GITIGNORES[@]}` to fail with "unbound variable" when no parent .gitignore with `*` is found (common on Mac where the install is not inside a Python venv). Use the `${arr[@]+"${arr[@]}"}` idiom to safely expand empty arrays under nounset mode.	2026-03-15 22:33:04 -07:00
Daniel Han	a8f02c9f3f	Fix studio frontend build producing empty Tailwind CSS Two issues caused the studio frontend to render without any styling when installed via `pip install` (non-editable): 1. `pyproject.toml` package-data only included `frontend/dist/*/`. The `include-package-data = true` setting relies on `git ls-files`, which fails in isolated builds (pip/uv copy source to a temp dir without `.git`). This meant `frontend/src/`, `package.json`, `vite.config.ts`, and other build files were missing from the installed package. Tailwind had no source files to scan. 2. Python venvs auto-create a `.gitignore` with a bare `` pattern. Tailwind v4's oxide scanner walks parent directories and respects `.gitignore` -- so even when source files are present, the venv's `` pattern causes the scanner to skip all `.tsx` files. The result is a 34KB CSS skeleton with zero utility classes instead of the expected 265KB. Additionally, Vite adds `crossorigin` to script/link tags by default. This forces CORS mode on font subresource loads, which Firefox HTTPS-Only Mode does not exempt -- causing all @font-face downloads to fail silently when Studio is served over HTTP. Changes: - pyproject.toml: Expand package-data to include frontend source, config files, setup scripts, and backend requirements using glob patterns (no node_modules) - studio/setup.sh: Temporarily hide parent .gitignore files containing a bare `*` during `npm run build`, with trap-based restoration - studio/backend/main.py: Strip `crossorigin` attributes from HTML at serve time so fonts load correctly on any protocol	2026-03-15 22:00:00 -07:00
Lee Jackson	15e7d0dd5c	fix: preserve save_steps when toggling to epochs mode (#4308 )	2026-03-16 08:43:49 +04:00
Lee Jackson	7b1ea88739	studio: simplify auth UX to password-only login (#4305 ) * feat(studio): switch to password-only login and simplify first-time setup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: align change-password button state with validation rules --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-03-16 03:02:58 +04:00
Roland Tannous	0818f78617	Graceful shutdown on Windows (signal handlers for Ctrl+C) (#4306 ) * fix: graceful shutdown on Windows (signal handlers for Ctrl+C) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-16 03:01:15 +04:00
Leo Borcherding	ce8e530e6d	Fix/colab plugin editable install (#4281 ) * fix: update Colab notebook to use public unsloth repo and correct paths * Update studio/Unsloth_Studio_Colab.ipynb For efficiency, especially in environments like Colab, it's better to perform a shallow clone of the repository. This fetches only the latest commit from the specified branch, which is significantly faster and uses less disk space than cloning the entire project history. Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update Unsloth_Studio_Colab.ipynb * studio: add standard Unsloth header, news, section headings, and footer to Colab notebook * studio: refine Colab notebook section headings and cell cleanup --------- Co-authored-by: LeoBorcherding <LeoBorcherding@users.noreply.github.com>	2026-03-16 01:34:37 +04:00
Lee Jackson	1e3aa4ff92	studio: add max steps and epochs toggle switch (#4296 ) * feat: add Epochs toggle for Max Steps * refactor: dedupe max-steps/epochs toggle logic and fix input bug * fix(studio): max-steps input validation and prevSaveSteps seed in epochs mode --------- Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-03-16 01:33:51 +04:00
Manan Shah	b2dce8e3a8	chat only with gguf for mac devices (#4300 ) * chat only with gguf for mac devices * resolving gpt comments * add change-password for chat only * hide lora adaptors dropdown * solving gpt comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * addressing the comment * fixing auth flow --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-15 23:20:48 +04:00
pre-commit-ci[bot]	050240b27a	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-15 05:24:06 -07:00
Daniel Han	11612f6dc9	studio: fix GGUF download UX -- progress bar, cancel, sorting, auto-scroll - Run GGUF load_model in asyncio.to_thread so the event loop stays free for progress polling during download (was blocking all requests). - Extract download phase out of the lock in LlamaCppBackend.load_model so unload_model/cancel can take effect immediately during download. - Fix "downloaded" badge for split GGUFs: check total cached bytes across all shards vs expected size, not just first shard existence. - Respect CUDA_VISIBLE_DEVICES in /api/system GPU reporting so the frontend GGUF fit estimation uses actual available VRAM. - Sort tight variants (need CPU offload) smallest-first instead of largest-first -- closer to GPU budget = faster inference. - Fix cancel: use refs instead of React state for abort controller and toast ID so both cancel buttons (text + toast) work reliably. Make cancel synchronous (fire-and-forget unload) for instant UI response. Check abortCtrl.signal.aborted after loadModel returns to prevent ghost model state. Skip rollback and suppress errors on cancel. - Dynamic top 4 GGUF models fetched from HF API sorted by downloads, prepended to the default recommended list. - Remove turnAnchor="top" for auto-scroll to bottom during generation. - Set default toast duration to 10s (was infinite for loading toasts). - Deduplicate cached GGUF repos using scan_cache_dir API (fixes Qwen/X-GGUF vs qwen/x-gguf duplicates from lowercased HF cache). - Pre-compile repo_id validation regex to silence CodeQL ReDoS warning. - Change welcome text and default suggestion text.	2026-03-15 05:24:06 -07:00
Daniel Han	bb57236e29	studio: revert -- always respect CUDA_VISIBLE_DEVICES in GPU memory query	2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]	851cb2af68	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-15 05:24:06 -07:00
Daniel Han	5603ced75f	studio: ignore CUDA_VISIBLE_DEVICES in GPU memory query for llama-server _get_gpu_free_memory was filtering by CUDA_VISIBLE_DEVICES, so with CUDA_VISIBLE_DEVICES='0' set by the training env, llama-server only saw 1 GPU and used --fit for CPU offloading instead of spreading across all 8 GPUs. Since llama-server manages its own GPU allocation (the _select_gpus method picks GPUs and sets CUDA_VISIBLE_DEVICES for the subprocess), the query must see ALL physical GPUs to make the right decision.	2026-03-15 05:24:06 -07:00
Daniel Han	1dfba866be	studio: fix download progress -- track per-variant, include incomplete blobs 1. Progress endpoint now takes a variant parameter and only counts .gguf files matching that variant (not all files in the repo cache, which would include previously downloaded variants) 2. Tracks .incomplete files in HF blobs dir for in-progress single-shard downloads, capping at 99% until the file is fully committed 3. Fixed loading text: "Loading model..." for cached, "Downloading model..." for new downloads, with appropriate descriptions 4. Wording: "Downloading and loading model. Large models can take a while." instead of "This may include downloading."	2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]	b1dda44745	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-15 05:24:06 -07:00
Daniel Han	475ba417dc	studio: context-aware loading text + download progress bar 1. Loading text: shows "Loading model..." for cached models, "Downloading model..." for new downloads. Toast description adapts accordingly. 2. Download progress: polls /api/models/gguf-download-progress every 2s during downloads, updating the toast with percentage and GB downloaded. Progress is estimated by checking the HF cache folder size against the expected total bytes. 3. Passes isDownloaded and expectedBytes through the full chain from variant click to selectModel for accurate UI state.	2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]	061de08f86	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-15 05:24:06 -07:00
Daniel Han	1e9d19126b	studio: fix P1 issues from PR review comments 1. n_gpu_layers kwarg: accept (and ignore) in load_model signature so callers like llm_assist.py don't get TypeError 2. mmproj exclusion: filter out mmproj files in _find_smallest_fitting_variant so fallback doesn't pick a tiny vision projection as the "model" 3. Shard preservation after fallback: re-discover shards for the fallback variant instead of resetting to empty list, so split GGUFs download all shards 4. Orphan cleanup safety: only kill llama-server processes whose cmdline contains ".unsloth/", avoiding termination of unrelated llama-server instances on the same machine 5. Path expression sanitization: validate repo_id format before using it in cache directory lookups	2026-03-15 05:24:06 -07:00
Daniel Han	cf45ff7232	studio: fix downloaded check -- compare basename not full path The variant filename includes a subfolder prefix (e.g. UD-Q4_K_XL/Kimi-K2.5-UD-Q4_K_XL-00001-of-00013.gguf) but rglob returns just the filename. Use Path.name for the comparison.	2026-03-15 05:24:06 -07:00
Daniel Han	92670a90dd	studio: fix case-insensitive HF cache lookup for downloaded GGUF variants HF cache dirs use the exact case from the repo_id at download time (e.g. models--unsloth--kimi-k2.5-gguf) which may differ from the canonical HF repo_id (unsloth/Kimi-K2.5-GGUF). Use case-insensitive matching to find the cache directory.	2026-03-15 05:24:06 -07:00
Daniel Han	7b65073311	studio: show 'downloaded' badge instead of 'recommended' when variant is cached	2026-03-15 05:24:06 -07:00
Daniel Han	bcb382def9	studio: sort downloaded GGUF variants before recommended Downloaded variants now take priority over the recommended badge in sort order. Within the same tier (downloaded+fits, etc.), recommended still sorts first. Order: downloaded -> recommended -> fits -> tight -> OOM	2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]	64ab7554b1	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-15 05:24:06 -07:00
Daniel Han	4d35699c65	studio: show downloaded status in GGUF variant list, sort downloaded first - Backend: /gguf-variants now checks HF cache for each variant's file and returns a downloaded flag per variant - Frontend: downloaded variants sort before non-downloaded (after recommended), and show a green "downloaded" badge - Sort order: recommended -> downloaded+fits -> downloaded+tight -> fits -> tight -> OOM	2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]	904ac86f4a	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-15 05:24:06 -07:00
Daniel Han	897d8b426a	studio: interruptible GGUF downloads, cached models endpoint, Downloaded section 1. Interruptible downloads: load_model now checks a cancel event between shard downloads. unload_model sets the event so cancel stops the download at the next shard boundary. 2. /api/models/cached-gguf endpoint: scans the HF cache for already-downloaded GGUF repos with their total size and cache path. 3. "Downloaded" section in Hub model picker: shows cached GGUF repos at the top (before Recommended) so users can quickly re-load previously downloaded models without re-downloading.	2026-03-15 05:24:06 -07:00
Daniel Han	226ece0c9e	studio: fix cancel to actually kill llama-server during loading The unload endpoint checked is_loaded (requires healthy=True), but during initial loading the server is not yet healthy. Cancel had no effect because the unload route fell through to the Unsloth backend. Fix: add is_active property (process exists, loading or loaded) and check it in the unload route so cancel kills llama-server even during the download/loading phase. Also: toast cancel button now properly triggers the backend unload.	2026-03-15 05:24:06 -07:00
Daniel Han	a0fdf03340	studio: add Cancel button to model loading toast popup Replace toast.promise with a manual toast.loading that includes a Cancel action button. Users can now cancel model downloads/loads from the toast notification itself, not just from the header bar spinner.	2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]	1c4efa6c3d	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-15 05:24:06 -07:00
Daniel Han	c59f028150	studio: kill orphaned llama-server processes on startup When the studio process is killed (SIGTERM/SIGKILL), atexit handlers may not run in the subprocess orchestrator, leaving llama-server processes orphaned and holding GPU memory. This caused OOM errors when trying to load a new model after a studio restart. On init, LlamaCppBackend now runs pgrep to find and SIGKILL any stale llama-server processes before starting fresh.	2026-03-15 05:24:06 -07:00
Daniel Han	7b19cb418e	studio: sort TIGHT (CPU offload) GGUF variants after GPU-only fits Sort order is now: recommended -> fits (largest first) -> tight/CPU offload (largest first) -> OOM (smallest first). Previously tight variants were mixed with fits variants.	2026-03-15 05:24:06 -07:00
Daniel Han	5bb783850a	studio: GGUF OOM accounts for CPU offload via --fit (GPU + system RAM) Updated GGUF fit classification to match llama-server's --fit behavior: - fits: model <= 70% of total GPU memory (all GPUs) - tight: model > 70% GPU but <= 70% GPU + 70% available system RAM (llama-server uses --fit to offload layers to CPU) - OOM: model exceeds both GPU and system RAM budgets useGpuInfo now also returns systemRamAvailableGb from /api/system so the frontend can compute the combined GPU+RAM budget.	2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]	1625565da2	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-15 05:24:06 -07:00
Daniel Han	c9c485a7b0	studio: use nvidia-smi for all GPUs + 70% VRAM threshold for GGUF OOM Two fixes for accurate GGUF OOM detection: 1. /api/system now uses nvidia-smi to enumerate all physical GPUs instead of torch.cuda which only sees CUDA_VISIBLE_DEVICES. This matches llama-server which can use all GPUs regardless of the env var. Falls back to torch-based detection if nvidia-smi unavailable. 2. Frontend GGUF OOM check now uses 70% of total GPU memory as the budget, matching the PR's _select_gpus logic (30% reserved for KV cache and compute buffers). Previously used checkVramFit's 100% threshold which was too generous.	2026-03-15 05:24:06 -07:00
Daniel Han	f5f631e5d1	studio: add cancel button for model loading/downloading Adds a Cancel button next to the "Downloading model..." spinner so users can abort long downloads. Clicking it aborts the in-flight load, calls unloadModel to kill any running llama-server process, and clears the loading state.	2026-03-15 05:24:06 -07:00
Daniel Han	4600131fea	studio: sort OOM GGUF variants smallest-to-largest OOM variants are more useful sorted ascending by size since smaller ones are more likely to run with --fit. Non-OOM variants remain largest-first (best quality).	2026-03-15 05:24:06 -07:00
Daniel Han	ea45370ab8	studio: use total multi-GPU VRAM for OOM checks, recommend smallest when all OOM Two fixes for GGUF variant dropdown: 1. useGpuInfo now sums memory across all GPU devices instead of only reading devices[0]. This matches llama-server's multi-GPU allocation where models can be split across GPUs. 2. When the backend-recommended variant (e.g. UD-Q4_K_XL) exceeds total GPU VRAM, the frontend picks the largest variant that fits instead. If all variants are OOM, it recommends the smallest one (most likely to work with --fit).	2026-03-15 05:24:06 -07:00
Daniel Han	10c4db04d8	studio: fix React hooks order -- move useMemo before early returns The useMemo for sortedVariants was placed after the loading/error early returns, which violated React's rules of hooks (hooks must be called in the same order every render). Move it before the conditional returns. Fixes: Minified React error #310	2026-03-15 05:24:06 -07:00
Daniel Han	3c1b8d7ab7	studio: sort GGUF dropdown client-side -- recommended first, OOM last, rest by size descending Move the sort logic from the backend to the frontend GgufVariantExpander component where GPU VRAM info is available. The backend now does a simple size-descending sort. The frontend pins the recommended variant at the top, pushes OOM variants to the bottom, and sorts the rest by file size descending (largest/best quality first).	2026-03-15 05:24:06 -07:00
Daniel Han	dd2d979b40	studio: sort GGUF quants largest-first so best quality that fits is at the top	2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]	1f861e185b	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-15 05:24:06 -07:00
Daniel Han	2f5347cb4d	studio: sort GGUF quant variants -- recommended first, then UD by size, then standard by size The variants list was returned in HuggingFace file listing order (alphabetical), making the dropdown confusing (e.g. BF16 before Q4_0). Now sorted as: 1. Recommended variant (from _pick_best_gguf) pinned at top 2. Other UD (Unsloth Dynamic) variants sorted by disk size ascending 3. Non-UD variants sorted by disk size ascending	2026-03-15 05:24:06 -07:00
Daniel Han	928868f07d	studio: auto-find free port if requested port is in use If the requested port (default 8000) is already in use, auto- increment and try the next port, up to 20 attempts. Prints a message like "Port 8000 is in use, using port 8001 instead". Previously, if port 8000 was busy, uvicorn would fail with "[Errno 98] address already in use" and the studio would not start. Now it gracefully finds the next free port. Uses socket.bind() to check availability before starting uvicorn. Cross-platform (Linux, macOS, Windows).	2026-03-15 05:24:06 -07:00
Daniel Han	ab6fdccfb5	studio: reorder GGUF preference -- UD-Q4_K_XL first, all UD above standard Reorder _GGUF_QUANT_PREFERENCE so all UD (Unsloth Dynamic) variants come before standard quants. UD-Q4_K_XL is the default (best size/quality tradeoff), followed by other UD quants in decreasing preference order. For repos without UD variants (e.g., bartowski), falls through to standard quants starting with Q4_K_M. Verified with: - unsloth/Qwen3.5-35B-A3B-GGUF -> UD-Q4_K_XL - bartowski/Qwen_Qwen3.5-35B-A3B-GGUF -> Q4_K_M - unsloth/DeepSeek-V3.2-GGUF -> UD-Q4_K_XL (9 shards) - unsloth/Llama-3.2-1B-Instruct-GGUF -> UD-Q4_K_XL	2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]	1dba26012c	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-15 05:24:06 -07:00
Daniel Han	8ccb461570	studio: group GGUF shards by variant in size-based fallback The smallest-fitting-variant fallback now groups split GGUF shards by their variant prefix and sums all shard sizes per variant. For example, DeepSeek-V3.2 UD-Q4_K_XL has 9 shards totaling 379.8 GB. The previous code treated each shard as a separate "variant" and would have incorrectly selected a single 50 GB shard as fitting, ignoring the other 8 shards needed. Tested with unsloth/DeepSeek-V3.2-GGUF (237 GGUF files, 27 variants from 150 GB to 1.25 TB). Correctly groups and sorts all variants by total size.	2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]	d5a18e5a00	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-15 05:24:06 -07:00
Daniel Han	93ec05ced2	studio: default to UD-Q4_K_XL for GGUFs, fall back to smallest Two changes for GGUF variant selection: 1. Default variant preference now starts with UD-Q4_K_XL (Unsloth Dynamic quantization) which provides better quality per bit than standard Q4_K_M. Also added UD-Q2_K_XL, UD-IQ2_M, UD-IQ1_M, UD-IQ1_S as small fallback options. 2. If the selected variant doesn't fit on disk, automatically fall back to the smallest GGUF variant in the repo that does fit. Queries all GGUF file sizes via get_paths_info() and picks the smallest one under the free disk space limit. If nothing fits, raises a clear error. This means users with limited disk space won't get a download error -- they'll get a smaller quantization instead.	2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]	12f3f4361d	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-15 05:24:06 -07:00
Daniel Han	38d700ecb0	studio: check disk space before downloading GGUF models Query file sizes from HuggingFace via get_paths_info() before downloading, and compare against free disk space on the cache partition. Raises a clear error if there is not enough space, instead of failing mid-download. Uses get_paths_info() instead of repo_info() because xet-stored repos return size=None from repo_info().siblings, but get_paths_info() returns the actual file sizes. If the size check fails for any reason (network error, API change), it logs a warning and continues with the download anyway.	2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]	f4fbbcaec8	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-15 05:24:06 -07:00
Daniel Han	f4f69f16a6	studio: centralize cache directory for all downloads Set HF_HOME, HF_HUB_CACHE, HF_XET_CACHE, UV_CACHE_DIR, and VLLM_CACHE_ROOT to a unified location under ~/.unsloth/studio/cache/ on startup. This keeps all model downloads, datasets, and caches in one place instead of scattered across ~/.cache/huggingface, ~/.cache/uv, etc. Layout: ~/.unsloth/studio/cache/ huggingface/ (HF_HOME) hub/ (HF_HUB_CACHE -- model/dataset downloads) xet/ (HF_XET_CACHE -- xet blob store) uv/ (UV_CACHE_DIR -- uv package cache) vllm/ (VLLM_CACHE_ROOT -- vllm compiled kernels) Only sets variables that are not already in the environment, so user overrides (e.g. HF_HOME=/data/models) are respected. Cross-platform: uses Path.home() which resolves correctly on Linux (~), macOS (~), and Windows (C:\Users\<user>).	2026-03-15 05:24:06 -07:00
Daniel Han	f1293fe7d8	studio: respect existing CUDA_VISIBLE_DEVICES in GPU selection If CUDA_VISIBLE_DEVICES is already set in the environment (e.g., by the user or a wrapper script), only consider those GPUs when selecting devices for llama-server. nvidia-smi reports all physical GPUs regardless of CUDA_VISIBLE_DEVICES, so we filter its output to match the allowed set. Without this, the GPU selector could pick a GPU outside the user's allowed set, overriding their restriction.	2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]	e885d7308e	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-15 05:24:06 -07:00
Daniel Han	12183e0656	studio: smart GPU allocation for GGUF inference Automatically select the best GPU(s) for a GGUF model based on file size and available VRAM, instead of relying on hardcoded -ngl -1 or letting llama-server guess. Logic: 1. Measure total GGUF file size (including split shards) 2. Query free memory per GPU via nvidia-smi 3. If the model fits in 70% of the most-free GPU's memory, pin to that single GPU (CUDA_VISIBLE_DEVICES=X, no --fit) 4. If it needs multiple GPUs, pick the N most-free GPUs (CUDA_VISIBLE_DEVICES=X,Y, no --fit) 5. If it's too large for all GPUs combined, omit CUDA_VISIBLE_DEVICES and use --fit on to let llama-server handle partial offloading The 70% threshold accounts for KV cache and compute buffers that sit on top of the model weights. Removed the -ngl parameter (was hardcoded to -1). llama-server's default of "auto" handles layer offloading correctly, especially with --fit on for oversized models. Tested on 8x B200: - 1B model (0.75 GB): picks 1 GPU, no --fit - 27B model (17 GB): picks 1 GPU, no --fit - 405B model (230 GB): picks 2 GPUs, no --fit - 2TB model: all GPUs, --fit on	2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]	7202f81985	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-15 05:24:06 -07:00
Daniel Han	80d84a5b5f	studio: optimize llama-server flags for single-user studio Refactor command building (deduplicate HF/local paths) and add flags for better performance: - --parallel 1: studio is single-user, so only 1 inference slot is needed. The previous auto-detect picked 4 slots, wasting VRAM on 3 unused KV caches. - --flash-attn on: force flash attention for faster inference. Default is "auto" which may not always enable it. - --fit on: auto-adjust parameters to fit in available device memory. Already the default but now explicit. Also cleaned up the duplicated command building for HF vs local mode into a single block.	2026-03-15 05:24:06 -07:00
Daniel Han	887e7a31c4	studio: don't cap max_tokens for GGUF inference Remove the hard max_tokens=2048 default and le=4096 cap for GGUF chat completions. When max_tokens is not set (None), the field is omitted from the llama-server payload entirely, letting the model generate until it produces an EOS token or hits the context limit. This is critical for thinking/reasoning models (Qwen3.5, DeepSeek-R1, etc.) where the thinking phase alone can consume 1000+ tokens before the actual answer. With the previous 2048 default, simple questions like "What is 2+2?" used all tokens on thinking and produced empty visible responses. Changes: - llama_cpp.py: max_tokens default None, only include in payload when explicitly set - models/inference.py: default None, remove le=4096 cap - routes/inference.py: pass max_tokens directly, no "or 2048" fallback llama-server handles omitted max_tokens gracefully (generates until EOS or context limit). The context size (-c flag, default 4096) acts as the hard upper bound.	2026-03-15 05:24:06 -07:00
Daniel Han	961720c1b1	studio: handle reasoning_content in GGUF streaming llama-server sends thinking/reasoning tokens as "reasoning_content" in the SSE delta (separate from "content"). The studio was only reading delta.content, so all reasoning tokens from models like Qwen3.5, Qwen3-Thinking, DeepSeek-R1, etc. were silently dropped. This caused "replies with nothing" for thinking models: the model would spend its entire token budget on reasoning, produce zero content tokens, and the user would see an empty response. Fix: read reasoning_content from the delta and wrap it in <think>...</think> tags. The frontend already has full support for these tags (parse-assistant-content.ts splits them into reasoning parts, reasoning.tsx renders a collapsible "Thinking..." indicator). Verified with Qwen3.5-27B-GGUF (UD-Q4_K_XL): - Before: "What is 2+2?" -> empty response (all tokens in reasoning) - After: shows collapsible thinking + answer "4"	2026-03-15 05:24:06 -07:00
Roland Tannous	477e68675b	Fix: Compare Mode Deadlock, Cancel Event Poisoning & IPC Optimization (#4303 ) * fix: resolve compare mode deadlock, cancel_event poisoning, and add dispatcher-based IPC optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert to 2048 tokens * refactor: extract dispatcher timeout values into named constants * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: guard dispatcher shutdown against active compare mailboxes --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-15 16:11:44 +04:00
Wasim Yousef Said	e280b0bebc	miscallenous studio (#4293 ) * miscallenous studio * chore: upload dataset misc * chore: redudancy studio cleanup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: adress the pr comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: adress comments about recipes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-15 14:42:11 +04:00
Roland Tannous	f44857b2df	PR: Windows Setup Improvements (#4299 ) * quiet llama.cpp build, smarter CUDA install via winget, accept Python 3.11-3.13 * studio: hide Python traceback when setup script exits with error * setup.ps1: auto-add Python Scripts dir to PATH so 'unsloth' command works in new terminals * setup.ps1: fix GPU check to run nvidia-smi instead of just checking command existence * setup.ps1: fix PATH check to use exact entry comparison instead of substring match * setup.ps1: validate Python probe exit code before persisting Scripts PATH	2026-03-14 23:59:49 +04:00
Wasim Yousef Said	629199e3a6	fix: remove old comments (#4292 ) * fix: quotation marks * diceware passphrase generation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-14 16:50:13 +04:00
pre-commit-ci[bot]	b20b3b80df	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-14 00:54:09 -07:00
Daniel Han	4b6f5c76c1	studio: probe-based --system detection for uv Replace _in_virtualenv() heuristic with a runtime probe. At bootstrap time, try a dry-run uv install without --system. If that fails (exit code 2, "No virtual environment found"), retry with --system to confirm it works. This handles all environments correctly: venvs, Colab (system Python), local machines, containers.	2026-03-14 00:54:09 -07:00
Daniel Han	9b7eaf8f0c	studio: make uv optional + fix --system for Colab Three fixes based on review: 1. Make uv truly optional: _bootstrap_uv() now only checks if uv is already on PATH. It no longer tries to pip install uv. If uv is not present, pip is used with zero changes to behavior. 2. Add --system flag for Colab: on Colab there is no venv (packages install into system Python). uv requires --system in this case, otherwise it errors with "No virtual environment found". Added _in_virtualenv() check that detects VIRTUAL_ENV, sys.real_prefix, or sys.base_prefix != sys.prefix. 3. Fix label printed twice on uv fallback: when uv fails and falls back to pip, the label now says "(pip)" to distinguish from the initial uv attempt, instead of printing the same label twice. Tested: - venv path: no --system flag, uv installs correctly - no-venv path (Colab sim): --system flag added automatically - full unsloth studio setup + training run (Llama-3.2-1B, 10 steps)	2026-03-14 00:54:09 -07:00
Daniel Han	a7a66a66b9	studio: address review feedback install_python_stack.py: - Print uv error output on failure for debuggability - Refactor pip_install() to use early return after uv success, removing duplicated pip command path setup.sh: - Guard nvidia-smi command substitution with \|\| true so it does not abort the script under set -euo pipefail when nvidia-smi fails (e.g., containerized environments, driver quirks) - Read all GPU compute capabilities and deduplicate, so mixed-GPU hosts get kernels built for all present architectures instead of only the first GPU	2026-03-14 00:54:09 -07:00
Daniel Han	6dda8c4c23	studio: revert combined targets, keep separate builds Restore separate cmake --build calls for llama-server and llama-quantize on both setup.sh and setup.ps1. The combined approach made llama-quantize failure fatal, but it was originally best-effort (\|\| true on Linux, [WARN] on Windows). The timing savings from combining was only ~2.7s, not worth the semantic change. The Ninja + arch detection speedups are preserved (55s vs 1m 37s).	2026-03-14 00:54:09 -07:00
Daniel Han	e4a5da8d96	studio: combine llama.cpp build targets in setup.ps1 Build llama-server and llama-quantize in a single cmake --build invocation on Windows, matching the same optimization done in setup.sh. This allows MSBuild to better parallelize the two targets. The Visual Studio generator is kept as-is (not switching to Ninja on Windows since VS generator is the standard approach and interacts with MSBuild).	2026-03-14 00:54:09 -07:00
Daniel Han	f8dc7c9a5c	studio: speed up llama.cpp build with Ninja + arch detection Three improvements to the llama.cpp build step in setup.sh: 1. Detect GPU compute capability via nvidia-smi and limit CMAKE_CUDA_ARCHITECTURES to the current GPU. Without this, cmake builds for all default CUDA architectures which is very slow. 2. Use Ninja build generator when available. Ninja has better parallelism than Make for CUDA compilation. 3. Build both llama-server and llama-quantize targets in a single cmake --build invocation for better parallelism. 4. Add --threads=0 to CMAKE_CUDA_FLAGS for multi-threaded nvcc compilation. Measured on 192-core machine with B200 (sm_100): Make (all archs): very slow (minutes for each arch) Make (single arch): 1m 37s Ninja (single arch): 55s Speedup: ~1.7x Combined with the uv change, total setup goes from ~4m 35s to ~1m 40s.	2026-03-14 00:54:09 -07:00
pre-commit-ci[bot]	174d61e0f5	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-14 00:54:09 -07:00
Daniel Han	a537ece7eb	studio: use uv for Python package installs (8x faster) Replace pip with uv in install_python_stack.py to speed up the Python dependency installation phase of `unsloth studio setup`. - Add _bootstrap_uv() that checks for uv on PATH, and if not found, installs it via pip. Falls back to pip if uv is unavailable. - Translate pip flags to uv equivalents (--no-cache-dir dropped since uv caching is fast, --force-reinstall becomes --reinstall). - Add --torch-backend=auto so uv auto-detects CUDA version for PyTorch ecosystem packages. - Per-install fallback: if any uv install step fails, it retries that step with pip before exiting. Measured on clean venv setup: Python packages (pip): 2m 28s Python packages (uv): 18s Speedup: ~8x Total setup time goes from ~4m 35s to ~2m 30s (llama.cpp build is now the bottleneck at 1m 40s).	2026-03-14 00:54:09 -07:00
Daniel Han	2bb72a2244	Revert "add support for mixtral" This reverts commit `c8f712b614`.	2026-03-13 22:39:15 -07:00
tohrnii	943e8f6d84	add support for mixtral (cherry picked from commit `a55b740062`)	2026-03-13 22:39:15 -07:00
Daniel Han	936c18424e	Revert "patch vlm trainer to resize images" This reverts commit `481b22fdf4`.	2026-03-13 22:39:07 -07:00
oliveirabruno01	aa8d91b241	patch vlm trainer to resize images (cherry picked from commit `14c282c4ec`)	2026-03-13 22:39:07 -07:00
Daniel Han	b8eee7a8ba	Revert "Initial changes: Refactor Attention" This reverts commit `a2af843271`.	2026-03-13 22:38:57 -07:00
Shikhar Mishra	7502195443	Initial changes: Refactor Attention (cherry picked from commit `5a7237abfd`)	2026-03-13 22:38:57 -07:00
Daniel Han	49132ced50	Revert "feat: Add Mixtral model support" This reverts commit `99c302d873`.	2026-03-13 22:38:49 -07:00
Shikhar Mishra	659281c508	feat: Add Mixtral model support (cherry picked from commit `2258875885`)	2026-03-13 22:38:49 -07:00
Daniel Han	30a18786bf	Revert "Improve documentation on how to export model from Colab" This reverts commit `703c235a7d`.	2026-03-13 22:38:41 -07:00
Vishwanath Martur	022a5d566a	Improve documentation on how to export model from Colab Related to #1615 Add documentation and function for exporting models from Colab to local machines. * README.md: Add a new section titled "Exporting Models from Colab to Local Machine" under "✨ Finetune for Free" with detailed steps for exporting models from Colab to local machines. * CONTRIBUTING.md: Add a note about the new documentation section for exporting models from Colab. * unsloth/save.py: Add a new function `export_model_to_local` to handle exporting models from Colab to local machines. (cherry picked from commit `0361bd658f`)	2026-03-13 22:38:41 -07:00
Daniel Han	c5fa314937	Revert "adding tools to be able to profile model fwds to see what to turn into kernels" This reverts commit `d32b00ecd8`.	2026-03-13 22:38:31 -07:00
cm2435	12898b5bef	adding tools to be able to profile model fwds to see what to turn into kernels (cherry picked from commit `6db5b126b6`)	2026-03-13 22:38:31 -07:00
LeoBorcherding	3ab282fd40	fix: install data-designer plugin non-editable for Colab compatibility Editable installs (-e) work via a .pth file that is only processed at Python startup. In Colab the kernel is already running when setup.sh installs the plugin, so the .pth file never gets picked up and data_designer_unstructured_seed is not importable. Remove -e so pip copies the package files directly into site-packages, which the live kernel can find immediately. Local venv installs are unaffected since the venv is always created fresh before install.	2026-03-13 13:44:08 -07:00
pre-commit-ci[bot]	6baa181890	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-03-13 13:38:19 -07:00
Daniel Han	eb7637013e	Update CODEOWNERS	2026-03-13 13:38:19 -07:00
Roland Tannous	b95242a80f	fix: only skip frontend build for PyPI prebuilt (site-packages + dist check)	2026-03-13 20:26:10 +00:00
Roland Tannous	bf54225f86	fix: site-packages + dist check for frontend build, fix ruff blank lines	2026-03-13 20:13:34 +00:00
Roland Tannous	0e0325127d	Revert "site-packages + dist check" This reverts commit `82063d8edb`.	2026-03-13 20:09:41 +00:00
Roland Tannous	82063d8edb	site-packages + dist check	2026-03-13 20:04:15 +00:00
Roland Tannous	8ce2b64df7	allow install from source	2026-03-13 20:04:15 +00:00
Daniel Han	1f99dee027	fix(seed): disable remote code execution in seed inspect dataset loads (#4275 ) * fix(seed): disable remote code execution for seed inspect loads * fix(test): use __file__-relative path in seed test The test used a CWD-relative path (`studio/backend/routes/...`) which only resolved when pytest was invoked from the repo root. Use `Path(__file__).resolve()` so the test passes regardless of CWD. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Test <test@test.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-13 19:37:43 +04:00
Daniel Han	88c7b08faa	fix: prevent ai-assist model config RCE via untrusted Hugging Face repos (#4274 ) * fix: disable remote code loading for ai-assist model hint lookup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-13 19:29:11 +04:00
Roland Tannous	e539965740	fix error for chat template	2026-03-13 15:18:04 +00:00
Roland Tannous	8108f1bf11	Fix nvm/npmrc prefix conflict in setup.sh	2026-03-13 08:59:51 +00:00
Daniel Han	3e8f085474	Limit rocm711-torch291 to Linux	2026-03-13 01:40:56 -07:00
sstamenk	a54a913431	Add more ROCm/PyTorch combinations (cherry picked from commit `d02aa7f9c3`)	2026-03-13 01:40:56 -07:00
sstamenk	c752c8107a	Add more ROCm/PyTorch versions (cherry picked from commit `ed6877fadd`)	2026-03-13 01:40:56 -07:00
Daniel Han	51bf500f57	Remove Blackwell flex attention disable workaround from studio (#4273 ) The studio was disabling flex attention entirely on Blackwell+ GPUs (sm_120 and above) by setting UNSLOTH_ENABLE_FLEX_ATTENTION=0 at startup. This was a workaround for the flex_attention backward kernel exceeding shared memory limits on these GPUs. The root cause is now fixed in unsloth-zoo (PR #542) which patches the backward kernel config selection to generate safe fallback configs that fit within the GPU's shared memory limit. With that fix, flex attention works correctly on Blackwell GPUs and provides a ~1.3x speedup over the SDPA fallback.	2026-03-13 01:35:17 -07:00
Daniel Han	37b8d5e440	remove duplicate import (#4271 ) (cherry picked from commit `d1f4fb5d6a`) Co-authored-by: electron271 <66094410+electron271@users.noreply.github.com>	2026-03-13 00:26:38 -07:00
Daniel Han	d6e40df8fa	Fix llm_int8_skip_modules for VLM dynamic quants on transformers 5.x (#4249 ) Fix `llm_int8_skip_modules` not being respected for VLMs with dynamic quantization on transformers 5.x. Dynamic quant checkpoints (e.g. `gemma-3-4b-it-unsloth-bnb-4bit`) encode skip paths as `language_model.model.layers.`, but the live module tree on 5.x surfaces them as `model.language_model.layers.`. This prefix mismatch causes `should_convert_module` to miss the skip list, so 22 modules meant to stay in 16-bit get wrapped in `Linear4bit` without a `quant_state`, producing "Skipping ... no quant_state found" warnings. Patches `should_convert_module` to expand both the module name and the skip patterns into all equivalent alias forms before matching. Guarded by `hasattr` so it is a no-op on transformers 4.x where the bug does not exist. Closes #4208	2026-03-13 00:17:00 -07:00
Daniel Han	1ca441a3f3	[Feature] VLMs support for GRPO (#4265 ) * Updated rl and rl_replacements * Revert "Updated rl and rl_replacements" This reverts commit 077fd5996daa73c9c58c9f213657f33f47f5d73b. --------- Co-authored-by: Sinoué GAD <85933501+GAD-cell@users.noreply.github.com>	2026-03-12 16:09:02 -07:00
Daniel Han	74c1497f2f	[Feature] Support Sequence Classification (#4264 ) * initial commit for sequence classification implementation * Revert "initial commit for sequence classification implementation" This reverts commit 0f3200cdf2dfb8446e5d69dcbe40d6f70bc520e7. --------- Co-authored-by: Rabin Tiwari <84705625+rabintiwari45@users.noreply.github.com>	2026-03-12 16:08:49 -07:00
Daniel Han	96ff5c5f61	Update CODEOWNERS for studio and cli (#4266 ) * Update CODEOWNERS for studio and cli * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-12 15:16:38 -07:00
Daniel Han	c26aa1a1e8	Restore non-studio files from main after history recovery	2026-03-12 21:48:45 +00:00
Daniel Han	6f0bca70f8	Merge remote-tracking branch 'studio/feature/merge-build-main' into history-recovery-candidate	2026-03-12 21:48:30 +00:00
Daniel Han	17ae3d3cba	Revert "Studio (#4237 )" This reverts commit `f08aef1804`.	2026-03-12 21:48:23 +00:00
Roland Tannous	2b04c0da40	add build.sh	2026-03-12 20:52:42 +00:00
Roland Tannous	47654cb91c	Final cleanup	2026-03-12 18:28:04 +00:00
Roland Tannous	3613166b6d	Delete studio/frontend/README.md	2026-03-12 22:20:34 +04:00
Roland Tannous	8002913b7c	Delete studio/frontend/AGENTS.md	2026-03-12 22:20:23 +04:00
Roland Tannous	a2baf80511	Update license headers	2026-03-12 17:23:10 +00:00
Roland Tannous	7de1c18c14	Update llm_assist.py	2026-03-12 21:06:04 +04:00
Roland Tannous	063cdc6072	Update llm_assist.py	2026-03-12 20:32:04 +04:00
Roland Tannous	5798a34606	Update run.py	2026-03-12 19:28:48 +04:00
Roland Tannous	9bb64fbd96	Update run.py	2026-03-12 18:53:19 +04:00
Roland Tannous	10bee32f3d	Update run.py	2026-03-12 18:30:00 +04:00
Roland Tannous	20aeb2ef19	Update studio.py	2026-03-12 18:13:56 +04:00
Roland Tannous	7881fc253f	Update install_python_stack.py	2026-03-12 18:06:37 +04:00
Roland Tannous	542a25977a	Update run.py	2026-03-12 18:01:14 +04:00
Roland Tannous	f598e69f38	Update studio.py	2026-03-12 17:30:10 +04:00
Roland Tannous	874711912d	Update studio.py	2026-03-12 17:28:08 +04:00
Roland Tannous	788b120114	Update setup.ps1	2026-03-12 17:11:20 +04:00
Roland Tannous	3cf27589a6	Remove AGENTS.md from frontend folder	2026-03-12 12:00:42 +00:00
Roland Tannous	a98164af50	Remove README.md from frontend folder	2026-03-12 11:59:56 +00:00
Daniel Han	36785caf80	Cache packed sequence metadata to reduce D2H syncs across layers (#4243 ) * packing optimziation with cache to reduce D2H copy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cache per device to avoid race condition for multi-gpu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add cache freeing up func --------- Co-authored-by: ruixiangw <ruixiangw@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ruixiang <wangruixiang07@outlook.com>	2026-03-12 03:37:49 -07:00
Daniel Han	f08aef1804	Studio (#4237 ) * Rebuild Studio branch on top of main * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix security and code quality issues for Studio PR #4237 - Validate models_dir query param against allowed directory roots to prevent path traversal in /api/models/local endpoint - Replace string startswith() with Path.is_relative_to() for frontend path traversal check in serve_frontend - Sanitize SSE error messages to not leak exception details to clients (4 locations in inference.py) - Bind port-discovery socket to 127.0.0.1 instead of all interfaces in llama_cpp backend - Import datasets_root and resolve_output_dir in embedding training function to fix NameError and use managed output directory - Remove stale .gitignore entries for package-lock.json and test directories so tests can be tracked in version control - Add venv-reexecution logic to ui CLI command matching the studio command behavior * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move models_dir path validation before try/except block The HTTPException(403) was inside the try/except Exception handler, so it would be caught and re-raised as a 500. Moving the validation before the try block ensures the 403 is returned directly and also makes the control flow clearer for static analysis (path is validated before any filesystem operations). * Use os.path.realpath + startswith for models_dir validation CodeQL py/path-injection does not recognize Path.is_relative_to() as a sanitizer. Switched to os.path.realpath + str.startswith which is a recognized sanitizer pattern in CodeQL's taint analysis. The startswith check uses root_str + os.sep to prevent prefix collisions (e.g. /app/models_evil matching /app/models). * Never pass user input to Path constructor in models_dir validation CodeQL traces taint through Path(resolved) even after a startswith barrier guard. Fix: the user-supplied models_dir is only used as a string for comparison against allowed roots. The Path object passed to _scan_models_dir comes from the trusted allowed_roots list, not from user input. This fully breaks the taint chain. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-12 03:36:19 -07:00
Roland Tannous	400b6ecede	Update setup.ps1	2026-03-12 02:44:25 +04:00
Roland Tannous	220a7bb1ed	Update setup.sh	2026-03-12 02:42:43 +04:00
Roland Tannous	11e74b2dc5	resolved conflicts	2026-03-11 20:58:25 +00:00
Roland Tannous	1087216cb5	Merge branch 'fix/pre-merge-cleanup' into feature/merge-build-final	2026-03-11 20:56:49 +00:00
Roland Tannous	6f77c63229	refactor: remove project_root passing, use self-resolved paths and ~/.unsloth/studio - Workers now compute backend_path and venv_t5 locally via Path(__file__) - Moved .venv_t5 to ~/.unsloth/studio/.venv_t5 - Added ensure_studio_directories() call on server startup - Expanded CLI studio command into sub-app with setup subcommand	2026-03-11 20:32:18 +00:00
Manan17	fbccac8cee	shifting setup & co inside studio	2026-03-11 20:19:52 +00:00
Shine1i	bbb4cd0f0b	feat(studio): add auth-specific paths and integrate auth database location	2026-03-11 20:19:52 +00:00
Shine1i	7012b8396f	fix(studio): update temporary directory path to use system temp dir	2026-03-11 20:19:52 +00:00
Shine1i	904e440513	feat(studio): studio storage roots path utilities	2026-03-11 20:19:52 +00:00
Roland Tannous	d6e4a0644f	resolved format_conversion conflict	2026-03-11 19:53:53 +00:00
Roland Tannous	6926a8b091	fix: prefer tabular files over archives in Tier 1 dataset preview Tier 1 check-format was picking images.zip over testmini.parquet, causing wrong columns (image/label) and broken VLM mapping. Also log first VLM conversion failure instead of swallowing silently.	2026-03-11 19:13:11 +00:00
Roland Tannous	a63196c93e	updated on completion response markers for qwen3.5	2026-03-11 19:00:29 +00:00
Roland Tannous	e455b307be	add fmpeg system support for linux and windows	2026-03-11 18:50:11 +00:00
Roland Tannous	adf919f47a	Remove test_llama_cpp.ps1	2026-03-11 18:18:19 +00:00
Roland Tannous	a6aa9a0efa	Remove test_llama_cpp.ps1 from tracking	2026-03-11 18:10:25 +00:00
Roland Tannous	859bfe23c4	Merge pull request #375 from unslothai/feature/llm-assist-detection Feature/llm assist detection	2026-03-11 22:02:33 +04:00
Roland Tannous	b274e9e0c6	chore: merge nightly & update dataset preview dialog mapping text	2026-03-11 17:00:14 +00:00
Roland Tannous	0e3ac91e2a	feat: target AI Assist mapping prompts for audio & embedding models	2026-03-11 16:55:43 +00:00
Roland Tannous	9dac1bedf9	Merge remote-tracking branch 'origin/nightly' into feature/llm-assist-detection	2026-03-11 16:23:09 +00:00
Roland Tannous	6d6a62821e	Merge pull request #374 from unslothai/fix/model-caching-issues Fix: Normalize HuggingFace model identifiers to lowercase	2026-03-11 18:40:26 +04:00
Roland Tannous	7862e70211	fix: lowercase remote Hugging Face model IDs in ModelConfig and routes to prevent caching mismatches with Unsloth	2026-03-11 14:20:25 +00:00
Roland Tannous	fb211f3254	Merge pull request #372 from unslothai/fix/input-focus-clipping Input focus outline clipped by container	2026-03-11 18:12:04 +04:00
Roland Tannous	774c9b17fd	Merge pull request #373 from unslothai/feature/structlog-logging-system feat: integrate structlog, configure workers for prod logging, and mi…	2026-03-11 16:52:49 +04:00
Roland Tannous	817f2e8dcc	feat: integrate structlog, configure workers for prod logging, and migrate print statements	2026-03-11 12:33:16 +00:00
imagineer99	014695b38a	fix: scope overflow-visible to studio collapsibles	2026-03-11 11:26:43 +00:00
imagineer99	984f4a4acb	fix: input focus outline clipping	2026-03-11 11:11:57 +00:00
Roland Tannous	ee063c5910	Merge pull request #367 from unslothai/fix/yaml-syntax Modified to fix the yaml syntax for unsloth_Qwen3-14B-Base-unsloth-bnb-4bit	2026-03-11 13:39:48 +04:00
Roland Tannous	16eb39b53d	Merge pull request #369 from unslothai/fix/model-mappping-syntax fixed string concatenation in model mapping	2026-03-11 12:39:13 +04:00
Samit	379bbbdbdd	fixed string concatenation in model mapping	2026-03-11 00:07:26 -07:00
Samit	822050bf57	modified to fix the yaml syntax	2026-03-10 23:58:51 -07:00
Manan Shah	2aa9322167	Merge pull request #365 from unslothai/fix/gguf-gemma-with-text fixing gguf export for gemma with text	2026-03-10 17:59:22 -07:00
Manan17	5ca623a166	fixing gguf export for gemma with text	2026-03-11 00:58:22 +00:00
Wasim Yousef Said	739838bd48	Merge pull request #364 from unslothai/feature/chat-seq-slider chat seq slider	2026-03-11 01:56:48 +01:00
Shine1i	4a8a96b1af	chat seq slider	2026-03-11 01:41:25 +01:00
Manan Shah	cce274717b	Merge pull request #357 from unslothai/feat/embedding-models feat: add embedding model training support	2026-03-10 14:59:20 -07:00
Manan17	983c20bbb2	local model's embedding nature check	2026-03-10 21:58:45 +00:00
Manan17	294a3d3e47	fix: reset isEmbeddingModel in error fallback paths to prevent stale state	2026-03-10 21:33:13 +00:00
Roland Tannous	5b042165e6	Merge pull request #363 from unslothai/feature/enable-all-modalities Removed audio and embedding from coming soon	2026-03-11 01:32:43 +04:00
Manan17	bc5a72dd8c	fix: local directory dataset loading	2026-03-10 21:29:51 +00:00
imagineer99	8d6f88577f	chore: removed audio and embedding from coming soon	2026-03-10 21:29:18 +00:00
Wasim Yousef Said	98e3396fbe	Merge pull request #356 from unslothai/fix/summary-step-spacing-and-colors Redesign summary step with consistent card layout, spacing and icons	2026-03-10 22:26:10 +01:00
Wasim Yousef Said	27311e3986	Merge pull request #362 from unslothai/feature/setup-no-llama-nuke fix(setup): stop deleting llama.cpp in setup	2026-03-10 22:25:12 +01:00
Manan Shah	f696ef81e8	Merge branch 'nightly' into feat/embedding-models	2026-03-10 14:16:05 -07:00
Roland Tannous	08d9c84f1f	Merge pull request #359 from unslothai/fix/stream-manual-slice-dataset fix: stream HF dataset when manual slice is specified	2026-03-11 01:13:51 +04:00
Manan17	9523e5c1f9	fixing embedding model search	2026-03-10 21:12:24 +00:00
Shine1i	2895518f0c	fix(setup): stop nuking llama.cpp in setup	2026-03-10 22:03:01 +01:00
Wasim Yousef Said	29e56e8649	Merge pull request #361 from unslothai/fix/tooltip-z-index Increase tooltip z-index to appear above dropdowns	2026-03-10 22:01:30 +01:00
imagineer99	d572c43814	fix: increase tooltip z-index to appear above dropdowns	2026-03-10 20:57:12 +00:00
Manan17	3b0b002b34	fixing logging for each step	2026-03-10 20:32:40 +00:00
Roland Tannous	21ef22a9ff	fix: skip streaming when dataset_slice_start > dataset_slice_end Prevents training on the wrong row range when start exceeds end by falling back to full download where existing clamping handles it.	2026-03-10 20:21:34 +00:00
imagineer99	5dcbf86d09	fix: reject negative manual dataset slices Prevent negative Train Split Start/End values in the dataset advanced UI and sanitize payload mapping so negative slice values are never sent to the backend. Made-with: Cursor	2026-03-10 20:13:46 +00:00
Roland Tannous	226f251589	fix: guard against negative dataset_slice_end before streaming Fall back to full download when dataset_slice_end is negative, avoiding an empty stream.take(0) that would produce a broken dataset.	2026-03-10 20:12:42 +00:00
Roland Tannous	b91cdda2b9	Merge pull request #354 from unslothai/fix/audio-train-completions fix: uncheck train_on_completions for audio models	2026-03-11 00:05:51 +04:00
Roland Tannous	949f2ac87e	Merge pull request #358 from unslothai/fix/sharded-gguf fix: download all GGUF shards for split models	2026-03-11 00:05:01 +04:00
Roland Tannous	970a029108	fix: stream HF dataset when manual slice is specified Instead of downloading the full dataset and then slicing, use streaming mode to only fetch the rows needed (up to slice_end + 1) when a manual dataset slice is configured.	2026-03-10 19:50:53 +00:00
Roland Tannous	c986174c56	fix: preserve zero-valued dataset slice boundaries in embedding worker Use explicit None checks instead of falsy `or` for slice_start and slice_end so that a valid slice_end=0 is not replaced with the full dataset length.	2026-03-10 19:33:10 +00:00
Roland Tannous	b84202e8db	fix: restrict shard siblings to exact basename and total count startswith(prefix) could match unrelated split variants whose names extend the selected file's prefix (e.g. model-Q8_0-v2-00001-of-...). Now builds an exact regex from the chosen file's base prefix and shard total so only true siblings are downloaded.	2026-03-10 19:28:26 +00:00
Shine1i	18a60b930a	chore/fix(studio): add placeholder dropdowns for dataset subset and splits in disabled state	2026-03-10 20:27:11 +01:00
Roland Tannous	b8678a3ed6	fix: pass hf_token for gated embedding models and key cache by token - Forward hf_token to FastSentenceTransformer.from_pretrained() so private/gated embedding repos authenticate correctly - Key _embedding_detection_cache by (model_name, hf_token) tuple so unauthenticated lookups don't shadow subsequent authenticated ones	2026-03-10 19:20:12 +00:00
Roland Tannous	d635846b8d	fix: use exact variant matching and shard-prefix discovery for split GGUFs Substring matching (e.g. "Q8_0" in filename) could match superset variants like "IQ8_0", causing wrong quantizations to be downloaded. Now uses word-boundary regex for variant matching and discovers split shards by shared filename prefix rather than treating all variant matches as shards.	2026-03-10 19:13:03 +00:00
Roland Tannous	d6ae910edc	fix: propagate is_embedding into worker subprocess config start_training() cherry-picks kwargs into a config dict but was missing is_embedding, so config.get("is_embedding", False) in worker.py always returned False and embedding training never ran.	2026-03-10 19:05:47 +00:00
Roland Tannous	defa761fb2	fix: download all GGUF shards for split models (e.g. 7B Q8_0) LlamaCppBackend.load_model() only downloaded the first matching GGUF file. For split models (e.g. 7B Q8_0 with 3 shards), llama-server needs all shards present. Now collects and downloads all matching files.	2026-03-10 19:04:10 +00:00
Roland Tannous	846cc2cf2a	fix: always force-uncheck trainOnCompletions for pure audio models in dataset check Separate pure-audio from audio-VLM logic in runDatasetCheck so pure audio models are always forced to trainOnCompletions=false regardless of dataset type, while audio VLMs (gemma3n) only uncheck when the dataset is audio.	2026-03-10 19:02:49 +00:00
Roland Tannous	d9f2d08267	fix: reset isAudioModel on model config fetch failure Clear stale isAudioModel in the fallback path when getModelConfig fails, preventing a previously-selected audio model's flag from leaking into the next model selection.	2026-03-10 19:00:56 +00:00
Roland Tannous	5a086353ab	feat: add embedding model training support Add end-to-end embedding/sentence-transformer training pipeline using FastSentenceTransformer, SentenceTransformerTrainer, and MultipleNegativesRankingLoss with BatchSamplers.NO_DUPLICATES. Backend: - Add is_embedding_model() detection via HF tags + pipeline_tag - Add /check-embedding/ API route and EmbeddingCheckResponse - Extend derive_model_type() to return "embeddings" - Add _run_embedding_training() in worker.py with progress callbacks, stop handling, LoRA (task_type=FEATURE_EXTRACTION), and model saving - Add is_embedding field to TrainingStartRequest and ModelDetails - Add YAML configs for 5 models: all-MiniLM-L6-v2, bge-m3, embeddinggemma-300m, gte-modernbert-base, Qwen3-Embedding-0.6B Frontend: - Wire isEmbeddingModel flag through store, API types, and mappers - Force packing=false, train_on_completions=false, warmup_ratio=0.03 - Hide packing and train_on_completions checkboxes for embedding models - Auto-set modelType to "embeddings" from backend model_type response	2026-03-10 18:10:09 +00:00
Roland Tannous	5b8f5bc554	fix: improve advisor prompts for more reliable column role assignment - Pass 1: clearer definition of "conversational" vs non-conversational, constrained dataset_type to specific enum values - Pass 2: much more explicit worked examples with step-by-step reasoning, added "skip" role for metadata columns, stronger reminder at end that all-user is wrong - Pass 3: returns raw text instead of JSON for cleaner system prompts, removed system message to give model more freedom	2026-03-10 18:01:20 +00:00
Roland Tannous	1430bbc604	fix: uncheck train_on_completions for audio models Pure audio models (orpheus, sparktts, whisper, sesame-csm) now always have trainOnCompletions auto-unchecked when selected. Gemma3n (audio_vlm) only unchecks when the dataset is audio. - Add is_audio to frontend ModelConfigResponse (backend already returns it) - Add isAudioModel state to training config store - Auto-set trainOnCompletions=false for pure audio models on model load - Auto-set trainOnCompletions=false for audio VLMs when dataset is audio - Respect manual user override via existing _trainOnCompletionsManuallySet flag	2026-03-10 17:39:35 +00:00
imagineer99	c895cc56a4	fix: redesign summary step with consistent card layout, icons, and compact spacing	2026-03-10 17:38:10 +00:00
Roland Tannous	cb389fb756	Merge pull request #353 from unslothai/feat/dataset-shortlist-and-model-type Curated dataset shortlists and model type plumbing	2026-03-10 21:31:16 +04:00
Roland Tannous	2fc50ff0cf	refactor: advisor maps columns to roles instead of generating templates The advisor now only assigns columns to user/assistant roles and generates a system prompt. Templates (user_template, assistant_template) are removed entirely — the LLM was frequently putting all columns in user or copying actual data values into templates. Column values are now used directly as message content, grouped and concatenated by role. This is simpler, more robust, and prevents the class of bugs where the advisor generates bad template content.	2026-03-10 17:17:27 +00:00
Roland Tannous	21cff233e5	feat: add model_type field to backend /config and /list responses Derive a single model_type string ("text" \| "vision" \| "audio" \| "embeddings") from existing is_vision and audio_type detection, so the frontend doesn't have to infer modality from scattered boolean flags.	2026-03-10 16:54:19 +00:00
Roland Tannous	a30153e1bb	fix: improve Pass 2 prompt to correctly split INPUT/OUTPUT columns The LLM was putting all columns in user_template (e.g. summarization dataset had both document AND summary as user input). Fixed by: - Reframed system message: explicitly states user=INPUT, assistant=OUTPUT - Added 4 concrete correct examples (summarization, NLI, translation, QA) showing exactly how to split columns - Added "NEVER put the output/target column in the user template" rule - Added sanity check: if assistant_template has no column placeholders, reject the result and fall back to simple classification	2026-03-10 16:47:50 +00:00
Roland Tannous	5db251b31c	fix: include label mapping in Pass 3 system prompt generation Pass 3 now sees the label mapping from Pass 2 (e.g. "0 = does not follow, 1 = follows, 2 = entailed") so the generated system prompt can explain what each label value means. Also bumped to 2-4 sentences to give room for the label descriptions.	2026-03-10 16:21:54 +00:00
Roland Tannous	49a4089dfa	feat: Beta badge, generated System column, fix table scroll - Add "Beta" badge next to AI Assist button text - When advisor generates a system prompt, show it as a "System (generated)" column prepended to the data table so user can see it alongside data - Fix table being squished to near-zero height when advisor notification banner is present: add min-h-[250px] to table wrapper, change body from overflow-hidden to overflow-auto	2026-03-10 16:14:16 +00:00
Roland Tannous	78489e41c4	refactor: 3-pass advisor — dedicated system prompt generation Pass 1: Classify dataset type (unchanged) Pass 2: Generate user/assistant templates + label mapping + column roles (system_prompt removed from this pass to keep it focused) Pass 3: Generate system prompt (only for non-conversational datasets) - Dedicated pass with focused prompt that sees the templates from Pass 2 - Skipped entirely for conversational datasets - Produces specific, task-relevant system prompts	2026-03-10 16:07:30 +00:00
Roland Tannous	76cc5b19cb	fix: show generated templates in UI, make system prompt optional - System prompt is now optional — LLM only generates one when the task is ambiguous from the data alone (persona, domain, format constraints) - Sanitize system_prompt extraction (handle literal "null" string) - Show system prompt, user template, and assistant template in the advisor notification banner so user can see exactly what was generated - Templates displayed in monospace with labeled sections	2026-03-10 16:01:57 +00:00
Roland Tannous	48a5e49313	fix: remove Pass 3 self-scoring, trust Pass 2 output directly The LLM was bad at scoring its own conversion quality — rejecting good Pass 2 output (score 5/10 for a perfectly usable conversion). Instead: - Remove Pass 3 entirely (saves ~0.4s and one inference call) - Trust Pass 2 output and return it to the user - Build notification from Pass 1 classification info instead - User can always adjust mapping via dropdowns if they disagree	2026-03-10 15:56:48 +00:00
Roland Tannous	ed849b7d0d	fix: advisor quality gate, better prompts, always show AI Assist button - Reject advisor result when Pass 3 scores < 6 or is_acceptable=false, falls back to simple column classification instead of using bad output - Improved Pass 2 prompt: explicit rules for label_mapping completeness, {column_name} vs {column_name_name} for mapped labels, column_roles must match which template uses them - Build suggested_mapping from ALL template-referenced columns (not just first match per role) — fixes hypothesis being dropped from SNLI mapping - Guard against LLM returning literal string "null" for revised_system_prompt - Always show AI Assist button when available, even when mapping looks complete	2026-03-10 15:51:14 +00:00
Roland Tannous	ab58121cd8	fix: harden template mapping for complex column types and curly braces - Handle dict columns (e.g. squad answers) by extracting text instead of raw repr() - Handle list columns by joining or extracting single value - Catch ValueError in .format() calls (stray { } in column data) - Add missing json import to dataset_utils.py	2026-03-10 15:43:35 +00:00
Roland Tannous	202780c32c	feat: Dataset Conversion Advisor — multi-pass LLM for non-conversational datasets Non-conversational HF datasets (e.g. stanfordnlp/snli) were naively mapped column→role, producing poor training results. The AI Assist button now runs a 3-pass advisor using Qwen 7B that: 1. Fetches the HF dataset card/README to understand the dataset purpose 2. Classifies the dataset type and determines if conversion is needed 3. Generates a system prompt, user/assistant templates with {column} placeholders, and label mappings (e.g. 0→entailment) 4. Validates the conversion quality (score ≥7/10 required) Architecture: advisor metadata flows as __-prefixed keys in custom_format_mapping (e.g. __system_prompt, __user_template, __assistant_template, __label_mapping). The existing _apply_user_mapping() detects these keys and routes to template-based conversation construction. No __ keys = existing simple mode (backwards compatible). Backend: upgraded llm_assist.py (7B default, multi-pass advisor, HF card fetching), extended API models, added _apply_template_mapping() to dataset_utils.py. Frontend: extended store with advisor state fields, wired AI Assist to store templates/system prompt, inject __ metadata in training request, show advisor notification banner in mapping card.	2026-03-10 15:39:56 +00:00
Roland Tannous	c2dd0f4cf1	fix: download all GGUF shards for split models (e.g. 7B Q8_0) LlamaCppBackend.load_model() and precache_helper_gguf() only downloaded the first matching GGUF file. For split models (e.g. 7B Q8_0 with 3 shards), llama-server needs all shards present. Now collects and downloads all matching files.	2026-03-10 15:08:20 +00:00
Roland Tannous	7f1fd28acd	debug: decode first sample after train_on_completions masking	2026-03-10 14:08:14 +00:00
imagineer99	3de197ac31	rename: tts model type to audio for broader category support	2026-03-10 13:28:49 +00:00
Roland Tannous	49b29fb1fd	debug: fix dataset access - result is a dict, use dataset['dataset']	2026-03-10 13:19:31 +00:00
imagineer99	968f11f60a	feat: infer tts model type from backend is_audio flag	2026-03-10 12:57:40 +00:00
Roland Tannous	21cd9f9d02	debug: improve sample preview with type info and traceback	2026-03-10 12:56:24 +00:00
Roland Tannous	a36c073770	debug: switch to print() for subprocess visibility	2026-03-10 12:49:01 +00:00
Roland Tannous	97612af993	debug: add temporary log statements for dataset preview and VLM instruction	2026-03-10 12:35:55 +00:00
imagineer99	8cba556bea	feat: curated dataset shortlists and model type plumbing	2026-03-10 12:00:09 +00:00
Roland Tannous	5d471d7e4a	feat: add AI Assist button for user-triggered column classification Move LLM-assisted column mapping from silent /check-format automation to an explicit "AI Assist" button in the dataset mapping dialog. This makes the feature transparent and user-controlled. - Remove llm_classify_columns() from check_dataset_format() (heuristic-only) - Remove auto-save suggested_mapping from use-training-actions.ts - Add POST /api/datasets/ai-assist-mapping endpoint (receives preview samples from frontend, no dataset re-loading needed) - Add AiAssistMappingRequest/Response models - Add aiAssistMapping() frontend API function - Add Sparkles AI Assist button to DatasetMappingCard with loading state - Wire up handleAiAssist handler in dataset-preview-dialog.tsx	2026-03-10 11:09:01 +00:00
Roland Tannous	6ae931ca46	Merge pull request #343 from unslothai/fix/cli-changes Fix/cli changes	2026-03-10 14:38:35 +04:00
Roland Tannous	a26a5cc6be	Merge pull request #352 from unslothai/fix/cancel-training Fix/cancel training	2026-03-10 14:38:30 +04:00
Roland Tannous	0ec340d3e1	fix: LLM-assisted mapping flows from /check-format to training - Frontend auto-saves suggested_mapping into datasetManualMapping when check-format returns requires_manual_mapping=false, so the mapping flows to training via custom_format_mapping (no redundant AI calls) - Backend returns meaningful warning when column detection fails (LLM-generated or static fallback) for both text and VLM datasets - /check-format endpoint merges check_dataset_format warnings with existing URL-based image detection warnings	2026-03-10 09:58:58 +00:00
Roland Tannous	f7ca361c5c	feat: add LLM-assisted dataset detection using ephemeral GGUF helper Uses Qwen2.5-3B-Instruct Q8_0 via LlamaCppBackend to complement heuristic-based dataset detection when heuristics are uncertain. - New llm_assist.py: VLM instruction generation, column classification, and user-friendly warning generation for dataset issues - Pre-cache helper GGUF on FastAPI startup (background thread) - Reorder training pipeline: dataset processing runs BEFORE model load to avoid VRAM contention (detect → dataset → model → train) - Add pre_detect_and_load_tokenizer() for lightweight detection - LLM warnings on VLM conversion failures (broken URLs, missing images) - LLM column classification fallback when heuristics return unknown - Graceful degradation: all paths unchanged when helper unavailable	2026-03-10 09:20:45 +00:00
Manan17	fd7ca8bda8	distinguish cancel and stop for force terminate	2026-03-10 02:35:32 +00:00
pre-commit-ci[bot]	bced78373f	[pre-commit.ci] pre-commit autoupdate (#4192 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.15.4 → v0.15.5](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.4...v0.15.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-09 19:29:08 -07:00
Manan17	9be55f0c1b	fixing cancel training	2026-03-10 02:20:56 +00:00
Roland Tannous	daa50d0756	Revert "Merge pull request #347 from unslothai/feature/studio-storage-roots" This reverts commit `6b43e33ff1`, reversing changes made to `9edadaf21f`.	2026-03-10 01:52:47 +00:00
Manan17	8c493bbd20	CLI fix for backend changes	2026-03-10 01:51:00 +00:00
Roland Tannous	6b43e33ff1	Merge pull request #347 from unslothai/feature/studio-storage-roots update studio storage roots	2026-03-10 05:49:42 +04:00
Roland Tannous	9edadaf21f	Merge pull request #350 from unslothai/fix/vision-datasets-fix Fix VLM dataset detection and conversion	2026-03-10 05:48:27 +04:00
Roland Tannous	8488c2b1df	fix: fall back to auto-detection when user VLM mapping fails Instead of erroring out when custom_format_mapping fails conversion, clear it and let auto-detection try. Handles stale cached mappings.	2026-03-10 01:42:25 +00:00
Roland Tannous	dd6c38cc7b	fix: probe image column candidates when multiple exist When multiple image columns are found, probes them (HEAD for URLs, os.path.exists for paths) and picks the first that works. Skips probing when top candidate is PIL/dict (score >= 75).	2026-03-10 01:38:33 +00:00
Roland Tannous	81adc47b6e	fix: prefer URL image columns over bare filenames, add value-based fallback find_image_column now scores candidates by resolvability (PIL > dict > URL > path) and has a Pass 2 value-based fallback for columns not matching image keywords. Fixes phiyodr/coco2017 picking file_name (unresolvable) over coco_url (resolvable).	2026-03-10 01:36:19 +00:00
Roland Tannous	d6803de35a	fix: detect list-of-strings text columns and pick random element for VLM conversion Handles datasets like phiyodr/coco2017 where captions is a list of strings.	2026-03-10 01:32:19 +00:00
Roland Tannous	0b8325ab96	feat: add ShareGPT+image VLM format support and improve image column detection - Detect and convert ShareGPT/ChatML conversations with <image> placeholders - Add file_name/filename as image column keywords - Detect image paths and URLs by value (string ending in .jpg/.png/etc)	2026-03-10 01:27:36 +00:00
Roland Tannous	56d02a3b57	fix: use word-boundary matching for image/audio column detection Substring matching caused false positives like 'pic' in 'topic', leading to non-deterministic image column selection.	2026-03-10 00:38:02 +00:00
Manan17	32569fc8a8	shifting setup & co inside studio	2026-03-09 23:48:31 +00:00
Shine1i	109db14817	feat(studio): add auth-specific paths and integrate auth database location	2026-03-09 23:48:31 +00:00
Shine1i	958bdef43e	fix(studio): update temporary directory path to use system temp dir	2026-03-09 23:48:31 +00:00
Shine1i	5301514775	feat(studio): studio storage roots path utilities	2026-03-09 23:48:31 +00:00
Roland Tannous	32bbccc573	fix: resolve bare-filename images via HF repo lookup Datasets like VQAonline store image filenames (e.g. "img.png") without the directory prefix. Build a basename→repo_path lookup using list_repo_files, then resolve each file via hf_hub_download.	2026-03-09 23:37:00 +00:00
Roland Tannous	c272c4f844	fix: prefer tabular files over archives in Tier 1 dataset preview Tier 1 check-format was picking images.zip over testmini.parquet, causing wrong columns (image/label) and broken VLM mapping. Also log first VLM conversion failure instead of swallowing silently.	2026-03-09 22:00:20 +00:00
Roland Tannous	65413c95fb	Merge pull request #349 from unslothai/license/agpl3-studio Add AGPL-3.0 SPDX headers to all source files	2026-03-10 00:30:29 +04:00
Roland Tannous	d882678fe4	Add AGPL-3.0 SPDX headers to all source files	2026-03-09 20:17:45 +00:00
Roland Tannous	198ca7efce	Merge pull request #348 from unslothai/license/agpl3-studio Main license file for studio codebase	2026-03-09 23:38:08 +04:00
Roland Tannous	ac2906f357	Add AGPL-3.0 license to studio folder	2026-03-09 19:36:25 +00:00
Wasim Yousef Said	a4bc6330a0	Merge pull request #344 from unslothai/style/ui-feedback Refine UI spacing, icons, and border radius per feedback	2026-03-09 19:24:12 +01:00
Wasim Yousef Said	971ef40d85	Merge pull request #345 from unslothai/feature/fixes-client feat(studio): fix chat code block actions and some training view changes	2026-03-09 19:22:19 +01:00
Shine1i	1fe8995f1c	feat(recipe-studio): add support for managing tools by provider in tool profiles	2026-03-09 19:19:14 +01:00
Shine1i	2ccb75f2b7	Merge remote-tracking branch 'origin/nightly' into feature/fixes-client	2026-03-09 19:07:42 +01:00
Roland Tannous	b6811bc5c4	Merge pull request #342 from unslothai/local-dataset dataset upload	2026-03-09 21:22:23 +04:00
Roland Tannous	022bafaf92	store uploaded datasets under assets/datasets/uploads instead of ~/.cache	2026-03-09 17:06:36 +00:00
Roland Tannous	ae89101e81	Revert "narrow stale selection guard to only skip clearing for uploaded files" This reverts commit `fbcd111a70`.	2026-03-09 16:51:30 +00:00
Roland Tannous	ffeefd15d1	Merge pull request #346 from unslothai/fix/eval-loss-worker-filtering fix: eval loss broken after subprocess isolation refactor	2026-03-09 20:43:19 +04:00
Roland Tannous	41351e1566	fix: split dataset 80/20 when eval split matches train split	2026-03-09 16:36:44 +00:00
Shine1i	542d9126cc	chore(data-recipe): bump data-designer to 0.5.2 and pin duckdb<1.5	2026-03-09 17:27:02 +01:00
Roland Tannous	6eba6fff43	fix: disable Start Training when eval_steps set without eval split	2026-03-09 16:20:00 +00:00
Shine1i	1f37b76b19	feat(recipe-studio): remove MCP tools-related dialogs and refactor tool profile management logic	2026-03-09 17:04:15 +01:00
Datta Nimmaturi	cff1e554fc	[trl] Trl v0.28 (and above) rl fixes (#4156 ) * Refactor loss computation to include completion_mask * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes for trl 0.28 and above Remove sync/reload weights calls , remove vllm.LLM instantiation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor loss computation to include completion_mask * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes for trl 0.28 and above Remove sync/reload weights calls , remove vllm.LLM instantiation * patch rpc in openenv for newer trl * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pluesclues <136766175+pluesclues@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-09 12:01:49 -04:00
Roland Tannous	2a11e79b8b	fix: restore eval_enabled early signal for subprocess training	2026-03-09 15:35:49 +00:00
Shine1i	c67f4ba29f	feat(recipe-studio): improve UI responsiveness and fix JSON preview handling	2026-03-09 16:04:46 +01:00
Roland Tannous	c3185d5d98	fix: allow eval-only progress events through worker callback filter	2026-03-09 14:39:49 +00:00
Shine1i	1ae8fb402b	feat(studio): centralize chart styling and formatting	2026-03-09 15:34:15 +01:00
Roland Tannous	fbcd111a70	narrow stale selection guard to only skip clearing for uploaded files	2026-03-09 14:01:02 +00:00
Roland Tannous	1d06e2f54c	switch dataset upload from base64 JSON to multipart/form-data with streamed writes	2026-03-09 13:55:45 +00:00
Roland Tannous	56412f2362	include all candidate files when scanning a directory, not just the first	2026-03-09 13:52:45 +00:00
Shine1i	d66bc2760b	feat(studio): rework chart settings with a new preferences store and revamped settings UI	2026-03-09 14:47:20 +01:00
Roland Tannous	c998227fec	add client-side file size validation before upload	2026-03-09 13:38:00 +00:00
Roland Tannous	4c5ded4c52	normalize uploaded filename extension to lowercase for consistent downstream checks	2026-03-09 13:35:55 +00:00
imagineer99	8fa021e839	style: reduce border radius on onboarding summary cards	2026-03-09 13:33:00 +00:00
Roland Tannous	19b2a0a0da	Merge pull request #340 from unslothai/fix/auth-audio Added auth to audio generate endpoint	2026-03-09 17:25:49 +04:00
Roland Tannous	87c8d7b3da	Merge pull request #338 from unslothai/fix/trust-code Exposed trust_remote_code through the UI	2026-03-09 17:19:56 +04:00
Roland Tannous	91dd7fc762	merge nightly, resolve conflict in use-chat-model-runtime	2026-03-09 13:19:17 +00:00
Roland Tannous	c719f1ba54	training: restore YAML fallback for trust_remote_code (no UI toggle)	2026-03-09 13:10:24 +00:00
imagineer99	d1c544b705	style: remove playground sidebar right border	2026-03-09 13:08:59 +00:00
Roland Tannous	7989cd4567	respect trust_remote_code toggle, return helpful error when required	2026-03-09 13:06:55 +00:00
Shine1i	95f9e0ba41	feat(studio): add support for code block actions including copy and download options in markdown blocks	2026-03-09 13:07:54 +01:00
Roland Tannous	4858204c62	backend: resolve trust_remote_code from YAML when not set by frontend	2026-03-09 11:58:23 +00:00
imagineer99	e188a7b067	style: improve navbar spacing and icon rendering	2026-03-09 11:37:19 +00:00
Roland Tannous	1ddd138da8	add trust_remote_code to BackendTrainingDefaults type	2026-03-09 10:51:28 +00:00
Roland Tannous	a1105d8ef3	wire trust_remote_code from YAML configs to frontend toggles	2026-03-09 10:15:15 +00:00
Roland Tannous	5e36ae2629	add trust_remote_code defaults to all model configs	2026-03-09 09:54:52 +00:00
Roland Tannous	e1b6798a4e	Merge pull request #329 from unslothai/fix/add-title modified the title	2026-03-09 13:10:17 +04:00
Manan17	a08b73e385	remove file size limit	2026-03-09 07:04:02 +00:00
Manan17	d132730f6b	CLI fix for backend changes	2026-03-09 07:00:06 +00:00
Manan17	a49638c504	dataset upload	2026-03-09 05:50:18 +00:00
Wasim Yousef Said	91e81227bd	Merge pull request #273 from unslothai/feature/data-reciper-enchansments UX + layout polish & WIP data-reciper client & backend finalization p2	2026-03-09 02:57:32 +01:00
Shine1i	f41a552c29	feat(recipe-studio): enforce run name validation for full runs and refine validation UI	2026-03-09 02:53:59 +01:00
Shine1i	3b1663b1e9	feat(recipe-studio, datasets): improve dataset handling and update metadata logic	2026-03-09 02:47:32 +01:00
Shine1i	84f005fb25	feat(recipe-studio, studio): dataset logic, refine run settings, and improve validation UI	2026-03-09 02:06:12 +01:00
samit	2db36c0b30	added auth to audio generate endpoint	2026-03-08 17:46:54 -07:00
Roland Tannous	254f10e37a	Merge pull request #328 from unslothai/fix/chat-unloading-model fixed model unload before load without validation	2026-03-09 04:40:05 +04:00
Shine1i	e00d7c6745	feat(studio): refine dataset selection logic with Hugging Face and local dataset support	2026-03-09 01:16:45 +01:00
samit	662cb1c440	Adding trust_remote_code to the orchestrator and worker	2026-03-08 16:44:41 -07:00
Shine1i	a2dde15367	merge nightly	2026-03-09 00:32:33 +01:00
samit	86e94b5844	exposed trust_remote_code through the UI	2026-03-08 16:28:56 -07:00
Roland Tannous	2c879bf4a4	Merge pull request #320 from unslothai/fix/stale-dataset-split-on-switch Fix/stale dataset split on switch	2026-03-09 00:14:09 +04:00
Roland Tannous	1bf0d39f3d	Merge pull request #323 from unslothai/fix/vision-dataset-search-filter fixing update model type	2026-03-09 00:13:50 +04:00
Roland Tannous	d98d4da6c8	Merge pull request #223 from unslothai/feature/support-for-audio-models Adding support for audio llms	2026-03-09 00:09:59 +04:00
Roland Tannous	7b76fccb9b	fix: loosen executorch pin for python 3.13 compat	2026-03-08 19:56:09 +00:00
Roland Tannous	a1778d6655	fix: replace is_dataset_multimodal with is_dataset_image/is_dataset_audio in training orchestrator	2026-03-08 19:40:00 +00:00
Roland Tannous	d10012d9fb	silence pip check output	2026-03-08 19:22:31 +00:00
Roland Tannous	0f11415c22	make pip check non-fatal for known third-party conflicts	2026-03-08 19:21:28 +00:00
Manan17	80b704d7b7	Audio_VLM bug fix	2026-03-08 19:14:07 +00:00
Roland Tannous	5a52a0131a	add extras-no-deps install step for audio model support	2026-03-08 18:49:55 +00:00
imagineer99	cc0aa43d5d	fix: replace favicon with branded sloth icon	2026-03-08 18:46:42 +00:00
Roland Tannous	7ee81dd7df	feat: route audio inference (TTS, ASR, Whisper) through orchestrator/worker subprocess	2026-03-08 18:25:27 +00:00
Roland Tannous	f4393ed3e5	fix: pin streamdown package versions to avoid type mismatch	2026-03-08 16:29:28 +00:00
Wasim Yousef Said	efaae9dc4c	Merge pull request #308 from unslothai/fix/browser-autofill-hf-token Prevent browser credential autofill in HF token fields	2026-03-08 16:51:11 +01:00
Wasim Yousef Said	2001abfb26	Merge pull request #331 from unslothai/fix/slider-fill-alignment Align slider fill bar with thumb across value range	2026-03-08 16:50:38 +01:00
Roland Tannous	1e39e7d05f	fix: handle structured audio part type in chat adapter	2026-03-08 14:18:16 +00:00
Daniel Han	1fe8f9061b	Bug fixed version	2026-03-08 06:39:19 -07:00
DoubleMathew	c3b7614bd5	Fix gpt temporary patch for grpo to happen after compile (#4180 ) * Fix gpt temporary patch for grpo to happen after compile * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-08 06:14:38 -07:00
pluesclues	f7baf6cc02	Completion mask fix (#4140 ) * Refactor loss computation to include completion_mask * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-08 06:05:47 -07:00
Roland Tannous	1435dbaf59	merge nightly into audio branch (mock test)	2026-03-08 10:23:44 +00:00
imagineer99	075bfe961b	fix: track live slider values for uncontrolled mode and scope fill to horizontal	2026-03-08 10:06:17 +00:00
Manan17	ef714f010e	adding export support	2026-03-08 04:18:20 +00:00
Roland Tannous	ff56a5f785	Merge pull request #318 from unslothai/fix/recharts-dimension-warning Fix Recharts -1 dimension warning on chart mount	2026-03-08 03:34:58 +04:00
Roland Tannous	aab35f2ed3	Merge pull request #324 from unslothai/feature/subprocess-isolation-version-switching Subprocess isolation for training, inference, and export with automatic transformers version switching	2026-03-08 03:34:12 +04:00
Roland Tannous	a7c34b42be	fix: clear stale model state on failed inference subprocess reload	2026-03-07 23:32:53 +00:00
Roland Tannous	4f766bbe25	fix: reset checkpoint metadata on failed export checkpoint reload	2026-03-07 23:29:34 +00:00
Roland Tannous	1dab9f57b6	fix: validate pip exit codes for .venv_t5 installs in setup.ps1	2026-03-07 23:25:13 +00:00
Roland Tannous	adf0fa6f81	add .venv_t5/ to .gitignore	2026-03-07 23:21:48 +00:00
Manan17	6487f81113	check fir gated repo	2026-03-07 21:32:50 +00:00
Manan17	905f989521	fixing sesame model	2026-03-07 19:06:00 +00:00
Roland Tannous	8454e6dd2b	fix: scope dataloader_num_workers=0 to Windows + transformers 5.x only	2026-03-07 17:55:59 +00:00
Roland Tannous	ef9184c731	fix: prevent training hang on Windows by adding triton-windows support	2026-03-07 17:53:36 +00:00
Roland Tannous	e25705a211	fix: propagate PYTHONPATH to child subprocesses, revert tokenizer patching	2026-03-07 11:28:24 +00:00
Roland Tannous	9330588015	fix: patch TokenizersBackend in export output after save_pretrained	2026-03-07 10:57:51 +00:00
Roland Tannous	76c78afb8f	fix: patch TokenizersBackend by model name - Qwen3.5→Qwen2Tokenizer, GLM→PreTrainedTokenizer	2026-03-07 10:29:59 +00:00
Roland Tannous	d60cd2843f	fix: patch Qwen3.5 broken tokenizer_class TokenizersBackend across all backends	2026-03-07 09:43:25 +00:00
Strahinja Stamenkovic	1db4a013a1	Conditionally enable 4bit on CDNA for bitsandbytes>=v0.49.2 (#4161 )	2026-03-07 01:33:40 -08:00
Roland Tannous	29fa91be07	fix: bump transformers to 5.2.0 and pin huggingface_hub in setup.ps1	2026-03-07 09:12:12 +00:00
Roland Tannous	bd60562145	fix: bump transformers 5.x pin from 5.1.0 to 5.2.0 for Qwen3.5 support	2026-03-07 09:10:09 +00:00
Roland Tannous	0b3397cc3a	fix: fail fast if runtime pip install of transformers 5.x fails	2026-03-07 08:40:25 +00:00
Roland Tannous	f3aeceeb24	fix: join prior pump thread before starting new training job	2026-03-07 08:37:03 +00:00
Roland Tannous	f7a3092cbd	fix: correct project root depth in model_config.py vision check	2026-03-07 08:15:29 +00:00
Roland Tannous	728420b290	fix: drain stale events from resp_queue after generation cancel	2026-03-07 08:12:16 +00:00
samit	3eef1dcfe3	modified the title	2026-03-06 22:33:30 -08:00
Samit	5f902af456	fixed model unload before load	2026-03-06 22:01:27 -08:00
Roland Tannous	25b51fad3b	fix: wait for training shutdown before export load, clear stop flag on reset 1. Export route: stop_training() only signals the subprocess — wait up to 30s for it to actually exit before loading the export checkpoint, avoiding a GPU memory race. 2. Training reset: clear _should_stop so /api/train/status returns phase=idle instead of staying stuck on phase=stopped after a user-triggered stop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 04:16:10 +00:00
Roland Tannous	f45f1f74c0	fix: add /v1 proxy entry to vite dev server config Without this, /v1/chat/completions requests in local dev are served by Vite instead of being proxied to the FastAPI backend. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 04:09:28 +00:00
Roland Tannous	609e3168a1	fix: serialize generation with _gen_lock to prevent concurrent queue readers Two overlapping /chat/completions requests could both read from the shared resp_queue, consuming and dropping each other's token events. Replace the request_id filtering (which silently dropped non-matching messages) with a threading.Lock that serializes generation — correct for single-GPU inference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 04:06:51 +00:00
Roland Tannous	9470957bb9	fix: log final GGUF file locations after relocation	2026-03-06 18:04:27 +00:00
Roland Tannous	5a828ebd43	fix: increase export timeout to 1 hour for large model GGUF conversion	2026-03-06 17:59:42 +00:00
Roland Tannous	4b7ad23b3a	feat: broaden Qwen3.5 matching to cover entire family	2026-03-06 16:48:28 +00:00
Roland Tannous	ed1e63c814	feat: add Qwen3.5-35B-A3B and Qwen3-Next to transformers 5.x model list	2026-03-06 10:54:48 +00:00
Manan17	be517cc958	derive effective model type from isVisionModel for dataset search filterin	2026-03-06 08:47:56 +00:00
Shine1i	5e5feb5c00	feat(recipe-studio, validators): tweak OXC validator with lint suppression support and improve error normalization logic	2026-03-06 09:40:53 +01:00
Roland Tannous	d910759121	feat: add OpenAI-compatible /v1/chat/completions endpoint	2026-03-06 07:48:09 +00:00
Manan17	cb3f3f4d0c	fixing update model type	2026-03-06 07:44:57 +00:00
imagineer99	15efb0f235	fix: harden chart container sizing with legacy event rechecks	2026-03-06 07:16:36 +00:00
Roland Tannous	c3bc19494f	fix: pin huggingface_hub==1.3.0 in .venv_t5 (satisfies transformers 5.x)	2026-03-06 06:19:28 +00:00
imagineer99	005e8ac671	fix: preserve chart sizing updates without ResizeObserver	2026-03-06 06:06:50 +00:00
Roland Tannous	c5f4503b9e	fix: unload competing subprocesses before load across all routes	2026-03-06 06:05:31 +00:00
Roland Tannous	6b32af0bdc	feat: subprocess-based export, pin huggingface_hub==0.36.0	2026-03-06 06:03:09 +00:00
imagineer99	ad2acbc07a	fix: align slider fill bar with thumb across value range	2026-03-06 05:21:44 +00:00
Roland Tannous	b5cfd0952c	fix: use subprocess with transformers 5.x for vision detection Models like GLM-4.7-Flash have architectures (glm4_moe_lite) that AutoConfig in the main process (transformers 4.57.x) can't recognize. Instead of a raw config.json workaround, run the AutoConfig check in a subprocess with .venv_t5/ activated — same pattern as training and inference workers. This is more robust and consistent.	2026-03-06 04:51:23 +00:00
Roland Tannous	e5c7a18f72	fix: handle unrecognized model architectures in vision detection AutoConfig.from_pretrained() fails for models needing transformers 5.x (e.g. glm4_moe_lite) when running with 4.57.x. Add a raw config.json fallback that bypasses AutoConfig's architecture registry — fetches config.json directly from local path or HuggingFace Hub and checks for vision indicators without needing the architecture to be registered.	2026-03-06 04:46:51 +00:00
Roland Tannous	1167be2798	refactor: consolidate version switching to .venv_t5, remove .venv_overlay All version switching now uses .venv_t5/ (pre-installed by setup.sh). The old .venv_overlay/ with runtime pip installs is removed. ensure_transformers_version() (used only by export) now does a lightweight sys.path swap instead of pip installing at runtime.	2026-03-06 04:37:06 +00:00
Manan17	821ba4936f	Fixing dataset split issues	2026-03-06 01:13:55 +00:00
Shine1i	93063c3212	feat(recipe-studio, validators): extend OXC validator with code shape support and integrate into recipe studio	2026-03-06 02:04:05 +01:00
Shine1i	f118216898	feat(recipe-studio): add inference_timeout configuration and validation logic	2026-03-06 00:51:28 +01:00
Shine1i	cf8cb9109b	feat(recipe-studio): add support for inference_extra_body configuration with collapsible UI and enhanced validation logic	2026-03-05 23:33:48 +01:00
Roland Tannous	cbe2896705	fix: unload inference model before training to free GPU memory When starting training, shut down the inference subprocess first so the training subprocess has full GPU memory available.	2026-03-05 22:28:11 +00:00
imagineer99	5d907b0449	fix: guard recharts ResponsiveContainer behind measured container dimensions	2026-03-05 21:54:27 +00:00
Roland Tannous	31334cece1	fix: indentation error in orchestrator load_model	2026-03-05 19:43:30 +00:00
Shine1i	3cafc0506e	feat(data-recipes, validators): extend OXC validator with linting mode support and integrate new modes into recipe studio	2026-03-05 20:19:31 +01:00
Roland Tannous	5bd6fac80e	fix: always spawn fresh subprocess per model load Reusing a subprocess after unsloth patches torch internals causes inspect.getsource() failures when loading a different model type. Each load now gets a clean Python interpreter.	2026-03-05 19:15:37 +00:00
Roland Tannous	7fc563731a	fix: use mp.Event for instant cross-process generation cancel Replaces cmd_queue-based cancel polling with a shared mp.Event. Fixes two issues: - Loading a new model while generating no longer hangs (cancel is instant) - Subprocess shuts down cleanly after explicit stop generation	2026-03-05 18:54:17 +00:00
Shine1i	552eb06bed	feat(data-recipes, validators): add OXC validator runtime and integration with recipe studio	2026-03-05 19:48:26 +01:00
Roland Tannous	4eabc74f34	feat: subprocess-based inference for transformers version switching Inference now runs in a persistent subprocess, solving the same transformers version-switching problem that was fixed for training. The subprocess stays alive between requests (model in GPU memory) and is only restarted when switching transformers versions. New files: - core/inference/worker.py: subprocess entry point with command loop - core/inference/orchestrator.py: parent-side proxy with same API Modified: - core/inference/__init__.py: exports orchestrator as default backend - routes/inference.py: removed in-process ensure_transformers_version()	2026-03-05 17:47:57 +00:00
Roland Tannous	1e04149ddf	fix: handle None job_id before first training run	2026-03-05 16:59:37 +00:00
Roland Tannous	842c05e75a	fix: lazy imports in core/__init__ to prevent subprocess importing ML libs early	2026-03-05 16:56:45 +00:00
Roland Tannous	9696bd557a	fix: exclude bitsandbytes from module purge to prevent duplicate operator registration	2026-03-05 16:40:20 +00:00
Roland Tannous	e3a1811c79	fix: remove in-process version switching from models routes	2026-03-05 16:22:32 +00:00
Roland Tannous	878f8f3924	fix: remove UnslothTrainer/get_trainer from core __init__ exports	2026-03-05 15:57:07 +00:00
Roland Tannous	f8bd4303f7	feat: subprocess-based training for transformers version switching	2026-03-05 15:40:32 +00:00
Shine1i	b277308b7e	merge: nightly into feature/data-reciper-enchansments	2026-03-05 14:51:08 +01:00
Shine1i	9a5cea201a	feat(recipe-studio): runtime edge handling with template refs and reversed edge support	2026-03-05 14:46:48 +01:00
Shine1i	889b3f78a8	refactor(studio): replace `inputValue` with `searchQuery` for improved clarity, add input reason tracking, and streamline dataset filtering logic	2026-03-05 14:06:17 +01:00
Shine1i	cc74d01df7	feat(recipe-studio): improve tab switch fit logic with animation and delay support	2026-03-05 13:29:38 +01:00
Shine1i	a66b1678e8	feat(recipe-studio): normalize and slugify `run_name`, update job naming logic	2026-03-05 12:25:51 +01:00
Shine1i	e30fc87187	refactor(studio): add local data-recipe dataset selection + training wiring	2026-03-05 12:25:51 +01:00
Shine1i	85d92281f3	feat(data-recipes, recipe-studio): refactor and enhance recipe templates with updated model configurations, structure changes, and added validation logic	2026-03-05 12:14:01 +01:00
Shine1i	bffda3a479	feat(recipe-studio): persist advanced collapsible states across components and sessions	2026-03-05 11:56:40 +01:00
Shine1i	9062755e8f	feat(data-recipes, recipe-studio): recipies changes, image context selector	2026-03-05 11:46:42 +01:00
Shine1i	337cb4de8d	feat(data-recipes): update recipe templates	2026-03-05 11:29:20 +01:00
Manan17	79cc850a50	remove tracked OuteTTS embedded repo reference	2026-03-05 08:44:23 +00:00
Manan17	9909111982	resolved merge conflicts	2026-03-05 07:59:43 +00:00
Manan17	c723f8d4da	fix SNAC training crash on variable-length sequences with DataCollatorForSeq2Seq	2026-03-05 07:04:53 +00:00
Roland Tannous	81b4928e99	Merge nightly into feature/transformers-v5-support	2026-03-05 06:49:44 +00:00
Roland Tannous	4e9c248fa8	Merge pull request #314 from unslothai/fix/vlm-dataset-conversion-error-handling-local Fix VLM training abort on URL-based dataset conversion failure	2026-03-05 10:10:58 +04:00
Roland Tannous	c171573a8f	fix: check for http(s) prefix instead of bare string type for URL detection	2026-03-05 06:10:10 +00:00
Roland Tannous	657cdaa151	fix: remove benchmark scripts from git tracking These are standalone benchmark scripts that were force-added despite being gitignored. They have no test functions and run network calls at module level, which breaks pytest collection in CI.	2026-03-05 06:06:47 +00:00
imagineer99	69299c168c	feat(data-recipes): add OCR learning recipe template	2026-03-05 00:58:29 +00:00
Roland Tannous	8218cb651e	Merge pull request #313 from unslothai/feature/index-range-dataset-slicing Fix: clear dataset slice state on file upload	2026-03-05 03:48:10 +04:00
Roland Tannous	352fe023a5	fix: clear dataset slice state when switching to uploaded file Prevents stale slice values from silently truncating uploaded datasets.	2026-03-04 23:42:23 +00:00
Roland Tannous	9ca45826d4	feat: parallel URL image probe with time estimate and progress reporting - Add 200-sample parallel probe using ThreadPoolExecutor + safe_num_proc to estimate download speed and failure rate before full conversion - Abort with clear error if >=30% of probe images fail to download - Show estimated download time in the training overlay modal - Parallel batch conversion for URL-based datasets (vs sequential for local) - Add warning field to /check-format response for URL-based image datasets - Display URL warning in dataset preview dialog (amber banner) - Thread progress_callback from trainer through format_and_template_dataset to convert_to_vlm_format for real-time status updates	2026-03-04 23:40:38 +00:00
Roland Tannous	195c1a3ce3	test: add parallel download benchmark with ThreadPoolExecutor	2026-03-04 23:29:43 +00:00
Roland Tannous	f59eaad212	feat: add tqdm progress bar to VLM conversion and download benchmark test	2026-03-04 23:29:43 +00:00
Roland Tannous	50885a7aa3	fix: add early probe to fail fast on datasets with too many broken image URLs	2026-03-04 23:29:43 +00:00
Roland Tannous	fdc23f4a43	fix: use fsspec for URL image downloads with per-sample error handling	2026-03-04 23:29:43 +00:00
Roland Tannous	8039eebcd5	test: add URL image loading comparison script	2026-03-04 23:29:43 +00:00
Roland Tannous	2b704221f7	fix: abort training pipeline on dataset conversion failure	2026-03-04 23:29:43 +00:00
Roland Tannous	929c3e9e1e	fix: cast URL image columns to HF Image() type in VLM conversion	2026-03-04 23:29:43 +00:00
Roland Tannous	f55153249d	Merge pull request #312 from unslothai/feature/index-range-dataset-slicing Add index range dataset slicing to Studio training page	2026-03-05 03:25:21 +04:00
Roland Tannous	40f2dc517f	fix: remove unnecessary tooltip copy from train split start	2026-03-04 23:24:09 +00:00
Roland Tannous	8199e0d2c0	refactor: move train split slice controls back to Advanced section Place Train Split Start / End inputs inside the Advanced collapsible with descriptive tooltips clarifying they slice the training split. Revert the selectors component to its original eval-split-only layout.	2026-03-04 23:24:09 +00:00
Roland Tannous	5f0559926c	refactor: move index range fields next to eval split in 3-col grid Place Slice Start and Slice End inputs alongside the Eval Split selector in a single row (grid-cols-3) so the dataset card stays compact. Remove the duplicate controls from the Advanced section.	2026-03-04 23:24:09 +00:00
Roland Tannous	a80188848d	feat: add index range dataset slicing to studio training page Add Start/End index inputs under Advanced in the dataset card, allowing users to slice a dataset by row range before training. Wired end-to-end: frontend store, API payload, backend Pydantic model, and trainer dataset loading (inclusive on both ends).	2026-03-04 23:24:09 +00:00
Roland Tannous	505487f66a	Merge pull request #311 from unslothai/revert-310-feature/index-range-dataset-slicing Revert "Add index range dataset slicing to Studio training page"	2026-03-05 03:23:31 +04:00
Roland Tannous	91783c0fb2	Revert "Add index range dataset slicing to Studio training page"	2026-03-05 03:21:07 +04:00
Roland Tannous	9f9d480e63	Merge pull request #310 from unslothai/feature/index-range-dataset-slicing Add index range dataset slicing to Studio training page	2026-03-05 03:20:31 +04:00
Roland Tannous	42ee6fe443	fix: remove unnecessary tooltip copy from train split start	2026-03-04 23:15:49 +00:00
Roland Tannous	7f8c0867d5	refactor: move train split slice controls back to Advanced section Place Train Split Start / End inputs inside the Advanced collapsible with descriptive tooltips clarifying they slice the training split. Revert the selectors component to its original eval-split-only layout.	2026-03-04 23:07:36 +00:00
Roland Tannous	07bbe7bae5	refactor: move index range fields next to eval split in 3-col grid Place Slice Start and Slice End inputs alongside the Eval Split selector in a single row (grid-cols-3) so the dataset card stays compact. Remove the duplicate controls from the Advanced section.	2026-03-04 22:35:17 +00:00
Roland Tannous	11ebea6a4b	feat: add index range dataset slicing to studio training page Add Start/End index inputs under Advanced in the dataset card, allowing users to slice a dataset by row range before training. Wired end-to-end: frontend store, API payload, backend Pydantic model, and trainer dataset loading (inclusive on both ends).	2026-03-04 21:48:40 +00:00
Roland Tannous	880633e42b	test: add parallel download benchmark with ThreadPoolExecutor	2026-03-04 14:30:11 +00:00
Roland Tannous	e4ec16296e	feat: add tqdm progress bar to VLM conversion and download benchmark test	2026-03-04 13:30:27 +00:00
Manan Shah	950e405c89	Delete studio/TESTING.md	2026-03-04 03:47:19 -07:00
Manan17	a5825f8d44	dynamic detection of audio models and fixing autoencoder issues	2026-03-04 10:44:44 +00:00
imagineer99	6c7d61d70e	fix: prevent browser credential autofill in HF token fields	2026-03-04 08:49:17 +00:00
Roland Tannous	5ee9479e37	fix: add early probe to fail fast on datasets with too many broken image URLs	2026-03-04 08:05:40 +00:00
Roland Tannous	722744cf04	fix: use fsspec for URL image downloads with per-sample error handling	2026-03-04 07:50:55 +00:00
Roland Tannous	6ba669c8eb	test: add URL image loading comparison script	2026-03-04 07:39:35 +00:00
Roland Tannous	645d7d357a	fix: abort training pipeline on dataset conversion failure	2026-03-04 06:42:48 +00:00
Roland Tannous	34fb9ec973	fix: cast URL image columns to HF Image() type in VLM conversion	2026-03-04 06:42:37 +00:00
Roland Tannous	2575b9e37d	Merge pull request #305 from unslothai/fix/dropdown-layout-shift Prevent select dropdowns from shifting layout when opened	2026-03-04 10:19:27 +04:00
Roland Tannous	12a4350a61	Merge pull request #306 from unslothai/fix/hf-dataset-error-message Sanitize dataset script errors and persist training start error	2026-03-04 10:14:26 +04:00
Roland Tannous	43bf599b33	Remove overly broad .py check from dataset error normalization	2026-03-04 06:13:47 +00:00
Roland Tannous	2d7d3cd27e	Merge pull request #287 from unslothai/fix/duplicate-def-inference Deleted duplicate definitions for load_for_eval, load_adapter, and load_model_simple in core Inference	2026-03-04 10:06:04 +04:00
Roland Tannous	46550ecf24	Merge pull request #289 from unslothai/fix/datasets-auth Added auth to dataset endpoints	2026-03-04 08:21:42 +04:00
Shine1i	29299d73b8	merge: nightly into feature/data-reciper-enchansments resolve setup.sh conflict by keeping nightly installer flow and preserving local data-designer plugin install via install_python_stack.py	2026-03-03 22:21:04 +01:00
Shine1i	2473043fe1	feat(recipe-studio): enhance edge synchronization logic with layout direction support	2026-03-03 22:17:50 +01:00
Shine1i	6997919c65	feat(recipe-studio): add support for naming full runs, enhance empty states, and refine UI components	2026-03-03 21:56:37 +01:00
Shine1i	df553fc955	feat(recipe-studio): add authentication to API requests and backend routes	2026-03-03 21:32:39 +01:00
imagineer99	a0f4566173	fix: sanitize dataset script errors and persist training start error	2026-03-03 20:15:23 +00:00
Shine1i	1334b24bea	refactor(recipe-studio): update UI components with consistent styling and improved hierarchy	2026-03-03 21:02:57 +01:00
Roland Tannous	50b88bfb34	Updated README	2026-03-03 18:42:35 +00:00
imagineer99	84bb57f208	fix: prevent select scroll-lock margin from shifting layout	2026-03-03 18:37:46 +00:00
Roland Tannous	4ddf59f781	Merge pull request #296 from unslothai/feature/windows-native-support PR: Windows Native Support + llama.cpp Build Migration	2026-03-03 22:23:35 +04:00
Roland Tannous	7bc235bed2	Merge branch 'nightly' into feature/windows-native-support	2026-03-03 22:23:18 +04:00
Roland Tannous	58b00db5cb	chore: add cross-platform Python installer with updated unsloth patch URLs	2026-03-03 17:31:59 +00:00
Roland Tannous	a4d2853fbc	fix: align llama-server binary discovery with upstream unsloth-zoo paths	2026-03-03 17:03:01 +00:00
Daniel Han	892caf5eb7	Update _utils.py	2026-03-03 08:29:33 -08:00
Daniel Han	a665c9b57d	Also patch accelerate's is_wandb_available for trl callbacks path (#4148 ) trl/trainer/callbacks.py imports is_wandb_available from accelerate.utils, not from transformers. The original fix in #4147 only patched the transformers version, so `from trl import GRPOTrainer` still crashed via the callbacks.py -> accelerate -> wandb path. Must patch both the source module (accelerate.utils.imports) AND the re-export namespace (accelerate.utils) since Python's `from accelerate.utils import X` reads from the latter, which holds its own cached reference.	2026-03-03 08:28:55 -08:00
Daniel Han	f4da8c3819	Update _utils.py	2026-03-03 07:14:15 -08:00
Daniel Han	cb13a1fe04	Fix broken wandb import crashing unsloth startup (#4147 ) * Fix broken wandb import crashing unsloth startup When wandb is installed but broken (e.g., wandb < 0.19.11 with protobuf >= 6.0), the import chain unsloth -> trl -> transformers -> is_wandb_available() -> import wandb crashes with: ImportError: cannot import name 'Imports' from 'wandb.proto.wandb_telemetry_pb2' This happens because transformers' is_wandb_available() has no try/except around `import wandb`. The error propagates up and kills `from unsloth import FastLanguageModel` even though wandb is optional. Add disable_broken_wandb() following the same pattern as disable_torchcodec_if_broken(). It proactively tries importing wandb during early init, and if the import fails, patches is_wandb_available() to return False and sets WANDB_DISABLED=true. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-03 07:08:12 -08:00
Datta Nimmaturi	f840119fa4	Fixup mapper issues and resolve properly (#4124 ) * Fixup mapper issues and resolve properly * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-03 06:57:25 -08:00
Daniel Han	e238fd14aa	Update __init__.py	2026-03-03 06:55:08 -08:00
Daniel Han	9b4a216b57	Update	2026-03-03 06:53:58 -08:00
Mustafa Eyceoz	6762a380e3	Fix multi-node distributed training with single GPU per node (#4143 )	2026-03-03 20:15:41 +05:30
Roland Tannous	b1b9262198	fix: update GGUF save paths to use ~/.unsloth/llama.cpp with Windows support (#4138 ) * fix: update GGUF save paths to use ~/.unsloth/llama.cpp with Windows support * fix: quote LLAMA_CPP_DEFAULT_DIR in fallback shell commands to handle paths with spaces * refactor: deduplicate platform-specific build instructions in quantization error message * chore: remove accidentally committed PR description file * Fix import safety and f-string bugs in save.py - H4: Add defensive try/except for LLAMA_CPP_DEFAULT_DIR and IS_WINDOWS imports with fallback defaults, so save.py works even if zoo PR #526 is not merged yet - H5: Fix Kaggle error path using plain "Error: {e}" instead of f"Error: {e}", so the actual exception is shown to users * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-03 06:34:09 -08:00
Lei Zhenyuan	6d42e0a7c8	add intel support for torch210 within pyproject.toml (#4144 ) * add intel support for torch210 * fix for typo	2026-03-03 06:33:45 -08:00
Datta Nimmaturi	b7ec64c96f	[Fix] lm_head lora save (#4106 ) * Fix lm_head lora save * Fix _need_to_train_embeddings guard for lm_head LoRA targets When lm_head is already in final_modules as a LoRA target, the _need_to_train_embeddings block should not also add it to modules_to_save. This prevents dual-wrapping (LoRA + modules_to_save on the same module) which causes assertion failures downstream. Check if embed_tokens/lm_head are already being trained as LoRA targets before adding them to modules_to_save. Also prevents duplicate entries with elif guards. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-03 06:30:13 -08:00
金黄色葡萄球君君	1ebf994da1	fix(ROCm): restrict is_rdna() to ROCm-officially-supported RDNA GPUs (#4136 ) Current arch.startswith("gfx1") incorrectly matches: - RDNA1 (gfx10xx) and RDNA2 (gfx103x): not ROCm supported - gfx1102 (RX 7600), gfx1103 (Phoenix APU): not in ROCm support matrix - gfx1150/1151/1152 (RDNA3.5 APUs): not in ROCm support matrix Replace with explicit whitelist aligned to the ROCm Linux support matrix: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html gfx1100 - RDNA3 discrete (RX 7900 series, PRO W7900/W7800) gfx1101 - RDNA3 discrete (RX 7800/7700 series, PRO W7700) gfx1200 - RDNA4 discrete (RX 9060 series) gfx1201 - RDNA4 discrete (RX 9070 series, AI PRO R9700) Mirrors the existing is_cdna() pattern. Avoids silently applying unverified Triton kernel tuning to unsupported hardware.	2026-03-03 03:05:38 -08:00
金黄色葡萄球君君	5e781900fb	Revert "perf(ROCm): optimize chunked CE loss num_warps for RDNA GPUs (#4123 )" (#4139 ) This reverts commit `721bf4852a`.	2026-03-03 03:05:32 -08:00
Shine1i	0166cb6d38	feat(recipe-studio): add HF repo ID inference and reset logic for HF state	2026-03-03 11:37:49 +01:00
Shine1i	c88cce8185	refactor(seed): package unstructured seed reader as local Data Designer plugin	2026-03-03 11:22:04 +01:00
Shine1i	95dd202ab3	merge nightly into feature/data-reciper-enchansments	2026-03-03 11:14:18 +01:00
Shine1i	bdc825298d	feat(seed): backend unstructured seed reader + server-side chunking, remove client chunk splitter	2026-03-03 11:11:26 +01:00
Roland Tannous	bded396923	Merge pull request #297 from unslothai/fix/fix-pip-issues fix: make pip check non-fatal and install jedi for Colab compatibility	2026-03-03 13:35:29 +04:00
Manan17	f04c684d8a	variable changes and some cleanup	2026-03-03 09:35:11 +00:00
Roland Tannous	f190d5a16d	fix: make pip check non-fatal and install jedi for Colab compatibility	2026-03-03 09:34:35 +00:00
Shine1i	7d35463abc	feat(recipe-studio): add execution progress island and collapsible advanced options for validators	2026-03-03 10:34:32 +01:00
Michael Han	59f7a9006a	Qwen3.5 Update.md Updated with Qwen3.5 Small models	2026-03-02 23:33:22 -08:00
pre-commit-ci[bot]	2089c158a7	[pre-commit.ci] pre-commit autoupdate (#4141 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.15.2 → v0.15.4](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.2...v0.15.4) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-02 21:48:36 -08:00
Etherll	65f212b940	Add Qwen 3.5 to FORCE_FLOAT32 (#4134 ) * Add Qwen3.5 to FORCE_FLOAT32 * fix vision encoder dtype mismatch * revert vision cast changes	2026-03-02 13:36:28 -06:00
Roland Tannous	87f2b2a9db	Merge branch 'nightly' into feature/support-for-audio-models	2026-03-02 15:55:25 +04:00
Roland Tannous	c64e50b46f	Patch unsloth-zoo llama_cpp.py and unsloth save.py from windows-support branch	2026-03-02 10:45:09 +00:00
Roland Tannous	e280e457d1	Move llama.cpp clone/build from in-tree to ~/.unsloth/llama.cpp - setup.sh: builds at ~/.unsloth/llama.cpp instead of ./llama.cpp - setup.ps1: builds at %USERPROFILE%/.unsloth/llama.cpp - inference llama_cpp.py: searches ~/.unsloth/ first, in-tree as legacy - export.py: updated comments (unsloth-zoo handles path natively)	2026-03-02 04:04:41 +00:00
DoubleMathew	a835b266ef	Fix auto padding free logic to respect user passed False (#4128 ) * Fix auto padding free logic to respect user passed * Update unsloth/trainer.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-01 19:30:47 -08:00
Wasim Yousef Said	dc7976c534	Merge pull request #294 from unslothai/fix/navbar-center-tabs-shift Prevent navbar tab shift when navigating across pages	2026-03-01 15:42:19 +01:00
Wasim Yousef Said	2df2d671ce	Merge pull request #290 from unslothai/fix/model-dropdown-visual-consistency Standard OOM/TIGHT model status indicators across model dropdowns	2026-03-01 15:41:39 +01:00
Roland Tannous	674cc67d78	Tighten Python bounds to >= 3.11, < 3.14 (matching setup.sh), only auto-install if missing	2026-03-01 13:05:10 +00:00
Roland Tannous	d5644d2d0d	Add Python 3.12 prerequisite check with auto-install via winget	2026-03-01 13:05:10 +00:00
Roland Tannous	0267ba0a18	Auto-enable Windows Long Paths via UAC elevation during setup	2026-03-01 13:05:10 +00:00
Roland Tannous	6536bfb33b	Remove unused CMP0194 cmake policy (eliminates cmake warning)	2026-03-01 13:05:10 +00:00
Roland Tannous	453f423d22	Simplify: use winget OpenSSL.Dev instead of vcpkg for HTTPS support	2026-03-01 13:05:10 +00:00
Roland Tannous	2e102b683e	Add vcpkg/curl[ssl] for HTTPS support in llama-server, enable LLAMA_CURL=ON	2026-03-01 13:05:10 +00:00
Roland Tannous	6e5a3d1744	Download GGUF via huggingface_hub instead of llama-server -hf (fixes HTTPS not supported on Windows)	2026-03-01 13:05:10 +00:00
Roland Tannous	9eb0ff074b	Add .venv/Scripts to User PATH so unsloth-studio works without activation	2026-03-01 13:05:10 +00:00
Roland Tannous	8e22b16bd8	Simplify completion banner: no venv activation needed	2026-03-01 13:05:10 +00:00
Roland Tannous	12867f701b	Auto-add CUDA DLLs to PATH when launching llama-server on Windows	2026-03-01 13:05:10 +00:00
Roland Tannous	d79fe439ed	Warn user to uninstall incompatible CUDA toolkit instead of failed side-by-side	2026-03-01 13:05:10 +00:00
Roland Tannous	b22c5b6ed8	Fallback: try descending CUDA versions if exact driver-max install fails	2026-03-01 13:05:10 +00:00
Roland Tannous	7b4d074857	Always persist compatible CUDA_PATH to User registry (overwrite stale values)	2026-03-01 13:05:10 +00:00
Roland Tannous	7576552717	Fix: scan side-by-side CUDA installs, pick compatible toolkit version	2026-03-01 13:05:10 +00:00
Roland Tannous	3521de7040	Build llama.cpp in-tree, auto-detect driver CUDA version for compatible toolkit	2026-03-01 13:05:10 +00:00
Roland Tannous	8d272ff8d5	Auto-detect driver CUDA version, install compatible toolkit instead of latest	2026-03-01 13:05:10 +00:00
Roland Tannous	bccbd26f3a	Fix non-ASCII chars in test script for Windows PS 5.1	2026-03-01 13:05:10 +00:00
Roland Tannous	1684e48b1e	Add llama-cpp Windows test script, fix binary lookup paths	2026-03-01 13:05:10 +00:00
Roland Tannous	f036a70681	Fix llama-server binary lookup for Windows (.exe, Release dir, ~/.unsloth)	2026-03-01 13:05:10 +00:00
Roland Tannous	7e021886c8	Force num_proc=1 on Windows to avoid slow spawn overhead	2026-03-01 13:05:10 +00:00
Roland Tannous	bd7c17708b	Set short TORCHINDUCTOR_CACHE_DIR to fix Windows MAX_PATH crash	2026-03-01 13:05:10 +00:00
Roland Tannous	e1cc5e61b1	Fix npm Invalid Version: delete package-lock.json, relax Node constraint	2026-03-01 13:05:10 +00:00
Roland Tannous	aba3d8e29b	Enforce Node LTS (v20-v22), add npm error checking, clean node_modules	2026-03-01 13:05:10 +00:00
Roland Tannous	2dfe0abaa1	Fix npm stderr crash on Windows ErrorActionPreference	2026-03-01 13:05:10 +00:00
Roland Tannous	662a1eb9d5	Fix Windows frontend build, add setup.bat, ANSI colors, aliases	2026-03-01 13:05:10 +00:00
Roland Tannous	783f0caf5f	add setup.bat	2026-03-01 13:05:10 +00:00
Roland Tannous	28bac1859a	Extract shared install_python_stack.py for cross-platform setup	2026-03-01 13:05:10 +00:00
Roland Tannous	4aba18375b	Merge pull request #295 from unslothai/fix/fix-local-vision-gguf-loading fix: support mmproj for local vision GGUF models + fix Windows pipe d…	2026-03-01 17:03:24 +04:00
Roland Tannous	ff93c97024	fix: support mmproj for local vision GGUF models + fix Windows pipe deadlock	2026-03-01 12:58:38 +00:00
Shine1i	761c84f92b	feat(recipe-studio): introduce validator blocks for code validation with Python and SQL engines	2026-03-01 13:01:00 +01:00
Shine1i	891739a56a	feat(recipe-studio): add LLM trace modes and reasoning content extraction support	2026-03-01 12:01:48 +01:00
Shine1i	b7ee065ffd	refactor(recipe-studio): add image preview support for dataset and LLM configurations p2	2026-03-01 11:21:10 +01:00
Shine1i	c3c65cded8	feat(recipe-studio): add image preview support for dataset and LLM configurations p1	2026-03-01 10:57:51 +01:00
Shine1i	4d718db5a0	feat(recipe-studio): auto-fit editor viewport on tab switch, track manual viewport adjustments	2026-03-01 10:21:02 +01:00
Daniel Han	54119f2060	rl: guard warnings_issued before TRL estimate_tokens write (#4034 ) Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-03-01 00:42:37 -08:00
金黄色葡萄球君君	afd5b687aa	Fix global dequantize buffer dtype mismatch across mixed-precision loads (#4026 ) Fix global dequantize buffer dtype mismatch when loading multiple 4-bit models with different dtypes in the same process. Adds dtype check alongside existing None check for WEIGHT_BUFFER in both CUDA/HIP and XPU paths.	2026-03-01 00:15:47 -08:00
Manan17	c636fd5a42	code cleanup	2026-03-01 08:04:38 +00:00
金黄色葡萄球君君	721bf4852a	perf(ROCm): optimize chunked CE loss num_warps for RDNA GPUs (#4123 ) Use 16 warps for RDNA in the chunked cross-entropy forward kernel (large vocab > 65536), matching the existing CDNA optimization. Benchmarked on W7900 (gfx1100) with actual unsloth kernels (5 trials, median): - Chunked CE forward (BS=65536): 16 warps = 2.4-2.6x faster than 32 - All other kernels (LayerNorm, RoPE, SwiGLU): default heuristic is already optimal for RDNA; no modification needed. Depends on: #4109 (provides is_rdna() detection)	2026-02-28 23:59:34 -08:00
金黄色葡萄球君君	48e8f78042	fix(ROCm): prevent false TMA support detection on AMD GPUs (#4126 ) TMA (Tensor Memory Accelerator) is an NVIDIA Hopper+ feature that does not exist on AMD GPUs. However, _check_tma_support() incorrectly returns True on ROCm because: 1. torch.cuda.get_device_capability() returns (11, 0) for gfx1100, satisfying the >= 9 check intended for Hopper (sm_90). 2. ROCm Triton exports tl.make_tensor_descriptor (the symbol exists even though the hardware does not support TMA). This would cause MoE grouped_gemm to attempt TMA operations on AMD GPUs, leading to runtime failures. Fix: early-return False for HIP devices, matching the existing XPU guard.	2026-02-28 23:59:27 -08:00
金黄色葡萄球君君	17795e4f14	fix(Triton): ensure float32 eps in RMS LayerNorm rsqrt for HIP/ROCm (#4110 ) * fix(Triton): ensure float32 eps in RMS LayerNorm rsqrt for HIP/ROCm On HIP (AMD ROCm), Triton constexpr eps may not promote to float32 in rsqrt, causing numerical instability (NaN/Inf) on RDNA GPUs (gfx1100, gfx1151 Strix Halo, etc.). Use tl.full((), eps, tl.float32) to explicitly create a float32 scalar before adding to row_var in rsqrt. Applied to both standard and Gemma RMS LayerNorm forward kernels. Tested on W7900 (gfx1100): full test suite passed (dim 512-2048, bf16/fp16, various seqlen). Related: #3385, #3588 * Apply same float32 eps fix to layernorm.py for PR #4110 layernorm.py has the identical tl.constexpr eps pattern in layernorm_forward that can misfire on HIP/ROCm. Apply the same tl.full((), eps, tl.float32) fix for consistency. Both testing_suite_layernorm (standard LayerNorm) and testing_suite_layernorm (RMS LayerNorm) pass on NVIDIA after this change. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-02-28 23:59:22 -08:00
金黄色葡萄球君君	8a8dcd48dd	fix(ROCm): Comprehensive RDNA GPU support - fix Gemma3 NaN & add is_rdna() (#4109 ) * fix(ROCm): comprehensive RDNA GPU support - fix Gemma3 NaN & add is_rdna() - Add is_rdna() detection for RDNA3/3.5/RDNA4 consumer GPUs (gfx11xx, gfx1151, gfx12xx) - Disable torch.compile for Gemma3 on HIP to fix NaN loss (fixes #3385, #4029) - Export is_cdna/is_rdna from kernels for downstream use - Import is_rdna into cross_entropy_loss for future RDNA-specific tuning Tested on AMD Radeon PRO W7900 (gfx1100) with ROCm 7.1: ✓ Gemma3-1B: loss 3.37→3.25 (no NaN) ✓ Llama-3.2-1B: loss 2.44→2.37 (no NaN) ✓ Qwen2.5-1.5B: loss 1.89→1.85 (no NaN) ✓ RMS LayerNorm Triton kernel: bf16/fp16 PASSED ✓ Cross Entropy Loss Triton kernel: 32K/256K vocab PASSED * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: scope compile disable to RDNA only, use partial mode, remove unused import Changes based on Daniel's review: 1. (HIGH) Replace DEVICE_TYPE=='hip' with is_rdna() to avoid disabling torch.compile on CDNA GPUs (MI250X/MI300X/MI350) where it works fine 2. (MEDIUM) Use 'partial' instead of '1' for UNSLOTH_COMPILE_DISABLE to only disable model forward compilation while keeping loss compilation, matching the existing Sesame pattern 3. (LOW) Remove unused is_rdna import from cross_entropy_loss.py (F401) * Remove redundant is_cdna/is_rdna exports from kernels/__init__.py These functions are imported directly from .utils where needed (e.g. cross_entropy_loss.py, loader.py). No external code imports them from the unsloth.kernels namespace. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-28 23:59:17 -08:00
金黄色葡萄球君君	a3a1c3457f	fix(ROCm): remove fix_rocm_triton_key_error — based on a false premise (#4125 ) The function (introduced in #3923) assumed that the absence of `triton.runtime.triton_key` on ROCm means torch.compile will crash. Investigation shows this is incorrect: 1. `triton.runtime.triton_key` was renamed/removed in the ROCm Triton fork — it does not exist at that path. However, `triton.compiler.compiler.triton_key` (the path torch._inductor actually imports) EXISTS and works correctly on ROCm. 2. Both call-sites in torch._inductor (codecache.py and async_compile.py) already wrap the import in try/except, so even a genuinely missing triton_key would be handled gracefully. 3. Comprehensive testing on ROCm 7.1 + Triton 3.4.0 + gfx1100 confirms torch.compile works correctly for matmul, cross-entropy, RMSNorm, multi-layer transformer forward+backward, and LoRA — all without triton.runtime.triton_key. The original code was also ineffective (environment variables set after torch import have no effect on torch._dynamo config), so removing it has zero behavioral change on existing installations. Supersedes the compile-disable portion of #3923.	2026-02-28 23:59:12 -08:00
imagineer99	471fc8fd90	fix: prevent navbar tab shift when navigating across pages	2026-03-01 03:02:49 +00:00
Manan17	c48437848d	revamping up the code and adding inference	2026-03-01 02:30:31 +00:00
Manan17	ab2ac39017	Changes with audio training	2026-03-01 02:27:45 +00:00
Manan17	ac27edde35	merging with nightly	2026-03-01 02:27:45 +00:00
imagineer99	42f5ba5fcc	fix: standardize OOM/TIGHT model status indicators across model dropdowns	2026-03-01 00:02:15 +00:00
samit	ece51dca13	updated fetch to auth fetch in the frontend	2026-02-28 02:10:12 -08:00
samit	d07397c81e	added auth to dataset endpopints	2026-02-28 01:17:43 -08:00
Shine1i	ce33d673b9	feat(markdown): fix Mermaid integration with error handling and copy button	2026-02-27 20:02:40 +01:00
samit	862b4100d2	deleted duplicate definitions	2026-02-27 06:00:28 -08:00
Roland Tannous	89d0a98192	Merge pull request #284 from unslothai/fix/drop-model-task-hard-filter Apply HF task filtering only for empty model queries	2026-02-27 15:46:17 +04:00
imagineer99	49c319c2f7	fix: only apply HF task filter for empty model search queries	2026-02-27 11:31:53 +00:00
Wasim Yousef Said	ad7c1ceb40	Merge pull request #268 from unslothai/fix/delete-custom-config Updated the delete custom preset in chat tab (filters)	2026-02-27 01:38:01 -08:00
Shine1i	6d74943dea	keep checkpoint on preset apply	2026-02-27 10:36:07 +01:00
Wasim Yousef Said	7c364b594c	Merge pull request #269 from unslothai/fix/fine-tuned-model-tooltip Added tooltip for the fine tuned models in chat page	2026-02-27 01:34:49 -08:00
Wasim Yousef Said	5cd3d85882	Merge pull request #262 from unslothai/fix/truncated-text Reduced name truncation on the training page	2026-02-27 01:33:05 -08:00
Shine1i	3c728f5eb3	merge nightly	2026-02-27 10:31:37 +01:00
Roland Tannous	6e12e2536d	Merge pull request #267 from unslothai/fix/config-switch Preserving model name during configuration type switch in chat page	2026-02-27 13:23:29 +04:00
Roland Tannous	5be7ae925d	Merge pull request #266 from unslothai/fix/dataset-search-remove-size-download-badges Remove dataset metadata badges from HF dataset dropdowns	2026-02-27 13:22:49 +04:00
Roland Tannous	cef36ee8e8	Merge pull request #282 from unslothai/fix/inference-auth Added auth to inference endpoints	2026-02-27 13:18:38 +04:00
Roland Tannous	32d68b99e3	Merge pull request #278 from unslothai/fix/reorder-model-type-cards-onboarding Reorder model type cards in onboarding to show Text first	2026-02-27 13:17:00 +04:00
Roland Tannous	90b06063d6	Merge pull request #275 from unslothai/fix/show-size-gguf Passes metadata to get model size	2026-02-27 13:15:38 +04:00
Manan17	168957a87a	Aggregating sharded models, showing fit/oom for quantizations	2026-02-27 08:23:15 +00:00
samit	b18a14d369	added auth to inference endpoints	2026-02-27 00:20:36 -08:00
Daniel Han	9248091500	Fix Whisper auto_model mapping fallback for concrete model classes (#4115 )	2026-02-26 23:43:57 -08:00
Manan17	2fea4cadd3	Passes metadata to get model size	2026-02-27 07:38:55 +00:00
Roland Tannous	24c931c374	Merge pull request #276 from unslothai/fix/rebuild-llamacpp-setup rebuild llama cpp for setup	2026-02-27 10:20:32 +04:00
imagineer99	4aecfb80d4	fix: reorder model type cards in onboarding to show Text first	2026-02-27 06:16:26 +00:00
Manan17	cd7bdf4224	rebuild llama cpp for setup	2026-02-27 04:35:52 +00:00
Daniel Han	d9089de0f7	Guard Gemma3N variants from flex attention defaults (#4116 )	2026-02-26 17:48:38 -08:00
Daniel Han	7c68ec439f	Update README.md (#4119 )	2026-02-26 09:18:29 -08:00
Daniel Han	618ac74ae0	Update README.md (#4118 )	2026-02-26 08:06:21 -08:00
Shine1i	bf594c89de	refactor(recipe-studio): split page logic into graph/runtime hooks + floating run controls	2026-02-26 16:42:35 +01:00
Roland Tannous	9a35f26307	Merge branch 'nightly'	2026-02-26 19:38:49 +04:00
Shine1i	4c97591e4c	fix(recipe-studio): prevent stale empty fitView from offsetting first block zoom	2026-02-26 16:23:27 +01:00
Shine1i	b7edf4e3cd	refactor(recipe-studio): simplify runtime graph flow + guard stale active execution lock p2	2026-02-26 15:37:48 +01:00
Shine1i	8a996afbfb	feat(recipe-studio): add live execution graph state (active flows, node status, editor lock) p1	2026-02-26 15:27:46 +01:00
Shine1i	7ed6ad1e0c	fix(recipe-studio): support user.* refs validation + toggle user badge details; style user refs/node amber	2026-02-26 14:27:36 +01:00
Wasim Yousef Said	77502494d5	Merge pull request #272 from unslothai/feature/data-reciper-enchansments feat(recipe-studio): UX + layout polish & WIP data-reciper client & backend finalization p1	2026-02-26 05:10:37 -08:00
Shine1i	00a869f837	refactor(data-recipe): centralize json+stage constants, tighten parser/errors, sync seed ui	2026-02-26 14:06:53 +01:00
Shine1i	e4b64f3cd5	refactor(data-recipe): split recipe backend routes for readability (seed/validate/jobs)	2026-02-26 14:05:32 +01:00
Shine1i	aaf62095fe	feat(recipe-studio): add jinja ref validation UI for llm/expression fields	2026-02-26 13:35:07 +01:00
Shine1i	1e3d50f876	feat(recipe-studio): add block sheet search + clearer icons in sheet	2026-02-26 12:55:09 +01:00
Shine1i	81a7e38aa6	chore: simplify recipe drag payload parsing	2026-02-26 12:48:04 +01:00
Shine1i	9ee7633bc1	feat(recipe-studio): add sidebar drag-drop block creation + spawn sheet added blocks at viewport center	2026-02-26 12:45:38 +01:00
Shine1i	04d6f5e67b	feat(recipe-studio): sanitize shared seed payload + add inline seed UX with HF search	2026-02-26 12:23:05 +01:00
Shine1i	75857cdcea	feat(recipe-studio): polish import/llm editors, refs preview, copy toast, and note layout behavior	2026-02-26 11:44:57 +01:00
Shine1i	a53d8cb272	fix(recipe-studio): preserve note positions during auto-layout and fit workflow only	2026-02-26 11:30:24 +01:00
Shine1i	d1047646a9	feat(recipe-studio): optimize model infra auto-layout handles and centering	2026-02-26 11:07:07 +01:00
Shine1i	dd3e1e7293	refactor(recipe-studio): simplify aux node graph logic and remove dead handle/sync code	2026-02-26 10:43:26 +01:00
Shine1i	48f7d40e87	feat(ui): normalize recipe dialogs + chip/category UX polish	2026-02-26 10:13:01 +01:00
Shine1i	1de4b73244	fix: recipe studio dialog combobox click-select + simplify model provider form	2026-02-26 10:03:47 +01:00
Roland Tannous	cbbccefcdf	Merge pull request #271 from unslothai/fix/setup-python-version-bounds fix(setup): enforce Python >= 3.11 and < 3.14 version bounds	2026-02-26 12:13:32 +04:00
Michael Han	e8ae589e84	Qwen3.5 update.md	2026-02-25 23:56:48 -08:00
Roland Tannous	e81516320d	fix(setup): restrict Python to >=3.11 and <3.14 Adds lower bound (>= 3.11) and tightens upper bound (< 3.14) for Python version discovery in setup.sh. Extracts bounds into MIN_PY_MINOR / MAX_PY_MINOR variables for easy future updates.	2026-02-26 11:54:16 +04:00
Roland Tannous	28e0218263	Merge pull request #270 from unslothai/fix/gguf-export-relocation Fix GGUF exports saving to wrong directory and missing from chat model selector	2026-02-26 11:48:15 +04:00
Roland Tannous	ed18f9b9dd	Flatten GGUF subdirs in export and fix metadata lookup in scanner	2026-02-26 11:35:04 +04:00
Roland Tannous	90f012a444	Write export metadata for GGUF exports to fix Unknown base model	2026-02-26 11:24:32 +04:00
Roland Tannous	2ce63f09c4	Add gguf to toLoraSummary inline type	2026-02-26 10:59:39 +04:00
Roland Tannous	1b822a943c	Revert "Add gguf to frontend export_type unions" This reverts commit `782af39949`.	2026-02-26 10:56:26 +04:00
Roland Tannous	782af39949	Add gguf to frontend export_type unions	2026-02-26 10:54:49 +04:00
Roland Tannous	609ae4809a	Merge pull request #229 from unslothai/feat/dataset-list-sorting Feat: Sort and filter dataset search results by model type relevance	2026-02-26 10:40:02 +04:00
Roland Tannous	ea9b22000e	Merge pull request #245 from unslothai/fix/datetime-utc-python39-compatibility fix: replace datetime.UTC with timezone.utc for Python 3.9+ compatibility	2026-02-26 10:37:01 +04:00
imagineer99	852dff564e	feat: added datasets of size 5M and 10M to pretraining size category	2026-02-26 06:32:45 +00:00
imagineer99	6e535ed0eb	fix: filter OCR datasets from non-vision hub results	2026-02-26 06:27:52 +00:00
Roland Tannous	e0127c0d4c	Merge branch 'nightly'	2026-02-26 10:25:41 +04:00
Roland Tannous	808f4655c9	Merge pull request #243 from unslothai/fix/setup-unbound-variable resolved unbound variable error	2026-02-26 10:18:21 +04:00
samit	04aee4a4c6	updated to make the delete preset work	2026-02-25 21:36:50 -08:00
imagineer99	1c55e2fbaa	fix: remove dataset metadata badges from HF dataset dropdowns	2026-02-26 03:57:55 +00:00
samit	7bc752d5f9	passed checkpoint as a parameter to presets	2026-02-25 18:08:02 -08:00
Wasim Yousef Said	2f84bbf0e0	Merge pull request #264 from unslothai/fix/attachment-tsx-type-error fix(attachment): replace never exhaustive check to fix Colab TS2322 b…	2026-02-25 17:19:44 -08:00
Leo Borcherding	62c6fd9f46	fix(attachment): replace never exhaustive check to fix Colab TS2322 build error `attachment.type` resolves to `string & {}` via @assistant-ui/store@0.1.6's generic type chain when installed through npm (package-lock.json), breaking the `const _exhaustiveCheck: never = type` exhaustive check pattern. Replace with a direct throw that compiles cleanly across library versions while preserving identical runtime behaviour. Fixes #263 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 17:56:02 -06:00
Daniel Han	3fc6cfd32d	Fix transformers v5 RoPE inv_freq corruption and generate() BatchEncoding compat (#4112 ) * Fix transformers v5 RoPE inv_freq corruption during model loading Transformers v5 initializes models on the meta device, then _move_missing_keys_from_meta_to_device() replaces all non-persistent buffers with torch.empty_like() (uninitialized memory). Vanilla transformers restores inv_freq via _init_weights() checking for original_inv_freq, but Unsloth's LlamaRotaryEmbedding subclasses lack this attribute, so inv_freq stays corrupted with garbage values. This caused 5-11x higher training loss on transformers v5 for all models using Unsloth's rope (Llama 3.x, Qwen3, Mistral, TinyLlama, Granite). Models using native transformers rope (Gemma, Phi-4, Falcon-H1) were unaffected. The fix recomputes inv_freq from the stored base/dim after model loading, applies model-specific scaling via _apply_inv_freq_scaling(), and rebuilds cos/sin caches. Also handles LongRopeRotaryEmbedding (Phi-3.5 style short/long inv_freq). Guarded by transformers >= 5.0.0 so it is a no-op on v4. Tested on: Llama 3.1 8B, Llama 3.2 3B, Qwen3 14B, Qwen3 4B, Phi-4, TinyLlama, Mistral 7B, Gemma2 2B, Falcon-H1 -- all v5 losses now match v4 baselines to < 0.004 absolute difference. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Unpack BatchEncoding in generate() for v4/v5 backwards compatibility Old notebooks pass the full tokenizer output as input_ids: inputs = tokenizer(..., return_tensors="pt").to("cuda") model.generate(input_ids=inputs, ...) This worked on transformers v4 because generate() internally extracted the tensor. Transformers v5 calls .shape on input_ids directly, which crashes since BatchEncoding has no .shape attribute. Fix: in unsloth_fast_generate(), detect when input_ids is a dict-like object (BatchEncoding) and unpack its contents into separate kwargs before forwarding to the underlying generate(). This makes both old and new notebook patterns work on both v4 and v5. * Remove redundant seen_ids dedup in _fix_rope_inv_freq named_modules() already deduplicates with remove_duplicate=True (default). Also clarify that native v5 rotary classes (Gemma3 etc.) have original_inv_freq which transformers v5's _init_weights() uses to restore inv_freq, so they do not need this fix. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-25 08:18:45 -08:00
DoubleMathew	6d0f864369	Fix/pr 3699 leftpad prefill main (#4100 ) * Fix left-padding masks and positions in batched decode/prefill * Fix batched generation with left padding * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix attention mask handling, padding_idx zeroing, and Mistral batched generation 1. attention_dispatch.py: Fall back from flash/xformers to SDPA when an attention_mask is present, since flash attention only supports causal masking via flag and cannot consume arbitrary padding masks. 2. gemma2.py: Apply attention_mask during decode inference for bsz > 1. Guard against boolean SWA/GA flags with isinstance check. Slice mask to match K/V length when sliding window is active. Remove dead commented-out SDPA branch (SDPA does not support softcapping). 3. granite.py: Apply attention_mask during decode inference for bsz > 1. Remove dead commented-out SDPA branch and misleading comment. 4. mistral.py: Fix 2D-to-4D padding mask conversion -- convert 0/1 mask to additive format (0 for keep, -inf for mask) before combining with the causal mask. Force SDPA backend when attention_mask is present. 5. llama.py: Skip zeroing embed_tokens.weight[padding_idx] when the embedding is weight-tied to lm_head, since zeroing the shared weight forces logit(pad) = 0 which is higher than real token logits in models like Gemma, causing the decoder to emit pad tokens as gibberish. Also add eos != pad guard, clean up unused _seq_length variable, and fix get_max_cache_shape handling. 6. vision.py: Same padding_idx fix as llama.py for the vision model loading path. Tested on gemma-2b-it, gemma-2-2b-it, Llama-3.2-1B, Mistral-7B-v0.3, Qwen2.5-0.5B, Qwen3-0.6B with flash-attn 2.8.3 active. All outputs coherent, zero crashes, zero resize warnings. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Inference path optimizations: eliminate per-layer GPU-CPU sync, cache inspect.signature, add Granite SDPA split * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * More inference path optimizations across model files - gemma: hoist rotary_seq_len computation to model level (eliminates N per-layer GPU-CPU syncs from position_ids.max().item()), pre-convert attention mask to bool once for all layers, use scalar float multiply instead of torch.tensor allocation for embedding scaling - gemma2: use in-place tanh_() for softcap attention, use scalar float multiply for embedding scaling - granite: pre-convert attention mask to bool once for all layers - cohere: use in-place neg_() for rotary embedding (consistent with all other model files) - falcon_h1: use in-place mul_() for key_multiplier scaling - llama: use in-place tanh_() for logit softcapping * Revert scalar multiply for Gemma/Gemma2 embedding scaling The original torch.tensor(..., dtype=hidden_states.dtype) is intentional: sqrt(3072) rounds to 55.5 in bfloat16 vs 55.4256 in float32. A plain scalar multiply may compute at higher precision internally, producing different results. Restore the explicit dtype-cast tensor to match the training path in LlamaModel_fast_forward. * Fix hardcoded cuda:0 device strings and add Cohere .eq(0) bool mask Replace 15 hardcoded "cuda:0" with f"{DEVICE_TYPE_TORCH}:0" across gemma.py, gemma2.py, cohere.py, and falcon_h1.py to support multi-GPU and non-CUDA devices (XPU, etc.). Add .eq(0) bool mask pre-conversion in CohereModel_fast_forward_inference for batched inference consistency with llama.py, granite.py, and gemma.py. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Disable flex_attention for Mllama (Llama 3.2 Vision) Mllama's _update_causal_mask uses the deprecated make_flex_block_causal_mask which creates a BlockMask with Q_LEN=KV_LEN=total_seq_len. During decode with KV cache, q_len=1 but the block_mask still has Q_LEN=total_seq_len, causing a ValueError. This is an upstream transformers issue -- newer models use flex_attention_mask from masking_utils which handles decode correctly via cache_position, but mllama has not been updated yet. Add mllama to the exclusion list in prefer_flex_attn_if_supported alongside gpt_oss so it falls back to sdpa, which works correctly for both training and inference. * Fix off-by-one in sliding window K/V slicing for gemma2, qwen3, falcon_h1, cohere The old formula `slicing_tokens = 1 - sliding_window` uses negative indexing that keeps `sliding_window - 1` tokens instead of `sliding_window`. For example with sliding_window=32 and kv_seq_len=100, `1-32 = -31` keeps indices 69..99 (31 tokens) instead of the correct 68..99 (32 tokens). Replace with `start = kv_seq_len - sliding_window` to match the fix already applied in llama.py and the canonical definition in transformers masking_utils (sliding_window_overlay: kv_idx > q_idx - W, which keeps exactly W tokens). Also add attention_mask slicing after K/V trim in qwen3, falcon_h1, and cohere to prevent mask/K dimension mismatch during batched SDPA inference, matching the pattern already used in llama.py. Currently only gemma2 (sliding_window=4096) is actively affected. The other three models have sliding_window=None in their configs so the code path is not triggered, but this keeps it correct for any future models that set it. * Fix Gemma2 softcapping order: apply mask after softcap, not before The attention mask must be applied AFTER logit softcapping, not before. Both the Google DeepMind reference implementation (google-deepmind/gemma, gm/nn/_modules.py lines 254-277) and transformers' eager_attention_forward (gemma2/modeling_gemma2.py lines 187-193) use this order: 1. logits = Q @ K^T * scale 2. logits = tanh(logits / softcap) * softcap # softcap first 3. logits = logits + mask # mask after 4. probs = softmax(logits) The PR had the mask addition before softcapping, which causes tanh to clamp the -inf mask values to -softcap instead of preserving them as -inf for softmax. While the practical impact is small (masked positions get ~1e-23 probability instead of exact zero), this should match upstream. * Clarify GQA condition precedence and remove stale comments Add explicit parentheses to grouped query attention conditions in llama.py, qwen3.py, granite.py to make operator precedence clear. The expression `bsz == 1 or not X and Y` relies on Python binding `not` > `and` > `or` which is correct but easy to misread. Remove dead commented-out code (`# else: # Knn, Vnn = Knn, Vnn`) and stale mask comments (`# if attention_mask ...`) from the bsz==1 fast path in llama, qwen3, cohere, falcon_h1, gemma2 inference functions. These were leftover from the pre-batched-inference structure and no longer apply. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-02-25 07:21:04 -08:00
Daniel Han	9b51b14b2b	Support Python 3.14 in package metadata (#4113 )	2026-02-25 07:17:16 -08:00
Roland Tannous	c21cf2ffcf	Add GGUF tag for exported models in chat page selector	2026-02-25 19:01:47 +04:00
Roland Tannous	bfb1403032	Relocate GGUF exports into exports/ directory	2026-02-25 18:54:39 +04:00
Datta Nimmaturi	3f9e03ff1b	Allow fp8 for non fast inference (#3904 ) * Allow fp8 for non fast inference * Extensive fp8 alow and quantizer patch * Clean up commented-out code, duplicate import, and revert unnecessary Version() changes - Delete commented-out FP8 fast_inference guard in FastModel (loader.py) instead of leaving it commented -- matches FastLanguageModel which was properly deleted - Delete commented-out fast_inference guard in loader_utils.py - Remove duplicate `from transformers import GenerationConfig, CompileConfig` in vision.py (line 112 already imports both plus AutoConfig) - Revert Version(trl.__version__) back to Version(trl) in trainer.py -- trainer.py imports Version from unsloth_zoo.utils which already handles module objects --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-02-25 06:52:18 -08:00
Daniel Han	00fe9a40c0	Add resilience to TRL internal API reclassification (#4111 ) * Add resilience to TRL internal API reclassification TRL is moving toward v1.0 and will reclassify several currently-importable symbols as internal with no stability guarantees. This adds try/except cascading imports with local fallbacks so Unsloth keeps working regardless of whether TRL removes, moves, or restructures these symbols. Changes: - rl.py: Add try/except cascade for unwrap_model_for_generation with local contextmanager fallback. Wire sanitize_logprob from RL_REPLACEMENTS into the compiled trainer template (same pipeline as selective_log_softmax and other global functions). Add import math and import logging to the template header. - rl_replacements.py: Remove inline import of sanitize_logprob from trl.scripts.vllm_serve in the regex replacement. The function is now a module-level global in the compiled file. - tokenizer_utils.py: Wrap dynamic exec import with per-item fallback so a single removed symbol does not break the entire bulk import. Depends on unslothai/unsloth-zoo#516. Tested across all TRL versions from 0.22.2 through 0.29.0.dev0 (git main). Training losses and grad norms are bit-identical to unpatched runs. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-25 06:34:21 -08:00
Irfan Ali	30fac638ad	fix: correct gpt-oss Ollama generation prompt and add quantization wa… (#4087 ) * Warn when save_pretrained_gguf overrides quantization to MXFP4 for GPT-OSS GPT-OSS only supports MXFP4 format. If the user passes a different quantization_method, log a warning via logger.warning_once before overriding. Pass quantization_method=None to suppress the warning. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-25 04:39:16 -08:00
Roland Tannous	a8b5b7ed58	Fix GGUF models missing from chat page model search GGUF was in the global EXCLUDED_TAGS set which filtered it from all consumers of useHfModelSearch, including the chat page. Move GGUF exclusion to an opt-in excludeGguf option so only training and onboarding pages filter out GGUF models.	2026-02-25 16:21:08 +04:00
Roland Tannous	f92e4a3e1b	Merge pull request #261 from unslothai/feat/gguf-llama-cpp-inference Add GGUF model inference via llama-server with quantization variant selection	2026-02-25 16:07:39 +04:00
Roland Tannous	01082b84e5	Merge branch 'nightly' into feat/gguf-llama-cpp-inference	2026-02-25 16:06:03 +04:00
Roland Tannous	a1e064b1c4	Remove UNSLOTH_ENABLE_LOGGING from export pipeline	2026-02-25 16:00:24 +04:00
Roland Tannous	a7fe8a388c	Filter GGUF models from training page model selectors GGUF models can't be fine-tuned, so hide them from the training/studio page while keeping them available for inference on the chat page. - Add "gguf" to EXCLUDED_TAGS in HF model search hook - Filter local models with .gguf extension or -GGUF in ID	2026-02-25 15:47:45 +04:00
samit	bbe208ea38	reduced broad padding	2026-02-25 03:44:05 -08:00
samit	0eab635666	added space to show model/dataset name	2026-02-25 03:40:44 -08:00
Roland Tannous	299ce77467	added vision.py patch for vision processor from PR#260	2026-02-25 11:39:17 +00:00
Roland Tannous	cfaa2f2074	Merge pull request #249 from unslothai/fix/section-card-corner-bleed Fix: Clip section card overflow to prevent background bleed	2026-02-25 15:27:46 +04:00
Roland Tannous	cb3e4f2c26	Merge pull request #259 from unslothai/feat/dataset-subsets-split Feat/dataset subsets split	2026-02-25 15:27:12 +04:00
Roland Tannous	96217b5056	Merge pull request #246 from unslothai/fix/dataset-custom-mapping-heuristic adding custom mapping according to the chat templates	2026-02-25 15:26:36 +04:00
Roland Tannous	d2fe02ff04	Merge pull request #260 from unslothai/fix/fix-vision-processor-unsloth-bug fix: correct vision.py patch path to unsloth/models/vision.py + add V…	2026-02-25 15:21:01 +04:00
Daniel Han	78963ca19c	Fix Nemotron-H and Nemotron-VL model support (#4105 ) * Fix Nemotron-H and Nemotron-VL model support - Add Mamba kernel precision settings for Nemotron-H hybrid models - Fix VL model auto_model selection for models that only register AutoModelForCausalLM in their auto_map - Skip quantization of out_proj for Nemotron-H Mamba layers * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Simplify VLM auto_model selection logic Reduce three branches to two since the first and third both assign AutoModelForVision2Seq. The simplified condition checks whether the auto_map exclusively registers AutoModelForCausalLM without the VLM class, and defaults to AutoModelForVision2Seq otherwise. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-25 03:14:12 -08:00
Shine1i	db11f1a601	style(studio): align card heights and restore dataset advanced section placement	2026-02-25 12:11:06 +01:00
Roland Tannous	40719c4a6f	fix: correct vision.py patch path to unsloth/models/vision.py + add VLM processor diagnostic	2026-02-25 11:06:22 +00:00
Roland Tannous	6f0b7bc38a	fix: use raw github URL for vision.py patch + add VLM processor diagnostic logging	2026-02-25 10:29:05 +00:00
Shine1i	cb57d48e7e	chore: add tour label next to navbar tour icon	2026-02-25 11:23:43 +01:00
Manan17	6e8e70c987	fixing the chatml None error	2026-02-25 10:23:13 +00:00
Shine1i	122311a6b1	fix recipe output path, remove tracked root datasets	2026-02-25 11:19:10 +01:00
Wasim Yousef Said	c67da8f349	Merge pull request #257 from unslothai/feature/chat-model-switch-warning feat: chat model switching toast and add image detection logic	2026-02-25 01:58:56 -08:00
Shine1i	44d6abb36c	feat: chat model switching toast and add image detection logic	2026-02-25 10:55:53 +01:00
Wasim Yousef Said	a793660960	Merge pull request #254 from unslothai/feature/theme-fix feat: fix markdown rendering, UI adjustments	2026-02-25 00:48:09 -08:00
Shine1i	8732f3befb	feat: fix markdown rendering, UI adjustments	2026-02-25 09:46:13 +01:00
samit	a199e3f682	rebase with nightly	2026-02-25 00:36:25 -08:00
Wasim Yousef Said	468ec99489	Merge pull request #253 from unslothai/feature/theme-fix feat: fix dark mode support and refine UI assets	2026-02-25 00:27:43 -08:00
Shine1i	adcbf78553	feat: fix dark mode support and refine UI assets	2026-02-25 09:23:05 +01:00
Manan17	47fc79df6d	My changes for dataset	2026-02-25 08:15:44 +00:00
Manan17	60912e45e6	adding custom mapping according to the chat templates	2026-02-25 07:56:30 +00:00
imagineer99	0b47ab1eab	fix: clip section card overflow to prevent background bleed at rounded corners	2026-02-25 03:54:40 +00:00
imagineer99	dbf5acf486	feat: filter pretraining datasets from search results	2026-02-25 03:00:40 +00:00
Roland Tannous	875c6c8094	Merge pull request #247 from unslothai/fix/path-traversal-vuln Fix Content-Length crash and path traversal vulnerability in frontend serving	2026-02-25 05:16:00 +04:00
Roland Tannous	a6f1153f9a	fix: replace FileResponse with Response for index.html to prevent Content-Length mismatch and add path traversal guard	2026-02-25 01:05:04 +00:00
Roland Tannous	da8e58fb8f	Merge pull request #30 from unslothai/feature/canvas-lab Draft: Data Recipes graph editor WIP	2026-02-25 04:01:53 +04:00
Shine1i	773f7945a6	chore: squircle! tooltip	2026-02-25 00:55:35 +01:00
Shine1i	faa36e9abb	feat: update icons and enhance dark mode styling for navbar and recipes - Replaced `CookBookIcon` with `ChefHatIcon` in navbar for improved clarity. - Added dark mode-specific gradient styles to recipe cards for better visual differentiation.	2026-02-25 00:39:49 +01:00
Shine1i	164560c6c9	feat: improve dark mode styling and simplify navbar	2026-02-25 00:31:20 +01:00
Roland Tannous	7adb69581e	Fix GGUF export cwd confusion: remove os.chdir, use absolute paths Remove os.chdir(save_directory) from export.py which was causing all of unsloth-zoo's relative-path internals (check_llama_cpp, use_local_gguf, _download_convert_hf_to_gguf) to resolve against the export directory instead of the repo root. This caused llama.cpp to be cloned inside each export dir and destroyed the repo root's llama-server build on cleanup. Now passes absolute paths to save_pretrained_gguf so unsloth resolves llama.cpp from the repo root where setup.sh already built it. Also builds llama-quantize in setup.sh (needed by unsloth-zoo's export pipeline) and symlinks it to llama.cpp root for check_llama_cpp().	2026-02-25 03:30:54 +04:00
Shine1i	2548720c01	Merge branch 'feature/canvas-lab' of https://github.com/unslothai/new-ui-prototype into feature/canvas-lab # Conflicts: # studio/frontend/bun.lock	2026-02-25 00:19:19 +01:00
Shine1i	929c7f86e4	feat: add animated theme toggler and refine dark mode styling	2026-02-25 00:18:20 +01:00
Manan17	fdbc60de77	adding custom mapping according to the chat templates	2026-02-24 21:15:56 +00:00
Leo Borcherding	a3daae1c40	fix: replace datetime.UTC with timezone.utc for Python 3.9+ compatibility - Replace datetime.UTC with datetime.timezone.utc in authentication.py and storage.py - Fixes ImportError on Python versions < 3.11 - timezone.utc works on Python 3.9+ Resolves #237	2026-02-24 14:37:00 -06:00
Roland Tannous	0e7c8a2e5e	Switch GGUF backend from /v1/completions to /v1/chat/completions Fixes two bugs: 1. Chat template tags (<\|im_start\|>, <\|im_end\|>) leaking into output because /v1/completions treated them as literal text 2. Image hallucination because image_b64 was never passed to llama-server Now llama-server handles chat templates natively and receives images as OpenAI-format multimodal content parts for vision models.	2026-02-24 19:21:01 +04:00
Roland Tannous	ef1cd3ac98	Use llama-server -hf mode, add GGUF variant selector, fix vision detection Replace Python-side GGUF download with llama-server's native -hf flag for HuggingFace repos. Add frontend variant picker so users can choose quantization (Q4_K_M, Q8_0, BF16, etc.) with file sizes. Fix vision detection via mmproj files instead of hardcoding is_vision=False.	2026-02-24 19:03:06 +04:00
Roland Tannous	08aeeaee4b	Fix llama-server: build in-tree, fix path resolution, add LD_LIBRARY_PATH	2026-02-24 18:19:29 +04:00
Roland Tannous	4e88092452	Preflight llama-server check before downloading remote GGUF files	2026-02-24 18:02:43 +04:00
Daniel Han	0f5a1fa7c3	Fix FP8 model loading: redirect to BF16 sibling for BNB/16-bit (#4095 ) * Fix FP8 model loading for BNB/16-bit: redirect to BF16 sibling Models like Ministral-3-3B-Instruct-2512 ship with FP8 weights and an FP8 quantization_config in their config.json. Loading these with BNB 4-bit/8-bit fails because BNB cannot quantize FP8 tensors. Loading with 16-bit also fails because the FP8 quantization config has activation_scheme=static which is unsupported by transformers' FineGrainedFP8Config. When an FP8 model is detected and the user is not explicitly requesting FP8 loading, check if a BF16 sibling repo exists (model_name + "-BF16") and redirect to it. This happens early in the loading flow before any quantization config processing. Also pass the modified model_config to auto_model.from_pretrained to avoid transformers re-reading the original config from the model repo. Tested with Ministral-3-3B in 4-bit and 16-bit modes. Both now load and train correctly. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Simplify FP8 condition and narrow exception handling Simplify the load_in_fp8 check (works for bool and string values). Narrow inner except to KeyError and add comment for outer except. * Warn user when FP8 model has no BF16 sibling for redirect Previously the except block silently fell through with `pass`, so users would get a confusing BNB dtype error later. Now prints a clear message explaining the FP8 situation and suggesting load_in_fp8=True or uploading a BF16 version. * Fix FP8 redirect state corruption and add fbgemm_fp8 support - Fix state corruption: model_name was reassigned before AutoConfig.from_pretrained, so if config fetch failed, model_name pointed to BF16 repo while auto_config still had FP8. Now only updates state after both checks succeed. - Save original model_name so warning message is correct even on failure. - Handle fbgemm_fp8 quant method in addition to fp8. * Extract FP8 redirect to shared _redirect_fp8_to_bf16() in _utils.py Addresses reviewer feedback: - Move FP8 redirect logic to a shared function callable from both vision.py (FastBaseModel) and llama.py (FastLlamaModel) - Raise RuntimeError instead of warning when BF16 sibling not found - Add FP8 redirect to llama.py for text-only model loading path * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ministral 3B/8B/14B mapper entries Adds all 9 Ministral model variants to the mapper: - Instruct (3B, 8B, 14B) with FP8 variant mappings - Base (3B, 8B, 14B) - Reasoning (3B, 8B, 14B) This routes mistralai/Ministral-* to unsloth/Ministral-* repos (BF16 weights), which also avoids the FP8 config issue for the standard loading path through loader.py. * Add FP8 mapper entries for Mistral-Small-3.2 and Magistral-Small-2509 --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-16-253.us-east-2.compute.internal> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-24 05:56:07 -08:00
Roland Tannous	a900eb9ad7	Fix GGUF detection for HuggingFace repo IDs (not just local paths)	2026-02-24 17:49:09 +04:00
Roland Tannous	70c912d788	Fix CUDA detection for llama-server build on multi-GPU machines	2026-02-24 17:45:17 +04:00
Roland Tannous	a40ebb1aab	Add GGUF model inference via llama-server backend	2026-02-24 17:40:05 +04:00
Roland Tannous	3f34996288	Merge branch 'nightly' into feature/canvas-lab	2026-02-24 10:08:13 +00:00
Roland Tannous	7bbb1f0a0b	Merge pull request #218 from unslothai/fix/stop-startup-modal Added cancel training button on the overlay	2026-02-24 14:03:25 +04:00
Roland Tannous	d38656139d	Merge pull request #241 from unslothai/feature/adding-exported-models-for-chat Adding exported model for chat	2026-02-24 13:55:45 +04:00
Roland Tannous	3ffbee3586	fix(chat): strip /suffix from lora display name and show type tag instead of base model	2026-02-24 09:54:34 +00:00
Roland Tannous	2149bc74ee	Merge pull request #232 from unslothai/fix/disable-eval-by-default # fix/disable eval by default	2026-02-24 13:35:11 +04:00
Roland Tannous	f5057d86ed	use explicit float bounds for eval_steps input (0.0–1.0)	2026-02-24 09:31:30 +00:00
Roland Tannous	2be2933846	skip eval split and HF split detection when eval_steps is disabled	2026-02-24 09:26:54 +00:00
imagineer99	b8617a5544	feat: move cancel training button inside terminal startup card	2026-02-24 09:05:02 +00:00
Roland Tannous	c0f6012d77	Merge remote-tracking branch 'origin/nightly' into feature/canvas-lab # Conflicts: # studio/frontend/bun.lock # studio/frontend/package.json	2026-02-24 09:02:54 +00:00
Roland Tannous	8aca1cad29	Merge pull request #231 from unslothai/feat/custom-YAML-saving Feat: Add Upload / Save / Reset training config from local YAML	2026-02-24 12:43:42 +04:00
imagineer99	002fe3d879	feat: improve training config UX and remove unused logging options	2026-02-24 08:22:02 +00:00
Shine1i	a4b7d360de	chore: remove unused "Evaluate" navigation item and its icon from navbar	2026-02-24 09:09:56 +01:00
Shine1i	c8c844a4d6	feat: introduce single-env Python dependency management for streamlined compatibility - Added constrained dependency files for single-env installations: `constraints.txt`, `data-designer.txt`, and `data-designer-deps.txt`. - Implemented a `patch_metadata.py` script to resolve metadata conflicts between dependency versions. - Updated `setup.sh` to integrate single-env setup, including dependency installation and metadata patching. - Upgraded `fastmcp` and `websockets` versions in `extras.txt` for compatibility. - Commented out unused "Start Tutorial" button in `data-recipes-page.tsx`.	2026-02-24 07:45:40 +01:00
Shine1i	dad78ae0ce	feat: improve markdown note styles and layout logic	2026-02-24 04:04:02 +01:00
Shine1i	b80796a7cd	feat: enhance markdown note blocks with style options and double-click config access - Added support for configuring markdown note block styles, including color and opacity. - Enabled double-click on markdown notes to open their configuration dialog. - Adjusted layout styles in markdown previews for better interaction control. - Updated relevant payloads, types, and UI logic to support added styling features. - Integrated multiple example notes in learning recipes for better visualization.	2026-02-24 03:47:42 +01:00
Shine1i	3989cd6524	feat: introduce markdown note blocks for canvas documentation - Added "Markdown Note" block to allow users to add UI-only markdown notes to the canvas for documentation purposes. - Integrated note creation, editing, and rendering in the `recipe-studio` UI, including markdown previews. - Updated payload generation logic to omit markdown notes from backend payloads. - Enhanced block types, definitions, and dialog support to include the new "Markdown Note" feature.	2026-02-24 03:11:29 +01:00
Shine1i	ba000dc0f2	feat: add "Multi-Turn Chat" learning recipe with structured conversation outputs - Introduced "Multi-Turn Chat" recipe to generate structured user-assistant conversations with domain/topic-based goals and constraints. - Added `conversation.json` with model configuration, sampling strategies, and LLM prompts. - Updated UI nodes, layout, and graph rendering logic to support new recipe. - Enhanced `recipe-studio` fit view logic to improve editor layout responsiveness.	2026-02-24 02:38:36 +01:00
samit	32d5cd7198	resolved unbound variable error	2026-02-23 17:37:23 -08:00
Manan17	aeb198f52d	Fixing base model export issue for vlms	2026-02-24 01:34:11 +00:00
Manan17	4be677e45d	Adding exported model for chat	2026-02-24 01:17:09 +00:00
pre-commit-ci[bot]	36181bad96	[pre-commit.ci] pre-commit autoupdate (#4096 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.15.1 → v0.15.2](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.1...v0.15.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-23 17:04:34 -08:00
Shine1i	dec6b4b224	feat: add new learning recipes for diverse data transformations - Added three new learning recipes: "Instruction from Answer," "PDF Grounded QA," and "Structured Outputs Jinja," with respective metadata and configuration. - Integrated support for unstructured and structured input handling, including sampling strategies, prompt definitions, and model specifications. - Enhanced JSON structure and UI nodes to facilitate better recipe visualization and execution.	2026-02-24 01:50:55 +01:00
Shine1i	5254f04065	feat: add layout direction support and enhance handle logic - Introduced `layoutDirection` to control graph orientation ("LR" or "TB") and integrate into edges, nodes, and payloads. - Enhanced handle management with new default, semantic, and data-specific mappings based on layout direction. - Added handle normalization for consistent connections across layouts and semantic/data flows. - Updated UI to reflect layout-aware positioning and semantic connections.	2026-02-24 00:51:49 +01:00
Shine1i	9f574941f9	feat: normalize handle IDs and enhance scorer options UI - Added handle normalization functions to standardize handle IDs across connections. - Expanded UI for scorer options with real-time updates, input fields for values and descriptions, and support for adding/removing options. - Updated graph node handles and their layout logic for better connection visualization. - Stripped sensitive fields (e.g., `api_key`) from payloads during export.	2026-02-24 00:29:14 +01:00
Shine1i	ad95b4a951	feat: add "Instruction from Answer" learning recipe and badge display enhancements - Introduced a new "Instruction from Answer" learning recipe with related metadata, payload integration, and UI updates. - Enhanced badge display logic to include up to 3 badges with overflow indication for additional learning badges.	2026-02-24 00:25:21 +01:00
Shine1i	ab31aa9ed4	feat: add per-column seed drop support with UI integration, validation, and payload enhancements	2026-02-23 23:33:59 +01:00
Shine1i	54382c659c	feat: add support for learning recipes with template loading, dialog integration, and enhanced payload handling	2026-02-23 23:20:32 +01:00
Shine1i	71916f1dce	feat: add ShineBorder UI component and learning recipe templates to enhance data recipes page	2026-02-23 22:38:06 +01:00
Shine1i	8739a01f56	Merge branch 'nightly' into feature/canvas-lab	2026-02-23 21:54:35 +01:00
Shine1i	b8231a6be6	refactor: keep seed block pos	2026-02-23 21:53:32 +01:00
Shine1i	1323e0af53	refactor: add batch processing support with configuration options and execution enhancements	2026-02-23 21:32:20 +01:00
Shine1i	59a15cb5bc	refactor: enhance recipe validation flows with error collection, seed-specific updates, and improved UX in execution dialogs	2026-02-23 20:34:53 +01:00
Shine1i	d4655eb8bf	refactor: streamline recipe execution flows with validation support and enhanced run dialog interactions	2026-02-23 20:28:41 +01:00
Shine1i	91cbb0e933	refactor: improve dialog rendering and logging setup for stability and configurability	2026-02-23 20:16:03 +01:00
Leo Borcherding	cdeed53a97	fix: disable eval by default, set eval_steps to 0.0 - Changed default eval_steps from 0.01 to 0.0 across backend and frontend - Fixed UI to allow eval_steps=0 (removed min=0.001 constraint) - Added conditional eval logic with helpful console messages - Updated tooltip to explain how to disable evaluation - Tested: confirmed eval disabled by default with eval_steps=0.0	2026-02-23 13:07:47 -06:00
imagineer99	6cedc339c6	feat: add Upload / Save / Reset training config from local YAML	2026-02-23 18:55:08 +00:00
Shine1i	71ab9ff4b4	refactor: enhance seed configuration handling with added fields, dynamic chunking logic, and streamlined interactions	2026-02-23 19:40:13 +01:00
Shine1i	424b00b701	refactor: improve seed source handling with additional type support, enhanced parsing logic, and text chunking optimization	2026-02-23 19:29:54 +01:00
Shine1i	3e17e2b0f6	refactor: enhance seed source handling with new source types and streamlined inspection flows	2026-02-23 18:46:02 +01:00
imagineer99	71d698d182	feat: sort and filter dataset search results by model type relevance	2026-02-23 16:22:45 +00:00
Roland Tannous	77b0978d5f	Merge pull request #228 from unslothai/fix/cap-num-proc-multigpu-deadlock Cap dataset.map num_proc on multi-GPU machines to prevent fork deadlocks	2026-02-23 19:04:22 +04:00
Roland Tannous	d74174f7f5	Cap dataset.map num_proc on multi-GPU machines to prevent fork deadlocks	2026-02-23 14:25:31 +00:00
Roland Tannous	6acc2dbf8f	Merge branch 'nightly' into feature/transformers-v5-support	2026-02-23 13:40:16 +00:00
Roland Tannous	4cb0cfdaf5	Remove firebase-debug.log and setup_leo.sh from tracking and add to .gitignore	2026-02-23 17:38:22 +04:00
Roland Tannous	c017ce802f	Remove firebase-debug.log and setup_leo.sh from tracking and add to .gitignore	2026-02-23 17:37:46 +04:00
Roland Tannous	313e77c5fd	Merge branch 'nightly' into feature/transformers-v5-support	2026-02-23 13:32:51 +00:00
Roland Tannous	2e2aa54ad2	Merge pull request #225 from unslothai/fix/fix-response-on-completion-truncation fix: error on >30% sample drop after `train_on_responses_only` instead of silent DataLoader crash	2026-02-23 16:28:32 +04:00
Roland Tannous	3015916d26	fix: error on >30% sample drop after train_on_responses_only instead of silent DataLoader crash	2026-02-23 12:21:06 +00:00
Roland Tannous	b03938f6ad	Merge nightly into main Brings main up to date with nightly, including chat attachments, model-per-thread persistence, speech recognition, VRAM recommendations, MoE model configs, VLM fixes, and compile cache cleanup. Conflicts resolved by taking nightly's version for all diverged files (main-only changes were a feature add + immediate revert with net zero effect).	2026-02-23 14:52:57 +04:00
Daniel Han	2ed86865fb	Suppress FBGEMM CUTLASS stdout spam on Blackwell GPUs (#4092 ) * Suppress FBGEMM CUTLASS "Arch conditional MMA" stdout spam on Blackwell GPUs On Blackwell GPUs (B200/B100, SM100), FBGEMM's f8f8bf16_blockwise kernel is hardcoded to cutlass::arch::Sm90 with no SM100 code path. When test_has_fbgemm() probes this kernel, it fires 2304 "ERROR : Arch conditional MMA instruction used without targeting appropriate compute capability" lines before aborting and returning zeros. The existing HidePrintMessage filter on sys.stderr (line 109) does not catch these because CUDA device-side printf writes to stdout fd 1 at the C level, bypassing Python's sys.stdout/sys.stderr entirely. Fix: add suppress_cuda_printf() context manager in import_fixes.py that redirects fd 1 and fd 2 to /dev/null at the OS level, with torch.cuda.synchronize() and libc fflush before restoring. Wrap the test_has_fbgemm() call in fp8.py with this context manager. Tested on B200 with fbgemm-gpu-genai 1.4.0+cu130 and 1.5.0+cu130: - Before: 2304 warning lines on every import - After: 0 warning lines - UNSLOTH_HAS_FBGEMM correctly set to 0 (Triton fallback works) - Works with both UNSLOTH_ENABLE_LOGGING=0 and =1 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Guard _libc init and fflush to prevent fd leak on failure --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-16-253.us-east-2.compute.internal> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-23 01:27:10 -08:00
Daniel Han	fec06247c9	Fix VLM processor load degradation and vLLM CUDA version detection (#4091 ) * Fix VLM processor load degradation and vLLM CUDA version detection vision.py - Fix VLM processor load for issue #4085: - Before loading the processor, scan local config files and strip the _Unsloth_Patched_ prefix. AutoProcessor.from_pretrained silently degrades to a text-only tokenizer instead of raising an exception when it encounters the unrecognized class name, so the existing get_auto_processor fallback never triggers. Sanitizing the configs before loading fixes backwards compat for old corrupted saves. - After loading, detect when AutoProcessor returned a text-only tokenizer for a VLM model (has no image_processor attribute) and trigger the manual fallback constructor. import_fixes.py - Fix vLLM CUDA version mismatch detection: - _is_broken_vllm_error now also matches CUDA shared library errors (libcudart, libcublas, libnvrtc) with "cannot open shared object file". Previously it only matched errors containing "vllm._c" in the message text, which missed cases where the error message was about the missing CUDA library itself (e.g. vllm built for CUDA 12 on a CUDA 13 system). - New _get_vllm_cuda_mismatch_message function extracts the CUDA version from the error, compares to the system CUDA version via torch.version.cuda, and returns a targeted install command using the correct GitHub releases wheel URL. - disable_broken_vllm uses the targeted message when a CUDA mismatch is detected, falling back to the existing generic message otherwise. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-16-253.us-east-2.compute.internal> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-23 01:06:53 -08:00
Roland Tannous	5416bdd4e6	added shutil import to main.py	2026-02-23 07:44:59 +00:00
Roland Tannous	f16a7f2d17	Merge nightly into feature/transformers-v5-support	2026-02-23 07:40:28 +00:00
Roland Tannous	bfb84221fe	Merge pull request #221 from unslothai/feature/clear-unsloth-compile-cache feat: clear unsloth_compiled_cache on startup, shutdown, and between …	2026-02-23 11:30:46 +04:00
Roland Tannous	dbbcdb4f09	feat: clear unsloth_compiled_cache on startup, shutdown, and between model loads	2026-02-23 07:26:22 +00:00
Roland Tannous	2a117f57a9	Merge pull request #220 from unslothai/feature/moe-training-models-configs Add model defaults for MoE models (Qwen3 MoE, GLM Flash) and GLM response mapping	2026-02-23 10:09:27 +04:00
Roland Tannous	fb1c321ad3	Add GLM, Qwen3 MoE, TinyQwen3 MoE, and Ministral 3 VL model defaults and GLM train_on_responses_only mapping	2026-02-23 05:51:43 +00:00
Roland Tannous	3fe85d36cc	Remove stale .venv_overlay on server startup to prevent transformers version conflicts	2026-02-23 05:08:27 +00:00
Roland Tannous	6995aaf077	Clean up stale .venv_overlay directory during setup	2026-02-22 20:30:00 +00:00
Roland Tannous	e3fb4f53df	Patch adapter_config.json with unsloth_training_method and auto-detect load_in_4bit for LoRA inference	2026-02-22 20:27:52 +00:00
Roland Tannous	5de6246142	Purge own utils/core modules and use lazy imports so is_vision_model picks up fresh AutoConfig after version switch	2026-02-22 20:04:35 +00:00
samit	7dcaa52083	added cancel training button on the overlay	2026-02-22 12:03:26 -08:00
Roland Tannous	c12d75c472	Add transformers version switch to model config and vision check endpoints for dropdown selection	2026-02-22 19:56:12 +00:00
Roland Tannous	60997a75eb	Install transformers into both site-packages and overlay to fix sub-package resolution during version switch	2026-02-22 19:44:12 +00:00
Roland Tannous	7cde520176	Move transformers overlay to local .venv_overlay/, add huggingface-hub to overlay install	2026-02-22 19:34:46 +00:00
Roland Tannous	15cb9b0f37	Use sys.path overlay to switch transformers versions in-process instead of modifying site-packages	2026-02-22 19:19:12 +00:00
Roland Tannous	0050e78aa3	Fix in-memory transformers version detection and aggressive module purge for 5.1.0/4.57.1 switching	2026-02-22 19:08:18 +00:00
Roland Tannous	1c2653fcc2	aggressive reload_transformers	2026-02-22 18:52:06 +00:00
Roland Tannous	4d06258e93	Auto-switch transformers version (5.1.0/4.57.1) for Ministral-3, GLM-4.7-Flash, Qwen3-30B-A3B models with LoRA adapter resolution	2026-02-22 18:29:40 +00:00
Roland Tannous	bb1bd49a68	Merge pull request #217 from unslothai/fix/update-config-yamls Fix vision LoRA defaults for VLMs and clean up text-only model configs	2026-02-22 19:14:54 +04:00
Roland Tannous	132cdb0547	fix: correct vision LoRA defaults for VLMs and remove vision fields from text-only model configs	2026-02-22 15:09:10 +00:00
Roland Tannous	7ac391d1e0	Merge pull request #215 from unslothai/fix/vlm-processing-class fix: pass full Processor as processing_class for VLM SFTTrainer	2026-02-22 18:14:14 +04:00
Roland Tannous	202b7cdfa7	fix: pass full Processor as processing_class for VLM SFTTrainer	2026-02-22 14:11:12 +00:00
Roland Tannous	a4346b954e	Merge pull request #213 from unslothai/fix/fix-clear-chat-new-model Fix: Clear Chat and Manage Model Lifecycle on Model Switch	2026-02-22 17:54:27 +04:00
Roland Tannous	761953b50e	feat(chat): persist model per thread and auto-load on thread switch	2026-02-22 13:35:45 +00:00
Roland Tannous	536a735acc	feat(chat): eject current model and start fresh thread on model switch	2026-02-22 13:17:03 +00:00
Roland Tannous	a490dca8a0	Merge pull request #211 from unslothai/fix/param-count-display Fix: Remove download count fallback when model param count is unavailable	2026-02-22 16:52:25 +04:00
Roland Tannous	d3fbaa2256	Merge pull request #204 from unslothai/fix/image-preview-thumbnail Fix: Resolve image preview thumbnail not rendering before send	2026-02-22 16:09:24 +04:00
Roland Tannous	36f10ea90e	Merge pull request #212 from unslothai/fix/stop-unclickable Updated stop button to be unavailable during cancel training	2026-02-22 16:08:47 +04:00
Roland Tannous	c5a88ef90b	Merge pull request #208 from unslothai/revert-207-feature/attachment-restore Revert "fix(chat): persist + hydrate user attachments in IndexedDB history"	2026-02-22 12:22:25 +04:00
Roland Tannous	df8e4bdc40	Merge pull request #203 from unslothai/feat/sort-unsloth-models-first Feat: Sort unsloth models first in HF search dropdowns	2026-02-22 12:21:42 +04:00
Roland Tannous	75f3e5e2a1	feat: dual-query HF model search to surface all unsloth size variants first	2026-02-22 08:20:25 +00:00
imagineer99	4fe3772aa2	fix: remove download count fallback when model param count is unavailable	2026-02-22 06:30:50 +00:00
Wasim Yousef Said	4779ae8e61	Merge pull request #209 from unslothai/feature/attachment-restore fix(chat): persist + hydrate user attachments in IndexedDB history	2026-02-21 22:17:18 -08:00
Roland Tannous	ac83e8b668	Revert "fix(chat): persist + hydrate user attachments in IndexedDB history"	2026-02-22 10:15:24 +04:00
Wasim Yousef Said	4f2e434bc3	Merge pull request #207 from unslothai/feature/attachment-restore fix(chat): persist + hydrate user attachments in IndexedDB history	2026-02-21 22:09:53 -08:00
Shine1i	c726cff4c8	feat: add utils for deep cloning content and attachments in thread messages	2026-02-22 07:08:18 +01:00
Shine1i	7e1e25fb32	refactor: simplify execution overview tab by removing unused token metrics and refining layout spacing	2026-02-22 06:36:32 +01:00
Shine1i	b7dfa2b7e4	refactor: extract and modularize execution tabs and helpers for enhanced code reusability and maintainability	2026-02-22 06:26:40 +01:00
Shine1i	d6921042b0	refactor: enhance execution row handling and dataset pagination for improved interactivity and preview support	2026-02-22 05:44:23 +01:00
Shine1i	4cda750589	refactor: extract reusable runtime utilities and unify execution dialog flows for preview and full runs	2026-02-22 05:37:33 +01:00
Shine1i	17a22fe155	refactor: improve layout direction handling and auxiliary node visibility for LLMS	2026-02-22 05:18:33 +01:00
Shine1i	2cc9981ef9	refactor: enhance variable handling with structured entries and UI updates for badges	2026-02-22 04:07:01 +01:00
Shine1i	0786728323	refactor: extract reusable helpers and streamline seed inspection flow	2026-02-22 03:41:59 +01:00
imagineer99	52a2bfe016	fix: resolve image preview thumbnail not rendering before send	2026-02-22 02:39:56 +00:00
Shine1i	869ac64e18	feat: enhance dataset seed handling with inspection and UI improvements	2026-02-22 03:39:34 +01:00
imagineer99	c815fc045d	feat: sort unsloth models first in HF search dropdowns	2026-02-22 01:45:51 +00:00
Shine1i	3b29d088c0	merge: nightly into feature/canvas-lab	2026-02-22 02:31:32 +01:00
Shine1i	63c1b95f20	refactor: replace inline labels with reusable `FieldLabel` component in dialogs	2026-02-22 02:30:04 +01:00
Shine1i	77bed648ae	refactor: update UI styles for graph nodes and components with consistent transitions and rounded elements	2026-02-22 02:20:51 +01:00
Shine1i	e3c3bf75a0	refactor: streamline samplers and block handling, update dialogs and validation	2026-02-22 02:16:09 +01:00
Roland Tannous	000da20237	Merge pull request #202 from unslothai/fix/rename-downloading-model-to-loading-model renaming downloading model to loading model	2026-02-21 17:32:32 +04:00
Roland Tannous	9bc32789a6	renaming downloading model to loading model	2026-02-21 13:30:41 +00:00
Roland Tannous	44828f582f	Merge pull request #200 from unslothai/fix/sloth-z-index-overlay fix: enable navbar z-index by adding relative position	2026-02-21 17:15:05 +04:00
Roland Tannous	ea324f2806	Merge pull request #193 from unslothai/feature/vram-fit-chat Added vram fit indicator to models in chat	2026-02-21 17:09:31 +04:00
Roland Tannous	3a4a576128	Merge pull request #196 from unslothai/fix/compare-dictate-attachment-buttons Added dictate and add attachments feature in chat page	2026-02-21 16:52:56 +04:00
Roland Tannous	9b3cab6d5f	move microphone icon in compare page to be next to send button	2026-02-21 12:51:38 +00:00
samit	3fb9b4c056	added SpeechRecognition declarations and missing type packages	2026-02-21 01:00:08 -08:00
imagineer99	33b31c6be6	fix: enable navbar z-index by adding relative position	2026-02-21 08:37:47 +00:00
samit	f4a888cddb	Added VRAM fit indicator to recoomended models	2026-02-20 23:55:10 -08:00
Roland Tannous	ea0964de75	Merge pull request #192 from unslothai/feature/model-download-status Updated to edit loading as "downloading model"	2026-02-21 11:07:08 +04:00
samit	e7d32a6461	updated stop button to unavailable during cancel training	2026-02-20 22:45:41 -08:00
Roland Tannous	470d12cf46	Merge pull request #188 from unslothai/fix/gemma-3-chat fixed the vlm's text only errors	2026-02-21 10:16:20 +04:00
samit	97f40bdc58	Added dictate and add attachments feature	2026-02-20 22:14:27 -08:00
Roland Tannous	c051e3d532	fix: load proper vision processor from base model when FastVisionModel returns raw tokenizer, add tokenize=False to vision chat template	2026-02-21 04:40:29 +00:00
Manan17	f6ebeb1d42	Mapping proper tokenizer for VLMs	2026-02-21 01:57:05 +00:00
samit	08c3c80d31	added vram fit indicator to models in chat	2026-02-20 17:25:03 -08:00
samit	0a3beade35	updated to edit loading as downloading model	2026-02-20 15:49:33 -08:00
Manan17	3fa9e773c2	fixed the vlm's text only errors	2026-02-20 22:23:26 +00:00
Roland Tannous	ef3bd22b02	Merge pull request #187 from unslothai/fix/fix-git-clone-branch-colab removed branch from colab git clone	2026-02-20 23:16:06 +04:00
Roland Tannous	a77b9717f8	removed branch from colab git clone	2026-02-20 19:15:03 +00:00
Roland Tannous	ed476534f7	Merge pull request #186 from unslothai/fix/colab-setup-fixes Fix/colab setup fixes	2026-02-20 23:08:25 +04:00
Roland Tannous	08ff8de31d	add huggingface-hub==0.36.0 due to colab error	2026-02-20 18:26:20 +00:00
Roland Tannous	48e232b38c	add huggingface-hub==0.36.0 due to colab error	2026-02-20 18:24:48 +00:00
Roland Tannous	34131da9a4	moved transformers4.57.1 to no-extra-deps	2026-02-20 18:08:58 +00:00
Roland Tannous	bf34ede725	Merge pull request #180 from unslothai/fix/dropdown-menu-prefill Fix: model and dataset dropdowns selecting stale value on Enter	2026-02-20 22:01:35 +04:00
Roland Tannous	e5f9ae5c9f	Merge branch 'nightly' into fix/dropdown-menu-prefill	2026-02-20 17:44:06 +00:00
Roland Tannous	63c564d7ec	Merge pull request #185 from unslothai/fix/remove-warmup-text-inference-status Fix: remove warmup text inference status	2026-02-20 21:14:37 +04:00
imagineer99	77b7e8a9ba	fix: remove warmup text inference status	2026-02-20 15:35:06 +00:00
Shine1i	0d2f81ab3d	refactor: extract and consolidate execution runtime and tracking logic	2026-02-20 14:41:38 +01:00
Shine1i	e0d65bb1cf	feat: add recipe execution stores, hooks, and logic for managing preview and full executions	2026-02-20 14:38:05 +01:00
Shine1i	4c19a2330a	refactor: simplify execution view by removing unused state and redundant logic	2026-02-20 14:19:52 +01:00
Shine1i	f50dd7bcd9	feat: refine execution view with enhanced summary and insights - Removed unused model usage properties (`total`, `tps`, `requestsSuccess`, etc.) for cleaner data handling. - Added new metrics: total input/output tokens, null rate, and low uniqueness flags. - Improved UI for execution summary cards with consolidated insights and model usage tables. - Introduced detailed analysis for dataset columns, including dropped columns and LLM column counts. - Optimized rendering logic to reduce clutter and enhance user experience.	2026-02-20 14:05:43 +01:00
Shine1i	360b4daf6d	feat: enhance execution log tracking, progress updates, and data visualization - Added `log_lines` field to track and display runtime logs for executions. - Enhanced progress tracking with terminal-like log outputs and live log scrolling. - Introduced detailed "model usage" and "dropped columns" analysis in `ExecutionsView`. - Optimized UI components for displaying dataset metrics, including input/output token averages.	2026-02-20 13:51:19 +01:00
Shine1i	e7fcfef8c7	feat: improve dataset column visibility and cell expansion in ExecutionsView - Added column visibility toggles using a dropdown menu for greater customization. - Introduced expandable table cells for long values with "expand/collapse" functionality. - Ensured hidden columns reset on execution change, providing a consistent user experience.	2026-02-20 13:04:57 +01:00
Shine1i	391b633cae	feat: enhance progress tracking for execution jobs - Added logic to calculate and manage column-level progress for job executions. - Introduced `progress_columns_total` and `_column_done` fields for more granular progress updates. - Improved overall progress computation by considering total columns and individual progress per column.	2026-02-20 12:52:45 +01:00
Shine1i	13e153e448	feat: refactor and extend recipe execution logic - Extracted shared execution utilities into `execution-helpers.ts` for reusability across features. - Replaced deprecated `/preview` endpoint and its logic with unified job execution handling. - Consolidated job execution flows ("Preview" and "Full Run") into shared `runJobExecution` logic. - Enhanced execution progress tracking with support for column-level progress reporting. - Added support for handling execution job events and improved error reporting from the backend. - Updated backend to better manage dataset access errors and provide more informative error messages. - Cleaned up redundant code in `use-recipe-studio-actions` and streamlined execution APIs.	2026-02-20 12:47:51 +01:00
imagineer99	759ae059db	Fix: model and dataset dropdowns selecting stale value on Enter	2026-02-20 11:33:20 +00:00
Shine1i	f3296b1953	feat: add dataset pagination support for recipe executions - Introduced backend changes to handle dataset pagination with limit, offset, and total row support. - Updated frontend execution view with dataset pagination controls, including "Next" and "Prev" buttons. - Extended recipe execution logic to manage dataset pagination details like page number, page size, and total records.	2026-02-20 12:12:02 +01:00
Shine1i	1259b75d15	feat: add support for full recipe executions with detailed progress and analysis - Introduced "Full Run" support in execution logic, including progress tracking, cancellation, and job status updates. - Extended backend to manage full execution jobs, handle dataset previews, and return detailed analysis and artifacts. - Updated frontend components to support full runs, with execution sorting, live updates, and detailed execution views. - Enhanced `ExecutionsView` with progress indicators, status filtering, and dataset preview capabilities. - Added IndexedDB schema migration to track additional execution metadata.	2026-02-20 12:05:42 +01:00
Roland Tannous	811a9243b2	Merge pull request #179 from unslothai/fix/copy-mac Added the copy feature on mac	2026-02-20 14:38:29 +04:00
Shine1i	d378e48c2a	feat: introduce execution tracking and analysis for recipe preview - Added `ExecutionsView` with execution history tracking, live updates, and detailed data analysis. - Implemented IndexedDB support via Dexie to persist execution records locally. - Enhanced backend preview logic to return execution analysis and artifacts. - Updated studio header with view toggling between "Editor" and "Executions."	2026-02-20 11:34:25 +01:00
Roland Tannous	cdd1f7fce2	Merge pull request #166 from unslothai/feat/download-progress-indicator feat: add download progress indicators for dataset preview and training overlay	2026-02-20 14:11:07 +04:00
Roland Tannous	168055a267	Merge pull request #172 from unslothai/fix/training-param Added optim and lr_scheduler_type in the frontend	2026-02-20 14:05:33 +04:00
Shine1i	763001b78e	refactor: remove Jinja autocomplete components and simplify variable handling - Deleted `jinja-ref-autocomplete` components and related hooks. - Replaced custom Jinja variable autocomplete with standard `Textarea` and `Input` components. - Streamlined variable handling logic by replacing `getAvailableRefItems` with `getAvailableVariables`. - Removed unused state (`flowMoving`) and redundant logic tied to Jinja-specific functionality.	2026-02-20 10:36:51 +01:00
samit	3d403c6c99	edited the font of the new parameters	2026-02-20 01:29:37 -08:00
Wasim Yousef Said	6dd0e11439	Merge branch 'nightly' into feature/canvas-lab	2026-02-20 01:23:44 -08:00
Roland Tannous	f61392cfb3	Merge pull request #173 from unslothai/fix/hf-token-model-hiding added hf token validation	2026-02-20 13:19:31 +04:00
samit	5ba8edf9fe	added the copy on mac	2026-02-20 00:49:38 -08:00
Roland Tannous	a8c1f5fa84	added NODE OPTIONS export , updating npm to 2.2.6	2026-02-20 08:27:09 +00:00
samit	68028bf7f3	added lr_scheduler type to the frontend	2026-02-19 23:31:46 -08:00
Roland Tannous	fa555c72e8	Merge pull request #175 from unslothai/fix/changing-num-proc-for-filtering Fix/changing num proc for filtering	2026-02-20 10:39:11 +04:00
Roland Tannous	3c1eff525f	Merge pull request #171 from unslothai/fix/compare-mode-race-and-adapter-toggle Fixing compare feature	2026-02-20 10:38:56 +04:00
Manan17	798bfb8f6f	Setting it to total cpu_count // 4	2026-02-20 06:32:01 +00:00
samit	035f765130	added hf token validation	2026-02-19 21:05:27 -08:00
samit	93b31f0db2	added optim in the frontend	2026-02-19 15:29:23 -08:00
Manan17	fdeccec259	Fixing compare feature	2026-02-19 20:15:44 +00:00
Roland Tannous	b6799a21a5	Merge pull request #169 from unslothai/feature/backwards-compatibility-unsloth-ui-command Add `unsloth-ui` alias for backwards compatibility	2026-02-19 22:44:41 +04:00
Roland Tannous	353cb13618	add unsloth-ui shell alias for backwards compatibility alongside unsloth-studio	2026-02-19 18:41:10 +00:00
Roland Tannous	53820f0eed	Merge pull request #168 from unslothai/fix/cli-studio-command-update Update `studio` CLI command to use new FastAPI backend	2026-02-19 22:17:37 +04:00
Roland Tannous	d486a2ef6f	rename cli studio command to use new FastAPI backend and add unsloth-ui alias for backwards compatibility	2026-02-19 17:59:22 +00:00
Daniel Han	3bddfed117	Patch trunc_normal_ for low-precision stability (#4027 ) * Fix low-precision trunc_normal initialization instability * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Document TorchTitan trunc_normal low-precision failure mode * Fix trunc_normal generator positional compatibility * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix trunc_normal generator TypeError fallback --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-19 04:40:14 -08:00
Daniel van Strien	8165266a37	Add optional datasets metadata support to save/push functions (#4076 ) * Add `datasets` metadata support to model cards Add an optional `datasets` parameter to all save/push functions so users can specify which datasets were used for training. The metadata is set via `ModelCard.data.datasets` for standard paths and via `metadata_update` for GGUF and generic save paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix datasets metadata for existing repos, add token, improve errors - Add metadata_update fallback in create_huggingface_repo and upload_to_huggingface so datasets metadata is set even when the repo already exists (previously only worked on first creation). - Pass token=token to all metadata_update calls so they work without a global HF login. - Replace silent except:pass with logger.warning_once for metadata failures so users know if something went wrong. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix generic datasets metadata repo resolution for PR #4076 * Fix create_huggingface_repo username resolution for PR #4076 --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-19 03:53:35 -08:00
Roland Tannous	f106ca5ed4	Merge pull request #167 from unslothai/fix/train_on_completions-dataset-check-improvements Simplify dataset check to 2-tier, improve multimodal detection, auto-set trainOnCompletions, recheck dataset on reload	2026-02-19 15:35:50 +04:00
Roland Tannous	18c41c2b08	Simplify dataset check to 2-tier, improve multimodal detection, auto-set trainOnCompletions, recheck dataset on reload	2026-02-19 11:25:54 +00:00
imagineer99	ee6d33fa32	feat: add download progress indicators for dataset preview and training overlay	2026-02-19 08:22:23 +00:00
Roland Tannous	21ee32df40	Auto-set trainOnCompletions based on vision/multimodal state, default all model configs to true, and re-fetch model defaults on page reload	2026-02-19 06:40:40 +00:00
Roland Tannous	e2b7b4b54c	change train_on_completions to true	2026-02-19 06:02:16 +00:00
Roland Tannous	9f1caa1ceb	Revert "Setting default for the train on responses only" This reverts commit `982a855df7`.	2026-02-19 09:55:21 +04:00
Roland Tannous	b35798c8ff	Merge pull request #163 from unslothai/fix/add-vision-and-dataset-to-local Fix/add vision and dataset to local	2026-02-19 09:46:54 +04:00
Roland Tannous	35194e4382	Merge pull request #161 from unslothai/fix/vision-model-detection Fixing Vision model detection	2026-02-19 09:43:30 +04:00
Manan17	b7b4cbc949	Adding vision and multimodal dataset to the localstorage	2026-02-19 05:06:41 +00:00
Manan17	982a855df7	Setting default for the train on responses only	2026-02-19 02:59:48 +00:00
Manan17	56869c63bd	Passing use_auth = True and also having different checks which is missed by the is_vision function	2026-02-19 02:55:46 +00:00
Kaitao Yang	fd38dc96c3	reduce code duplicaton by inheritting from LlamaRotaryEmbedding (#3878 ) * simplify_code_using_apply_time_scaling * modify LlamaRotaryEmbedding for better inheritance * reduce_code_duplication_LlamaExtendedRotaryEmbedding	2026-02-18 19:13:33 -06:00
Roland Tannous	559ba976bc	Merge pull request #159 from unslothai/fix/reduce_dataset_num_proc_25_pct Fix/reduce dataset num proc 25 pct	2026-02-19 00:55:02 +04:00
Roland Tannous	adc0c78dbc	reduce dataset_num_proc to 1/4 of cpu_count	2026-02-18 20:53:32 +00:00
Roland Tannous	6aceaec323	Merge pull request #155 from unslothai/fix/sft-tokenizer-unwrap-for-vlm-text fix: Unwrap ProcessorMixin to raw tokenizer for text-only SFTTrainer on VLM-architecture models	2026-02-18 23:19:57 +04:00
Roland Tannous	a5529fbb0e	fix: unwrap ProcessorMixin to raw tokenizer for text-only SFTTrainer on VLM-architecture models	2026-02-18 19:16:45 +00:00
Roland Tannous	e33920974b	fix: unwrap ProcessorMixin to raw tokenizer for text-only SFTTrainer on VLM-architecture models	2026-02-18 19:13:20 +00:00
Roland Tannous	a243ea411d	Merge pull request #154 from unslothai/feature/colab-notebook Add Google Colab Support for Unsloth Studio	2026-02-18 23:02:22 +04:00
Roland Tannous	a6e2fa5b3a	Merge remote-tracking branch 'origin/nightly' into fix/dataset-mapping-vlm-text-datasets	2026-02-18 18:11:45 +04:00
Roland Tannous	940328ce1e	Merge remote-tracking branch 'origin/nightly' into feature/colab-notebook	2026-02-18 17:41:07 +04:00
Roland Tannous	fb556d9a2c	Merge branch 'fix/sm_120-flex-attention-temp-disable' into nightly renamed UNSLOTH_FLASH_ATTENTION to UNSLOTH_ENABLE_FLASH_ATTENTION to match actual environment variable in unsloht	2026-02-18 09:24:52 +00:00
Roland Tannous	d57b2742ab	renamed UNSLOTH_FLEX_ATTENTION to UNSLOTH_ENABLE_FLEX_ATTENTION	2026-02-18 09:21:53 +00:00
Roland Tannous	ee5236228d	Merge pull request #152 from unslothai/fix/sm_120-flex-attention-temp-disable Disable flex attention on Blackwell+ GPUs (sm_120+) at startup	2026-02-18 13:00:43 +04:00
Roland Tannous	5a02ed4f0f	Disable flex attention on Blackwell+ GPUs (sm_120+) at startup	2026-02-18 08:58:25 +00:00
Roland Tannous	346d7cd13a	Merge pull request #151 from unslothai/fix/trainer-hang-resource-cleanup Fix/trainer hang resource cleanup	2026-02-18 12:39:40 +04:00
Roland Tannous	d69431fa57	Scale dataset num_proc dynamically to cpu_count//3 instead of hardcap 8	2026-02-18 08:38:53 +00:00
Manan17	76cd1dc24c	fixing the hangup of training after multiple back to back training processes	2026-02-18 08:18:13 +00:00
Manan17	c37bf686a6	Dividing the total cpu_count // 3	2026-02-18 07:59:57 +00:00
Roland Tannous	14edb08cf5	Merge pull request #148 from unslothai/fix/linear linear fix	2026-02-18 11:10:16 +04:00
Manan17	58116e7e7a	fix the linear path on backend	2026-02-18 07:08:32 +00:00
Roland Tannous	d2f7eaf085	Merge pull request #145 from unslothai/fix/check-format-sample fix: stream HF datasets in check-format endpoint to avoid full d…	2026-02-18 10:49:17 +04:00
Roland Tannous	713d4a6ddb	Merge pull request #146 from unslothai/feature/fix-cuda-fork-deadlock fix: cap dataset.map() num_proc to 8 to prevent CUDA fork deadlocks	2026-02-18 10:47:46 +04:00
Lee Jackson	73c7dbec21	Merge pull request #147 from unslothai/feat/disable-navbar-training feat: disable navbar navigation while training is active	2026-02-18 01:37:11 +00:00
imagineer99	064cd56a21	feat: disable navbar navigation while training is active Disable Export and Chat nav items (desktop + mobile) when isTrainingRunning is true, keeping only Studio clickable.	2026-02-18 01:30:13 +00:00
Roland Tannous	d7853efd21	debug statements	2026-02-18 00:37:09 +00:00
Roland Tannous	0d8b67b706	fix: normalize target_modules [all-linear] list to string for Unsloth/PEFT compatibility	2026-02-18 00:32:21 +00:00
Manan17	42ee6178ae	linear fix	2026-02-18 00:11:27 +00:00
Roland Tannous	6f1b782172	Update README.md	2026-02-18 03:58:44 +04:00
Roland Tannous	d2332622d1	fix: defensively rename VLM chat column to match model's forward() signature	2026-02-17 23:49:13 +00:00
Roland Tannous	3d0d1c7020	fix: cap dataset.map() num_proc to 8 to prevent CUDA fork deadlocks	2026-02-17 23:12:45 +00:00
Roland Tannous	5864dece26	fix: fix: stream HF datasets in check-format endpoint to avoid full downloads; add info logging to model config endpoints	2026-02-17 22:53:29 +00:00
Roland Tannous	63196042ea	Merge pull request #144 from unslothai/fix/remove-hardcoded-port-in-dataset-preview fix: remove hardcoded port from dataset preview error message	2026-02-18 02:44:17 +04:00
imagineer99	e3fb46a4c6	fix: remove hardcoded port from dataset preview error message	2026-02-17 22:37:52 +00:00
Shine1i	dc0cec772d	feat: enhance training stop and reset flow with detailed checks	2026-02-17 23:32:22 +01:00
Leo Borcherding	3e3c315b01	Merge nightly into feature/colab-notebook - resolved setup.sh conflicts	2026-02-17 15:51:14 -06:00
Roland Tannous	5fbcd682b7	update gitignore	2026-02-17 21:45:05 +00:00
Leo Borcherding	7818e3efc8	Add GPU check as first cell in notebook	2026-02-17 15:29:29 -06:00
Wasim Yousef Said	765e1cfee2	Merge pull request #143 from unslothai/feature/local-models feat: add schemas for local model discovery and listing	2026-02-17 13:10:04 -08:00
Shine1i	972cde7971	feat: add schemas for local model discovery and listing	2026-02-17 21:53:42 +01:00
Roland Tannous	f88f0bc047	Merge pull request #142 from unslothai/feature/wsl-gguf-sudo-fix fix: skip sudo check on WSL during GGUF export to prevent password pr…	2026-02-18 00:46:15 +04:00
Wasim Yousef Said	43fac18d40	Merge pull request #141 from unslothai/feature/uxui-heuristics style: improve layout consistency and responsiveness across components	2026-02-17 12:34:37 -08:00
Shine1i	68f0321404	style: improve layout consistency and responsiveness across components - Adjusted padding, spacing, and grid configurations for better alignment and scaling across screen sizes. - Enhanced mobile responsiveness by updating flex and grid layouts, ensuring optimal display on smaller devices. - Tuned container dimensions and card styling to maintain design consistency.	2026-02-17 21:27:02 +01:00
Leo Borcherding	7cac94c930	Skip venv creation in Colab, install packages directly	2026-02-17 14:21:56 -06:00
Wasim Yousef Said	d9869dece1	Merge pull request #140 from unslothai/feature/uxui-heuristics ux: improve training-to-chat flow, param defaults UX, and guided onboarding/export polish	2026-02-17 12:10:27 -08:00
Shine1i	9c353f7bc2	refactor: wrap splash screen content in a card for improved layout and consistency	2026-02-17 21:08:45 +01:00
Leo Borcherding	25bf39c1e6	Detect Colab environment and upgrade npm directly instead of using nvm	2026-02-17 13:57:45 -06:00
Shine1i	08437def80	feat: add session-based storage for chat training comparison handoff - Implemented a utility to manage `training-compare-handoff` data in `sessionStorage` with strict validation and expiration logic. - Added methods to set, retrieve, and clear handoff data for improved chat training flow.	2026-02-17 20:43:08 +01:00
Shine1i	a0235025af	fix chat compare handoff: auto-load trained lora, stop refresh loop, add debug logs	2026-02-17 20:42:43 +01:00
Roland Tannous	c9fdce63e7	fix: skip sudo check on WSL during GGUF export to prevent password prompt hang	2026-02-17 19:30:02 +00:00
Leo Borcherding	057f3b9628	Remove unnecessary if statement for token check	2026-02-17 13:27:05 -06:00
Shine1i	ff458a36e9	feat: enhance training flow with new runtime hints, adjustable steps/epochs - Added halfway/completed training hints with actionable links. - Introduced sliders for adjusting max steps and epochs dynamically. - Refined tooltip explanations for configuration parameters. - Enabled custom overlay styling for `AlertDialogContent`.	2026-02-17 20:08:02 +01:00
Roland Tannous	41ad9fc3d9	Merge pull request #139 from unslothai/fix/move-backend-requirements move requirements/ to studio/backend/ and update paths in setup.sh	2026-02-17 23:04:44 +04:00
Roland Tannous	39b072d2ee	move requirements/ to studio/backend/ and update paths in setup.sh	2026-02-17 19:02:25 +00:00
Leo Borcherding	f48d815797	Clone feature/colab-notebook branch for colab.py	2026-02-17 13:01:16 -06:00
Leo Borcherding	2c5e0ce606	Simplify notebook to use existing setup.sh script	2026-02-17 12:54:07 -06:00
Roland Tannous	c13e1594de	Merge pull request #138 from unslothai/fix/reset-max-steps-epoch-defaults Override Model Defaults for num_epochs and max_steps	2026-02-17 22:51:40 +04:00
Roland Tannous	299bc65e36	chore: override model defaults to use max_steps=30, save_steps=30, num_epochs=0 for testing	2026-02-17 18:46:23 +00:00
Wasim Yousef Said	7874fa9066	Merge pull request #137 from unslothai/feature/chart-fixes rafactor: fix training charts: sticky full-window follow latest + grad line render + stopped metric fallback	2026-02-17 10:41:01 -08:00
Shine1i	4e2b567c29	refactor: simplify chart view logic by removing pan controls and enhancing window size handling	2026-02-17 19:32:32 +01:00
Shine1i	2d90210a69	feat: add reusable chart components for training metrics visualization - Introduced `EvalLossChartCard`, `GradNormChartCard`, `LearningRateChartCard`, and `TrainingLossChartCard` components. - Implemented shared chart settings via `SharedChartSettings` to manage scale, outliers, and view configuration. - Added utilities for metrics formatting, step tick generation, data compression, and smoothing (`utils.ts`). - Created types and structures for chart data handling (`types.ts`).	2026-02-17 19:10:24 +01:00
Roland Tannous	7307e08bff	Merge pull request #130 from unslothai/integrate/exports-page Integrate/exports page	2026-02-17 22:10:10 +04:00
Roland Tannous	b8171a86ac	Merge branch 'nightly' into integrate/exports-page	2026-02-17 22:09:36 +04:00
Roland Tannous	e818d97f24	Merge pull request #136 from unslothai/setup/update-setup-sh-dependencies setup.sh: Replace inline pip installs with pinned requirements files	2026-02-17 22:04:03 +04:00
Roland Tannous	caa28e5d46	replace with patch from merged PR in unsloth-zoo	2026-02-17 17:55:05 +00:00
Wasim Yousef Said	e03ed04278	Merge pull request #135 from unslothai/feature/chart-fixes feat: integrate gradient norm tracking in training runtime and metrics	2026-02-17 09:49:29 -08:00
Shine1i	0be3e6f525	feat: integrate gradient norm tracking in training runtime and metrics - Enhanced chart logic to filter and visualize finite gradient norm values.	2026-02-17 18:26:59 +01:00
Wasim Yousef Said	91b67f197e	Merge pull request #134 from unslothai/feature/model-configs feat: Apply model-config defaults in onboarding + studio	2026-02-17 09:08:21 -08:00
Shine1i	7203766fac	refactor: streamline vision model detection and improve state persistence logic - Removed redundant vision-check controllers. - Added `NON_PERSISTED_STATE_KEYS` to manage persisted training state. - Introduced `partializePersistedState` for cleaner state filtering.	2026-02-17 18:01:45 +01:00
Shine1i	3badf6649c	feat: add default model configuration mapping and auto-apply logic - Implemented backend model configuration mapping to training state. - Added auto-apply logic for default configurations when models are selected. - Introduced utilities for type conversion and validation within training configuration.	2026-02-17 17:58:36 +01:00
Michael Han	ac70db5556	Update README Install.md Updating to include new installation links	2026-02-17 07:23:31 -08:00
Roland Tannous	e803e13d3e	add full dependency chain for unsloth + unsloth-extras	2026-02-17 14:21:18 +00:00
Leo Borcherding	1530d1c165	fix: Add GitHub authentication cell for private repo - Add cell 1 for token input (getpass) - Update clone command to use token from environment - Now 3 cells: auth, setup, start	2026-02-17 05:07:46 -06:00
Leo Borcherding	bebb45d847	fix: Clean up notebook to just 2 cells Remove all the overcomplicated markdown and extra cells. Now it's exactly like the POC: setup and start only.	2026-02-17 05:03:59 -06:00
Leo Borcherding	17df5bf3bf	feat: Add simple 2-cell Colab notebook (no tunnel needed) - Create studio/backend/colab.py using Colab's built-in proxy - Uses google.colab.kernel.proxyPort() for URL (no cloudflare) - Shows nice clickable link with IPython.display.HTML - Notebook has just 2 cells: setup and start - Much simpler than external tunneling approach	2026-02-17 04:57:30 -06:00
Roland Tannous	8054449606	Remove exports/ from tracking	2026-02-17 07:57:43 +00:00
Roland Tannous	d97e23dfd3	added llama.cpp python dependencies to setup.sh	2026-02-17 07:56:27 +00:00
Roland Tannous	8a4a1554b0	fix: vite build fail - suppress unused estimatedSize prop in export dialog	2026-02-17 07:21:23 +00:00
Roland Tannous	bbd7d6d122	Merge pull request #129 from unslothai/fix/adding-meta-data-for-checkpointing-api Adding metadata for checkpoints	2026-02-17 11:17:24 +04:00
Lee Jackson	d5b6a69b05	Merge pull request #132 from unslothai/fix/dataset-check-format-subset-param fix: subset param name	2026-02-17 05:57:05 +00:00
imagineer99	e1fbccfc57	fix: subset param name	2026-02-17 05:53:21 +00:00
pre-commit-ci[bot]	42f5a02f06	[pre-commit.ci] pre-commit autoupdate (#4072 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.15.0 → v0.15.1](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.0...v0.15.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-16 21:19:45 -08:00
Manan17	6f9dd90d56	Integration of the api with the EXPORT page with UI changes	2026-02-17 00:34:48 +00:00
Manan17	c7b7ecab4f	Adding metadata for checkpoints	2026-02-16 23:46:17 +00:00
Wasim Yousef Said	7b8220598e	Merge pull request #124 from unslothai/feature/bug-fixes feat: support disabling top-k sampling with -1 and standardize normalization	2026-02-16 13:21:12 -08:00
Roland Tannous	a6cefadbf7	Merge pull request #122 from unslothai/feature/eval-split-auto-detection [Feature] evaluation during training	2026-02-17 01:13:46 +04:00
Roland Tannous	ff0aec180a	Merge branch 'nightly' into feature/eval-split-auto-detection	2026-02-17 01:11:30 +04:00
Roland Tannous	369486246e	Merge pull request #121 from unslothai/fix/vision-model-fixes fix: Vision model detection and model-dataset compatibility	2026-02-17 00:39:58 +04:00
Shine1i	0db7da96cc	feat: support disabling top-k sampling with -1 and standardize normalization logic - Updated top-k parameter range to accept -1 in models and frontend. - Added utility to normalize top-k for backend compatibility.	2026-02-16 21:33:24 +01:00
Roland Tannous	fa0ca59215	feat: auto-detect model+dataset compatibility to select VLM vs LLM training path	2026-02-16 19:18:49 +00:00
Roland Tannous	09f3b6bce5	feat(frontend): auto-detect vision models via backend, separate search filter from model classification	2026-02-16 18:24:44 +00:00
Roland Tannous	43d84d7143	Merge pull request #120 from unslothai/fix/back-on-complete fix: show Back to configuration breadcrumb when training completes	2026-02-16 21:38:40 +04:00
Roland Tannous	f4f2f50364	fix: show Back to configuration breadcrumb when training completes	2026-02-16 17:34:52 +00:00
Roland Tannous	fc36696a52	Merge pull request #119 from unslothai/feature/base-model-chat-template-handling fix: Apply default chat template for base models without tokenizer chat_template	2026-02-16 20:52:17 +04:00
Roland Tannous	b32ad350c5	feat: apply default chat template for base models without tokenizer chat_template	2026-02-16 15:56:06 +00:00
Roland Tannous	5df3a0b250	feat: add eval_enabled flag and format-first-then-split for eval dataset	2026-02-16 14:13:55 +00:00
Roland Tannous	0aea3f149d	feat: add eval split auto-detection, eval_steps hyperparam, and eval_loss chart integration	2026-02-16 13:51:10 +00:00
Roland Tannous	37452d56cf	feat: add eval split auto-detection, eval_steps hyperparam, and eval_loss chart integration	2026-02-16 13:38:54 +00:00
Roland Tannous	321096f0cb	Merge pull request #118 from unslothai/feature/onboarding-hardware-info feat(onboarding): hook system info to live /api/system…	2026-02-16 16:29:22 +04:00
Roland Tannous	883ae7ff8c	feat(onboarding): replace hardcoded system info with live /api/system/hardware data	2026-02-16 12:26:49 +00:00
Roland Tannous	c18ccbb773	Merge pull request #116 from unslothai/feature/gpu-monitor-training feat: add live GPU monitor with nvidia-smi polling during training	2026-02-16 15:52:48 +04:00
Roland Tannous	a0ebd9183a	feat: add live GPU monitor with nvidia-smi polling during training	2026-02-16 11:47:43 +00:00
Roland Tannous	3cc2a8f326	Merge pull request #115 from unslothai/feature/api-hardware-info feat: add GET /api/system/hardware endpoint for GPU info and package versions	2026-02-16 14:35:06 +04:00
Roland Tannous	b20d50e8d0	feat: add GET /api/system/hardware endpoint for GPU info and package versions	2026-02-16 10:29:21 +00:00
Roland Tannous	da356afd33	Merge pull request #114 from unslothai/feature/checkpoint-loss-in-api feat: include training loss per checkpoint in API response	2026-02-16 13:53:53 +04:00
Roland Tannous	fd49c56481	feat: include training loss per checkpoint in /api/models/checkpoints response	2026-02-16 09:50:28 +00:00
Roland Tannous	d4f2fc8a8f	Merge pull request #113 from unslothai/feature/refactor-checkpoints-pull-endpoint Refactor: Move checkpoint scanning to models domain	2026-02-16 13:34:08 +04:00
Roland Tannous	f0298edeb8	refactor: move checkpoint scanning to utils/models and /checkpoints endpoint to models router	2026-02-16 09:32:11 +00:00
Datta Nimmaturi	f3b5090f24	[Feat] FP8 per tensor quant support (#4043 ) * FP8 per tensor quant support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-16 01:21:30 -08:00
Shine1i	fc71548a31	feat: add index route with auth guard and redirect logic	2026-02-16 09:57:41 +01:00
Roland Tannous	6ecc03485d	Merge pull request #97 from unslothai/fix/progress-metics Resolved the progress metrics	2026-02-16 11:55:01 +04:00
sshah229	63b34660ed	modified the num_tokens logic	2026-02-16 00:38:40 -07:00
Daniel Han	0212f7f7df	Fix regressions from security PRs #4042 , #4044 , and #4045 (#4062 ) * Fix security-regression fallout in chat templates and PDL patching * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Drop security regression test files from PR scope * Apply suggestion from @danielhanchen --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-15 23:16:17 -08:00
Daniel Han	be77c66a84	Add reinstall command to broken vLLM warning (#4070 ) * Add vLLM reinstall command to broken-extension warning * Apply suggestion from @danielhanchen --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-15 23:02:12 -08:00
Roland Tannous	e02c6e4caf	Merge pull request #109 from unslothai/feat/backend-generation-implement-min-p feat: add `min_p` sampling parameter to `/chat/completions` generation pipeline	2026-02-16 10:54:15 +04:00
Roland Tannous	909955767b	feat: add min_p sampling parameter to /chat/completions generation pipeline	2026-02-16 06:33:17 +00:00
Wasim Yousef Said	c1d2ed1449	Merge pull request #108 from unslothai/feature/inference-params feat(chat): apply recommended inference params on model load	2026-02-15 22:19:48 -08:00
Shine1i	96c30b6c38	feat: add inference parameter merging for model loading and runtime updates	2026-02-16 07:14:53 +01:00
Daniel Han	5f81ac8964	Guard optional vLLM imports when extension is broken (#4068 ) * Guard optional vLLM imports when extension is broken * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove vLLM import guard tests from PR scope * Block broken vLLM imports like causal_conv1d --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-15 22:09:29 -08:00
Lee Jackson	d0be326f96	Merge pull request #107 from unslothai/feat/chat-min-p feat: add min_p inference parameter to chat page	2026-02-16 05:49:10 +00:00
imagineer99	36703c46a8	feat: add min_p inference parameter to chat page	2026-02-16 05:41:36 +00:00
Roland Tannous	9cc8f02818	Merge pull request #106 from unslothai/fix/adding-checkpointing-data-to-api Fixing the get checkpoint api	2026-02-16 09:13:57 +04:00
Manan17	19276ae60b	Fixing the get checkpoint api	2026-02-16 04:47:28 +00:00
Roland Tannous	84427bec9e	Merge pull request #105 from unslothai/feature/setup-shell-improvements Auto-detect Python version & shell RC file in setup script	2026-02-16 08:15:58 +04:00
Roland Tannous	af4c145096	setup: auto-detect best Python ≤3.12 and write alias to user's default shell rc file	2026-02-16 04:10:05 +00:00
Roland Tannous	d064fafcd8	Merge pull request #104 from unslothai/feature/support-dataset-configs-splits dataset `subset`/`split` params from API routes through to `load_dataset` calls	2026-02-16 07:59:09 +04:00
Roland Tannous	d0964652af	feat: thread dataset subset/split params from API routes through to load_dataset calls	2026-02-16 03:56:22 +00:00
Daniel Han	61c8ea6342	Add torchvision upgrade hint to mismatch ImportError (#4067 ) Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-15 19:36:16 -08:00
Daniel Han	ec80fd3f66	Raise ImportError on stable torch/torchvision mismatch (#4065 ) * Raise ImportError for stable torchvision mismatches * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove torchvision compatibility tests from PR scope --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-15 19:14:19 -08:00
Wasim Yousef Said	dff074869c	Merge pull request #103 from unslothai/feature/format-mapping feat: dataset manual mapping (2-col)	2026-02-15 17:10:50 -08:00
Shine1i	d0bcf02784	feat: lock model selector during guided tour and refine open state logic	2026-02-16 02:05:53 +01:00
Shine1i	541179ba98	merge: nightly	2026-02-16 02:02:40 +01:00
Shine1i	1a90741046	refactor: create utilities for dataset manual mapping and improve UI logic consistency in dataset preview dialog	2026-02-16 01:54:09 +01:00
Shine1i	cc6cb00d56	fix: enforce unique input/output mapping and enhance UI feedback in dataset preview dialog	2026-02-16 01:47:28 +01:00
Shine1i	4fd4d2fe76	feat: implement dataset mapping UI and preview dialog	2026-02-16 01:44:26 +01:00
Wasim Yousef Said	34483f2325	Merge pull request #102 from unslothai/feature/guided-tour-p2 feat: Guided tours p2: per-page + navbar trigger	2026-02-15 16:10:07 -08:00
Shine1i	7b20b4848e	feat: enable conditional confetti in guided tour and update navbar icon	2026-02-16 01:04:54 +01:00
Shine1i	baf47ab332	chore: text update	2026-02-16 00:58:46 +01:00
Shine1i	448aaa13cb	feat: improve guided tour descriptions and add sidebar state management - Updated step descriptions across Studio, Chat, and Export tours for better clarity. - Added `openSidebar` state management function and integrated it into the tour logic. - Improved target detection in guided tours with retry logic for better handling of unavailable elements.	2026-02-16 00:55:13 +01:00
Shine1i	362e679b66	feat: implement guided tours and refactor model selector components	2026-02-15 23:43:10 +01:00
Shine1i	28d42794cd	gitignore: .omx	2026-02-15 23:00:01 +01:00
Wasim Yousef Said	ed583f2b78	Merge pull request #101 from unslothai/feature/guided-tour feat: studio: guided tour (1st visit, skippable)	2026-02-15 12:33:07 -08:00
Wasim Yousef Said	28364f3314	Merge pull request #100 from unslothai/feature/chat-compare feat: chat compare + inference stream cancel fix	2026-02-15 12:29:54 -08:00
Shine1i	923371d637	setup: nightly	2026-02-15 21:26:11 +01:00
Shine1i	b8d8150201	revert setup.sh	2026-02-15 21:24:30 +01:00
Wasim Yousef Said	a9ec5d894c	Merge pull request #98 from unslothai/feat/dataset-config-splits feat: check dataset configs and splits before hitting check-format	2026-02-15 12:17:22 -08:00
Wasim Yousef Said	b071ae9c63	Merge pull request #99 from unslothai/feature/vram-estimation feat: VRAM-based model filtering in frontend	2026-02-15 12:17:07 -08:00
Shine1i	3dce0d475d	rm vitest	2026-02-15 21:13:15 +01:00
Shine1i	51218702c2	cfg->subset	2026-02-15 21:09:45 +01:00
Shine1i	ef9e5ffc33	fix hf cfg/split ui	2026-02-15 20:53:59 +01:00
Shine1i	e341656d45	feat: add confetti fireworks effect on tour completion	2026-02-15 20:25:23 +01:00
Shine1i	a14e3feed9	refactor: reorganize tour steps and relocate `ReadMore` component for improved structure	2026-02-15 20:22:17 +01:00
Shine1i	a579fe4eea	refactor: define explicit prop types for GuidedTour and SpotlightOverlay components	2026-02-15 20:17:28 +01:00
imagineer99	f285b5379a	feat: VRAM-based model filtering in frontend	2026-02-15 19:12:28 +00:00
Shine1i	87cf3b834a	refactor: extract tour utils into separate modules for cleaner structure	2026-02-15 20:11:28 +01:00
Shine1i	080b196dfc	tour readmore	2026-02-15 19:58:10 +01:00
Shine1i	2b6229f049	tour steps split	2026-02-15 19:57:07 +01:00
Shine1i	ddffa2bbdc	rm shine + perf	2026-02-15 19:41:15 +01:00
Shine1i	0db321ab85	tour light card	2026-02-15 19:31:49 +01:00
imagineer99	eb2ba90a03	feat: check dataset configs and splits before hitting check-format	2026-02-15 18:30:54 +00:00
Shine1i	ea36112a05	feat: add guided tour component and integrate with Studio UI elements	2026-02-15 19:20:37 +01:00
Shine1i	1eb07f6ad2	refactor: streamline chat runtime logic and remove warming indicator - Replaced `setThreadWarming` logic with streamlined token settlement functions (`settleFirstTokenOk` and `settleFirstTokenErr`) for improved readability and reliability. - Simplified model loading/unloading functions with reusable `performLoad` and `performUnload` patterns. - Removed `warmingByThreadId` from runtime store and associated code for reduced complexity. - Enhanced title generation flow by consolidating logic for persisting and streaming titles.	2026-02-15 18:59:03 +01:00
Shine1i	2e9f756ca6	feat: improve model loading/unloading UX and remove `_WarmupIndicator_` from thread UI - Refactored loading/unloading logic to provide detailed toast notifications with statuses (loading, success, error). - Removed unused `WarmupIndicator` component from thread UI to simplify interface. - Introduced better error handling for model refresh and inference tasks.	2026-02-15 18:48:02 +01:00
Shine1i	e59ef60a5c	feat: conditionally render "Compare" button based on active checkpoint and lora selection, improve fallback title generation, and update default settings	2026-02-15 18:38:25 +01:00
Shine1i	571959e383	feat: add cancelation support for chat generation and streaming tasks	2026-02-15 18:23:27 +01:00
Shine1i	9a4f71c939	chore: remove unused `ComponentExample` and associated imports and auto title generate	2026-02-15 18:08:46 +01:00
Shine1i	b83eeab603	fix: ensure consistent message order in chat runtime by improving sort logic and adding fallback for createdAt	2026-02-15 17:34:32 +01:00
Shine1i	8529f89a75	fix lora: outputs path local	2026-02-15 16:58:24 +01:00
Shine1i	2ffdd59925	chat compare: send use_adapter	2026-02-15 16:44:14 +01:00
Shine1i	b1258bed0d	wip setup: py312	2026-02-15 16:36:53 +01:00
Shine1i	184b786116	fix setup: fish alias, venv no activate	2026-02-15 16:27:44 +01:00
Shine1i	43c66da783	merge nightly	2026-02-15 14:52:07 +01:00
Shine1i	992e60266f	chore: ignore frontend .omx	2026-02-15 14:50:26 +01:00
Shine1i	485f174202	feat: add support for event replay and resume in job events API, improve SSE handling, and fix regex patterns in log parsers	2026-02-15 14:49:04 +01:00
Roland Tannous	4dbd77786e	Merge pull request #96 from unslothai/feature/inference-yaml-ordered Added the inference defaults for models	2026-02-15 17:43:31 +04:00
Roland Tannous	6fc52d6535	Merge pull request #94 from unslothai/fix/pass-save-steps-to-trainer Adding save-steps to the SFTConfig	2026-02-15 17:25:48 +04:00
Shine1i	85653237ea	feat: add Data Recipe core functionality with job manager, API routes, and validation services	2026-02-15 13:43:46 +01:00
sshah229	0b1c635b43	resolved the prgress metrics	2026-02-15 05:35:32 -07:00
Shine1i	8bd7cfaab1	feat: remove seed inspect/preview and MCP tools fetch support due to backend endpoint deprecation	2026-02-15 12:45:59 +01:00
sshah229	9e50e167d9	added the inference fetching from model mappers	2026-02-15 02:48:53 -07:00
Manan17	6e4cde3bf8	Adding save-steps to the SFTConfig	2026-02-15 09:37:54 +00:00
sshah229	625bc1bbc6	added default inference config for default.yaml	2026-02-15 02:13:24 -07:00
sshah229	238fdc5c4a	added default inference config from unsloth notebooks	2026-02-15 02:13:24 -07:00
sshah229	b5d93adcf2	added configs from Ollama	2026-02-15 02:13:24 -07:00
sshah229	ac20103e54	added inference defaults from unsloth guides	2026-02-15 02:13:24 -07:00
nole69	e3c9482cfb	[FIX] Move loss and n_items to logits device in fast_cross_entropy_loss loss for multi-GPU support (#4063 ) * bug fix for multi-GPU * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-15 01:09:40 -08:00
Roland Tannous	12db4799b1	Merge pull request #93 from unslothai/docs/add-readme add draft README.md file	2026-02-15 12:51:37 +04:00
Roland Tannous	0bada7e758	README.md draft	2026-02-15 08:49:56 +00:00
Roland Tannous	f453791916	Merge pull request #29 from unslothai/feature/export Added the export routes and pydantic models	2026-02-15 12:42:01 +04:00
Roland Tannous	0f07afcbbc	Merge pull request #91 from unslothai/fix/automatic-signup-if-unauthenticated Auto-redirect to signup/login on stale auth tokens instead of showing 401 spam	2026-02-15 12:30:49 +04:00
Roland Tannous	0d360d74df	fix: auto-redirect to signup/login when auth tokens are stale	2026-02-15 08:27:35 +00:00
Roland Tannous	7b5fc07e87	Merge pull request #89 from unslothai/ux/default-signup-tab-on-first-launch feat: Redirect first-time users to signup page instead of login	2026-02-15 10:45:17 +04:00
Manan Shah	0ec08ff4c1	Merge pull request #88 from unslothai/fix/change-labels-in-data-card Changing labels in dataset card	2026-02-15 00:38:57 -06:00
Manan17	e8f9610122	Changing labels in dataset card	2026-02-15 06:34:39 +00:00
Roland Tannous	46e8dd3fad	feat: redirect first-time users to signup page instead of login	2026-02-15 06:29:52 +00:00
Roland Tannous	fb712f73f4	Merge pull request #85 from unslothai/fix/password-hint-length Fix/password hint length	2026-02-15 10:19:03 +04:00
Roland Tannous	2ed9408b50	Merge pull request #83 from unslothai/fix/training-stuck-multiprocessing-cuda Fixing stuck training processes	2026-02-15 10:18:49 +04:00
Roland Tannous	7411be50a0	Merge pull request #86 from unslothai/fix/index-html-cache-headers Fix: Browser serving stale frontend after rebuild	2026-02-15 10:17:38 +04:00
Daniel Han	084ca10ac2	Silence Apex Aiter RoPE warning unless logging is enabled (#4058 ) * Silence Apex Aiter RoPE warning unless logging is enabled * Update unsloth/import_fixes.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-14 22:14:05 -08:00
anonymous dev	ba1688c609	[FIX] Move labels to logits device in cross-entropy loss for multi-GPU support (#4041 ) (#4059 ) When using device_map='balanced' with multiple GPUs, the labels tensor may reside on a different device than the logits/losses tensors. This causes a RuntimeError at the masked_fill_ call in the chunked cross-entropy forward path. Fix: explicitly move labels to the same device as logits at the start of Fast_CrossEntropyLoss.forward(). This is a no-op on single-GPU setups. Fixes #4041	2026-02-14 22:13:07 -08:00
Manan17	45618fb43c	Adding hint for password length	2026-02-15 06:12:55 +00:00
Roland Tannous	d5faeb058c	Remove test_lora.py from tracking	2026-02-15 06:11:06 +00:00
Roland Tannous	2840efcc08	fix: add no-cache headers to index.html to prevent stale frontend after rebuild	2026-02-15 06:06:27 +00:00
Daniel Han	defcbf8bea	Auto-configure AMDGPU_ASIC_ID_TABLE_PATH on ROCm startup (#4060 ) * Auto-configure AMDGPU_ASIC_ID_TABLE_PATH on ROCm startup * Remove ROCm fd2 amdgpu.ids noise filter wrappers * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use PyPI bitsandbytes for amd extra to avoid malformed wheel URL * Add amd-preview extra for bitsandbytes continuous wheel channel * Keep amd extra on bitsandbytes>=0.49.1 and remove amd-preview --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-14 21:52:31 -08:00
Manan17	ace86ed0b8	Fixing stuck training processes	2026-02-15 05:40:06 +00:00
Manan17	6ccbc4edce	Fixing stuck training processes	2026-02-15 05:38:06 +00:00
Roland Tannous	2c5a4359c6	Merge pull request #81 from unslothai/feature/early-stop-or-cancel-training-ui feat: early stop or cancel training UI	2026-02-15 08:28:39 +04:00
Manan17	52cbf9b699	feat: UI for cancel or save and stop training	2026-02-15 00:22:28 +00:00
Manan17	97c6a09b84	feat: add cancel or save and stop training	2026-02-15 00:00:22 +00:00
Shine1i	070b9d41a4	feat: enhance sampler builders with datetime unit mapping, uuid format handling, and error reporting	2026-02-14 23:07:11 +01:00
Roland Tannous	1d362a36c3	Merge pull request #79 from unslothai/feat/compare-use-adapter Adapter Toggling for chat compare feature	2026-02-15 00:25:53 +04:00
Roland Tannous	b334e49498	decouple reliance of backend on frontend for is_lora	2026-02-14 20:13:50 +00:00
Roland Tannous	be3934860f	strip extra debug statements	2026-02-14 19:23:51 +00:00
Roland Tannous	3ff3def555	replace model unloading and peft loading mechanism for compare feature	2026-02-14 19:18:49 +00:00
Shine1i	ebc411e508	chore: ignore agent.md	2026-02-14 19:22:07 +01:00
Shine1i	964f7d1548	feat: refactor block definitions and utilities into modular components for enhanced maintainability	2026-02-14 19:13:19 +01:00
Shine1i	f2a00d6e44	feat: add seed dataset support with configuration, preview, and builder utilities	2026-02-14 18:44:38 +01:00
Roland Tannous	e7ae901737	del model.peft_config instead of using model.delete_adapter	2026-02-14 17:32:15 +00:00
Roland Tannous	7d8e991c1f	added print statements for activate_lora_adapter	2026-02-14 17:25:37 +00:00
Roland Tannous	6fefbe9f0b	swipped logger for print statements as logger isn't propagating	2026-02-14 17:21:26 +00:00
Roland Tannous	d0b94eae75	added logging	2026-02-14 17:09:07 +00:00
Roland Tannous	b5c8136957	exclude default from model.delete_adapter	2026-02-14 17:03:52 +00:00
Roland Tannous	35a6e40268	_apply_adapter_state now calls revert_to_base_model and activate_lora_adapter properly	2026-02-14 16:57:24 +00:00
Shine1i	2bd20d7d15	ignore tests	2026-02-14 16:42:58 +01:00
Shine1i	ca30e9f004	merge nightly	2026-02-14 16:42:20 +01:00
Shine1i	d28cc1670b	lock	2026-02-14 16:37:38 +01:00
Shine1i	175fd0459c	feat: add Jinja reference autocomplete components and enhance graph edges styling	2026-02-14 16:30:01 +01:00
Roland Tannous	f67ee58347	feat(inference): add use_adapter field for per-request adapter toggling in compare mode	2026-02-14 14:52:13 +00:00
Daniel Han	842099f2b0	Wrap models import with ROCm amdgpu ids fd2 filter (#4057 ) Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-14 04:13:25 -08:00
Daniel Han	191cbe55ee	Wrap unsloth_zoo import with HIP amdgpu.ids filter (#4056 ) * Wrap unsloth_zoo import with HIP amdgpu.ids filter * Refactor ROCm ids filter helpers for readability * Rename ROCm ids filter helper and annotate call sites * Remove obsolete amdgpu ids filter alias * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-14 03:59:57 -08:00
Daniel Han	66db2a1417	Filter only amdgpu.ids fd2 noise during ROCm startup (#4054 ) Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-14 03:35:41 -08:00
Daniel Han	66b09f2481	Make ROCm suppression detection robust for custom torch builds (#4053 ) * Make ROCm suppression detection robust for custom torch builds * Add ROCm detection debug logging behind UNSLOTH_ENABLE_LOGGING --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-14 02:59:49 -08:00
金黄色葡萄球君君	dd5ff9dcef	ROCm: Add gfx950 (MI355X/CDNA4) to is_cdna() (#4051 ) MI355X (gfx950) has the same 1024-thread workgroup limit as MI300X (gfx942), but was missing from is_cdna(), causing all Triton kernels to use num_warps=32 (2048 threads) instead of 16 (1024 threads), resulting in OutOfResources crash. Tested on: 8x AMD Instinct MI355X (gfx950), ROCm 7.1	2026-02-14 02:50:05 -08:00
Daniel Han	6ec46f49a6	Suppress HIP amdgpu.ids stderr noise during causal_conv1d check (#4052 ) * Suppress HIP libdrm stderr noise in causal_conv1d probe * Broaden HIP libdrm stderr suppression for early ROCm startup --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-14 02:44:34 -08:00
Daniel Han	1a929ce6f1	Simplify MI300X startup banner name (#4049 ) * Improve HIP GPU name reporting in startup banner * Drop MI300X arch suffix in banner name * Normalize _utils.py file mode * Simplify FA2 fallback text and filter AMD ids noise * Strip trailing GPU arch suffix via regex * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use gfx lookup default and normalize Ryzen AI naming * Remove name-path Ryzen AI normalization * Expand ROCm gfx map to full documented GPU name aliases * Simplify HIP fallback naming to AMD gfx token * Remove Ryzen Al torch_name normalization --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-14 02:24:03 -08:00
Wasim Yousef Said	6ed179b459	Merge pull request #78 from unslothai/feature/vision-capabilities-chat feat(chat): add vision image attachments for OpenAI-compatible chat	2026-02-14 02:01:17 -08:00
Shine1i	7611c7122c	feat: add image handling support with Vision adapter and base64 serialization in chat runtime	2026-02-14 10:59:10 +01:00
Roland Tannous	c843a93797	Merge pull request #76 from unslothai/feature/inference-vision-openai-compatible PR: OpenAI-compatible multimodal vision support + true vision streaming	2026-02-14 13:47:39 +04:00
Roland Tannous	418a374125	migrate _generate_vision_response to use TextIteratorStreamer + background thread	2026-02-14 09:30:32 +00:00
Roland Tannous	9de38cb773	feat(inference): accept OpenAI multimodal content parts (image_url) in /chat/completions	2026-02-14 09:06:25 +00:00
Roland Tannous	44d52b4103	Merge pull request #72 from unslothai/fix/sse-progress-timeout Fix: SSE progress stream timeout during training	2026-02-14 09:58:14 +04:00
Roland Tannous	4f0fad2156	fix: increase SSE progress timeout to 30min and allow step-0 updates	2026-02-14 05:47:22 +00:00
Roland Tannous	957e39c50d	Merge pull request #71 from unslothai/fix/frontend-default-path Fix/frontend default path	2026-02-14 09:43:24 +04:00
Daniel Han	d3fcba134b	Improve HIP GPU name detection in startup banner (#4048 ) * Improve HIP GPU name reporting in startup banner * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-13 21:32:34 -08:00
Roland Tannous	414d173624	replace function with alias	2026-02-14 05:28:47 +00:00
Roland Tannous	e09c5dfc7e	fix alias command	2026-02-14 05:24:54 +00:00
Daniel Han	c14917b96e	Handle broken causal_conv1d at import time (#4047 ) * Handle broken causal_conv1d import at runtime Add a startup import-time probe for causal_conv1d and disable the fast path when the shared library is ABI broken. This keeps Falcon H1/model loading resilient without requiring env flags. - Add disable_broken_causal_conv1d in import_fixes. - Invoke it early from unsloth/__init__ during package init. - Make Falcon H1 optional imports in loader and models/__init__ soft-fail instead of failing hard. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Enforce unavailable semantics for broken causal_conv1d * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove Falcon H1 import swallowing * Restore optional Falcon H1 import guard * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove causal_conv1d regression tests * Trim FA2 fallback messaging --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-13 21:20:25 -08:00
Roland Tannous	7260c09268	change line arguments order in setup.sh	2026-02-14 05:13:49 +00:00
Roland Tannous	6b2a777f97	fix path in run_server	2026-02-14 05:06:17 +00:00
Roland Tannous	f07b919385	change default frontend path in run.py to studio/frontend/dist	2026-02-14 05:02:26 +00:00
Michael Han	2a7d098203	Update README with faster MoE.md Adding MoE	2026-02-13 19:38:23 -08:00
Roland Tannous	933048d4f7	Merge pull request #70 from unslothai/feat/wire-custom-format-mapping-to-training feat: wire `custom_format_mapping` through training pipeline	2026-02-14 01:09:35 +04:00
Roland Tannous	67edebfeb3	feat: wire custom_format_mapping through training pipeline to format_and_template_dataset	2026-02-13 21:07:36 +00:00
Roland Tannous	f12c5f61ef	Merge pull request #69 from unslothai/fix/auto-detect-lora-in-model-config fix: auto-detect LoRA adapters for both local and remote HF models in ModelConfig	2026-02-14 00:56:34 +04:00
Roland Tannous	8ce96df66f	fix: auto-detect LoRA adapters for both local and remote HF models in ModelConfig	2026-02-13 20:54:40 +00:00
Roland Tannous	3d33753899	Merge pull request #68 from unslothai/fix/datasets-fix-vlm-detection fix: auto-detect multimodal datasets in /check-format without requiri…	2026-02-14 00:04:51 +04:00
Wasim Yousef Said	bdcaf5518c	Merge pull request #66 from unslothai/ui-fixes UI fixes	2026-02-13 11:29:32 -08:00
Wasim Yousef Said	1f50b5b5f1	Merge pull request #67 from unslothai/style/polish-chat-sidebar-spacing style: polish chat page spacing, typography adjustment, and panel alignment	2026-02-13 11:28:56 -08:00
Manan17	2210545493	fixing padding for titles	2026-02-13 19:22:39 +00:00
imagineer99	59acc087b6	style: polish chat page spacing, small typography, and panel alignment	2026-02-13 19:18:20 +00:00
Manan17	c0fbe7d4a5	Change of font space for title	2026-02-13 19:03:51 +00:00
Roland Tannous	a62a30bd9b	Merge pull request #65 from unslothai/refactor/change-highlighted-text-color Refactor/change highlighted text color	2026-02-13 21:36:08 +04:00
Roland Tannous	adf1ef5ea5	fix: auto-detect multimodal datasets in /check-format without requiring is_vlm flag	2026-02-13 17:29:39 +00:00
Manan17	acc00bd0a4	Changing the highlighted text color to be black while keeping the checkmark emerald	2026-02-13 17:24:57 +00:00
Wasim Yousef Said	6935fc3593	Merge pull request #63 from unslothai/feature/chat-openai-integration feat(chat): integrate backend chat runtime + model load flow	2026-02-13 08:47:44 -08:00
Shine1i	ce7c9917ab	feat: refactor suggestion handling and centralize defaults for thread UI	2026-02-13 17:42:45 +01:00
Shine1i	5fe4258401	feat: add warm-up indicator, new thread feature, and runtime improvements in chat UI	2026-02-13 17:28:01 +01:00
Roland Tannous	fe5a46ede3	Merge pull request #62 from unslothai/feature/add-easydict-addict Feature/add easydict addict	2026-02-13 20:22:46 +04:00
Roland Tannous	965be3f5dc	add easydict and addict to setup file	2026-02-13 16:19:55 +00:00
Shine1i	c9c4463d5d	feat: integrate LoRA model management with UI and runtime synchronization	2026-02-13 17:14:49 +01:00
Shine1i	23d2cfd09d	feat: refactor chat runtime with modular APIs, state management, and runtime synchronization	2026-02-13 16:45:00 +01:00
Wasim Yousef Said	a7c6432ffd	Merge pull request #61 from unslothai/feature/training-frontend-integration training frontend integration + backend sync v2	2026-02-13 05:06:17 -08:00
Shine1i	d58fa17c81	feat: add support for serialized previews in dataset API and improve training initialization logging	2026-02-13 13:47:17 +01:00
Shine1i	b0062535a7	feat: add image previews in dataset dialog, enable popularity sorting in model search, refine training config serialization	2026-02-13 13:17:20 +01:00
Shine1i	b0cb7a7305	Merge remote-tracking branch 'origin/nightly' into feature/training-frontend-integration	2026-02-13 12:41:09 +01:00
Shine1i	a4a997eee6	feat: add training feature with state management, API integration, and runtime synchronization	2026-02-13 12:26:28 +01:00
Shine1i	4e0596c395	wip p1	2026-02-13 11:42:19 +01:00
Roland Tannous	8fdbd05cc2	Merge pull request #51 from unslothai/feature/print-outbound-interface-address Show External IP in Startup Banner	2026-02-13 14:32:39 +04:00
Roland Tannous	5f155010f6	read external ips with fallback to standard notation 0.0.0.0	2026-02-13 10:28:20 +00:00
Roland Tannous	837596a9e7	feat: show external IP in startup banner	2026-02-13 10:23:12 +00:00
Roland Tannous	b10b303f4e	Merge pull request #50 from unslothai/fix/auth-setup-rollback Auth Setup Failure: `auth.db` Created Before Token Generation	2026-02-13 14:14:34 +04:00
Roland Tannous	5602f7ccb4	fix: rollback auth.db user row if token generation fails during setup	2026-02-13 10:11:07 +00:00
Roland Tannous	d201935dae	Merge pull request #49 from unslothai/fix/replace-jwt-with-pyjwt add pyjwt as dependency. remove jwt. fix AttributeError: module 'jwt'…	2026-02-13 14:00:52 +04:00
Roland Tannous	b86503af3f	add pyjwt as dependency. remove jwt. fix AttributeError: module 'jwt' has no attribute 'encode'	2026-02-13 09:58:50 +00:00
Roland Tannous	1ab41f67d8	Merge pull request #48 from unslothai/feature/shorten-setup-alias shorten unsloth-ui alias. auto append frontend dist folder location	2026-02-13 13:57:33 +04:00
Roland Tannous	44cc46bfec	shorten unsloth-ui alias. auto append frontend dist folder location	2026-02-13 09:45:47 +00:00
Roland Tannous	0f61fbff6a	Merge pull request #47 from unslothai/fix/add-jwt-dependency add jwt dependency to setup.sh	2026-02-13 13:40:19 +04:00
Roland Tannous	14c5560ace	add jwt dependency to setup.sh	2026-02-13 09:39:36 +00:00
Roland Tannous	8a2a9030f6	Merge pull request #46 from unslothai/feature/update-setup-file Feature/update setup file	2026-02-13 13:33:05 +04:00
Roland Tannous	d68a2ddd67	chore: suppress verbose output in setup.sh, show errors only	2026-02-13 09:32:17 +00:00
Roland Tannous	8db52c7649	Merge pull request #44 from unslothai/fix/remove-gradio-training refactor: remove gradio dependency from training backend	2026-02-13 13:26:40 +04:00
Roland Tannous	f52bddc23f	refactor: remove gradio dependency from training backend	2026-02-13 09:25:49 +00:00
Roland Tannous	cec51ad6d2	Merge pull request #43 from unslothai/feature/setup-file Add `setup.sh` for automated environment setup	2026-02-13 13:13:38 +04:00
Roland Tannous	96c827020b	chore: add setup.sh for automated environment and frontend build	2026-02-13 09:11:07 +00:00
Roland Tannous	2df07aa224	Merge pull request #42 from unslothai/fix/epoch-type-float Fix: Change `epoch` type from `int` to `float`	2026-02-13 10:53:51 +04:00
Roland Tannous	75f775d088	fix: change epoch type from int to float to match TrainerState	2026-02-13 06:51:55 +00:00
Roland Tannous	c065b05179	Merge pull request #38 from unslothai/fix/model-config-directory Changed the directory for default configs	2026-02-13 09:20:40 +04:00
Wasim Yousef Said	37e8d43cd5	Merge pull request #35 from unslothai/feature/dataset-preview-table feat: add dataset viewer using /check-format endpoint	2026-02-12 21:18:36 -08:00
imagineer99	bc9645cbbf	feat: add dataset preview dialog using /check-format endpoint	2026-02-13 05:04:11 +00:00
sshah229	82be5b237f	fixed the script directory	2026-02-12 21:55:36 -07:00
Roland Tannous	363ffb7d1a	Merge pull request #34 from unslothai/feature/openai-chat-completions PR: OpenAI-Compatible Chat Completions Endpoint	2026-02-12 23:20:52 +04:00
Roland Tannous	8403190cdd	feat: add OpenAI-compatible POST /chat/completions endpoint with streaming and non-streaming support	2026-02-12 19:00:05 +00:00
Roland Tannous	78d2fe5ee3	Merge pull request #33 from unslothai/feature/sse-connection-resilience feat: SSE Connection Resilience	2026-02-12 22:03:20 +04:00
Roland Tannous	509659ba97	feat: add SSE reconnection resilience with spec-compliant event fields, Last-Event-ID resume, and metric_history fallback in /status	2026-02-12 17:58:48 +00:00
Roland Tannous	c6a1e9ca4b	Merge pull request #32 from unslothai/feature/datasets-endpoint-preview-samples Return Raw Preview Samples on Format Detection Failure	2026-02-12 19:53:13 +04:00
Roland Tannous	0dbce96700	untrack package-lock.json and add to gitignore	2026-02-12 15:41:20 +00:00
Roland Tannous	dd71b0f18a	return raw preview samples on format detection failure for manual column mapping	2026-02-12 15:39:56 +00:00
Roland Tannous	e0623cae6c	Merge pull request #31 from unslothai/feature/datasets-endpoint-return-top-10 Optimize `/check-format` to return preview samples	2026-02-12 15:26:13 +04:00
Roland Tannous	4c791bd5aa	feat(datasets): check-format to return preview samples	2026-02-12 11:25:47 +00:00
Daniel Han	08bb85fcda	Create CODEOWNERS (#4039 )	2026-02-12 02:56:13 -08:00
Shine1i	e145a72adb	feat: add builders and components for LLM configuration in Recipe Studio and refactor for readability, preparing for draft	2026-02-12 04:25:15 +01:00
sshah229	ee703dd6c6	added router in main	2026-02-11 18:51:53 -07:00
sshah229	40bfe42974	added the pydantic models and routes for export	2026-02-11 18:34:12 -07:00
Shine1i	30cc509197	feat: implement Data Recipes page feature subfolders for workflow management and saving logic	2026-02-12 02:07:53 +01:00
Shine1i	390e9ed9d2	feat: add Recipe Studio utilities and components for configuring synthetic data pipelines	2026-02-12 01:03:47 +01:00
Shine1i	93f45ffd07	chore: rename Canvas Lab components and utilities	2026-02-12 00:39:08 +01:00
Shine1i	0ed6c141b0	feat: add interactive viewport controls and refactor floating button styles - Introduced `ViewportControls` for zoom, fit view, and interactive toggle in canvas lab. - Extracted and reused `CANVAS_FLOATING_ICON_BUTTON_CLASS` for consistent button styling. - Updated API base paths and server proxy settings. - Enabled dynamic interaction states for nodes and connections in canvas lab.	2026-02-12 00:18:56 +01:00
shine1i	e39d03c21e	Merge remote-tracking branch 'origin/nightly' into feature/canvas-lab # Conflicts: # .gitignore # studio/frontend/.gitignore # studio/frontend/bun.lock # studio/frontend/src/app/router.tsx	2026-02-11 22:36:34 +01:00
Roland Tannous	a8b8da96d1	Merge pull request #28 from unslothai/refactor/centralize-device-selection-and-gpu-cache [MLX] - Centralize Device Selection & GPU Cache Management	2026-02-11 20:59:23 +04:00
Roland Tannous	da1cde971c	use get_device() for device selection and clear_gpu_cache() for GPU memory cleanup in inference, trainer, and export	2026-02-11 16:56:52 +00:00
Roland Tannous	f7529d1503	Merge pull request #27 from unslothai/feature/implement-silicon-utils-compatibility [MLX] Add Hardware Detection Module & Apple Silicon (MLX) Compatibility	2026-02-11 20:21:08 +04:00
Roland Tannous	1a6bfe51b6	added @needs_torch to test_cuda_oom	2026-02-11 16:12:37 +00:00
Roland Tannous	95038d6129	add @needs_mlx decorator on tests	2026-02-11 16:10:22 +00:00
Roland Tannous	63c583c54f	replace torch MPS with MLX	2026-02-11 16:04:35 +00:00
Roland Tannous	7db31723b9	reset DEVICE type on fastapi lifespan exit	2026-02-11 15:58:13 +00:00
Roland Tannous	e7c3e7b48d	fixed tests to be hardware specific	2026-02-11 15:40:00 +00:00
Roland Tannous	85fc481afe	fixed tests to be hardware specific	2026-02-11 15:37:34 +00:00
Roland Tannous	59d5f24eb5	integrate global hardware detection at lifespan entrypoint	2026-02-11 15:34:26 +00:00
Roland Tannous	107bd2be4c	feat: add Apple Silicon (MPS) compatibility to backend utils + tests	2026-02-11 14:00:39 +00:00
Wasim Yousef Said	eddb2d5405	Merge pull request #23 from unslothai/feature/auth-ui auth setup for the client and auth guard checks	2026-02-11 05:05:07 -08:00
shine1i	8c85fef59f	drop docs file	2026-02-11 14:03:45 +01:00
shine1i	77f8316546	feat: auth guard on routes	2026-02-11 14:01:30 +01:00
Roland Tannous	fde1fea5a9	Merge pull request #26 from unslothai/fix/fix-existing-routes-models Fix: Move Inline Pydantic Models & Add Response Models for Routes	2026-02-11 16:42:25 +04:00
shine1i	cdf7ead71b	feat: new auth and refresh token on unauthorized	2026-02-11 13:40:33 +01:00
Roland Tannous	7ee4381936	move inline pydantic models - fix existing models routes integration	2026-02-11 12:39:58 +00:00
Roland Tannous	1e51f0b791	Merge pull request #25 from unslothai/feature/remove-unsloth-compiled-cache-lifespan-exit Remove `unsloth_compiled_cache` on FastAPI Lifespan Exit	2026-02-11 16:31:57 +04:00
Roland Tannous	28c7df5925	remove unsloth_compiled_cache folder on fastapi lifespan exit	2026-02-11 12:30:05 +00:00
shine1i	3d0b90a862	Merge remote-tracking branch 'origin/nightly' into feature/auth-ui	2026-02-11 13:29:11 +01:00
Roland Tannous	fd3e0e5f09	chore: untrack auth.db (already in .gitignore)	2026-02-11 12:21:54 +00:00
Roland Tannous	e66280119d	Merge pull request #24 from unslothai/feature/refactor-authentication-mechanism Feature/refactor authentication mechanism	2026-02-11 16:13:44 +04:00
Roland Tannous	01fcb4f713	authentication refactor - added setup token and token refresh mechanism	2026-02-11 12:09:47 +00:00
Lei Zhenyuan	cdc9dc1fb1	fix for tma (#4023 )	2026-02-10 17:50:33 -08:00
shine1i	de0c142a1c	ignore claude, test folder and docs for arch of canvas-lab	2026-02-10 17:39:41 +01:00
Datta Nimmaturi	6804c05130	Misc fixes (#4018 ) * convert print to logger * Print but cleaner * Hide model on multiple devices * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix typo transfomers -> transformers, revert MoE message change * Update MoE detection message to show num_experts and target_modules * Fix llama-cli path in save info message * target_parameters warning for moe * fix should_convert_module for llm_int8_skip_modules * fix should_convert_module for llm_int8_skip_modules * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Logging filters * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * negation * remove should_convert_module patch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-10 06:31:34 -08:00
Daniel Han	10338dbaa4	Fix warmup_ratio deprecation for transformers >= 5.0 (#4019 ) * Fix warmup_ratio deprecation warning for transformers >= 5.0 In transformers 5.0, warmup_ratio is deprecated in favor of warmup_steps which now accepts float values (< 1 = ratio, >= 1 = absolute steps). The compiler now conditionally sets warmup_steps=0.1 on transformers >= 5.0 (same semantics as warmup_ratio=0.1) and keeps warmup_ratio=0.1 on older versions where warmup_steps only accepts int. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-10 06:17:47 -08:00
Daniel Han	f106eec5e9	Fix Gemma3 4B training on transformers 5.x (token_type_ids) (#4017 ) * Inject token_type_ids for Gemma3 multimodal training on transformers 5.x In transformers 5.x, create_causal_mask_mapping() raises ValueError when is_training=True and token_type_ids is None. When doing text-only SFT on Gemma3 4B (a multimodal model), the dataset_utils detection for _needs_token_type_ids can miss because: - The model is wrapped in PeftModel, so type(model).__module__ points to peft.peft_model instead of transformers - The processing_class is a tokenizer (not Gemma3Processor), so the fallback MRO check resolves to a module without create_causal_mask_mapping This adds a fallback in _unsloth_pre_compute_loss that injects token_type_ids=zeros when: 1. token_type_ids is not already in inputs 2. The inner model config has model_type "gemma3" 3. The model's module has create_causal_mask_mapping (transformers 5.x) 4. The model is in training mode On transformers 4.x, create_causal_mask_mapping does not exist so this check is inert. Depends on: unslothai/unsloth-zoo#488 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-10 05:14:36 -08:00
andrewor14	cd24ea0e50	FP8: Load model on-the-fly in vLLM (#3717 ) * FP8: Load model on-the-fly in vLLM Summary: Existing support for `load_in_fp8=True` performs an offline quantization when loading the initial model. This is no longer necessary as of vllm==0.12.0 (after https://github.com/vllm-project/vllm/pull/23014), where we can quantize the model on-the-fly when we load it: ``` llm = LLM( ... hf_overrides={ "quantization_config_dict_str": json.dumps(torchao_config), }, ) ``` Note: Needs https://github.com/unslothai/unsloth-zoo/pull/380 Test Plan: https://gist.github.com/andrewor14/5b85119fae46845d07b608d420907423 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix on-the-fly FP8: always check mapper first, fallback to on-the-fly The original implementation bypasses the FP8 mapper entirely for vllm >= 0.12.0, meaning models like Llama-3.2-1B-Instruct and Qwen3-8B that have pre-quantized FP8-Block/FP8 checkpoints would never use them. This fixes the priority order: 1. Mapper has a pre-quantized model -> use it (always) 2. Mapper has no match + vllm >= 0.12.0 -> on-the-fly FP8 via torchao 3. Mapper has no match + vllm < 0.12.0 -> offline quantization Changes: - loader_utils.py: Move vllm >= 0.12.0 check after mapper lookups - loader.py: Set load_in_fp8=False when mapper resolves to a pre-quantized model to prevent double quantization Tested on B200 with Llama-3.2-1B-Instruct and Qwen3-8B. Corrected code produces results matching baseline (pre-quantized path preserved). --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-10 05:10:13 -08:00
Datta Nimmaturi	3df65308f3	[Misc] Fixes (#4015 ) * convert print to logger * Print but cleaner * Hide model on multiple devices * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix typo transfomers -> transformers, revert MoE message change * Update MoE detection message to show num_experts and target_modules --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-10 02:08:55 -08:00
Roland Tannous	fe5a7d11b6	add llama.cpp prefix to gguf conversion help messages (#4016 )	2026-02-10 01:59:05 -08:00
Fizza Mukhtar	a353fad514	Fix #3397 : Prevent trainer tokenization hang with safe num_proc (#4013 ) * Fix #3397: Prevent trainer tokenization hang with safe num_proc * Fix #3397: Add missing import sys for Windows-safe tokenization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Consolidate with existing num_proc guard in dataset_utils.py --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-10 01:53:46 -08:00
Daniel Han	acfe670357	Fix EmbeddingGemma float16 NaN via FORCE_FLOAT32 for gemma3_text (#4014 ) * Fix EmbeddingGemma float16 NaN by adding gemma3_text to FORCE_FLOAT32 and SDPA lists * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-10 01:40:13 -08:00
Daniel Han	a2f4f04ea5	Inject model reference for dynamic token_type_ids detection in SFTTrainer (#4012 ) * Inject model reference for dynamic token_type_ids detection in SFTTrainer * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-10 00:37:07 -08:00
Daniel Han	a35e866625	Suppress vLLM v1 executor sleep/wake log messages (#4011 ) * Suppress vLLM v1 executor sleep/wake log messages Add HideLoggingMessage filters for vllm.v1.executor.abstract logger to suppress repetitive sleep/wake INFO and WARNING messages that spam training output when UNSLOTH_VLLM_STANDBY is enabled. The existing filter at line 275 handles the legacy vllm.executor.executor_base path; this adds coverage for the v1 engine path used by vllm 0.11+. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-09 23:51:58 -08:00
pre-commit-ci[bot]	293b431e77	[pre-commit.ci] pre-commit autoupdate (#4009 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.14.14 → v0.15.0](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.14...v0.15.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-09 17:32:18 -08:00
shine1i	aab37f8dc2	refactor: consolidate `AvailableVariables` component and enhance variable display logic across dialogs - Moved `AvailableVariables` to shared directory. - Updated dialogs to use shared `AvailableVariables` component. - Enhanced inline expressions and processors dialog with better variable display.	2026-02-09 20:07:41 +01:00
shine1i	d360372d9c	add variable handling components and refactor inputs across dialogs - Introduce `AvailableVariables` for displaying variables linked to configs. - Implement `ChipInput` for dynamic value management in category and subcategory dialogs. - Add `AuxVariableBadges` to aux nodes for displaying variable references. - Update inline components with comboboxes for better user experience. - Replace badges and manual inputs with streamlined reusable components.	2026-02-09 19:48:42 +01:00
Daniel Han	4f5de9ba93	Silence peft target_parameters RuntimeWarning for MoE models (#4008 ) * Silence peft target_parameters RuntimeWarning for MoE models Wrap _get_peft_model calls with warnings.catch_warnings() to suppress the "target_parameters were set but no parameter was matched" warning. This fires on MoE models where expert layers use nn.Parameter naming that peft warns about but handles correctly. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-09 08:25:40 -08:00
Daniel Han	4924a5f6aa	Silence TRL's batch_size=1 padding-free warning in compiled trainer source (#4007 ) Strip the "anihilate"/"annihilate" warning block from compiled trainer source so it does not fire when Unsloth auto-enables padding-free mode with batch size 1 (the common single-GPU case). Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-09 07:55:29 -08:00
Daniel Han	f3f3c9dfb9	Fix dtype mismatch in fp16 + 4-bit/8-bit LoRA training (#4005 ) * Fix dtype mismatch in fp16 + 4-bit/8-bit LoRA training Two fixes for training with dtype=torch.float16 and load_in_4bit=True: 1. fast_lora.py: fast_dequantize() returns tensors in quant_state.dtype (typically bfloat16 or float32), but activations may be float16. The subsequent matmul/addmm operations require matching dtypes. Add dtype casts after each fast_dequantize() call in LoRA_MLP.backward and LoRA_QKV.backward (5 locations total). 2. rl.py: TRL unconditionally casts trainable parameters to bfloat16 in the peft init block. When training with fp16=True, this causes GradScaler to crash since it requires float32 parameters. Make the cast conditional -- use float32 when fp16 is enabled, bfloat16 otherwise. This is a no-op for GRPOTrainer (whose peft init block is already removed by the existing regex), but fixes SFTTrainer and other TRL trainers. Tested with Llama-3.2-1B-Instruct 4-bit on both fp16 and bf16 training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix fp16 + 4-bit LoRA: thread correct_dtype through post_patch Root cause: fast_dequantize returns tensors in quant_state.dtype, which for pre-quantized models is bfloat16 (from config.json). The post_patch methods in llama/gemma/gemma2 call patch_model_and_tokenizer without passing correct_dtype, so quant_state.dtype is never overridden to match the user's requested dtype. This causes a dtype mismatch crash in the backward pass when training with dtype=torch.float16. Fix: pass the user's dtype from from_pretrained through post_patch to patch_model_and_tokenizer as correct_dtype, matching the pattern already used by vision.py. Revert the 5 symptom-level dtype casts in fast_lora.py (upW, gateW, QW, KW, VW) since they are no longer needed with quant_state.dtype properly set at the source. Tested: fp16+4bit and bf16+4bit Llama-3.2-1B-Instruct 15-step SFT runs both complete successfully with similar losses (~1.558 vs ~1.563). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove TRL's unconditional bfloat16 cast instead of patching the dtype TRL 0.26.0+ hardcodes `param.data.to(torch.bfloat16)` for all trainable params in quantized models, citing the QLoRA paper recommendation. This is wrong: it ignores the user's requested dtype and breaks GradScaler when fp16=True. The block exists in sft_trainer, grpo_trainer, rloo_trainer, and reward_trainer (not dpo_trainer). Previous fix patched the cast to be dtype-conditional. This commit replaces the entire guard `if getattr(model, "is_loaded_in_4bit", ...) or getattr(model, "is_loaded_in_8bit", ...):` with `if False:` to disable the block entirely. Unsloth already handles adapter dtype via patch_model_and_tokenizer, making TRL's cast both unnecessary and harmful. For GRPOTrainer the enclosing peft init block is already removed by the regex above, making this a no-op for GRPO. --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-09 07:39:26 -08:00
Daniel Han	0a04b1b22c	Fix trl.experimental thin wrapper compilation and OOM from peft_config overwrite (#4006 ) * Fix trainer compilation failures from trl.experimental thin wrappers * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix OOM from prepare_model_for_kbit_training overwriting peft_config patching --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-09 07:04:55 -08:00
Daniel Han	14fe579629	Fix VLM model + text-only dataset ValueError in TRL 0.22.x (#4004 ) TRL 0.22.x checks _is_vlm (model type) instead of _is_vision_dataset (dataset content, added in 0.25.1+) in _set_signature_columns_if_needed. When _is_vlm=True (e.g. Gemma3), signature columns are set to vision-only ["messages","prompt","completion","images"], which has zero overlap with tokenized text columns [input_ids, labels, attention_mask, ...], causing a ValueError. Fix: expand the VLM branch signature columns to include both vision and text column names. Extra columns not present in the dataset are harmlessly ignored by _remove_unused_columns (it only raises when zero columns match). Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-09 06:24:58 -08:00
Daniel Han	ba7366be53	Fix notebook compatibility for transformers 4.57.6 and TRL 0.22-0.27 (#3998 ) * Patch before compile? * Fix notebook compatibility for transformers 4.57.6 and TRL 0.22-0.27 Fixes several notebook failures discovered during testing all 125 notebooks with transformers==4.57.6 + tRL 0.22.2 and TRL 0.27.1. Warning suppression (import_fixes.py): - Suppress torch 2.9+ pin_memory/is_pinned device deprecation warnings - Suppress cuda.cudart/cuda.nvrtc module deprecation FutureWarning - Filter vllm "Level is deprecated" stderr noise - Filter PydanticSerializationUnexpectedValue warnings - Filter Triton "df: No such file" stderr noise VLM tokenizer loading (vision.py): - Add _construct_vlm_processor_fallback() for models where AutoProcessor.from_pretrained fails (e.g., ERNIE 4.5 VL, LFM2.5-VL) - Wrap processor loading in try/except with fallback to manual construction from separate image_processor + tokenizer components - Add fallback to AutoTokenizer/PreTrainedTokenizerFast when tokenizer loading or patching fails TRL 0.27.1 trainer compatibility (trainer.py): - Add _resolve_trainer_params() to handle thin wrapper trainers that only have def __init__(self, args, kwargs) (e.g., ORPOTrainer in TRL 0.27.1) by walking MRO for real parameter signature VLM _is_vlm detection (rl.py): - Replace blanket _is_vlm=False override with model-architecture-based detection that checks vision_config or ForConditionalGeneration class name, fixing VLM training when bare tokenizer is passed as processing_class ModernBERT SDPA compatibility (loader.py, sentence_transformer.py): - Add "modernbert" to DISABLE_SDPA_MODEL_NAMES to avoid stride alignment issues with torch.compile backward pass - Add DISABLE_SDPA check for sentence transformer models Other fixes (_utils.py): - Suppress false uninitialized weight warnings for VLM multi_modal_projector.layer_norm Tested: 92/125 notebooks pass with TRL 0.22.2, 94/125 with TRL 0.27.1. Remaining failures are infra (missing FFmpeg, network timeouts, GPU arch) not code bugs. [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix KTO shape mismatch on TRL 0.27.2+ and truncation alignment - Patch KTO get_batch_logps to auto-align logits and labels when Unsloth model forward truncates input_ids beyond max_seq_length. TRL 0.27.2 changed _process_tokens to only truncate completions (not prompts), so sequences with long prompts exceed max_seq_length and trigger model-side truncation. The original ValueError is replaced with min-length alignment. - Also truncate attention_mask in LlamaModel forward when input_ids are truncated to max_seq_length, preventing shape mismatches in attention. - Widen except clause in rl_replacements.py openenv import from `except ImportError` to `except (ImportError, NameError, Exception)` to handle vllm SamplingParams NameError in TRL 0.27.2. * Fix TRL 0.26+ thin wrapper resolution, enable ModernBERT SDPA, clean up warning filters TRL 0.26+ thin wrapper resolution (rl.py): - Filter _-prefixed private imports when discovering Trainer/Config classes - Look up Config in separate _config.py module when not found in trainer module - Detect thin wrappers (<1000 chars source) and resolve to experimental parent via MRO walk; use resolved module for imports and create_new_function - Enables all 15 trainers to patch successfully (was 5/15 before) ModernBERT SDPA (loader.py): - Remove "modernbert" from DISABLE_SDPA_MODEL_NAMES - SDPA works correctly for both classification and sentence transformers - Verified: 88.9% accuracy on emotion classification, correct domain-specific embeddings after sentence transformer fine-tuning Warning filter cleanup (import_fixes.py): - Remove cuda.cudart/cuda.nvrtc FutureWarning filters (no such warnings exist in torch 2.9.1+; proactive suppression is unnecessary) [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove multi_modal_projector.layer_norm from uninitialized weight guard The LFM2.5-VL projector LayerNorm is properly initialized by transformers and does not need to be excluded from the uninitialized weight check. The original exclusion was added as a workaround but is no longer needed after the upstream fix. * Add transformers 5.0 compat: rope_theta helper, config-as-dim detection, BatchEncoding guard, try/except for TRL trainer source, push_to_hub_token compiler fix - llama.py: Add _get_rope_theta() helper handling both config.rope_theta and rope_parameters dict - llama.py: Handle BatchEncoding in unsloth_fast_generate (transformers 5.0+ returns BatchEncoding from apply_chat_template) - gemma.py: Detect config passed as dim arg in GemmaFixedRotaryEmbedding - tokenizer_utils.py: Add try/except for TRL trainer getsource in patch_sft_trainer_tokenizer - rl_replacements.py: Add compiler fix replacing bare pop("push_to_hub_token") with pop(..., None) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use trl.experimental string check instead of char-count heuristic for thin wrapper detection The <1000 / >1000 char threshold was fragile -- XPOConfig's parent is only 994 chars and would be skipped. All thin wrappers in TRL 0.26+ contain "trl.experimental" in their deprecation warning, while no real trainer or config class does, making it a reliable detection marker. * Move DISABLE_SDPA_MODEL_NAMES import to module level in sentence_transformer The function-level import was redundant since loader.py is already imported at module level. Move it to the existing loader import line. --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-09 05:11:50 -08:00
siddhu donda	884ce4601f	fix: add inputs_embeds support in _fast_prepare_inputs_for_generation (#3798 ) (#3814 ) Add `inputs_embeds` parameter to `_fast_prepare_inputs_for_generation` so `model.generate(inputs_embeds=...)` works with Unsloth-patched models. Changes: - Add `inputs_embeds=None` to function signature (fixes HF inspect check) - Track `use_inputs_embeds` flag: True when inputs_embeds provided and no cache - Conditionally return inputs_embeds on first step, input_ids on subsequent steps - Handle input_ids being None/empty for batch size and device extraction - Add attention_mask None-guard before slicing Fixes: https://github.com/unslothai/unsloth/issues/3798 Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: siddhudonda <siddhudonda@users.noreply.github.com>	2026-02-09 04:59:43 -08:00
Daniel Han	3b1e8d0ae6	Update README.md	2026-02-09 04:50:54 -08:00
Daniel Han	60dd7269a5	Fix broken documentation links, typos, and formatting in README (#4003 ) - Fix 14 broken documentation links (all returning 404) caused by docs site restructuring (install-and-update -> install, pages moved to /docs/blog/ and /docs/models/tutorials/) - Fix "Qwen2.3-VL" -> "Qwen3-VL" (model does not exist) - Fix incorrect "GSPO" label on gpt-oss GRPO notebook - Fix "4b-bit" typo -> "4-bit" - Fix "sodoku" typo -> "sudoku" - Fix double dash formatting on FP8 GRPO notebook list item - Fix citation URL from http:// to https:// - Update "MultiGPU coming soon" to "is now supported" - Fix Windows installation step numbering (1,3,5,6,7 -> 1,2,3,4,5) - Fix Advanced/Troubleshooting step numbering (5,6,5 -> 4,5,6) Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-09 04:46:46 -08:00
Fizza Mukhtar	c98312f229	Fix multi-GPU loading for quantized models in distributed training (#3917 ) When using torchrun with quantized models (4bit/8bit/fp8), each rank must load the model directly onto its own GPU. The default device_map ("sequential") places everything on GPU 0, causing illegal memory access errors when Accelerate tries to relocate quantized weights. Use the existing prepare_device_map() utility from loader_utils to detect distributed training via LOCAL_RANK/WORLD_SIZE env vars and override device_map to target each rank's local GPU. This is applied in both FastLanguageModel.from_pretrained and FastModel.from_pretrained, covering text, vision, and audio model paths. Fixes #3914 Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-09 04:26:21 -08:00
Mohammad Miadh Angkad	336bec216a	Refactor Ollama template wiring and harden packing helpers (#3890 ) * Refactor Ollama template wiring and harden packing helpers Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu> * Fix Qwen3 and Gemma3n template bindings and tidy packing test helper * Fix gptoss Ollama comment and tinyllama stop parameter - Fix wrong comment referencing gemma3n for gptoss_ollama in chat_templates.py - Add missing stop keyword to tinyllama PARAMETER in ollama_template_mappers.py * Fix _DummyTrainer compatibility across TRL versions The try/except only handled the removal of return_position_ids (TRL v0.24+) but not the absence of padding_free (TRL v0.18.2). Gracefully degrade through all optional collator flags so the test works from trl>=0.18.2 through v0.27+. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu> Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-09 04:04:48 -08:00
RektPunk	f868d8b073	[Feature] seperate gguf file path (#3934 ) * seperate gguf * fix Modelfile log * ollama Modelfile create * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix GGUF file placement: move initial conversion to _gguf dir, fix cleanup - Move initial GGUF files (from convert_to_gguf) into {model_directory}_gguf/ immediately after conversion, so all GGUF outputs live in the dedicated directory regardless of quantization method (fixes bf16-only case where quant == first_conversion skipped the loop and _gguf dir was never created) - Remove redundant gguf_directory/makedirs from inside the re-quant loop since the directory is now created before the loop - Use Path.unlink(missing_ok=True) for base GGUF cleanup robustness - Unify Modelfile location to {save_directory}_gguf/Modelfile for both VLM and non-VLM models - Fix print message to show actual modelfile_location path - Add gguf_directory key to return dict - Clean up {save_directory}_gguf in push_to_hub_gguf error/finally blocks * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-09 04:00:14 -08:00
Etherll	315178b5c3	Add push_to_hub_gguf support for FastSentenceTransformer (#4002 ) * Implement GGUF upload method for SentenceTransformer Added a method to convert and upload SentenceTransformer models to GGUF format, including handling of tokenizer, quantization methods, and repository management on Hugging Face Hub. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-09 00:51:26 -08:00
Daniel Han	b47b081f99	Fix triton 3.6.0 + torch 2.9.x torch.compile crash (missing cluster_dims) (#4001 ) Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-08 20:18:25 -08:00
Daniel Han	c43a5b8f02	Fix multiprocessing crash on Windows/macOS and unify num_proc logic (#3999 ) On Windows and macOS (Python 3.8+), multiprocessing uses the spawn start method. When datasets .map(num_proc=N) is called, it creates a Pool(N) which re-imports __main__ in each worker, causing infinite recursion and a RuntimeError during bootstrapping. Guard the auto-computed dataset_num_proc in the generated Config __init__ by checking multiprocessing.get_start_method() != 'fork'. When the start method is not fork (spawn/forkserver), force dataset_num_proc = None so datasets takes the single-process path. Linux fork behavior is unchanged. Also replace the fixed memory threshold logic with the simpler adaptive approach: cap at 64, then min(num_proc, int(available_gb)), with a safety floor of 1 when available memory is at or below 2GB. Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-08 02:50:06 -08:00
shine1i	9b4293c677	auth ui flow	2026-02-07 16:14:02 +01:00
shine1i	27a11b6383	merge nightly	2026-02-07 14:30:46 +01:00
Roland Tannous	d54671b1a6	Merge pull request #22 from unslothai/feature/auth Added jwt authentication	2026-02-07 14:27:35 +04:00
sshah229	0082627801	fixed the errors- renamed jwt to authentication, used raw jwt, and removed search route	2026-02-07 03:14:30 -07:00
shine1i	58b40a9015	inline sizes, aux nodes resize, and auto layout keep nodes close logic	2026-02-06 13:18:18 +01:00
shine1i	baa2a8426c	cleanup and restctucture	2026-02-06 12:54:55 +01:00
shine1i	0db132813b	decouple aux nodes and text dom for aux nodes	2026-02-06 12:12:32 +01:00
shine1i	03b947d8a2	fix edfe desync	2026-02-06 11:47:29 +01:00
shine1i	8f1e6622ca	ui buttons move, squircle boxes, and resize	2026-02-06 11:28:32 +01:00
sshah229	9c28c592a4	Merge branch 'feature/auth' of https://github.com/unslothai/new-ui-prototype into feature/auth	2026-02-06 03:19:22 -07:00
sshah229	50ff5626f1	refactored the code for username/password and added pydantic models and routes for the same	2026-02-06 03:15:30 -07:00
sshah229	009e93f079	Refactored the training and model routes and added the jwt authentication	2026-02-06 03:15:30 -07:00
sshah229	daa20821c1	refactored the code for username/password and added pydantic models and routes for the same	2026-02-06 02:56:54 -07:00
shine1i	b15f4e7ad5	inline dialogs and react flow ui refactor WIP p1	2026-02-06 00:42:35 +01:00
shine1i	271ddcfb4a	refactor of payload files	2026-02-05 23:41:27 +01:00
shine1i	1ca01e5d21	processors and drop column	2026-02-05 22:26:48 +01:00
shine1i	d67efb8516	new blocks timedelta, and some tweaks	2026-02-05 21:55:50 +01:00
shine1i	2a9a332ce3	model config and provider fixes and inline dialog	2026-02-05 20:55:39 +01:00
pluesclues	c6de138e62	Update rl_replacements.py (#3990 )	2026-02-05 08:22:42 -08:00
Daniel Han	3a4b1e7fc5	Disable torchcodec in transformers when FFmpeg is missing (#3989 ) * Disable torchcodec in transformers when FFmpeg is missing When torchcodec is installed but FFmpeg libraries are unavailable, transformers still thinks torchcodec is available (via find_spec check) and tries to use it for audio loading, causing RuntimeError. This adds disable_torchcodec_if_broken() which tests if torchcodec can actually load its native libraries, and if not, patches transformers' _torchcodec_available to False so it falls back to librosa instead. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-05 06:54:09 -08:00
Daniel Han	145f6aaeb1	Fix cutlass inductor options for PyTorch < 2.8.0 (#3988 ) The cuda.cutlass_epilogue_fusion_enabled and cuda.cutlass_tma_only inductor config options were added in PyTorch 2.8.0. Using these options on older PyTorch versions causes a RuntimeError during GRPOTrainer initialization. This fix adds a version check to only include these options when running PyTorch 2.8.0 or later, allowing GRPO training to work on older PyTorch versions (e.g., Colab environments with PyTorch 2.5-2.7). Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-05 06:40:11 -08:00
Daniel Han	7b42acae94	Fix RuntimeError not caught when torchcodec fails to load (#3987 ) When datasets library has torchcodec installed but FFmpeg libraries are missing, torchcodec raises a RuntimeError during import. The exception handler only caught ImportError and AttributeError, causing the error to propagate and crash Unsloth imports in environments like Colab where FFmpeg may not be installed. Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-02-05 06:35:10 -08:00
Daniel Han	ce256c43bc	Merge branch 'main' of https://github.com/unslothai/unsloth	2026-02-05 06:10:06 -08:00
Daniel Han	f463f692d6	MoE release	2026-02-05 06:09:56 -08:00
Datta Nimmaturi	fad6957555	[MoE] Improve moe kernels for unsloth fine tuning (#3812 ) * Improve MoE performance * small changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix imports * disable autotune * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * LoRA for MoE * Make autotune default * make dy contiguous * use non lora model as base for RL * Revert "use non lora model as base for RL" This reverts commit bc8f15629d060593b2eaf436f158ff5ac9df0d5d. * fixup derp * non TMA [T4] * Revert "non TMA [T4]" This reverts commit 35304566690e7c9ab9632899920c85bff322409a. * Fixes for VL MoE and v5 transformers * [transformers] [v5] remove unused hybridcache (#3910) * remote unused hybridcache * cleanup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * No double compile for qwen3moe * Fix top_k on trl GRPO * Recognise GLM as MoE * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing RotaryEmbeddingConfigMixin * Licensing for autotuning cache * Cleanup --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Erland366 <erland.pg366@gmail.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-02-05 06:03:25 -08:00
Daniel Han	2883ce4091	Update _utils.py	2026-02-05 05:58:00 -08:00
Daniel Han	ff3f78b6b9	Add PyTorch 2.10 and xformers 0.0.34 support (#3985 ) - Add cu126/cu128/cu130 xformers 0.0.34 wheel dependencies for torch 2.10 - Add cu126-torch2100, cu128-torch2100, cu130-torch2100 meta-dependencies - Add cu126-ampere-torch2100, cu128-ampere-torch2100, cu130-ampere-torch2100 variants - Update _auto_install.py version detection for torch 2.10.x - Add CUDA check for torch 2.10 (requires CUDA 12.6, 12.8, or 13.0) - Update README.md with torch 2.10 installation instructions Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-05 05:56:26 -08:00
Daniel Han	7ceebe4554	Silence non-actionable TRL trainer import failures (#3980 ) _patch_trl_rl_trainers enumerates all trainer modules from dir(trl.trainer) and attempts to import each one. Modules like alignprop_trainer fail because they depend on optional packages (diffusers) that may not be installed. The failure is harmless but the print() call produces noise on every import. Change print() to logger.info() so these messages only appear when UNSLOTH_ENABLE_LOGGING=1. Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-02-05 05:32:52 -08:00
Daniel Han	5798267401	Silence third-party deprecation warnings and fix socket leak (#3983 ) * Silence third-party deprecation warnings and fix socket resource leak - Add warning filters for TorchAO deprecated import paths - Filter SWIG builtin type warnings from bitsandbytes/triton - Filter Triton autotuner deprecation warnings - Filter Python 3.12+ multiprocessing fork warnings - Filter resource warnings for unclosed sockets/files - Fix socket leak in has_internet() by properly closing socket * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-05 04:55:52 -08:00
Daniel Han	649865caca	Fix GPT-OSS BlockMask error during inference (#3982 ) GPT-OSS models use eager attention during inference because flex attention returns incorrect results (likely due to left padding). However, when _attn_implementation is set to "flex_attention", transformers creates BlockMask objects which cause a TypeError when passed to the eager attention path: TypeError: unsupported operand type(s) for +=: 'Tensor' and 'BlockMask' This fix excludes GPT-OSS from using flex_attention, keeping it on the eager path to avoid the BlockMask/Tensor type mismatch.	2026-02-05 04:28:46 -08:00
shine1i	7350e52f2f	feat: Model Provider and Model config	2026-02-05 12:20:11 +01:00
Daniel Han	6f3e52bbcf	Prefer flex attention when available (#3979 ) * Enable flex attention by default * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Avoid dropping flex attention when SDPA unsupported --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-05 03:19:04 -08:00
shine1i	fbf5a30c77	sheet icons and llm judge	2026-02-05 11:38:11 +01:00
pluesclues	9b34982509	Trl 0.27.0 update (#3965 ) * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update rl_replacements.py * Update rl.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update rl_replacements.py, remove chat template from codexes commits * Update rl.py, got rid of gradient checkpointing code that did not work --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-04 23:01:16 -08:00
shine1i	4a909ded0e	handle layouting	2026-02-04 17:21:25 +01:00
shine1i	35721763c3	straight lines	2026-02-04 16:37:42 +01:00
shine1i	fbb8adbab5	convert to, lines and fixes	2026-02-04 16:33:42 +01:00
shine1i	4234e23f68	refactor: centralize block definitions and dialogs into registry, streamline node updates using helper utilities	2026-02-04 15:42:46 +01:00
shine1i	dc56229eda	save and import, and fixes	2026-02-04 14:32:49 +01:00
shine1i	9b10d81a46	import export	2026-02-04 14:22:03 +01:00
shine1i	3a4e54ef02	sheet tidy	2026-02-04 14:11:09 +01:00
shine1i	1d5d1b625b	canvaslab v1	2026-02-04 14:06:38 +01:00
Daniel Han	e1c682e6d2	Fix torchvision compatibility check for source builds and future torch versions (#3978 ) * Fix torchvision compatibility check for source builds and future torch versions The torchvision version check raised a hard ImportError for custom/source-built PyTorch installations (e.g. AMD ROCm from source with +git* suffixes), even when the actual build was functional. This also silently skipped any torch version not already in the hardcoded table, giving no warning at all for future releases. Changes: - Detect custom/source builds by checking the raw version string's local identifier against known standard prefixes (cu, rocm, cpu, xpu). Our custom Version() strips local identifiers via regex, so detection must happen on the raw string before parsing. - Downgrade to a warning (instead of ImportError) for custom/source builds, since their version numbers may not follow standard PyPI release pairings. - Add formula-based inference for future torch versions not yet in the table. The torch->torchvision minor version formula (torch 2.x -> tv 0.(x+15)) has held for every release from torch 2.0 through 2.9. For formula-predicted versions, mismatches produce a warning rather than a hard error. - Add UNSLOTH_SKIP_TORCHVISION_CHECK=1 env var to skip the check entirely. - Wrap importlib_version and Version calls in try/except so broken metadata never crashes the import. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: stricter regex, case insensitivity, pre-release detection Fixes three edge cases found during review: 1. Regex precision: cu/xpu now require a trailing digit (cu\d, xpu\d) to avoid false negatives on suffixes like "+custom_build" that happen to start with "cu". cpu/xpu match as exact strings only. 2. Case insensitivity: added re.IGNORECASE so "+ROCM6.3" and "+CPU" are correctly recognized as standard builds rather than custom ones. 3. Pre-release detection: nightly/dev/alpha/beta/rc builds with standard CUDA/ROCm suffixes (e.g. "2.7.0.dev20250301+cu124") now produce a warning instead of a hard ImportError. These builds commonly have version mismatches that are expected during development. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address PR review comments: fullmatch, env var casing, torchvision pre-release 1. Switch re.match to re.fullmatch for the custom build regex so the entire local identifier must match. Fixes false negatives where suffixes like +cu124_custom were misclassified as standard because re.match only checked the start of the string. 2. Use .lower() for the UNSLOTH_SKIP_TORCHVISION_CHECK env var so any casing of "true" / "TRUE" / etc. is accepted. 3. Check torchvision_version_raw for pre-release tags in addition to torch_version_raw, so a stable torch paired with a nightly torchvision (e.g. 0.23.0.dev...) also gets a warning instead of a hard ImportError. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-04 04:50:26 -08:00
sshah229	482e934e09	Refactored the training and model routes and added the jwt authentication	2026-02-04 05:49:29 -07:00
Wasim Yousef Said	82d42fd8e2	Merge pull request #21 from unslothai/feature/cleanup biome fixes, and linter fixes for some issues, and some readibility changes	2026-02-04 04:33:06 -08:00
shine1i	a48bb53e14	cleanup	2026-02-04 13:28:39 +01:00
shine1i	931891b207	add canvaslab	2026-02-04 13:22:41 +01:00
Roland Tannous	a295a624ce	Merge pull request #20 from unslothai/feature/datasets-endpoint Add Datasets Check-Format Endpoint	2026-02-04 00:45:31 +04:00
Roland Tannous	75bb6c08a5	Add datasets check-format endpoint	2026-02-03 20:42:25 +00:00
Roland Tannous	ddf8fd59eb	Merge pull request #19 from unslothai/fix/dataset-utils-custom-mapping fix custom_format_mapping flow for manual column mapping	2026-02-03 22:43:06 +04:00
Roland Tannous	9f9618980d	fix custom_format_mapping flow for manual column mapping	2026-02-03 18:42:07 +00:00
Roland Tannous	5ca7de3698	Merge pull request #18 from unslothai/refactor/add-dataset-detection-status-flag Add `requires_manual_mapping` Flag for Dataset Detection	2026-02-03 22:22:50 +04:00
Roland Tannous	8bd06e3e35	Add Flag for Dataset Detection	2026-02-03 18:21:05 +00:00
Roland Tannous	c88bf7c1a7	Merge pull request #17 from unslothai/fix/refactor-dataset-utils-part2 Fix/refactor dataset utils part2	2026-02-03 22:04:03 +04:00
Roland Tannous	f57757231b	remove duplicates from dataset_utils.py	2026-02-03 18:03:01 +00:00
Roland Tannous	08cfa1ab64	Merge pull request #16 from unslothai/refactor/inference-api-routes-part-1 refactor/inference-api-routes-part-1	2026-02-03 21:00:18 +04:00
Roland Tannous	b4ec0389f0	refactor/inference-api-routes-part-1	2026-02-03 16:57:57 +00:00
Roland Tannous	6df5b1eded	Merge pull request #15 from unslothai/enhance/refactor-dataset-utils Refactor `dataset_utils.py` into focused modules	2026-02-03 18:40:53 +04:00
Roland Tannous	62ddcfa019	Refactor [dataset_utils.py](cci:7://file:///home/support/new-ui-prototype/studio/backend/utils/datasets/dataset_utils.py:0:0-0:0) into focused modules	2026-02-03 14:38:02 +00:00
Daniel Han	4f75ec2fc8	Add vLLM + torch < 2.9.0 + SM100 compatibility check (#3973 ) vLLM's distributed module (device_communicators) crashes with std::bad_alloc when imported on SM100 GPUs (B200/B100/Blackwell) with torch < 2.9.0. This adds an early check that runs before vLLM is imported, providing a helpful error message instead of a cryptic C++ exception. The check: 1. Detects if vLLM is installed 2. Checks if torch version is < 2.9.0 3. Checks if any GPU is SM100 (Blackwell) 4. If all conditions met, raises RuntimeError with clear upgrade instructions	2026-02-03 03:10:24 -08:00
Daniel Han	d5f5b7d6a6	Add TRL truncation regression and metadata loss fixes (Fixes 1 and 3) (#3971 ) * Add TRL truncation regression and metadata loss fixes Fix 1: TRL 0.24.0-0.25.1 right-truncation regression - These versions pass max_length=self.max_prompt_length and truncation=True to the tokenizer, which right-truncates prompts and strips the assistant turn suffix - Use regex to remove these kwargs from the generated code Fix 3: Metadata loss for chat_template_kwargs - TRL 0.24.0+ extracts prompts = [x["prompt"] for x in inputs], losing metadata like reasoning_effort - Inject code to store per-sample chat_template_kwargs on self before extraction - Preserve these kwargs in prompts_text generation for all TRL versions Tested with TRL versions 0.22.2, 0.23.1, 0.24.0, 0.25.1, 0.26.2, and 0.27.1. * Update Fix 1 comment with detailed TRL version behavior explanation Expand the comment for the TRL 0.24.0-0.25.1 truncation regression fix to clarify what each TRL version does: - TRL 0.22.2-0.23.1: Uses truncate_with_protected_tokens() for smart truncation that preserves rightmost tokens and protects special tokens - TRL 0.24.0-0.25.1: Removed smart truncation, passes kwargs directly to tokenizer (max_length, truncation=True, add_special_tokens=False) - TRL 0.26.2+: Removed these kwargs entirely The fix removes these problematic kwargs so 0.24.0-0.25.1 behaves like 0.26.2+ (no tokenizer-level truncation). --------- Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com>	2026-02-03 03:00:12 -08:00
Daniel Han	8f44ae0eda	Fix num_train_epochs=None causing TypeError in GRPOConfig (#3972 ) When users pass `num_train_epochs=None` to GRPOConfig (relying on max_steps to control training duration), Trainer.__init__ fails with: TypeError: '>' not supported between instances of 'NoneType' and 'int' This happens because transformers.Trainer does `args.num_train_epochs > 0` in its __init__ which fails when the value is None. This fix converts None to 3.0 (the default) before Trainer initialization. The actual training duration is still controlled by max_steps since it takes precedence when both are set. Example that now works: ```python config = GRPOConfig( num_train_epochs=None, # Previously caused TypeError max_steps=500, # This controls actual duration ... ) ```	2026-02-03 02:48:40 -08:00
Roland Tannous	d8183237f3	Merge pull request #14 from unslothai/feature/pydantic-models-update Feature/pydantic models update	2026-02-03 14:47:12 +04:00
Roland Tannous	7bb0aeb756	add grad_norm and num_tokens to TrainingProgress response object	2026-02-03 10:35:58 +00:00
Daniel Han	41417693e4	Fix Vision GRPO string prompts and OpenEnv async compatibility (#3964 ) * [fix] Vision GRPO string prompts and OpenEnv async compatibility - Guard prepare_multimodal_messages in GRPO trainer to skip processing when prompts are pre-templated strings. Notebooks that pre-apply apply_chat_template() produce strings with image tokens already embedded; calling prepare_multimodal_messages on those crashes with TypeError. - Apply nest_asyncio when OpenEnv EnvClient exposes async reset/step, so scripts using run_until_complete() wrappers work in all contexts. - Add wrapper to call patch_torchcodec_audio_decoder() from unsloth_zoo for AudioDecoder dict-compatibility. * Add apply_chat_template guard for pre-templated string prompts in Vision GRPO When notebooks pre-apply apply_chat_template, prompts become strings. The existing guard skips prepare_multimodal_messages for strings. This adds a second guard to skip apply_chat_template in the forward_kwargs block, using prompts directly as prompts_text instead. Covers both TRL 0.25.x (no tools param) and TRL 0.26.2+ (with tools=self.tools). Non-matching replacements silently pass for older TRL versions. * Add TRL 0.25.1 single-line variant for apply_chat_template guard TRL 0.25.1 uses single-line formatting for apply_chat_template: apply_chat_template({"prompt": prompt}, ...)["prompt"] While TRL 0.26.2+ uses multi-line formatting: apply_chat_template( {"prompt": prompt}, ... )["prompt"] Add both variants to ensure full backwards compatibility. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-03 02:03:46 -08:00
Daniel Han	949f1ce573	Fix TRL 0.27.0 GRPO compatibility and PEFT model handling (#3969 ) * Fix TRL 0.27.0 GRPO compatibility and PEFT model handling - Remove use_reentrant=False from gradient_checkpointing_kwargs for TRL 0.27.0+ TRL 0.27.0 auto-sets use_reentrant=False in GRPOConfig.__post_init__, but Unsloth gradient checkpointing requires use_reentrant=True. This adds a post-init cleanup that removes the setting when present. - Handle prepare_peft_model standalone function pattern for TRL 0.22.0+ TRL changed from self._prepare_peft_model() method to prepare_peft_model() standalone function. Both patterns are now bypassed to let Unsloth handle PEFT model preparation. Tested with TRL versions 0.22.2, 0.23.1, 0.24.0, 0.25.1, 0.26.2, and 0.27.1. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-03 01:56:31 -08:00
Kaitao Yang	7dd3ae8768	reduce code duplication (#3877 ) * reduce code duplication * address reviewer feedback: keep original function name - Keep original function name `_offload_frozen_module_for_training` - Make `offload_device` parameter Optional (can be None) - Keep original error handling (return None for missing modules_to_save) - Maintain code deduplication by reusing the helper function --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-02-03 00:27:49 -08:00
Daniel Han	8f0b57ae18	Use standard gradient checkpointing for small sequence lengths (#3867 ) * Use standard gradient checkpointing for small sequence lengths When max_seq_length < 512, the overhead of gradient offloading in gc="unsloth" mode is not worth it. Benchmarks on B200 show: \| seq_len \| gc=unsloth \| gc=True \| Difference \| \|---------\|------------\|----------\|------------\| \| 256 \| 6,803 t/s \| 6,993 t/s\| +2.8% \| \| 384 \| 9,889 t/s \| 9,963 t/s\| +0.7% \| \| 512 \| 13,151 t/s \| 13,092 t/s\| -0.4% \| \| 1024 \| 26,662 t/s \| 25,094 t/s\| -5.9% \| The crossover point is around seq_len 384-512. For sequences shorter than 512, we now automatically use standard gradient checkpointing instead of the custom offloading implementation. Additionally, when user explicitly sets use_gradient_checkpointing to True or False in get_peft_model, it now correctly overrides any previous "unsloth" patching from from_pretrained. This ensures consistent behavior regardless of the order of function calls. Updated in three locations: - FastLlamaModel.get_peft_model (llama.py) - FastLanguageModel.from_pretrained (loader.py) - FastModel.from_pretrained (loader.py) * Refactor: extract gradient checkpointing heuristic into utility function Addresses code review feedback to reduce duplication. The gradient checkpointing heuristic logic was duplicated in 3 places: - FastLlamaModel.get_peft_model (llama.py) - FastLanguageModel.from_pretrained (loader.py) - FastModel.from_pretrained (loader.py) Created apply_unsloth_gradient_checkpointing() utility function in _utils.py that handles: - Heuristic: seq < 512 falls back to standard gc - Explicit True/False overrides unpatch previous patching - Returns the effective use_gradient_checkpointing value Net reduction of ~6 lines while improving maintainability. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-02 23:57:09 -08:00
Lei Zhenyuan	8228d89b30	fix for intel devices torch compile configs (#3952 ) * fix for intel devices * Refactor torch_compile_options to use base options with device-specific extensions - Extract common options into base_options shared by all device types - CUDA devices get additional CUDA-specific options - XPU, HIP, and other devices use base options only - Reduces code duplication and improves maintainability * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-02 21:15:06 -08:00
Roland Tannous	6c017686d9	delete tmp directory	2026-02-02 20:02:18 +00:00
Roland Tannous	ecebd30eca	update pydantic models for Models and Training routes	2026-02-02 20:00:23 +00:00
Roland Tannous	4fc9bbf0f1	update pydantic models for Models and Training routes	2026-02-02 20:00:04 +00:00
Roland Tannous	c07b81c083	fix: add utils/models directory that was ignored by gitignore	2026-02-02 19:52:34 +00:00
Roland Tannous	95fe3bed83	fix: restore models directory files deleted during restructure	2026-02-02 19:36:30 +00:00
Roland Tannous	d1dd8a61d6	Merge pull request #11 from unslothai/fix/move-claude-file-to-frontend moved CLAUDE.md into frontend directory	2026-02-02 22:25:52 +04:00
Roland Tannous	54846cf59c	moved CLAUDE.md into frontend directory	2026-02-02 18:25:11 +00:00
Roland Tannous	ae7313193a	Merge pull request #10 from unslothai/feature/add-model-training-yaml-configs added model yaml config files	2026-02-02 22:23:38 +04:00
Roland Tannous	bfa03ebd3c	added model yaml config files	2026-02-02 18:22:14 +00:00
Roland Tannous	92d4f52d7d	Merge pull request #9 from unslothai/fix/remove-backend-backend-redundant-folder remove redundant backend.backend folder	2026-02-02 22:04:38 +04:00
Roland Tannous	7448d2401b	remove redundant backend.backend folder	2026-02-02 18:03:56 +00:00
Roland Tannous	55eb0bb66a	migrated cli. fixed imports. fixed unsloth studio command logic	2026-02-02 17:50:11 +00:00
Roland Tannous	a1b8cd6696	Merge cli from ui-early-access and fix imports	2026-02-02 17:23:30 +00:00
Roland Tannous	396b8fb9a4	Merge pull request #8 from unslothai/feature/frontendui-export-page-client-rebased feat: Add export page, HF model/dataset search, PDF/DOCX extraction logic, modern AUI API, and code cleanup	2026-02-02 19:55:54 +04:00
shine1i	13ce83baf6	refactor: update quantization options in export constants, remove unused entries, and add F32 option	2026-02-02 16:49:07 +01:00
shine1i	4dc19b63f5	feat: add DOCX attachment support using `mammoth`, extend attachment handling to process and extract text from DOCX files	2026-02-02 16:33:10 +01:00
shine1i	3a15b915fc	feat: add PDF attachment support using `unpdf`, extend attachment handling and runtime to process and extract text from PDFs	2026-02-02 16:11:39 +01:00
shine1i	db92ab230b	refactor: enhance chat and UI elements with animations, tooltips, and improved styling; streamline sidebar, navbar, and chat-page interactions in top bar	2026-02-02 15:25:15 +01:00
shine1i	303865438f	refactor: replace depreceated `useAssistantRuntime` with `useAui`, update runtime API calls across chat features for consistency	2026-02-02 15:03:01 +01:00
shine1i	a87f14eccd	refactor: remove unused components, mock data, and redundant logic across chat features; streamline settings and runtime handling for better maintainability	2026-02-02 14:52:58 +01:00
shine1i	0d30950b75	refactor: remove unused model and dataset configurations, simplify `export-page` logic by eliminating modelInfo dependency and redundant params display	2026-02-02 14:22:35 +01:00
shine1i	6abe1d6e35	refactor: streamline combobox logic, improve search handling, and remove unused elements across model and dataset sections	2026-02-02 14:06:34 +01:00
shine1i	99bea160b3	refactor: simplify model and dataset combobox logic, remove curated items, and streamline search handling across components	2026-02-02 13:16:08 +01:00
shine1i	af3e8c20ee	refactor: format and clean up imports, hooks, and UI components for consistent structure and readability across models and datasets sections	2026-02-02 12:51:04 +01:00
shine1i	e705230499	feat: add Hugging Face search integration for datasets and models, extend infinite scroll support, and improve UI components with animations and tooltips	2026-02-02 12:45:41 +01:00
shine1i	e9857dab0f	feat: replace config summary with model export feature, including export methods, quantization options, and new UI components	2026-02-02 11:08:31 +01:00
Roland Tannous	1179735255	Merge pull request #7 from unslothai/fix/restructure-repo-root Add studio root folder and make frontend and backend as subfolders	2026-02-02 13:18:20 +04:00
Roland Tannous	8b80c71fe1	add studio root folder	2026-02-02 09:14:35 +00:00
Roland Tannous	544d6944d1	root studio folder	2026-02-02 09:13:49 +00:00
Roland Tannous	6db66ab1ff	Merge pull request #6 from unslothai/fix/remove-backend-backend-directory Fix/remove backend backend directory	2026-02-02 10:35:50 +04:00
Roland Tannous	9aec1c6cd4	remove redundant backend.backend directory	2026-02-02 06:35:00 +00:00
Datta Nimmaturi	5cf7b4e34f	[fix] qwen3-guard tokenizer (#3959 ) * fix for qwen3-guard tokenizer * Better qwen3guard check * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-01 22:09:15 -08:00
Roland Tannous	ed8839e009	Merge pull request #5 from unslothai/feature/backend-core-restructuring backend restructuring and housekeeping Changes made: - Moved all files from backend/backend/ → backend/core/ with nested subdirectories - Created init.py for each submodule with proper exports - Updated all imports in routes (routes/training.py, routes/models.py) - Updated internal relative imports to use .. for parent references - Deleted old backend/backend/ directory - Moved shared modules (path_utils.py , model_config.py) to utils/ subfolder	2026-02-02 09:56:59 +04:00
Roland Tannous	b4861d345b	Merge branch 'nightly' into feature/backend-core-restructuring	2026-02-02 09:56:19 +04:00
Roland Tannous	ce34bcd0d2	merge conflict .gitignore	2026-02-02 05:53:43 +00:00
Roland Tannous	75bd759108	fix .gitignore merge conflict	2026-02-02 05:51:25 +00:00
Roland Tannous	023405c76a	backend restructuring and housekeeping	2026-02-02 05:48:09 +00:00
Roland Tannous	2761c59012	Merge pull request #4 from unslothai/feature/backend-draft Pushing the initial draft of the backend	2026-02-02 09:33:01 +04:00
sshah229	c042223a7a	moved utils, dataset_utilsand datasets, updated the startTraining pydantic model	2026-02-01 16:49:42 -07:00
Roland Tannous	7b70d8fe70	Merge pull request #3 from unslothai/feature/frontendui-onboarding-dashboard feat: onboarding & dashboard UI	2026-02-01 14:00:54 +04:00
sshah229	d593b069e2	Added the training and models routes	2026-02-01 01:23:16 -07:00
shine1i	aeb5382f0a	feat: track and display reasoning duration, enhance runtime with adapter for copying during inference and edit and UI integration	2026-02-01 09:08:23 +01:00
shine1i	f05db56439	refactor: improve reasoning UI with animations and dynamic behavior, minor CSS and layout tweaks	2026-02-01 08:31:22 +01:00
shine1i	614500c117	refactor: chat input bg with fade, reuse it in composer view as well	2026-02-01 08:09:17 +01:00
shine1i	dea9cf1911	refactor: migrate chat sidebar and UI components to modular sidebar framework, some minor UI tweaks (sidebar, lines)	2026-02-01 07:50:53 +01:00
shine1i	cf0cda5b65	chore: update labels and UI minor adjustments for clarity	2026-02-01 06:57:15 +01:00
shine1i	52bd5ebebd	feat: add frontend UI codebase	2026-01-31 19:34:16 +01:00
Datta Nimmaturi	2deb583389	[trl] vllm trl topk fixup (#3935 ) * [transformers] [v5] remove unused hybridcache (#3910) * remote unused hybridcache * cleanup * Fix top_k on trl GRPO * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-01-31 06:34:07 -08:00
Roland Tannous	a30967a69c	added __init__.py for backend	2026-01-31 08:38:29 +00:00
Roland Tannous	0ef09af008	git repo skeleton structure	2026-01-31 08:27:01 +00:00
Roland Tannous	b5aa137b7f	first commit	2026-01-27 21:19:48 +04:00
Pádraic Slattery	a09bdb6adb	chore: Update outdated GitHub Actions version (#3936 )	2026-01-27 07:19:38 -08:00
pre-commit-ci[bot]	a34eb55ecd	[pre-commit.ci] pre-commit autoupdate (#3937 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.14.13 → v0.14.14](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.13...v0.14.14) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-01-27 07:18:26 -08:00
Daniel Han	29edef68a8	Update pyproject.toml	2026-01-27 07:17:45 -08:00
pluesclues	3fde3a91ee	Grpo compile settings update (#3927 ) * Add torch compile options for GRPOTrainer * Update CUDA settings based on device capability * Add triton persistent TMA matmul condition * Fix syntax for triton.enable_persistent_tma_matmul * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update rl.py * Update rl.py --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-01-24 17:17:55 -08:00
Michael Han	f3efb70823	Embedding model fine-tuning support	2026-01-22 21:35:46 -08:00
Rachel Li	ca9cb26eed	Guard torch.compile on ROCm when triton_key is missing (#3923 ) * Guard torch.compile on ROCm when triton_key missing * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update unsloth/import_fixes.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Tighten ROCm Triton import handling * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Rachel Li <rachelliqx07@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-22 15:46:08 -08:00
Michael Han	08e07e7865	Embedding model support	2026-01-22 14:22:03 -08:00
Daniel Han	a78c6a62e4	Update vision.py	2026-01-22 07:40:51 -08:00
electroglyph	101ab17728	add FastSentenceTransformer for easily finetuning SentenceTransformer models (#3719 ) * add FastSentenceTransformer * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Gemini code review suggestions * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unsloth-zoo patch only fixed usage for XLMRobertaForMaskedLM, this is a fix for XLMRobertaModel * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor do_lower_case * add some comments * force disable FP8 loading * refactor pooling detection, add missing pooling types * add save_pretrained_merged method which gets modules and config * fix _save_pretrained_merged * rename read_pooling_mode, load modules instead of hard-coding em * comment * revert save_pretrained_merged change * propagate trust_remote_code properly * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add super hacky mpnet patch from hell * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor _load_modules, add for_inference to from_pretrained, add transformers 5 code for mpnet, add distilbert patches * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add ModernBert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * deberta-v2 support (provisional), fix remote_code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add generic add_pooling_layer logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix for missing config * add push_to_hub_merged * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * edit messages, throw exception if no HF token * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix device_map mismatch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comments, move import, other suggestions by Datta0 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * re-add adapter removal to save_pretrained_merged, but if saving to folder which had adapters before, leave them * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add unsloth branding to save_pretrained_merged * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * propagate dtype to internal module when loading for inference * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix mpnet gradient checkpointing for torch >= 2.9 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * same thing for transformers 5, oops =) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix FastSentenceTransformer performance: 6x speedup via torch.compile + SDPA The original implementation was 31% slower than naive SentenceTransformer due to conflicting decorators from Unsloth's auto-compiler (@torch.compile on attention modules but @torch.compiler.disable on sub-modules). Changes: - Add fast encoder path that bypasses Unsloth patching for encoder models - Use native torch.compile with mode="reduce-overhead" for 6x speedup - Auto-detect and enable SDPA for models that support it (BERT, RoBERTa, etc.) - Change defaults: load_in_16bit=True, load_in_4bit=False (16-bit is optimal) - Change default: use_gradient_checkpointing=False (conflicts with torch.compile) - Add UNSLOTH_COMPILE_DISABLE=1 env var to fall back to old path if needed Supported encoder types: mpnet, bert, distilbert, roberta, xlm-roberta, albert, electra Benchmark results (BS=32, seq_len=128): - Naive 16-bit LoRA: 13-50ms per iter - Unsloth 16-bit LoRA: 2-9ms per iter (5.4x-6.7x faster) - Memory usage: 61MB-1.3GB (even largest model fits easily) Note: 4-bit + torch.compile has a PyTorch bug (pytorch/pytorch#90665). 4-bit is also 1.7-1.9x slower than 16-bit due to dequantization overhead, so 16-bit is recommended for these small encoder models anyway. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use Unsloth's prepare_model_for_kbit_training for consistency Changed from peft.prepare_model_for_kbit_training to unsloth.models._utils.prepare_model_for_kbit_training. Unsloth's version provides: - Float32 mixed precision upcasting for LoRA layers - Better numerical stability - Consistency with rest of Unsloth codebase * Use relative imports and add float16 machine support - Changed absolute import to relative: from ._utils import prepare_model_for_kbit_training - Added SUPPORTS_BFLOAT16 import for proper dtype detection - Handle devices that don't support bfloat16 by falling back to float16 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add save_pretrained_torchao * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add auto-compile for torch.compile based on training step breakeven analysis Changes: - Change default compile_mode from "reduce-overhead" to "default" since CUDA Graphs (used by reduce-overhead) is incompatible with PEFT/LoRA - Add _estimate_compile_threshold() to calculate minimum steps needed for torch.compile to be beneficial based on model parameter count - Add _apply_torch_compile() helper with accelerate unwrap_model bug workaround - Defer torch.compile application to trainer initialization time so we can check max_steps against the breakeven threshold - Patch SentenceTransformerTrainer to auto-apply compile when max_steps exceeds the calculated threshold Breakeven thresholds (with 1.2x safety margin): - 22M params (MiniLM): ~1388 steps - 110M params (mpnet): ~242 steps - 335M params (snowflake): ~203 steps This ensures torch.compile warmup cost is only paid when training is long enough to benefit from the speedup. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * do QAT preparation for fast path * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix double loading model, thanks Etherl * do mpnet gradient checkpoint patch if gc is enabled * remove distilbert patches from mpnet fix * sanity check on model params, thanks Etherl * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add save_pretrained_gguf, thanks Etherl * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refine compile threshold estimation for sentence transformers * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-01-22 07:35:55 -08:00
Daniel Han	09ebbf6e63	Versioning	2026-01-22 07:33:59 -08:00
Roland Tannous	affd52e868	Merge pull request #49 from unslothai/feature/default-param-yaml Yaml config for default parameters	2026-01-22 05:42:42 +04:00
sshah229	30c75e6d41	fixed parameters (finetune language, vision, attention layers, and mlp_modules) not updating	2026-01-21 02:40:25 -07:00
Daniel Han	509fd4227c	Handle Transformers 5 vLLM import errors (#3908 ) * Handle Transformers 5 vLLM import errors * Deduplicate vLLM transformers mismatch handling --------- Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com>	2026-01-20 01:02:39 -08:00
Roland Tannous	02982ceeba	set create public gradio share link to true	2026-01-20 07:36:57 +00:00
Roland Tannous	32e72a12ae	add studio command line argument to start unsloth studio UI	2026-01-20 07:27:05 +00:00
pluesclues	9172be8cfc	Fix vllm ipykernel patch (#3907 ) * Implement vLLM patch for notebook detection Add patch for vLLM compatibility in notebook environments. * Fix sys.stdout.fileno for vLLM compatibility Patch sys.stdout.fileno for vLLM compatibility in notebooks. * Add patch_vllm_for_notebooks to initialization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden vLLM notebook stdout patch * Use logger for vLLM notebook patch * Clarify vLLM notebook patch log message --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com>	2026-01-19 21:04:27 -08:00
pre-commit-ci[bot]	157c929354	[pre-commit.ci] pre-commit autoupdate (#3905 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.14.11 → v0.14.13](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.11...v0.14.13) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-01-19 18:42:13 -08:00
electroglyph	d80e69258c	add weight-only int8 QAT scheme and update tests for torchao 0.15.0 (#3859 ) * add int8 weight-only QAT scheme, add test, fix tests for current torchao version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change quantization to PerAxis * lambda =/ * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add torchao messages, remove group_size from int8 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * raise exception on missing torchao * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * touch up the torchao imports * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-01-16 09:32:29 +05:30
Michael Han	fda54f2634	Update README.md	2026-01-15 08:01:01 -08:00
Daniel Han	c80faef722	Update pyproject.toml	2026-01-15 07:00:25 -08:00
Daniel Han	f719d2b7bd	Update _utils.py	2026-01-15 05:09:26 -08:00
pluesclues	e83cbc9fe0	Merge pull request #3628 from pluesclues/alternative_compute_chunked_loss Chunk Across Batch and Context length for logprob calculations for grpo	2026-01-15 05:01:19 -08:00
Daniel Han	4fc06bd7fb	Merge pull request #3895 from Datta0/rl_ref_trl [trl] use non lora model as base for RL	2026-01-15 03:33:09 -08:00
pre-commit-ci[bot]	e360386719	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-01-15 11:25:11 +00:00
Datta Nimmaturi	b204148136	use non lora model as base for RL	2026-01-15 11:23:21 +00:00
Daniel Han	5c8ccf0671	Merge pull request #3879 from ducviet00/fix-gc Disable gradient checkpointing when explicitly off for vision	2026-01-14 04:32:02 -08:00
Michael Han	b03b014336	Update template.md	2026-01-14 03:45:35 -08:00
Daniel Han	f4e378dcc3	Merge pull request #3880 from f14-bertolotti/f14-wrong-ndim wrong number of dimensions	2026-01-12 21:32:48 -08:00
Daniel Han	1e790b03b2	Apply suggestion from @danielhanchen	2026-01-12 21:32:20 -08:00
Daniel Han	640404a93e	Merge pull request #3881 from unslothai/pre-commit-ci-update-config [pre-commit.ci] pre-commit autoupdate	2026-01-12 21:29:58 -08:00
pre-commit-ci[bot]	ab68311fdd	[pre-commit.ci] pre-commit autoupdate updates: - [github.com/astral-sh/ruff-pre-commit: v0.14.10 → v0.14.11](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.10...v0.14.11)	2026-01-12 19:08:13 +00:00
Francesco Bertolotti	e15445f13f	wrong number of dimensions	2026-01-12 16:19:43 +01:00
Duc-Viet Hoang	432864cb25	Complete disable `gradient_checkpointing` for vision when `use_gradient_checkpointing=False`	2026-01-12 10:03:54 +07:00
Daniel Han	6aedc769ee	Merge pull request #3865 from ykaitao/ktyang_configure_embedding_for_training reduce code duplication by _offload_frozen_module_for_training	2026-01-09 21:02:55 -08:00
danielhanchen	3ffc8b1a5b	fix: use peft.utils.other for ModulesToSaveWrapper import ModulesToSaveWrapper was removed from peft.tuners.tuners_utils in PEFT 0.16.0. The class has been available in peft.utils.other since at least PEFT 0.7.1, which is the minimum version Unsloth requires. This fixes the ImportError when using PEFT >= 0.16.0.	2026-01-09 23:24:39 +00:00
Kaitao Yang	f7e17fb513	reduce code duplication by _offload_frozen_module_for_training	2026-01-09 06:07:38 -08:00
Daniel Han	6b7063713a	Merge pull request #3869 from hnxnq7/fix-kaggle-telemetry-detection Fix Kaggle telemetry misclassification when COLAB_ keys exist	2026-01-08 17:29:23 -08:00
pre-commit-ci[bot]	76d3e469a0	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-01-09 00:33:01 +00:00
Rachel Li	93cf0ee805	Fix Kaggle telemetry detection & address review feedback - Fix Kaggle misclassification by prioritizing filesystem markers over env vars - Preserve telemetry pings when statistics is explicitly provided - Replace bare except with except Exception - Minor cleanup based on automated review feedback	2026-01-08 19:32:33 -05:00
Rachel Li	7b7287c9b9	Fix telemetry ping regression for explicit statistics Fixed Codex regression: keep snapshot_download pings for explicit statistics values; detection only runs when statistics is None. Also replaced bare except.	2026-01-08 19:20:24 -05:00
pre-commit-ci[bot]	d0f3e4d32e	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-01-09 00:04:59 +00:00
Rachel Li	8c7b89227e	Update _utils.py fixed indentation	2026-01-08 19:04:30 -05:00
pre-commit-ci[bot]	f52d370da6	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-01-08 23:49:55 +00:00
Rachel Li	c686140c68	Fix Kaggle telemetry misclassification when COLAB_ keys exist Problem: Kaggle notebook environments can expose both KAGGLE_* and COLAB_* environment keys. _get_statistics currently checks COLAB_ before KAGGLE_, causing Kaggle sessions to be labeled colab/colabpro. Prefer filesystem markers (e.g. /kaggle/working, /content + /opt/colab) before env-key heuristics, then fall back to the existing env-key checks. This avoids misclassification when providers leak overlapping env vars. Kaggle test notebook: https://www.kaggle.com/code/hnxnq07/kaggle-stats-gathering-test	2026-01-08 18:44:22 -05:00
Daniel Han	0f07e36813	Merge pull request #3612 from Vangmay/feature/raw-text-dataprep Feature/raw text dataprep	2026-01-08 03:38:15 -08:00
pre-commit-ci[bot]	3620564025	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-01-08 11:35:21 +00:00
Daniel Han	16a2d901fa	Fix bugs and add improvements to RawTextDataLoader - Fix test file: use return_tokenized instead of return_tensors - Fix test file: use text_dataset instead of undefined dataset variable - Move parameter validation to constructor (fail fast on invalid params) - Add labels field in tokenized output for causal LM training - Add empty file handling with clear error message - Add tests for constructor validation and labels field	2026-01-08 11:35:00 +00:00
Daniel Han	e6536a5884	Merge pull request #3863 from unslothai/fix/fbgemm-cutlass-errors-sm100 Fix FBGEMM/CUTLASS errors on SM100 (Blackwell) GPUs	2026-01-08 03:19:53 -08:00
pre-commit-ci[bot]	2ee55010d3	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-01-08 04:15:17 +00:00
danielhanchen	f26948b493	Fix FBGEMM/CUTLASS errors on SM100 (Blackwell) GPUs This PR fixes the "Arch conditional MMA instruction used without targeting appropriate compute capability. Aborting." errors that occur when using FBGEMM on Blackwell GPUs (B200/B100, SM100). Changes: - Add stderr filters in import_fixes.py for CUTLASS/FBGEMM MMA errors - Add warning filters for various deprecation messages - Update check_fbgemm_gpu_version() to disable FBGEMM instead of raising an error when old versions are detected - Update test_has_fbgemm() in fp8.py to catch broader CUTLASS/CUDA errors and gracefully fall back to Triton kernels - Update loader_utils.py to disable FBGEMM instead of raising ValueError for old fbgemm_gpu versions The key behavior change is that FBGEMM errors no longer crash the script. Instead, FBGEMM is disabled and Triton kernels are used automatically. This allows Unsloth to work on SM100 GPUs where CUTLASS SM90 kernels fail, and also gracefully handles old FBGEMM versions.	2026-01-08 04:14:53 +00:00
Daniel Han	d930479aa7	Merge pull request #3857 from Datta0/modelscope_stats [ModelScope] Disable stats when modelscope is being used	2026-01-06 02:56:55 -08:00
pre-commit-ci[bot]	b73f6a9be0	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-01-06 10:00:17 +00:00
Datta Nimmaturi	dc83a17239	Check env var explicitly Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-06 15:30:06 +05:30
Datta Nimmaturi	67caa21231	Disable stats when modelscope is being used	2026-01-06 09:53:20 +00:00
Daniel Han	52935bb00f	Versioning	2026-01-05 07:37:08 -08:00
Daniel Han	3b4ac4aa1c	Merge pull request #3843 from unslothai/fix-grpo-version-compat Unify Version usage and fix TRL version handling	2026-01-05 06:07:41 -08:00
Daniel Han	7e13c424c9	Merge pull request #3851 from unslothai/grpo-fix-on-pr3754 GRPO: restore model mode after generate (stacked on #3754)	2026-01-05 06:05:24 -08:00
danielhanchen	3240ab3391	Merge main into grpo-fix-on-pr3754	2026-01-05 14:02:18 +00:00
danielhanchen	3efb799aad	Revert rl_replacements GRPO edits	2026-01-05 13:55:08 +00:00
danielhanchen	6ede9a735d	Fix GRPO training state restoration	2026-01-05 13:50:48 +00:00
pre-commit-ci[bot]	18f335cec5	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-01-05 13:39:16 +00:00
danielhanchen	d5fcc7ddde	Restore TRL version fallback in rl.py	2026-01-05 13:39:03 +00:00
Daniel Han	7f93aa0a78	Merge branch 'main' into fix-grpo-version-compat	2026-01-05 05:31:42 -08:00
danielhanchen	9d4ccdbff5	Drop rl.py GRPO changes from this branch	2026-01-05 13:29:58 +00:00
Daniel Han	a1eaf90c7b	Merge pull request #3849 from unslothai/fix-pdl-use-vllm-version-check Replace GitHub API check with vLLM version check for PDL fix	2026-01-05 05:22:16 -08:00
pre-commit-ci[bot]	cd8c6d773d	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-01-05 13:19:44 +00:00
Daniel Han	44f420db4f	Address review feedback: add constant and debug logging	2026-01-05 13:19:37 +00:00
Daniel Han	e8def1194d	Replace GitHub API check with vLLM version check for PDL fix The GitHub issue check had issues: 1. Network latency on import 2. Issue being closed does not mean the fix is in the installed vLLM version Now skip the PDL workaround if vLLM version > 0.13.2, which is when the upstream fix is expected to be included.	2026-01-05 13:15:17 +00:00
Daniel Han	45e27a841d	Merge pull request #3836 from ykaitao/remove_unused_variable_BlockDiagonalCausalMask remove unused variable BlockDiagonalCausalMask	2026-01-05 04:42:25 -08:00
Daniel Han	6c79d84318	Merge pull request #3842 from unslothai/fix-vllm-chat-template-sync Sync chat_template from tokenizer to vLLM	2026-01-05 04:38:39 -08:00
Daniel Han	10926b0e3a	Merge pull request #3841 from unslothai/fix-vllm-pdl-blackwell Fix vLLM PDL bug on Blackwell GPUs (B200/B100)	2026-01-05 04:37:58 -08:00
Daniel Han	c8a585a589	Keep PDL module check but remove unnecessary env var setting The check skips the GitHub API call for old vLLM versions. No need to set TRITON_DISABLE_PDL for versions without PDL support.	2026-01-05 12:34:32 +00:00
Daniel Han	bc0f1514f2	Remove unnecessary PDL module existence check Old vLLM versions without PDL modules don't need the fix. The patching code already handles missing modules gracefully.	2026-01-05 12:32:16 +00:00
Daniel Han	f22a35d903	Add None check for vLLM tokenizer - Check _vllm_tok is not None before accessing attributes - Use getattr for safer chat_template access	2026-01-05 10:02:11 +00:00
pre-commit-ci[bot]	defbf038b2	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-01-05 07:03:35 +00:00
danielhanchen	8e941a6422	Improve TRL compatibility and GRPO state restore	2026-01-05 07:02:36 +00:00
Daniel Han	9860d8859d	Fix PDL patch: target utils.py source module and clear lru_cache - Patch vllm.lora.ops.triton_ops.utils directly where supports_pdl is defined - Clear lru_cache before patching to prevent stale cached results - Add fused_moe_lora_op to consumer modules list - Use args, *kwargs in fake function for compatibility	2026-01-05 06:53:42 +00:00
Daniel Han	ff846db6f6	Combine nested if statements for clarity	2026-01-05 05:25:53 +00:00
pre-commit-ci[bot]	dabfa79a16	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-01-05 05:24:59 +00:00
Daniel Han	e627adb4d5	Address review feedback: refactor and scan all GPUs - Add _spec_exists helper function to reduce duplication - Scan all GPUs for SM100 instead of just device 0 - Use loop for module patching to improve maintainability	2026-01-05 05:24:52 +00:00
Daniel Han	5976c3f10f	Add tokenizer fallback for chat_template sync	2026-01-05 05:10:24 +00:00
Daniel Han	e727c43685	Sync chat_template from tokenizer to vLLM When using base models with custom chat templates applied after loading, vLLM's internal tokenizer may not have the chat_template set. This causes issues during RL training with vLLM inference. This fix syncs the chat_template from the processing_class (the tokenizer you loaded and configured) to vLLM's internal tokenizer during trainer initialization, but only if vLLM's tokenizer does not already have one set.	2026-01-05 05:03:56 +00:00
pre-commit-ci[bot]	1a329b9e4f	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-01-05 05:03:29 +00:00
Daniel Han	594d3baffe	Fix vLLM PDL bug on Blackwell GPUs (B200/B100) vLLM's LoRA Triton kernels use tl.extra.cuda.gdc_wait() for PDL optimization on SM90+ GPUs. This fails on SM100 (Blackwell) during CUDA graph capture because Triton's pipeliner cannot handle gdc_wait in complex kernels. This fix: - Detects SM100 GPUs and applies the workaround automatically - Sets TRITON_DISABLE_PDL=1 environment variable - Monkey-patches supports_pdl to return False in lora_expand_op and lora_shrink_op - Checks GitHub issue #30872 status (with 3s timeout) to auto-disable the workaround once the upstream fix is merged - Includes quick internet connectivity check (0.5s) to avoid delays when offline Fixes the error: 'tt.elementwise_inline_asm' op pipeliner doesn't know how to predicate this op LLVM ERROR: Fatal pipeliner error See: https://github.com/vllm-project/vllm/issues/30872	2026-01-05 05:02:53 +00:00
Kaitao Yang	d66548f904	remove unused variable BlockDiagonalCausalMask	2026-01-04 09:21:44 -08:00
Daniel Han	1dd67b372e	Versioning	2026-01-04 06:12:44 -08:00
Daniel Han	eef05330ca	Merge pull request #3835 from unslothai/quant-config-respect Respect user quantization_config	2026-01-04 05:43:20 -08:00
Daniel Han	7bf648a882	Merge pull request #3834 from unslothai/rl-fixes rl.py fixes: buffer reset, safer attribute access, typo fix	2026-01-04 05:25:45 -08:00
danielhanchen	15052dc8e7	Keep 4bit flag for fast_inference	2026-01-04 13:18:15 +00:00
danielhanchen	e72808553f	Handle dict quantization_config flags	2026-01-04 13:14:03 +00:00
danielhanchen	3bfc927984	Respect user quantization_config	2026-01-04 13:03:06 +00:00
danielhanchen	0b1dbefacb	Fix psutil.cpu_count() potentially returning None in save.py	2026-01-04 12:58:45 +00:00
danielhanchen	d1d9832e70	Handle older unsloth-zoo without reset_unsloth_gradient_checkpointing_buffers	2026-01-04 12:57:10 +00:00
danielhanchen	762ef9a20f	rl.py fixes: buffer reset, safer attribute access, typo fix 1. Auto-reset gradient checkpointing buffers after trainer.train() - Import and call reset_unsloth_gradient_checkpointing_buffers() in prepare_for_training_mode wrapper to free memory after training while keeping buffers ready for subsequent runs 2. Replace eval/exec with safer getattr/setattr - eval(f"trl.trainer.{trainer}") -> getattr(trl.trainer, trainer) - exec(f"...{unwrap} = ...") -> setattr(current_trainer, unwrap, ...) - exec(f"Trainer.prediction_step=...") -> direct assignment 3. Fix psutil.cpu_count() potentially returning None - Change psutil.cpu_count()+4 to (psutil.cpu_count() or 1)+4 - Prevents TypeError on systems where cpu_count() returns None 4. Fix typo: oriignal_is_vlm_text -> original_is_vlm_text	2026-01-04 12:21:39 +00:00
Daniel Han	dc538896a5	Merge pull request #3832 from ykaitao/ktyang_remove_redundant_code_has_block remove redundant code of has_block	2026-01-03 23:18:27 -08:00
Kaitao Yang	d84602e549	remove redundant code of has_block	2026-01-03 22:38:37 -08:00
Daniel Han	6753691c92	Merge pull request #3822 from Fizza-Mukhtar/fix/llama-build-curl Make llama.cpp CURL dependency optional when building from source	2026-01-03 22:12:50 -08:00
pre-commit-ci[bot]	975e36f888	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-01-02 16:58:04 +00:00
Fizza-Mukhtar	1b4abe7a4e	Make llama.cpp CURL support optional during CMake builds	2026-01-02 08:55:58 -08:00
Fizza-Mukhtar	22210112c6	Make llama.cpp CURL support optional during CMake builds	2026-01-02 08:42:59 -08:00
Daniel Han	dc6df32f81	Merge pull request #3821 from unslothai/nightly Bug fixes	2026-01-02 06:22:08 -08:00
Daniel Han	52aed3ad14	Bug fixes	2026-01-02 06:07:16 -08:00
Daniel Han	e109102a0d	Merge branch 'main' into nightly	2026-01-02 06:06:11 -08:00
Daniel Han	5f2bd3c6e1	Merge pull request #3820 from unslothai/fix/fast-generate-wrapper-helpful-errors Add helpful error messages for fast_generate when fast_inference=False	2026-01-02 06:02:52 -08:00
pre-commit-ci[bot]	bd45518ba0	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-01-02 13:58:50 +00:00
danielhanchen	a52c3e545a	Add helpful error messages for fast_generate when fast_inference=False When users load a model with fast_inference=False but then try to use vLLM-style arguments with fast_generate, they previously got confusing errors. This adds a wrapper that detects common mistakes and provides helpful guidance: - Using sampling_params: explains to use HF generate args instead - Using lora_request: explains LoRA weights are already merged - Passing text strings: shows how to tokenize input first Changes: - Add make_fast_generate_wrapper to _utils.py - Apply wrapper in llama.py when fast_inference=False - Apply wrapper in vision.py when fast_inference=False	2026-01-02 13:58:08 +00:00
Daniel Han	2f7f260213	Merge branch 'main' into nightly	2026-01-02 05:40:32 -08:00
Daniel Han	ee0a242429	Update import_fixes.py	2026-01-02 05:05:47 -08:00
Daniel Han	7459010ab3	Update import_fixes.py	2026-01-02 03:41:51 -08:00
Daniel Han	26fea0ff35	Update loader.py	2026-01-02 02:48:28 -08:00
Daniel Han	8d44bd35d3	fix_huggingface_hub	2026-01-02 00:14:44 -08:00
Daniel Han	27f84990b5	Merge pull request #3818 from unslothai/fix-gemma3-qat-stability Fix Gemma3 QAT training instability with int8-int4 scheme	2026-01-01 23:23:55 -08:00
danielhanchen	697ea5d1c1	Fix Gemma3 QAT training instability with int8-int4 scheme Gemma3 models have a large vocabulary (262144 tokens) which causes training loss to explode when using int8 embedding quantization. This fix auto-detects Gemma3 models and switches from int8-int4 (phone-deployment) to int4 weight-only QAT for stable training.	2026-01-02 07:19:08 +00:00
Dan Saunders	f47ebfd237	CLI command for UI	2026-01-01 13:50:22 -05:00
Daniel Han	12004df0cb	Merge pull request #3711 from oKatanaaa/ensure-weight-tying FIX: weight tying for LoRA embeddings and lm_head	2026-01-01 04:55:01 -08:00
Daniel	41e6fe557a	Add TODO comment for ensure_weight_tying in vision models 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-01 12:54:21 +00:00
Daniel Han	94c0329e38	Merge pull request #3806 from Fizza-Mukhtar/fix/3d-tensor-matmul Fix 3D tensor support for bitsandbytes 8-bit matmul in forward pass	2026-01-01 04:07:43 -08:00
Daniel Han	f205be0ea7	Fix correctness bugs across multiple model files (#3813 ) 1. cohere.py:347-348 - Fixed wrong variable names in QK normalization. Used `Q`/`K` but variables were named `Qn`/`Kn`. This caused NameError when `use_qk_norm=True` (e.g., c4ai-command-r-plus models). 2. cohere.py:482 - Fixed wrong object reference in inference loop. Used `self.mlp` but should be `decoder_layer.mlp` since we're iterating through decoder layers. Caused AttributeError during inference. 3. falcon_h1.py:459,461 - Fixed wrong attribute names in inference path. Used `post_attention_layernorm` and `mlp` but Falcon H1 uses `pre_ff_layernorm` and `feed_forward`. Caused AttributeError during generation. 4. qwen3_moe.py:210 - Fixed wrong module path with incorrect capitalization. Used `transformers.models.Qwen3Moe` but should be `transformers.models.qwen3_moe`. Caused AttributeError when patching rotary embeddings. 5. qwen3_moe.py:239 - Fixed wrong model_patcher class. Used `FastQwen3Model` but should be `FastQwen3MoeModel` for MoE models. Caused incorrect patching for Qwen3 MoE models. 6. hf_hub.py:21-22 - Fixed floor division and missing return for billion values. Used `//` instead of `/` for millions, and had no return for values >= 1B. Caused incorrect formatting and None return for large numbers. 7. save.py:550 - Fixed self-assignment that did nothing. `sharded_ram_usage = sharded_ram_usage` should be `= max_shard_size`. Caused integer shard sizes to be ignored. 8. rl.py:562-567 - Fixed orphan string not included in length_check. The elif branch for max_seq_length validation was a standalone string expression, not concatenated to length_check. Caused silent skip of the max_seq_length > model_max_seq_length warning. 9. granite.py:49-52 - Fixed wrong model name and version in error message. Said "Gemma2" and "4.42.3" but should be "Granite" and "4.45.0".	2026-01-01 02:36:33 -08:00
Daniel Han	cbff64131f	Fix correctness bugs in rl.py, rl_replacements.py, and vision.py (#3811 ) * Fix correctness bugs in rl.py, rl_replacements.py, and vision.py 1. rl_replacements.py (lines 864, 870): Fixed undefined `nanmin`/`nanmax` functions by using `.nan_to_num(nan=inf/-inf).min()/.max()` pattern. PyTorch doesn't have torch.nanmin/nanmax, so we replace NaN values before computing min/max. 2. vision.py (line 150): Fixed bug where code checked for "input" key but then accessed kwargs["input_ids"] instead of kwargs["input"]. 3. vision.py (line 159): Fixed bug where literal string "key" was used instead of the variable `key` when accessing kwargs. 4. rl.py (lines 903, 905): Fixed non-existent `MathError` exception by replacing with `ValueError`. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-12-31 21:35:48 -08:00
Michael Han	9b5571fb69	Refresh of Unsloth README.md with https://unsloth.ai/docs	2025-12-30 15:14:27 -08:00
pre-commit-ci[bot]	c5a1eccb51	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-30 15:58:41 +00:00
Fizza-Mukhtar	9116aa8dfd	Fix 3D tensor support for bitsandbytes 8-bit matmul in forward pass	2025-12-30 07:56:01 -08:00
Fizza-Mukhtar	4f59695810	Fix 3D tensor support for bitsandbytes 8-bit matmul in forward pass	2025-12-30 07:08:10 -08:00
lif	8ab0c0c913	fix: add support for init_lora_weights="corda" in get_peft_model (#3794 ) Add "corda" as an allowed value for the init_lora_weights parameter in FastLanguageModel.get_peft_model() and FastBaseModel.get_peft_model(). This enables users to use CorDA (Correlation-aware Decomposed Adaptation) initialization from PEFT, which provides an alternative LoRA initialization strategy for improved finetuning performance. Fixes #3693 Signed-off-by: majiayu000 <1835304752@qq.com>	2025-12-28 23:17:58 -08:00
ゆり	5f1361aea3	Fix Boolean value of Tensor ambiguity error in mistral.py (#3790 ) * Fix is_contiguous() method call and remove duplicate imports - Fix bug in rope_embedding.py where is_contiguous was used without parentheses, causing the method object (always truthy) to be evaluated instead of calling the method. This fixes issue #3781 where fast rope backpropagation was broken for zero strided/non-contiguous tensors. - Remove duplicate `import torch` in rl.py (lines 20 and 25) - Remove duplicate `import functools` and `import types` in vision.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix Boolean value of Tensor ambiguity error in mistral.py Replace `or` operator with explicit `is None` check when getting n_items from kwargs. The `or` operator fails when the value is a Tensor because Python cannot determine the boolean value of a multi-element tensor. Fixes #3766 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update rope_embedding.py --------- Co-authored-by: yurekami <yurekami@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-12-28 21:30:55 -08:00
Fizza Mukhtar	091a801386	Fix crash when trl.experimental.openenv is unavailable (#3787 ) * Guard optional trl.experimental.openenv usage in RL patches * Simplify optional trl.openenv import handling * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-12-28 21:23:51 -08:00
Francesco Bertolotti	dabf2a901b	fastrope fix for zero strided tensors (#3782 ) Co-authored-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai>	2025-12-28 21:21:48 -08:00
Alkın Ünlü	6180adda1b	fix(trainer): import psutil to prevent NameError in _prepare_dataset (#3780 ) * fix(trainer): import psutil to prevent NameError in _prepare_dataset Fixes #3777 * Update rl.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-12-28 21:18:02 -08:00
Daniel Han	f40fa7a0e8	Update FUNDING.yml (#3792 )	2025-12-28 19:57:43 -08:00
Michael Han	96de7a817d	Update README for new unsloth.ai/docs.md	2025-12-27 00:49:19 -08:00
Fizza Mukhtar	f57cd25d46	Clarify NotImplementedError for fast_inference with full_finetuning (#3768 ) * Improve error message for fast_inference and full_finetuning * Refine error message string formatting * Update unsloth/models/vision.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-12-25 18:46:13 -08:00
Strahinja Stamenkovic	a058885b8a	Add missing import of inspect (#3778 ) * Add missing import of inspect * Update device_type.py	2025-12-25 18:43:59 -08:00
pre-commit-ci[bot]	7c1d528c00	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-23 20:06:37 +00:00
numb3r33	3aaca08a0e	Refactor return statement replacement to use explicit newlines Replace f-string triple-quoted approach with explicit newline characters for clearer string construction in the grpo_trainer patch.	2025-12-24 01:34:21 +05:30
pre-commit-ci[bot]	d8c9e6aafb	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-24 01:34:21 +05:30
numb3r33	fe21809bda	Fix indentation handling in grpo_trainer return statement replacement Use regex to dynamically detect and preserve the original indentation when replacing the 'return output' statement, instead of hardcoding spaces. This ensures the patched code maintains consistent indentation regardless of the original formatting.	2025-12-24 01:34:21 +05:30
pre-commit-ci[bot]	c8e7bd9f09	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-24 01:34:21 +05:30
numb3r33	a6405b36b4	Remove the comment.	2025-12-24 01:34:21 +05:30
abhishek.sharma	91671433b0	Fix model training state restoration in GRPO trainer Store the model's training state before generation and restore inference mode after completion if the model wasn't originally in training mode. This ensures the model returns to the correct state after generate and score operations.	2025-12-24 01:34:21 +05:30
Daniel Han	dea670a1b6	Merge branch 'main' into nightly	2025-12-23 05:51:04 -08:00
Daniel Han	1ff6fc85f0	llama.cpp fixes	2025-12-23 05:50:26 -08:00
Daniel Han	cbfa7a20f9	Update rl.py	2025-12-23 05:42:58 -08:00
Daniel Han	0ae7d2ba28	Update rl.py	2025-12-23 05:35:06 -08:00
Daniel Han	a4408f0e50	Merge branch 'main' into nightly	2025-12-23 04:52:58 -08:00
Daniel Han	fd42103a9b	Nightly (#3767 ) * Update _utils.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [FIX] [Transformers] VLM input embeds fix for gradients (#3715) * Fix get_input_embeds call for VLMs * patch input_require_grads instead * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup old patch * cleanup old patch * cleanup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestion from @danielhanchen * use logger instead of prints * Move unsloth present set * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rope_embedding.py * Fixes * Update _utils.py * Update import_fixes.py * Update rl_replacements.py * fix_openenv_no_vllm * Fix * Update __init__.py * Update __init__.py * Update __init__.py * Update import_fixes.py * Update import_fixes.py * Update import_fixes.py * logger * Update __init__.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update __init__.py * Update import_fixes.py * Update __init__.py * Update import_fixes.py * Update import_fixes.py * Update import_fixes.py * Update import_fixes.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update import_fixes.py * Update unsloth/import_fixes.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update save.py * [fbgemm] Silence tma fbgemm (#3735) * Silence fbgemm TMA print Also safer .push_to_hub * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update loader.py * Update save.py * Update save.py * Update _utils.py * Update _utils.py * Diffusers warnings * Update pyproject.toml * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [hf_hub] Token login (#3739) * login on token * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup old code * safer imports * cleanup * Return token after login * correct return types * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestion from @danielhanchen * add back imports * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * finish return token --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Do not overwrite slots (#3752) * Do not overwrite slots * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update save.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-23 04:52:29 -08:00
pre-commit-ci[bot]	ad73b7e493	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-23 12:51:09 +00:00
Daniel Han	6866c4b3df	Update save.py	2025-12-23 04:46:43 -08:00
Daniel Han	10ef983541	Merge branch 'main' into nightly	2025-12-23 04:46:15 -08:00
Daniel Han	691b5d129f	Update save.py	2025-12-23 00:55:06 -08:00
pre-commit-ci[bot]	e134ceed79	[pre-commit.ci] pre-commit autoupdate (#3760 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.14.9 → v0.14.10](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.9...v0.14.10) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-12-22 23:12:00 -08:00
Daniel Han	ad5b2f66af	Merge branch 'main' into nightly	2025-12-19 19:37:52 -08:00
Daniel Han	25a3141663	Update loader.py	2025-12-19 19:37:49 -08:00
Daniel Han	47ede31b8c	Nightly (#3753 ) * Update _utils.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [FIX] [Transformers] VLM input embeds fix for gradients (#3715) * Fix get_input_embeds call for VLMs * patch input_require_grads instead * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup old patch * cleanup old patch * cleanup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestion from @danielhanchen * use logger instead of prints * Move unsloth present set * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rope_embedding.py * Fixes * Update _utils.py * Update import_fixes.py * Update rl_replacements.py * fix_openenv_no_vllm * Fix * Update __init__.py * Update __init__.py * Update __init__.py * Update import_fixes.py * Update import_fixes.py * Update import_fixes.py * logger * Update __init__.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update __init__.py * Update import_fixes.py * Update __init__.py * Update import_fixes.py * Update import_fixes.py * Update import_fixes.py * Update import_fixes.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update import_fixes.py * Update unsloth/import_fixes.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update save.py * [fbgemm] Silence tma fbgemm (#3735) * Silence fbgemm TMA print Also safer .push_to_hub * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update loader.py * Update save.py * Update save.py * Update _utils.py * Update _utils.py * Diffusers warnings * Update pyproject.toml * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [hf_hub] Token login (#3739) * login on token * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup old code * safer imports * cleanup * Return token after login * correct return types * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestion from @danielhanchen * add back imports * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * finish return token --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Do not overwrite slots (#3752) * Do not overwrite slots * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-19 19:35:41 -08:00
Daniel Han	bfa7301768	Merge branch 'main' into nightly	2025-12-19 19:31:23 -08:00
Daniel Han	9a6d703d3c	Update _utils.py	2025-12-19 19:24:49 -08:00
Strahinja Stamenkovic	490153500b	Enable 4-bit quantization on AMD Radeon GPUs (#3748 ) * Enable 4-bit quant on Radeon * Fix table centering * Update comments for clarity * Handle failure to import Bitsandbytes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update device_type.py * Apply suggestion from @danielhanchen * Update device_type.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-12-19 19:22:56 -08:00
Dan Saunders	1a507b4a82	Fix VLM DDP checkpointing (#3751 )	2025-12-19 19:09:16 -08:00
Datta Nimmaturi	3918e07df8	Do not overwrite slots (#3752 ) * Do not overwrite slots * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-12-19 19:08:28 -08:00
Daniel Han	3f238efba5	Merge branch 'main' into nightly	2025-12-18 09:30:19 -08:00
Daniel Han	a36eb9b9a1	FunctionGemma	2025-12-18 09:27:46 -08:00
DoubleMathew	96bd2a7668	Fix Deepseek OCR Lora Model Load (#3738 ) * fix deepseek ocr lora_model load: trust_remote_code option check for import error in autoconfig/peftconfig from_pretrained error handle import * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-18 04:09:17 -08:00
Datta Nimmaturi	6832ce8098	[hf_hub] Token login (#3739 ) * login on token * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup old code * safer imports * cleanup * Return token after login * correct return types * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestion from @danielhanchen * add back imports * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * finish return token --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-12-18 04:07:12 -08:00
Dan Saunders	44104fa83d	review comments	2025-12-17 15:28:03 -05:00
Dan Saunders	7833191626	review comments	2025-12-17 15:22:42 -05:00
Daniel Han	6b676abaad	Merge branch 'main' into nightly	2025-12-17 03:32:57 -08:00
Daniel Han	1e7302cd77	Nightly (#3737 ) * Update _utils.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [FIX] [Transformers] VLM input embeds fix for gradients (#3715) * Fix get_input_embeds call for VLMs * patch input_require_grads instead * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup old patch * cleanup old patch * cleanup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestion from @danielhanchen * use logger instead of prints * Move unsloth present set * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rope_embedding.py * Fixes * Update _utils.py * Update import_fixes.py * Update rl_replacements.py * fix_openenv_no_vllm * Fix * Update __init__.py * Update __init__.py * Update __init__.py * Update import_fixes.py * Update import_fixes.py * Update import_fixes.py * logger * Update __init__.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update __init__.py * Update import_fixes.py * Update __init__.py * Update import_fixes.py * Update import_fixes.py * Update import_fixes.py * Update import_fixes.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update import_fixes.py * Update unsloth/import_fixes.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update save.py * [fbgemm] Silence tma fbgemm (#3735) * Silence fbgemm TMA print Also safer .push_to_hub * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update loader.py * Update save.py * Update save.py * Update _utils.py * Update _utils.py * Diffusers warnings * Update pyproject.toml * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-17 03:31:48 -08:00
pre-commit-ci[bot]	ec0b96012e	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-17 11:29:40 +00:00
Daniel Han	150f7e017e	Update pyproject.toml	2025-12-17 03:26:19 -08:00
Daniel Han	37888f3e10	Diffusers warnings	2025-12-17 03:25:40 -08:00
Daniel Han	fecee1b386	Update _utils.py	2025-12-17 02:54:15 -08:00
Daniel Han	2f7132d7c9	Update _utils.py	2025-12-17 02:38:05 -08:00
Daniel Han	be4b5996d7	Update save.py	2025-12-17 02:28:03 -08:00
Daniel Han	7d82555dc8	Update save.py	2025-12-17 02:21:47 -08:00
Daniel Han	28120b8a88	Update loader.py	2025-12-17 01:51:29 -08:00
Daniel Han	b342b6cf28	Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly	2025-12-17 01:07:29 -08:00
Datta Nimmaturi	a32f28d30b	[fbgemm] Silence tma fbgemm (#3735 ) * Silence fbgemm TMA print Also safer .push_to_hub * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-12-17 01:07:21 -08:00
Daniel Han	59ed4fce4c	Update save.py	2025-12-16 23:15:31 -08:00
Daniel Han	85f361a15d	Merge branch 'main' into nightly	2025-12-16 21:52:58 -08:00
Daniel Han	23a7ac5d17	Update FUNDING.yml (#3736 )	2025-12-16 21:36:25 -08:00
Daniel Han	ae4208f3ae	Merge branch 'main' into nightly	2025-12-16 21:07:38 -08:00
Daniel Han	9ef9b60660	Bug fixes (#3734 ) * Update _utils.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [FIX] [Transformers] VLM input embeds fix for gradients (#3715) * Fix get_input_embeds call for VLMs * patch input_require_grads instead * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup old patch * cleanup old patch * cleanup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestion from @danielhanchen * use logger instead of prints * Move unsloth present set * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rope_embedding.py * Fixes * Update _utils.py * Update import_fixes.py * Update rl_replacements.py * fix_openenv_no_vllm * Fix * Update __init__.py * Update __init__.py * Update __init__.py * Update import_fixes.py * Update import_fixes.py * Update import_fixes.py * logger * Update __init__.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update __init__.py * Update import_fixes.py * Update __init__.py * Update import_fixes.py * Update import_fixes.py * Update import_fixes.py * Update import_fixes.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update import_fixes.py * Update unsloth/import_fixes.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-16 20:52:57 -08:00
Daniel Han	329c465245	Update unsloth/import_fixes.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-16 20:52:09 -08:00
Daniel Han	19b283459c	Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly	2025-12-16 20:39:00 -08:00
Daniel Han	3fa8a363ab	Update import_fixes.py	2025-12-16 20:34:34 -08:00
pre-commit-ci[bot]	c27a5beed8	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-17 04:26:03 +00:00
Daniel Han	66c97eb1ff	Update import_fixes.py	2025-12-16 20:15:22 -08:00
Daniel Han	a79766f747	Update import_fixes.py	2025-12-16 20:06:12 -08:00
Daniel Han	c74d5e4ed6	Update import_fixes.py	2025-12-16 19:59:04 -08:00
Daniel Han	967fd51990	Update import_fixes.py	2025-12-16 19:51:10 -08:00
Daniel Han	b206214bb7	Update __init__.py	2025-12-16 19:49:19 -08:00
Daniel Han	4b6afd75de	Update import_fixes.py	2025-12-16 19:48:41 -08:00
Daniel Han	c69e45b32f	Merge branch 'main' into nightly	2025-12-16 19:46:34 -08:00
Daniel Han	90db3f465e	Update import_fixes.py	2025-12-16 16:58:03 -08:00
Daniel Han	ba3f91f72f	Update import_fixes.py	2025-12-16 16:57:38 -08:00
Daniel Han	c4d44af095	Update import_fixes.py	2025-12-16 15:46:27 -08:00
Daniel Han	95ce4fa8fa	Update rl.py	2025-12-15 22:43:16 -08:00
pre-commit-ci[bot]	3104fd0942	[pre-commit.ci] pre-commit autoupdate (#3731 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.14.8 → v0.14.9](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.8...v0.14.9) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-12-15 17:54:15 -08:00
Dan Saunders	e012936d75	nits	2025-12-15 18:46:21 -05:00
Dan Saunders	64a13cd4b5	vision config -> lora	2025-12-15 18:37:26 -05:00
Dan Saunders	1a929732f6	bugfix	2025-12-15 18:37:26 -05:00
Dan Saunders	e79b1c7832	review comments, tests, etc.	2025-12-15 18:37:26 -05:00
Dan Saunders	6e7e52fb26	add export command, nested reorg commands	2025-12-15 18:37:26 -05:00
Dan Saunders	7828f77175	fixes / cleanup	2025-12-15 18:37:26 -05:00
Dan Saunders	cf966fe98e	autogen typer options from pydantic models	2025-12-15 18:37:26 -05:00
Dan Saunders	356fb08b03	add dry run	2025-12-15 18:37:26 -05:00
Dan Saunders	22f9a65772	refactor	2025-12-15 18:37:26 -05:00
Dan Saunders	4ef25032c1	add config support + example configs, etc.	2025-12-15 18:37:26 -05:00
Dan Saunders	42490cfbc4	train CLI	2025-12-15 18:37:26 -05:00
Michael Han	086ccd377f	Update README.md	2025-12-13 16:44:44 -08:00
oKatanaaa	e368a0bd2a	fix: add a log instead of silent exception	2025-12-13 00:06:41 +00:00
Daniel Han	3412452d76	Merge branch 'main' into nightly	2025-12-12 05:53:19 -08:00
Daniel Han	cdc95e33a9	Nightly (#3720 ) * Update _utils.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [FIX] [Transformers] VLM input embeds fix for gradients (#3715) * Fix get_input_embeds call for VLMs * patch input_require_grads instead * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup old patch * cleanup old patch * cleanup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestion from @danielhanchen * use logger instead of prints * Move unsloth present set * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rope_embedding.py * Fixes * Update _utils.py * Update import_fixes.py * Update rl_replacements.py * fix_openenv_no_vllm * Fix * Update __init__.py * Update __init__.py * Update __init__.py * Update import_fixes.py * Update import_fixes.py * Update import_fixes.py * logger * Update __init__.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update __init__.py --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>	2025-12-12 05:53:08 -08:00
Daniel Han	e0ee31d814	Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly	2025-12-12 05:51:51 -08:00
Daniel Han	679a77c8f2	Update __init__.py	2025-12-12 05:51:31 -08:00
pre-commit-ci[bot]	95bfd7ba33	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-12 13:48:24 +00:00
Daniel Han	f656a4bf75	Update __init__.py	2025-12-12 05:46:50 -08:00
Daniel Han	6be76ac5f0	logger	2025-12-12 05:44:38 -08:00
Daniel Han	a2e08e689d	Update import_fixes.py	2025-12-12 05:40:56 -08:00
Daniel Han	fed8f9a04e	Update import_fixes.py	2025-12-12 05:38:24 -08:00
Daniel Han	9c10317550	Update import_fixes.py	2025-12-12 05:36:40 -08:00
Daniel Han	2e814c3ca9	Update __init__.py	2025-12-12 05:34:29 -08:00
Daniel Han	bb06544bb1	Update __init__.py	2025-12-12 05:31:36 -08:00
Daniel Han	35022da494	Update __init__.py	2025-12-12 05:29:25 -08:00
Daniel Han	06223976f6	Fix	2025-12-12 05:27:42 -08:00
Daniel Han	79e959e4d6	fix_openenv_no_vllm	2025-12-12 05:20:09 -08:00
Daniel Han	39e182e732	Update rl_replacements.py	2025-12-12 05:11:12 -08:00
Daniel Han	e479bb71e4	Update import_fixes.py	2025-12-12 05:10:45 -08:00
Daniel Han	890c30fd46	Update _utils.py	2025-12-12 05:01:43 -08:00
Daniel Han	63b6041f07	Fixes	2025-12-12 04:58:43 -08:00
Daniel Han	997931a38a	Merge branch 'main' into nightly	2025-12-12 04:07:32 -08:00
Scott Roy	c91e99370b	Update torchao save (#3679 ) * Update torchao save * up * up * up * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestion from @danielhanchen --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-12-12 04:07:02 -08:00
Daniel Han	d43dcf704b	Update rope_embedding.py	2025-12-12 03:41:09 -08:00
Daniel Han	372764ae65	Merge branch 'main' into nightly	2025-12-12 03:38:09 -08:00
Datta Nimmaturi	3da42dff93	[FIX] [Transformers] VLM input embeds fix for gradients (#3715 ) * Fix get_input_embeds call for VLMs * patch input_require_grads instead * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup old patch * cleanup old patch * cleanup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestion from @danielhanchen * use logger instead of prints * Move unsloth present set * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-12-12 03:33:39 -08:00
Dan Saunders	43e134a6e9	Mistral packing, train on completions only, simplifications (#3709 ) * pipe kwargs through mistral model * simplify / bugfix * bugfix for train_on_completions_only * wire up is_unsupported_model * nits, edge cases	2025-12-10 23:15:59 -08:00
Lei Zhenyuan	beb83d7f28	[intel] skip xpu fbgemm fp8 (#3625 ) * skip xpu fbgemm fp8 * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestion from @danielhanchen --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-12-10 21:13:29 -08:00
Michael Han	401de54fba	Padding free packing update	2025-12-10 21:12:13 -08:00
Michael Han	bff336c7a3	Adding new padding free packing support	2025-12-10 21:10:19 -08:00
pre-commit-ci[bot]	8c21f54b74	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-11 03:31:41 +00:00
oKatanaaa	8f08e57d8e	fix: weights tying	2025-12-11 03:21:02 +00:00
Dan Saunders	2040946d68	update TRL filter (#3707 ) * update TRL filter * both filters * Apply suggestion from @danielhanchen * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-12-10 07:49:52 -08:00
Daniel Han	d45b63eb3d	Merge branch 'main' into nightly	2025-12-10 06:25:56 -08:00
Daniel Han	3bc349f2d8	Gemma issue	2025-12-10 06:25:49 -08:00
Daniel Han	22cb954f4d	Merge branch 'main' into nightly	2025-12-10 06:11:32 -08:00
Daniel Han	b16af7b0f5	Update _utils.py	2025-12-10 06:11:14 -08:00
Daniel Han	c010cc6421	Update trainer.py	2025-12-10 06:11:03 -08:00
Daniel Han	4761574752	Merge branch 'main' into nightly	2025-12-10 04:15:57 -08:00
Daniel Han	26a9b5b322	Nightly (#3706 ) * Update _utils.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-12-10 04:13:13 -08:00
pre-commit-ci[bot]	d8564a05b4	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-10 12:12:49 +00:00
Daniel Han	0372af5ca4	Merge branch 'main' into nightly	2025-12-10 04:12:19 -08:00
Daniel Han	f0c8e21d59	Update import_fixes.py	2025-12-10 04:05:14 -08:00
Daniel Han	f9564cf84e	Update _utils.py	2025-12-10 03:48:01 -08:00
Datta Nimmaturi	62d19a12ff	[FIX] fbgemm version check (#3704 ) * fbgemm version check * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * safer version check * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add check for torchvision-torch compatibility * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor package check logic * Remove logs and enforce torch --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-12-10 03:46:30 -08:00
Dan Saunders	75e0d7ce62	Auto-enable padding-free SFT (#3672 ) * implement (sdpa, xformers, fa2) sample packing * attention dispatching * ddp working OOTB with CLI * packed SWA and softcap support * enable batch flattening * LGPL license headers * mask packed sequence boundaries * auto-enable sample packing * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add explicit toggle for sample packing * Add explicit toggle for sample packing * Update __init__.py * Update unsloth/kernels/rope_embedding.py * Update unsloth/kernels/rope_embedding.py * remove grad output clones; restore deleted FastLanguageModel arg * fix * restore rope embedding clones * xformers mask cache * implement (sdpa, xformers, fa2) sample packing * attention dispatching * ddp working OOTB with CLI * packed SWA and softcap support * enable batch flattening * LGPL license headers * mask packed sequence boundaries * auto-enable sample packing * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add explicit toggle for sample packing * Add explicit toggle for sample packing * Update __init__.py * Update unsloth/kernels/rope_embedding.py * Update unsloth/kernels/rope_embedding.py * remove grad output clones; restore deleted FastLanguageModel arg * fix * restore rope embedding clones * xformers mask cache * add back accidental deletion * Update unsloth/kernels/rope_embedding.py Co-authored-by: Daniel Han <danielhanchen@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix merge conflicts * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add *kwargs add back clobbered * Update rope_embedding.py * Update rope_embedding.py * simplify trl warnings filter * docstring * nit * bugfix * add padding-free seqlen metadata * auto-enable padding free * gemma2 disable * Apply suggestion from @danielhanchen * Update trainer.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update trainer.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-12-10 03:07:29 -08:00
pre-commit-ci[bot]	fb565d52f0	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-10 05:17:02 +00:00
vangmay	96eba88c90	Fix Chunking loop can hang when stride ≥ chunk_size	2025-12-10 10:46:29 +05:30
vangmay	07966659d8	Fix Incorrect non-relative import in dataprep package	2025-12-10 10:17:23 +05:30
vangmay	fe36643c66	Fix RawTextDataLoader import issue	2025-12-10 10:15:56 +05:30
Daniel Han	c81025b24d	Merge branch 'main' into nightly	2025-12-09 17:37:18 -08:00
Dan Saunders	496f84ff6b	SFT sample packing (#3566 ) * implement (sdpa, xformers, fa2) sample packing * attention dispatching * ddp working OOTB with CLI * packed SWA and softcap support * enable batch flattening * LGPL license headers * mask packed sequence boundaries * auto-enable sample packing * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add explicit toggle for sample packing * Add explicit toggle for sample packing * Update __init__.py * Update unsloth/kernels/rope_embedding.py * Update unsloth/kernels/rope_embedding.py * remove grad output clones; restore deleted FastLanguageModel arg * fix * restore rope embedding clones * xformers mask cache * implement (sdpa, xformers, fa2) sample packing * attention dispatching * ddp working OOTB with CLI * packed SWA and softcap support * enable batch flattening * LGPL license headers * mask packed sequence boundaries * auto-enable sample packing * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add explicit toggle for sample packing * Add explicit toggle for sample packing * Update __init__.py * Update unsloth/kernels/rope_embedding.py * Update unsloth/kernels/rope_embedding.py * remove grad output clones; restore deleted FastLanguageModel arg * fix * restore rope embedding clones * xformers mask cache * add back accidental deletion * Update unsloth/kernels/rope_embedding.py Co-authored-by: Daniel Han <danielhanchen@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix merge conflicts * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add *kwargs add back clobbered * Update rope_embedding.py * Update rope_embedding.py * simplify trl warnings filter * docstring * nit * bugfix * Apply suggestion from @danielhanchen * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update unsloth/trainer.py * Update unsloth/trainer.py * Update unsloth/trainer.py * Update unsloth/trainer.py --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-12-09 17:36:45 -08:00
Daniel Han	e561bb3ef5	Merge branch 'main' into nightly	2025-12-09 03:31:30 -08:00
Daniel Han	2b3cb06925	Update _utils.py (#3698 )	2025-12-09 03:31:20 -08:00
Daniel Han	cafa92a63b	Merge branch 'main' into nightly	2025-12-09 03:30:42 -08:00
Datta Nimmaturi	89787329d3	[Fix] [TRL] load_lora for multi line llm.chat/generate (#3696 ) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove reload_weights rpc call from grpo trainer * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use regex instead of static string * patch openenv reload_weights call * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Better handle sleep and wakeup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reset indentation * Handle multi line self.llm.chat better * Use logger * re-indent * Stricter regex to replace wildcard --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-12-09 03:30:23 -08:00
Daniel Han	6264afbf87	Update _utils.py	2025-12-09 01:02:26 -08:00
Datta Nimmaturi	9e5b4052e5	Remove reload_weights rpc call from grpo trainer (#3673 ) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove reload_weights rpc call from grpo trainer * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use regex instead of static string * patch openenv reload_weights call * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Better handle sleep and wakeup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reset indentation --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-12-08 23:36:22 -08:00
pre-commit-ci[bot]	c579cd7094	[pre-commit.ci] pre-commit autoupdate (#3694 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.14.7 → v0.14.8](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.7...v0.14.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-12-08 19:44:56 -08:00
Daniel Han	43ad66d37a	Versioning	2025-12-08 04:19:10 -08:00
Daniel Han	bebf042e0f	Update pyproject.toml	2025-12-08 04:13:45 -08:00
Daniel Han	e72e9d499d	Versioning	2025-12-08 04:06:01 -08:00
Noah Kirschmann	a80f1991c5	Update transformers version constraint in pyproject.toml (#3689 ) * Update transformers version constraint in pyproject.toml The latest transformers version just fixes the local training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformers version constraint in pyproject.toml --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-12-08 03:27:18 -08:00
Daniel Han	034a35f6d4	Add **kwargs	2025-12-08 03:24:51 -08:00
Daniel Han	f4de7baea1	Update rl.py	2025-12-08 02:23:43 -08:00
Daniel Han	4408ddc081	Update vision.py	2025-12-07 23:09:13 -08:00
Daniel Han	d86ded6799	Update _utils.py	2025-12-07 16:52:59 -08:00
Daniel Han	cb4d8da5a2	Xformers fix	2025-12-07 16:40:51 -08:00
Michael Han	3d4f236155	Update README.md	2025-12-04 08:21:20 -08:00
Daniel Han	845e61d351	Update README.md	2025-12-02 04:08:54 -08:00
Daniel Han	14e8e3137d	Update README.md	2025-12-02 03:52:50 -08:00
pre-commit-ci[bot]	13f6491fe6	[pre-commit.ci] pre-commit autoupdate (#3666 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.14.6 → v0.14.7](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.6...v0.14.7) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-12-01 17:45:13 -08:00
Daniel Han	4fd865cc99	Update _utils.py	2025-12-01 08:01:05 -08:00
Daniel Han	d655c7434a	Update rl.py	2025-12-01 08:00:08 -08:00
Daniel Han	66649d18bd	Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit `cad158a56c`.	2025-12-01 07:24:58 -08:00
pre-commit-ci[bot]	cad158a56c	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-01 15:24:34 +00:00
Daniel Han	487a951914	Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit `964c9fef95`.	2025-12-01 07:24:21 -08:00
pre-commit-ci[bot]	964c9fef95	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-01 15:23:44 +00:00
Daniel Han	5f27bc4db5	Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit `d34e0454ac`.	2025-12-01 07:23:31 -08:00
pre-commit-ci[bot]	d34e0454ac	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-01 15:20:22 +00:00
Daniel Han	d994280cdc	Update unsloth/models/rl.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-01 07:20:00 -08:00
Daniel Han	ebec564689	Update unsloth/models/rl.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-01 07:19:50 -08:00
Daniel Han	b4f5a70878	Update qwen3_moe.py	2025-12-01 07:19:07 -08:00
Datta Nimmaturi	04cfc0d139	Vllm guided decoding (#3663 ) * vllm sampling params fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * do not patch base_trainer * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * seperate vllm fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixup deletion * Fix indentation * revert to old style --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-12-01 07:11:28 -08:00
Daniel Han	3d62c38ada	Verisoning	2025-12-01 07:09:17 -08:00
Daniel Han	3f4768ad1e	Update rl.py	2025-12-01 06:23:23 -08:00
Daniel Han	ba2897a318	Revert "[FIX] Vllm guided decoding params (#3662 )" This reverts commit `fb4f0fdf56`.	2025-12-01 05:43:45 -08:00
Datta Nimmaturi	fb4f0fdf56	[FIX] Vllm guided decoding params (#3662 ) * vllm sampling params fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * do not patch base_trainer * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * seperate vllm fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestion from @danielhanchen * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit 58b483dc0d1790f99580665801d3fa0d7267c533. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit b2497519659a9f301e7a633795d9efdafdc2b277. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit de3daaf429f81aceb6632932b0cb1af5149652a8. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-12-01 05:42:37 -08:00
Daniel Han	bf28d686a0	Merge branch 'main' into nightly	2025-12-01 04:21:27 -08:00
Santosh Bhavani	f7be4b1140	Fix: Pass gradient_checkpointing parameter to model.for_training() calls (#3659 )	2025-12-01 04:18:41 -08:00
Daniel Han	f9fd3c43fa	Update vision.py	2025-12-01 01:21:26 -08:00
Daniel Han	085a0a9c2d	Typos	2025-12-01 00:01:07 -08:00
Daniel Han	7028dc02a5	Update qwen3_moe.py	2025-11-30 23:37:32 -08:00
Daniel Han	60fd9d870d	Update vision.py	2025-11-30 21:32:07 -08:00
VED	f23f17e8ba	set defualt [128, 128] insted of none (#3658 ) Co-authored-by: Ved <ved.work2024@gmail.com>	2025-11-30 17:00:31 -08:00
Daniel Han	bf11ba0c53	Update rl.py	2025-11-30 04:40:03 -08:00
Daniel Han	dbcedbbf65	Merge branch 'main' into nightly	2025-11-30 04:39:55 -08:00
DoubleMathew	f9d2a11dba	make unsloth_tiled_mlp a from_pretrained arg (#3655 ) * make unsloth_tiled_mlp a from_pretrained arg * adjust patching logic	2025-11-29 22:47:51 -08:00
Bhuvan Prakash	cd24a2896c	Fix: prevent load_in_fp8 kwarg from reaching Qwen3MoeForCausalLM constructor (Fix #3649 ) (#3654 ) * Fix: remove load_in_fp8 from kwargs to prevent Qwen3Moe init TypeError (Fix #3649) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-11-29 20:18:11 -08:00
gitpullpull	77d47ecee5	Fix broken link for Advanced pip install instructions (#3652 )	2025-11-29 15:33:48 -08:00
Michael Han	bfdf73fa66	Update README.md	2025-11-29 08:01:00 -08:00
DoubleMathew	8953a06764	fix rope_theta -> rope_parameters['rope_theta'] (#3651 )	2025-11-29 06:44:26 -08:00
Michael Han	f668897b3c	Update README.md	2025-11-27 20:52:27 -08:00
Michael Han	ef30739fd7	Update README.md	2025-11-27 20:49:47 -08:00
Daniel Han	1abf47e27c	Merge branch 'main' into nightly	2025-11-27 05:45:20 -08:00
mk0walsk	460f2cf6ad	Fix indefinite article usage in comments and docstrings (#3648 )	2025-11-26 18:15:27 -08:00
Dina Suehiro Jones	c740e14937	Fix llama tokenizer padding_side when using model.generate in inference mode (#3644 ) * Only restore training mode after generation, if the model started out in training mode Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-11-25 17:33:28 -08:00
Daniel Han	d7eafb1042	Merge branch 'main' into nightly	2025-11-25 07:59:24 -08:00
Daniel Han	ea4afe1154	Update mapper.py	2025-11-25 07:59:21 -08:00
Daniel Han	499758b0b5	Merge branch 'main' into nightly	2025-11-25 07:45:27 -08:00
Daniel Han	8119697202	Update loader.py	2025-11-25 07:45:14 -08:00
Daniel Han	8fc18d1a84	Merge branch 'main' into nightly	2025-11-25 07:39:00 -08:00
Daniel Han	528142dcda	Update loader.py	2025-11-25 07:38:51 -08:00
Daniel Han	fce90cc9b3	Merge branch 'main' into nightly	2025-11-25 07:23:55 -08:00
Daniel Han	86f708097d	Float8 GRPO, RL (#3640 ) * Enable FP8 + RL training for bf16 models (#3440) * Enable FP8 + RL training for bf16 models Summary: Enable FP8 + RL training using TorchAO for 1.33x faster training and 42% less model memory usage: - We quantize the frozen LoRA weights into fp8 and keep the LoRA adapters in bf16 - We leverage TorchAO's `Float8Tensor`, which calls into fbgemm's fp8 x fp8 rowwise matmul kernel - For now, we need to do an offline quantization first, because vllm doesn't support on-the-fly quantization for torchao yet (this is in progress: https://github.com/vllm-project/vllm/pull/26327) Example usage: ``` model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/Qwen3-8B-Base", max_seq_length = 2048, load_in_4bit = False, fast_inference = True, max_lora_rank = 32, load_in_fp8 = True, # set this to True ) \# the rest is the same as before model = FastLanguageModel.get_peft_model(...) ``` Initial results: ``` \# fp8 {'train_runtime': 1725.4337, 'train_samples_per_second': 0.232, 'train_steps_per_second': 0.058, 'train_loss': 0.00015715716748673002, 'epoch': 0.01} \# bf16 {'train_runtime': 2297.8145, 'train_samples_per_second': 0.174, 'train_steps_per_second': 0.044, 'train_loss': 0.00016081033063528594, 'epoch': 0.01} ``` <img width="1199" height="448" alt="Screenshot 2025-11-11 at 4 10 50 PM" src="https://github.com/user-attachments/assets/b6304afd-89e9-42b1-8064-775807e17b23" /> Test script: https://gist.github.com/andrewor14/5b85119fae46845d07b608d420907423 Requires: - https://github.com/pytorch/ao/pull/3158 (torchao nightly or 0.15.0+) - https://github.com/unslothai/unsloth-zoo/pull/351 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update utils.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * _get_inference_mode_context_manager * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update utils.py * Update utils.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Fix/save torchao model loading logic (#3621) * make loading gpt-oss-BF16 faster. Linked to unsloth-zoo PR #314 * fix model loading and clean merged model directory * revert default quant * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert mapper.py --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update loader_utils.py * Update loader_utils.py * Add 128x128 PerBlock FP8 + RL (#3629) * Add 128x128 PerBlock FP8 + RL Summary: Following https://github.com/unslothai/unsloth/pull/3440, this PR extends torchao FP8 + RL support to also handle 128x128 PerBlock granularity (in addition to PerRow). Example usage: ``` model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/Qwen3-8B-Base", max_seq_length = 2048, load_in_4bit = False, fast_inference = True, max_lora_rank = 32, load_in_fp8 = "block", # or "row" or True ) ``` Initial results: TBD Note: - Requires https://github.com/pytorch/ao/pull/3370 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Version * Update vision.py * Update rl.py * Add torch 2.9.1 * Fix auto installer * Update fp8.py * Float8 * Update fp8.py * Update mapper.py * Update mapper.py * Update loader_utils.py * Update loader.py * Update fp8.py * Versioning * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: andrewor14 <andrewor14@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2025-11-25 07:23:26 -08:00
pre-commit-ci[bot]	967434c948	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-11-25 15:20:19 +00:00
Daniel Han	7af84b491e	Versioning	2025-11-25 07:12:45 -08:00
Daniel Han	83aaeb150e	Update fp8.py	2025-11-25 07:11:30 -08:00
Daniel Han	ba0d16ce36	Update loader.py	2025-11-25 07:06:41 -08:00
Daniel Han	8bd5afbd50	Update loader_utils.py	2025-11-25 07:05:43 -08:00
Daniel Han	ee6ab2ec28	Update mapper.py	2025-11-25 07:02:20 -08:00
Daniel Han	ca8b938018	Update mapper.py	2025-11-25 06:53:06 -08:00
Daniel Han	6360dfbf5a	Update fp8.py	2025-11-25 06:50:58 -08:00
Daniel Han	f6509e6939	Float8	2025-11-25 06:48:10 -08:00
Daniel Han	06491d1b99	Update fp8.py	2025-11-25 05:35:34 -08:00
vangmay	646629884b	Remove training mode arg	2025-11-25 21:01:43 +08:00
Daniel Han	3fee93b48e	Fix auto installer	2025-11-25 01:47:58 -08:00
Daniel Han	49607bf27f	Add torch 2.9.1	2025-11-25 01:36:11 -08:00
Daniel Han	5c2c53afee	Update rl.py	2025-11-24 22:13:17 -08:00
pre-commit-ci[bot]	ba150c34b3	[pre-commit.ci] pre-commit autoupdate (#3634 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.14.5 → v0.14.6](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.5...v0.14.6) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-11-24 17:16:56 -08:00
Daniel Han	4ed44a2159	Update vision.py	2025-11-24 05:50:22 -08:00
Daniel Han	d7a9f801ff	Merge branch 'main' into nightly	2025-11-24 02:16:53 -08:00
Lei Zhenyuan	f746d854c5	[intel] change windows to remove windows-triton for intel xpu (#3168 ) * change windows to remove windows-triton for intel xpu * add changes for different platform * Update pyproject.toml * update mode windows * Update pyproject.toml Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update pyproject.toml Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update pyproject.toml Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update pyproject.toml Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update pyproject.toml Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update pyproject.toml Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update pyproject.toml Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update pyproject.toml Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-23 22:03:54 -08:00
Etherll	f91ea1e9a6	Add trust_remote_code parameter to tokenizer (#3631 )	2025-11-23 21:12:40 -08:00
Daniel Han	61ce3f0e73	Version	2025-11-22 06:20:00 -08:00
andrewor14	4320a8e82d	Add 128x128 PerBlock FP8 + RL (#3629 ) * Add 128x128 PerBlock FP8 + RL Summary: Following https://github.com/unslothai/unsloth/pull/3440, this PR extends torchao FP8 + RL support to also handle 128x128 PerBlock granularity (in addition to PerRow). Example usage: ``` model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/Qwen3-8B-Base", max_seq_length = 2048, load_in_4bit = False, fast_inference = True, max_lora_rank = 32, load_in_fp8 = "block", # or "row" or True ) ``` Initial results: TBD Note: - Requires https://github.com/pytorch/ao/pull/3370 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-11-21 20:09:27 -08:00
Mercury	13b3e7e6a8	Fix missing code and support inputs_embeds only input. (#3623 )	2025-11-20 07:56:52 -08:00
vangmay	082da69cc4	remove old function	2025-11-20 21:40:45 +08:00
pre-commit-ci[bot]	3bf8ca7da2	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-11-20 13:09:08 +00:00
vangmay	f05169e56a	Make the chunk function efficient	2025-11-20 21:08:33 +08:00
pre-commit-ci[bot]	25e69f2d36	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-11-20 12:57:53 +00:00
vangmay	d253b392fb	Merge branch 'feature/raw-text-dataprep' of https://github.com/Vangmay/unsloth into feature/raw-text-dataprep	2025-11-20 20:57:27 +08:00
vangmay	c20a3b40ee	Integrate smart dataset loader	2025-11-20 20:53:22 +08:00
pre-commit-ci[bot]	d429363c23	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-11-20 12:51:18 +00:00
Daniel Han	3fd334c2ee	Update loader_utils.py	2025-11-20 04:04:19 -08:00
Daniel Han	3d099a3bb6	Update loader_utils.py	2025-11-20 00:06:11 -08:00
Roland Tannous	22e0c63166	Fix/save torchao model loading logic (#3621 ) * make loading gpt-oss-BF16 faster. Linked to unsloth-zoo PR #314 * fix model loading and clean merged model directory * revert default quant * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert mapper.py --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-11-20 00:02:31 -08:00
Daniel Han	1b0852c42e	Update __init__.py	2025-11-19 23:56:38 -08:00
andrewor14	f1b24a6152	Enable FP8 + RL training for bf16 models (#3440 ) * Enable FP8 + RL training for bf16 models Summary: Enable FP8 + RL training using TorchAO for 1.33x faster training and 42% less model memory usage: - We quantize the frozen LoRA weights into fp8 and keep the LoRA adapters in bf16 - We leverage TorchAO's `Float8Tensor`, which calls into fbgemm's fp8 x fp8 rowwise matmul kernel - For now, we need to do an offline quantization first, because vllm doesn't support on-the-fly quantization for torchao yet (this is in progress: https://github.com/vllm-project/vllm/pull/26327) Example usage: ``` model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/Qwen3-8B-Base", max_seq_length = 2048, load_in_4bit = False, fast_inference = True, max_lora_rank = 32, load_in_fp8 = True, # set this to True ) \# the rest is the same as before model = FastLanguageModel.get_peft_model(...) ``` Initial results: ``` \# fp8 {'train_runtime': 1725.4337, 'train_samples_per_second': 0.232, 'train_steps_per_second': 0.058, 'train_loss': 0.00015715716748673002, 'epoch': 0.01} \# bf16 {'train_runtime': 2297.8145, 'train_samples_per_second': 0.174, 'train_steps_per_second': 0.044, 'train_loss': 0.00016081033063528594, 'epoch': 0.01} ``` <img width="1199" height="448" alt="Screenshot 2025-11-11 at 4 10 50 PM" src="https://github.com/user-attachments/assets/b6304afd-89e9-42b1-8064-775807e17b23" /> Test script: https://gist.github.com/andrewor14/5b85119fae46845d07b608d420907423 Requires: - https://github.com/pytorch/ao/pull/3158 (torchao nightly or 0.15.0+) - https://github.com/unslothai/unsloth-zoo/pull/351 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update utils.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * _get_inference_mode_context_manager * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update utils.py * Update utils.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-11-19 23:51:43 -08:00
DoubleMathew	5b0f19624c	Remove grpo requirement bs=num_generations (#3609 ) * Remove grpo requirement bs=num_generations * Update rl.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-11-19 19:57:01 -08:00
DoubleMathew	e8e0f6aa58	Add an int64 path for mlp kernels (#3614 ) * Add an int64 path for mlp kernels * move constant expressions to globals * fix name	2025-11-19 19:45:10 -08:00
Dan Saunders	a3ed3c395d	remove pre-commit workflow (covered by pre-commit app) (#3618 )	2025-11-19 15:34:32 -08:00
mk0walsk	8efbd5ac9c	Fix broken links and typo in README (#3611 ) * README Link Fixes * Update README.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-18 20:04:14 -08:00
vangmay	171fb12573	Add module to init	2025-11-18 22:44:48 +08:00
vangmay	ee37dd9f92	Write simple test	2025-11-18 22:36:38 +08:00
vangmay	8d482c2129	Add validation code	2025-11-18 22:02:35 +08:00
vangmay	6014bb4dd2	Add logic to clean and extract text sections	2025-11-18 22:01:36 +08:00
vangmay	ed5820e667	Write chunking logic	2025-11-18 22:00:07 +08:00
vangmay	aecfbe1fff	Add support for multiple files	2025-11-18 21:59:01 +08:00
vangmay	d75fbb5d0a	Add implementation to cli	2025-11-18 21:53:20 +08:00
vangmay	face46d188	Write file and template for raw_text dataprep	2025-11-18 21:46:41 +08:00
pre-commit-ci[bot]	2f68f246a4	[pre-commit.ci] pre-commit autoupdate (#3606 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.14.4 → v0.14.5](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.4...v0.14.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-11-17 17:02:44 -08:00
Datta Nimmaturi	4571ecaca3	Do not force set beta to 0 for DAPO (#3604 )	2025-11-16 22:39:36 -08:00
DoubleMathew	daeb4d57a3	fix qwen3 vl gradient accumulation (#3598 ) * fix qwen3 vl gradient accumulation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update unsloth/models/_utils.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-15 02:26:34 -08:00
Daniel Han	3f0dde40d1	Update pyproject.toml	2025-11-14 20:01:02 -08:00
Scott Roy	20bd66f49f	Extend TorchAOConfig to support mobile usecases (#3587 ) * up * up * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-11-14 03:08:21 -08:00
Yuxiao Cheng	ed829b672a	Fix: prevent rope_embedding AssertionError by checking kv_seq_len before reuse (#3578 ) * fix: add kv_seq_len boundary check before reusing RoPE embeddings Prevented AssertionError in rope_embedding.forward when kv_seq_len exceeds the cached rope size. Added condition to verify kv_seq_len <= position_embeddings[0].shape[0] before reuse, ensuring dynamic extension triggers correctly. Fixes #3036 #3216 * fix falcon h1 --------- Co-authored-by: jarrycyx <dzdzzd@126.com>	2025-11-14 03:06:33 -08:00
Giuseppe Franco	069781bcd6	Support for out-of-source quantizers (#3534 ) * Support for out-of-source quantizers * Fix decorators and functions to be staticmethod Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-11-14 02:52:24 -08:00
DoubleMathew	a3d42aaa28	Patch in tiled mlp (#3584 ) * Patch in tiled mlp * Update unsloth/models/llama.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-11-13 21:26:49 -08:00
DoubleMathew	ccebde2cb3	Resize rope embeddings for long sequence training (#3586 )	2025-11-11 18:11:31 -08:00
pre-commit-ci[bot]	3d34ed4def	[pre-commit.ci] pre-commit autoupdate (#3576 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.14.0 → v0.14.4](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.0...v0.14.4) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-11-11 18:10:49 -08:00
Daniel Han	03733cf180	Update _utils.py	2025-11-10 04:49:27 -08:00
Dan Saunders	45865ead0c	pre-commit CI config (#3565 )	2025-11-07 14:44:18 -08:00
DoubleMathew	01d3794828	add trust_remote_code kwarg (#3564 )	2025-11-07 14:16:35 -08:00
Daniel Han	d6bb89ad44	Formatting & bug fixes (#3563 ) * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update vision.py * Update vision.py * Fix DataParallel * Update _utils.py * Update rl.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update mapper.py * Versioning * Update loader.py * Update loader.py * Update rl.py * Versioning * Update _utils.py * Fix auto_mapping * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Message * Update vision.py * Update loader.py * Update vision.py * cache_implementation * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Save max_seq_length * Update _utils.py * Update rl.py * Update vision.py * Update llama.py * Mistral3 vllm (#3349) * [WIP] use vLLM for vision language models * Update README.md Editing icon sizes * Update README.md Updating icon sizes * Update README.md (#2885) * MoE kernels AGPLv3 * versioning * Many bug fixes (#2908) * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912) * Dynamically adjust get_per_token_logps function and patch as well (#2911) * add intel gpu with vllm support (#2903) * [bugs] fix for casual mask (#2868) * fix for casual mask * use un_casual in sdpa * add missing mask * fix for type * Explicitly check if xformers exists for attention (#2889) * Update __init__.py * Update llama.py * if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913) * Move inputs to right devices. (#2919) * Move tensors to right devices * fix multi gpu for non mistral models * multi GPU RoPE for gemma2 * Finish up multi GPU inference * Make multiGPU rope a list * Remove unnecessary transfer to CPU * Remove unnecessary move to CPU * Donot move inputs to device yet will be handled separately in another PR * Move inputs to appropriate decoder device * Make device count global variable * Cleanup RoPE device code * Fixup num_gpu to device count * Cleanup device counts * Use device index for RoPE get_cache * Donot typecast * Use tuple instead of list for tensors. Use device index directly * fixup move to device logic * WIP VLM vLLM * Make vLLM patch a function * Add save and load lora functions * Make fast_inference setup depend on the flag * Improve fast inference patching mechanism * Make vision setting depend on checks in fastbasemodel * Check LoRA and vLLM intercompatibility for vision models * Comment pointing to vLLM LoRA check * Improve lora validation on vLLM * Error out on no vLLM and increase max lora rank * Bug fixes (#3017) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * fix for casual mask (#3011) * [intel] add for intel path for llama.py (#3012) * fix for intel path * remove unuse code * Update unsloth/models/llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * Fix Gemma 2 (#3024) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * falcon force float32 on sm<75 machines (#3026) * Fix torch compile issues (#3028) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance` * Update pyproject.toml * Update _utils.py * Fixup patch vllm * Disable mllama * Use variables to decide VLM support * Better attn_impl handling * Patch TF protobuf incompatability * Torch 2.8 (#3186) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update _auto_install.py * Update pyproject.toml * Update rl.py * Protobuf issue * Update pyproject.toml * Fix extras transformers typo in pyproject.toml * Update _utils.py * Bug fixes (#3195) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * adallow float32 dtype in FastLanguageModel (#3204) * Update loader.py * Update vision.py * Suppress message and use unsloth sampling params * Use trl sampling params for now * Improve error message * fixup quantized fast inference model name * Add mistral 3 support --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> * Set padding to 0 * Fix patch * fixup patch (#3359) Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update vision.py * Versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * MXFP4 dequant * Update loader.py * Update vision.py * load_in_16bit * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update vision.py * offload_embedding * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update loader.py * Fix padding issue * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * New models * Update llama.py * Versioning * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Fix AMD * Update _utils.py * Update llama.py * Update vision.py * DEVICE_TYPE_TORCH * Update __init__.py * Update __init__.py * Update _utils.py * Move DEVICE_TYPE * Update rl_replacements.py * Update loader.py * AMD install script * Move AMD * Update _amd_install.sh * Update pyproject.toml * Update pyproject.toml * Delete _amd_install.sh * Update device_type.py * Update loader.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Versioning * Update pyproject.toml * Update loader.py * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update _utils.py * Update loader.py * Update _utils.py * Update _utils.py * local_files_only * Cut Cross Entropy * Update llama.py * Update vision.py * Update vision.py * Update vision.py * Qwen 3 VL vLLM (#3489) * Update __init__.py * patch_torchao * torchao_logger * Update rl_replacements.py * Fix * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Versioning * fbgemm fp8 block quant support (>=1.4.0) (#3531) * fbgemm fp8 block quant support (>=1.4.0) * Verify for fp8 support before proceeding * Use unsloth zoo's Version and improve comments * spacessss * Update vision.py * Update vision.py * Update rl.py * vllm_sampling_params * Update rl.py * Update rl.py * Update rl.py * Add `ruff` pre-commit hook and apply it (#3424) * Add Ruff pre-commit config and workflow * Add kwarg spacing enforcement helper * Apply Ruff formatting * Update fp8.py * Revert ruff on some files * Update * force-exclude = true * Datasets issue * Ruff * Remove mapper * Update mapper.py * Update pyproject.toml --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> Co-authored-by: Dan Saunders <danjsaund@gmail.com>	2025-11-07 06:00:22 -08:00
mk0walsk	d8ae1e266e	Fix typos in comment (#3557 )	2025-11-05 19:29:36 -08:00
Michael Han	c8421a939b	Update README.md	2025-11-04 22:00:06 -08:00
pluesclues	91db850488	Detach logits before returning from function (#3554 )	2025-11-04 07:29:27 -08:00
Datta Nimmaturi	7fe58d8c15	Sleep trl patch (#3517 ) * Patch sleep mode properly for trl * empty cache after sleep/wakeup * no extra wakeups * Do not redo wakeups * cleanup * post trl 0.23 sleep patch	2025-11-03 23:00:54 -08:00
Daniel Han	a9ff4e23c9	Bug fixes (#3546 ) * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * torch_dtype * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update vision.py * Update vision.py * Fix DataParallel * Update _utils.py * Update rl.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update mapper.py * Versioning * Update loader.py * Update loader.py * Update rl.py * Versioning * Update _utils.py * Fix auto_mapping * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Message * Update vision.py * Update loader.py * Update vision.py * cache_implementation * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Save max_seq_length * Update _utils.py * Update rl.py * Update vision.py * Update llama.py * Mistral3 vllm (#3349) * [WIP] use vLLM for vision language models * Update README.md Editing icon sizes * Update README.md Updating icon sizes * Update README.md (#2885) * MoE kernels AGPLv3 * versioning * Many bug fixes (#2908) * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912) * Dynamically adjust get_per_token_logps function and patch as well (#2911) * add intel gpu with vllm support (#2903) * [bugs] fix for casual mask (#2868) * fix for casual mask * use un_casual in sdpa * add missing mask * fix for type * Explicitly check if xformers exists for attention (#2889) * Update __init__.py * Update llama.py * if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913) * Move inputs to right devices. (#2919) * Move tensors to right devices * fix multi gpu for non mistral models * multi GPU RoPE for gemma2 * Finish up multi GPU inference * Make multiGPU rope a list * Remove unnecessary transfer to CPU * Remove unnecessary move to CPU * Donot move inputs to device yet will be handled separately in another PR * Move inputs to appropriate decoder device * Make device count global variable * Cleanup RoPE device code * Fixup num_gpu to device count * Cleanup device counts * Use device index for RoPE get_cache * Donot typecast * Use tuple instead of list for tensors. Use device index directly * fixup move to device logic * WIP VLM vLLM * Make vLLM patch a function * Add save and load lora functions * Make fast_inference setup depend on the flag * Improve fast inference patching mechanism * Make vision setting depend on checks in fastbasemodel * Check LoRA and vLLM intercompatibility for vision models * Comment pointing to vLLM LoRA check * Improve lora validation on vLLM * Error out on no vLLM and increase max lora rank * Bug fixes (#3017) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * fix for casual mask (#3011) * [intel] add for intel path for llama.py (#3012) * fix for intel path * remove unuse code * Update unsloth/models/llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * Fix Gemma 2 (#3024) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * falcon force float32 on sm<75 machines (#3026) * Fix torch compile issues (#3028) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance` * Update pyproject.toml * Update _utils.py * Fixup patch vllm * Disable mllama * Use variables to decide VLM support * Better attn_impl handling * Patch TF protobuf incompatability * Torch 2.8 (#3186) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update _auto_install.py * Update pyproject.toml * Update rl.py * Protobuf issue * Update pyproject.toml * Fix extras transformers typo in pyproject.toml * Update _utils.py * Bug fixes (#3195) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * adallow float32 dtype in FastLanguageModel (#3204) * Update loader.py * Update vision.py * Suppress message and use unsloth sampling params * Use trl sampling params for now * Improve error message * fixup quantized fast inference model name * Add mistral 3 support --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> * Set padding to 0 * Fix patch * fixup patch (#3359) Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update vision.py * Versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * MXFP4 dequant * Update loader.py * Update vision.py * load_in_16bit * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update vision.py * offload_embedding * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update loader.py * Fix padding issue * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * New models * Update llama.py * Versioning * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Fix AMD * Update _utils.py * Update llama.py * Update vision.py * DEVICE_TYPE_TORCH * Update __init__.py * Update __init__.py * Update _utils.py * Move DEVICE_TYPE * Update rl_replacements.py * Update loader.py * AMD install script * Move AMD * Update _amd_install.sh * Update pyproject.toml * Update pyproject.toml * Delete _amd_install.sh * Update device_type.py * Update loader.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Versioning * Update pyproject.toml * Update loader.py * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update _utils.py * Update loader.py * Update _utils.py * Update _utils.py * local_files_only * Cut Cross Entropy * Update llama.py * Update vision.py * Update vision.py * Update vision.py * Qwen 3 VL vLLM (#3489) * Update __init__.py * patch_torchao * torchao_logger * Update rl_replacements.py * Fix * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Versioning --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com>	2025-11-03 06:47:26 -08:00
pluesclues	c449c7b06e	Handle TRL version compatibility in rl_replacements.py (#3540 )	2025-11-01 05:17:27 -07:00
Daniel Han	f67c4a172a	Update mapper.py	2025-10-30 06:56:22 -07:00
Daniel Han	d6aa072c29	Update pyproject.toml	2025-10-30 06:48:14 -07:00
Daniel Han	1fd8c72aee	Nightly (#3532 ) * Update loader.py * Update vision.py * Update vision.py * custom_datatype * recheck * Float16 * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * torch_dtype * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update vision.py * Update vision.py * Fix DataParallel * Update _utils.py * Update rl.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update mapper.py * Versioning * Update loader.py * Update loader.py * Update rl.py * Versioning * Update _utils.py * Fix auto_mapping * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Message * Update vision.py * Update loader.py * Update vision.py * cache_implementation * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Save max_seq_length * Update _utils.py * Update rl.py * Update vision.py * Update llama.py * Mistral3 vllm (#3349) * [WIP] use vLLM for vision language models * Update README.md Editing icon sizes * Update README.md Updating icon sizes * Update README.md (#2885) * MoE kernels AGPLv3 * versioning * Many bug fixes (#2908) * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912) * Dynamically adjust get_per_token_logps function and patch as well (#2911) * add intel gpu with vllm support (#2903) * [bugs] fix for casual mask (#2868) * fix for casual mask * use un_casual in sdpa * add missing mask * fix for type * Explicitly check if xformers exists for attention (#2889) * Update __init__.py * Update llama.py * if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913) * Move inputs to right devices. (#2919) * Move tensors to right devices * fix multi gpu for non mistral models * multi GPU RoPE for gemma2 * Finish up multi GPU inference * Make multiGPU rope a list * Remove unnecessary transfer to CPU * Remove unnecessary move to CPU * Donot move inputs to device yet will be handled separately in another PR * Move inputs to appropriate decoder device * Make device count global variable * Cleanup RoPE device code * Fixup num_gpu to device count * Cleanup device counts * Use device index for RoPE get_cache * Donot typecast * Use tuple instead of list for tensors. Use device index directly * fixup move to device logic * WIP VLM vLLM * Make vLLM patch a function * Add save and load lora functions * Make fast_inference setup depend on the flag * Improve fast inference patching mechanism * Make vision setting depend on checks in fastbasemodel * Check LoRA and vLLM intercompatibility for vision models * Comment pointing to vLLM LoRA check * Improve lora validation on vLLM * Error out on no vLLM and increase max lora rank * Bug fixes (#3017) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * fix for casual mask (#3011) * [intel] add for intel path for llama.py (#3012) * fix for intel path * remove unuse code * Update unsloth/models/llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * Fix Gemma 2 (#3024) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * falcon force float32 on sm<75 machines (#3026) * Fix torch compile issues (#3028) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance` * Update pyproject.toml * Update _utils.py * Fixup patch vllm * Disable mllama * Use variables to decide VLM support * Better attn_impl handling * Patch TF protobuf incompatability * Torch 2.8 (#3186) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update _auto_install.py * Update pyproject.toml * Update rl.py * Protobuf issue * Update pyproject.toml * Fix extras transformers typo in pyproject.toml * Update _utils.py * Bug fixes (#3195) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * adallow float32 dtype in FastLanguageModel (#3204) * Update loader.py * Update vision.py * Suppress message and use unsloth sampling params * Use trl sampling params for now * Improve error message * fixup quantized fast inference model name * Add mistral 3 support --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> * Set padding to 0 * Fix patch * fixup patch (#3359) Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update vision.py * Versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * MXFP4 dequant * Update loader.py * Update vision.py * load_in_16bit * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update vision.py * offload_embedding * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update loader.py * Fix padding issue * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * New models * Update llama.py * Versioning * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Fix AMD * Update _utils.py * Update llama.py * Update vision.py * DEVICE_TYPE_TORCH * Update __init__.py * Update __init__.py * Update _utils.py * Move DEVICE_TYPE * Update rl_replacements.py * Update loader.py * AMD install script * Move AMD * Update _amd_install.sh * Update pyproject.toml * Update pyproject.toml * Delete _amd_install.sh * Update device_type.py * Update loader.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Versioning * Update pyproject.toml * Update loader.py * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update _utils.py * Update loader.py * Update _utils.py * Update _utils.py * local_files_only * Cut Cross Entropy * Update llama.py * Update vision.py * Update vision.py * Update vision.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com>	2025-10-30 06:45:57 -07:00
Daniel Han	067db89dc3	Update vision.py	2025-10-30 06:30:43 -07:00
Daniel Han	a3d6b3a4bf	Update vision.py	2025-10-30 06:28:05 -07:00
Daniel Han	64136e6336	Update vision.py	2025-10-30 06:27:58 -07:00
Daniel Han	dfe35fb441	Update vision.py	2025-10-30 06:27:31 -07:00
Daniel Han	be1c2ca95c	Update vision.py	2025-10-30 06:23:28 -07:00
Daniel Han	b3aa029c7a	Update rl_replacements.py	2025-10-30 06:04:30 -07:00
Daniel Han	8f7e0164df	Update vision.py	2025-10-30 05:54:26 -07:00
Daniel Han	ab98999a3f	Update vision.py	2025-10-30 05:53:43 -07:00
Daniel Han	60081c2f24	Update import_fixes.py	2025-10-30 05:41:06 -07:00
Daniel Han	6ef73397f2	Update vision.py	2025-10-30 05:38:16 -07:00
Daniel Han	e88cb620ab	Bug fixes	2025-10-30 05:35:47 -07:00
Daniel Han	810171d82c	Merge branch 'main' of https://github.com/unslothai/unsloth	2025-10-29 06:31:40 -07:00
Daniel Han	df4133ac36	Update import_fixes.py	2025-10-29 05:43:36 -07:00
pluesclues	45b1c7f7c8	Grpo gradient accumulation edits (#3390 ) * Update rl_replacements.py grpo accumulation kwargs * Update rl.py, remove bnpo default when setting dapo * Update rl.py * Update rl_replacements.py, add support for vllm importance sampling * Update rl_replacements.py, added ability to get metrics * Update rl_replacements.py send sampling per token logps to backend * Update rl_replacements.py, corrected if statement in monkey patch * Update rl_replacements.py, updating to handle nan cases as well * Update rl_replacements.py, imported text warp * Update rl_replacements.py, yes * Add error handling for sampling_per_token_logps Handle NameError for sampling_per_token_logps assignment. * Add delta check for use_vllm condition * Refactor vision model flag to use is_vlm variable	2025-10-28 22:54:34 -07:00
Daniel Han	0e766b28f0	Versioning	2025-10-28 05:35:47 -07:00
Daniel Han	160ba77142	Quant Method missing	2025-10-28 05:26:51 -07:00
Daniel Han	2c47b8a7ac	Update fp8.py	2025-10-26 23:29:14 -07:00
Daniel Han	52765eff31	Update fp8.py	2025-10-26 23:26:51 -07:00
Daniel Han	3ba905d0cc	Update fp8.py	2025-10-26 23:24:57 -07:00
Datta Nimmaturi	2585e57b6e	FP8 training enhancements (#3496 ) * Fix FP8 for models with non 8 multiple weights * patch fp8 forward methods for compiled models * patch hf quantizer for fp8 * Failsafe import of fbgemmfp8linear and fp8linear * Beautify	2025-10-26 23:22:20 -07:00
Daniel Han	b72306d148	Update pyproject.toml	2025-10-26 23:17:59 -07:00
Lei Zhenyuan	0079619063	enable support 2.9 for intel xpu (#3514 )	2025-10-26 23:14:42 -07:00
Lei Zhenyuan	57a03c35f4	fix for intel memory (#3513 )	2025-10-26 23:12:18 -07:00
Daniel Han	c9274533d2	Fix GPU name	2025-10-26 22:50:52 -07:00
Daniel Han	6f0f05518b	Update loader.py	2025-10-26 22:40:59 -07:00
Daniel Han	0528b4ce71	Fixes	2025-10-26 22:39:38 -07:00
Daniel Han	5273eb5cd5	Update import_fixes.py	2025-10-26 22:34:39 -07:00
Daniel Han	b0498fc4dd	OpenEnv patches	2025-10-26 22:31:04 -07:00
Daniel Han	9346b5ab6b	Update pyproject.toml	2025-10-26 21:59:51 -07:00
Daniel Han	30631866de	Add Torch 2.9 options	2025-10-26 21:49:30 -07:00
Lei Zhenyuan	281e38c918	add code for intel qlora (#3370 ) * add code for intel qlora * add specified code for xpu device	2025-10-26 21:44:29 -07:00
Lei Zhenyuan	e09787ab9d	add code changes for pyproject.toml (#3381 )	2025-10-26 21:43:17 -07:00
DoubleMathew	1c1f7033cd	move PYTORCH_CUDA_ALLOC_CONF into zoo (#3499 )	2025-10-26 21:29:18 -07:00
wangxunx	5d86b6e756	fix cross entropy loss issue for small vocab size on amd gpu (#3503 )	2025-10-26 21:20:47 -07:00
Michael Han	c2e2474e51	Update CODE_OF_CONDUCT.md	2025-10-25 19:31:05 -07:00
Michael Han	381e181e99	Update README.md	2025-10-25 19:26:05 -07:00
Daniel Han	60ab88301e	Versioning	2025-10-23 05:53:12 -07:00
Datta Nimmaturi	635cfdbbb0	Sleep trl patch (#3494 ) * Patch sleep mode properly for trl * empty cache after sleep/wakeup * no extra wakeups * Do not redo wakeups * cleanup	2025-10-23 01:43:55 -07:00
Daniel Han	ee473f6c52	Update pyproject.toml	2025-10-22 07:57:55 -07:00
Daniel Han	54cfe1f241	Update _utils.py	2025-10-22 05:16:22 -07:00
Daniel Han	0cd0635a90	More Qwen3-VL	2025-10-22 05:12:09 -07:00
Datta Nimmaturi	26ddbb5b8e	Patch sleep mode properly for trl (#3492 )	2025-10-22 05:00:52 -07:00
Daniel Han	06162ad350	Update save.py	2025-10-21 11:33:40 -07:00
Daniel Han	5e1b4e744e	Bug fixes (#3484 ) * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * custom_datatype * recheck * Float16 * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * torch_dtype * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update vision.py * Update vision.py * Fix DataParallel * Update _utils.py * Update rl.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update mapper.py * Versioning * Update loader.py * Update loader.py * Update rl.py * Versioning * Update _utils.py * Fix auto_mapping * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Message * Update vision.py * Update loader.py * Update vision.py * cache_implementation * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Save max_seq_length * Update _utils.py * Update rl.py * Update vision.py * Update llama.py * Mistral3 vllm (#3349) * [WIP] use vLLM for vision language models * Update README.md Editing icon sizes * Update README.md Updating icon sizes * Update README.md (#2885) * MoE kernels AGPLv3 * versioning * Many bug fixes (#2908) * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912) * Dynamically adjust get_per_token_logps function and patch as well (#2911) * add intel gpu with vllm support (#2903) * [bugs] fix for casual mask (#2868) * fix for casual mask * use un_casual in sdpa * add missing mask * fix for type * Explicitly check if xformers exists for attention (#2889) * Update __init__.py * Update llama.py * if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913) * Move inputs to right devices. (#2919) * Move tensors to right devices * fix multi gpu for non mistral models * multi GPU RoPE for gemma2 * Finish up multi GPU inference * Make multiGPU rope a list * Remove unnecessary transfer to CPU * Remove unnecessary move to CPU * Donot move inputs to device yet will be handled separately in another PR * Move inputs to appropriate decoder device * Make device count global variable * Cleanup RoPE device code * Fixup num_gpu to device count * Cleanup device counts * Use device index for RoPE get_cache * Donot typecast * Use tuple instead of list for tensors. Use device index directly * fixup move to device logic * WIP VLM vLLM * Make vLLM patch a function * Add save and load lora functions * Make fast_inference setup depend on the flag * Improve fast inference patching mechanism * Make vision setting depend on checks in fastbasemodel * Check LoRA and vLLM intercompatibility for vision models * Comment pointing to vLLM LoRA check * Improve lora validation on vLLM * Error out on no vLLM and increase max lora rank * Bug fixes (#3017) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * fix for casual mask (#3011) * [intel] add for intel path for llama.py (#3012) * fix for intel path * remove unuse code * Update unsloth/models/llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * Fix Gemma 2 (#3024) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * falcon force float32 on sm<75 machines (#3026) * Fix torch compile issues (#3028) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance` * Update pyproject.toml * Update _utils.py * Fixup patch vllm * Disable mllama * Use variables to decide VLM support * Better attn_impl handling * Patch TF protobuf incompatability * Torch 2.8 (#3186) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update _auto_install.py * Update pyproject.toml * Update rl.py * Protobuf issue * Update pyproject.toml * Fix extras transformers typo in pyproject.toml * Update _utils.py * Bug fixes (#3195) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * adallow float32 dtype in FastLanguageModel (#3204) * Update loader.py * Update vision.py * Suppress message and use unsloth sampling params * Use trl sampling params for now * Improve error message * fixup quantized fast inference model name * Add mistral 3 support --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> * Set padding to 0 * Fix patch * fixup patch (#3359) Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update vision.py * Versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * MXFP4 dequant * Update loader.py * Update vision.py * load_in_16bit * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update vision.py * offload_embedding * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update loader.py * Fix padding issue * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * New models * Update llama.py * Versioning * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Fix AMD * Update _utils.py * Update llama.py * Update vision.py * DEVICE_TYPE_TORCH * Update __init__.py * Update __init__.py * Update _utils.py * Move DEVICE_TYPE * Update rl_replacements.py * Update loader.py * AMD install script * Move AMD * Update _amd_install.sh * Update pyproject.toml * Update pyproject.toml * Delete _amd_install.sh * Update device_type.py * Update loader.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Versioning * Update pyproject.toml * Update loader.py * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update _utils.py * Update loader.py * Update _utils.py * Update _utils.py * local_files_only * Cut Cross Entropy * Update llama.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com>	2025-10-20 04:57:01 -07:00
Daniel Han	462e59b5e1	Bug fixes (#3483 ) * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * custom_datatype * recheck * Float16 * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * torch_dtype * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update vision.py * Update vision.py * Fix DataParallel * Update _utils.py * Update rl.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update mapper.py * Versioning * Update loader.py * Update loader.py * Update rl.py * Versioning * Update _utils.py * Fix auto_mapping * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Message * Update vision.py * Update loader.py * Update vision.py * cache_implementation * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Save max_seq_length * Update _utils.py * Update rl.py * Update vision.py * Update llama.py * Mistral3 vllm (#3349) * [WIP] use vLLM for vision language models * Update README.md Editing icon sizes * Update README.md Updating icon sizes * Update README.md (#2885) * MoE kernels AGPLv3 * versioning * Many bug fixes (#2908) * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912) * Dynamically adjust get_per_token_logps function and patch as well (#2911) * add intel gpu with vllm support (#2903) * [bugs] fix for casual mask (#2868) * fix for casual mask * use un_casual in sdpa * add missing mask * fix for type * Explicitly check if xformers exists for attention (#2889) * Update __init__.py * Update llama.py * if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913) * Move inputs to right devices. (#2919) * Move tensors to right devices * fix multi gpu for non mistral models * multi GPU RoPE for gemma2 * Finish up multi GPU inference * Make multiGPU rope a list * Remove unnecessary transfer to CPU * Remove unnecessary move to CPU * Donot move inputs to device yet will be handled separately in another PR * Move inputs to appropriate decoder device * Make device count global variable * Cleanup RoPE device code * Fixup num_gpu to device count * Cleanup device counts * Use device index for RoPE get_cache * Donot typecast * Use tuple instead of list for tensors. Use device index directly * fixup move to device logic * WIP VLM vLLM * Make vLLM patch a function * Add save and load lora functions * Make fast_inference setup depend on the flag * Improve fast inference patching mechanism * Make vision setting depend on checks in fastbasemodel * Check LoRA and vLLM intercompatibility for vision models * Comment pointing to vLLM LoRA check * Improve lora validation on vLLM * Error out on no vLLM and increase max lora rank * Bug fixes (#3017) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * fix for casual mask (#3011) * [intel] add for intel path for llama.py (#3012) * fix for intel path * remove unuse code * Update unsloth/models/llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * Fix Gemma 2 (#3024) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * falcon force float32 on sm<75 machines (#3026) * Fix torch compile issues (#3028) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance` * Update pyproject.toml * Update _utils.py * Fixup patch vllm * Disable mllama * Use variables to decide VLM support * Better attn_impl handling * Patch TF protobuf incompatability * Torch 2.8 (#3186) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update _auto_install.py * Update pyproject.toml * Update rl.py * Protobuf issue * Update pyproject.toml * Fix extras transformers typo in pyproject.toml * Update _utils.py * Bug fixes (#3195) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * adallow float32 dtype in FastLanguageModel (#3204) * Update loader.py * Update vision.py * Suppress message and use unsloth sampling params * Use trl sampling params for now * Improve error message * fixup quantized fast inference model name * Add mistral 3 support --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> * Set padding to 0 * Fix patch * fixup patch (#3359) Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update vision.py * Versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * MXFP4 dequant * Update loader.py * Update vision.py * load_in_16bit * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update vision.py * offload_embedding * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update loader.py * Fix padding issue * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * New models * Update llama.py * Versioning * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Fix AMD * Update _utils.py * Update llama.py * Update vision.py * DEVICE_TYPE_TORCH * Update __init__.py * Update __init__.py * Update _utils.py * Move DEVICE_TYPE * Update rl_replacements.py * Update loader.py * AMD install script * Move AMD * Update _amd_install.sh * Update pyproject.toml * Update pyproject.toml * Delete _amd_install.sh * Update device_type.py * Update loader.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Versioning * Update pyproject.toml --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com>	2025-10-19 23:21:39 -07:00
Daniel Han	267e74b624	Update pyproject.toml	2025-10-18 18:11:58 -07:00
Daniel Han	db4f000e07	Update __init__.py	2025-10-18 17:35:50 -07:00
Daniel Han	7e0ea4c66f	Zoo	2025-10-18 17:34:32 -07:00
Daniel Han	7520006b6a	Update __init__.py	2025-10-18 17:32:01 -07:00
Daniel Han	9714ab85f2	Update utils.py	2025-10-17 20:51:54 -07:00
Daniel Han	f05f7c019d	Update utils.py	2025-10-17 17:07:02 -07:00
wangxunx	caeb7f7cb9	fix out of resources issue for llama3.2 sft on amd gpu (#3455 ) Co-authored-by: Xun Wang <xunwang2@amd.com>	2025-10-17 16:24:02 -07:00
Dan Saunders	f845cf964f	EOL LF (unix line endings) normalization (#3478 )	2025-10-17 16:22:42 -07:00
Daniel Han	f62c454a86	GRPO bug fixes (#3474 ) * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * custom_datatype * recheck * Float16 * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * torch_dtype * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update vision.py * Update vision.py * Fix DataParallel * Update _utils.py * Update rl.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update mapper.py * Versioning * Update loader.py * Update loader.py * Update rl.py * Versioning * Update _utils.py * Fix auto_mapping * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Message * Update vision.py * Update loader.py * Update vision.py * cache_implementation * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Save max_seq_length * Update _utils.py * Update rl.py * Update vision.py * Update llama.py * Mistral3 vllm (#3349) * [WIP] use vLLM for vision language models * Update README.md Editing icon sizes * Update README.md Updating icon sizes * Update README.md (#2885) * MoE kernels AGPLv3 * versioning * Many bug fixes (#2908) * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912) * Dynamically adjust get_per_token_logps function and patch as well (#2911) * add intel gpu with vllm support (#2903) * [bugs] fix for casual mask (#2868) * fix for casual mask * use un_casual in sdpa * add missing mask * fix for type * Explicitly check if xformers exists for attention (#2889) * Update __init__.py * Update llama.py * if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913) * Move inputs to right devices. (#2919) * Move tensors to right devices * fix multi gpu for non mistral models * multi GPU RoPE for gemma2 * Finish up multi GPU inference * Make multiGPU rope a list * Remove unnecessary transfer to CPU * Remove unnecessary move to CPU * Donot move inputs to device yet will be handled separately in another PR * Move inputs to appropriate decoder device * Make device count global variable * Cleanup RoPE device code * Fixup num_gpu to device count * Cleanup device counts * Use device index for RoPE get_cache * Donot typecast * Use tuple instead of list for tensors. Use device index directly * fixup move to device logic * WIP VLM vLLM * Make vLLM patch a function * Add save and load lora functions * Make fast_inference setup depend on the flag * Improve fast inference patching mechanism * Make vision setting depend on checks in fastbasemodel * Check LoRA and vLLM intercompatibility for vision models * Comment pointing to vLLM LoRA check * Improve lora validation on vLLM * Error out on no vLLM and increase max lora rank * Bug fixes (#3017) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * fix for casual mask (#3011) * [intel] add for intel path for llama.py (#3012) * fix for intel path * remove unuse code * Update unsloth/models/llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * Fix Gemma 2 (#3024) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * falcon force float32 on sm<75 machines (#3026) * Fix torch compile issues (#3028) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance` * Update pyproject.toml * Update _utils.py * Fixup patch vllm * Disable mllama * Use variables to decide VLM support * Better attn_impl handling * Patch TF protobuf incompatability * Torch 2.8 (#3186) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update _auto_install.py * Update pyproject.toml * Update rl.py * Protobuf issue * Update pyproject.toml * Fix extras transformers typo in pyproject.toml * Update _utils.py * Bug fixes (#3195) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * adallow float32 dtype in FastLanguageModel (#3204) * Update loader.py * Update vision.py * Suppress message and use unsloth sampling params * Use trl sampling params for now * Improve error message * fixup quantized fast inference model name * Add mistral 3 support --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> * Set padding to 0 * Fix patch * fixup patch (#3359) Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update vision.py * Versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * MXFP4 dequant * Update loader.py * Update vision.py * load_in_16bit * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update vision.py * offload_embedding * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update loader.py * Fix padding issue * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * New models * Update llama.py * Versioning * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Fix AMD * Update _utils.py * Update llama.py * Update vision.py * DEVICE_TYPE_TORCH * Update __init__.py * Update __init__.py * Update _utils.py * Move DEVICE_TYPE * Update rl_replacements.py * Update loader.py * AMD install script * Move AMD * Update _amd_install.sh * Update pyproject.toml * Update pyproject.toml * Delete _amd_install.sh * Update device_type.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com>	2025-10-17 06:56:12 -07:00
Daniel Han	b80f110a44	Update _utils.py	2025-10-17 06:43:11 -07:00
Daniel Han	657580cd67	Update loader.py	2025-10-17 05:19:26 -07:00
Daniel Han	287b67eb91	Update device_type.py	2025-10-17 04:57:39 -07:00
Daniel Han	279063c7ed	Update device_type.py	2025-10-17 04:55:21 -07:00
Daniel Han	33b67f1ebe	Missing inspect	2025-10-17 04:54:58 -07:00
Daniel Han	7271f79284	Disable BnB for AMD	2025-10-17 04:48:29 -07:00
Daniel Han	edff83544b	Update pyproject.toml	2025-10-17 04:29:48 -07:00
Daniel Han	e5c7fe9c53	Delete _amd_install.sh	2025-10-17 04:13:25 -07:00
Daniel Han	cb8484eaf1	Fix transformers 4.57.1 (#3473 ) * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * custom_datatype * recheck * Float16 * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * torch_dtype * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update vision.py * Update vision.py * Fix DataParallel * Update _utils.py * Update rl.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update mapper.py * Versioning * Update loader.py * Update loader.py * Update rl.py * Versioning * Update _utils.py * Fix auto_mapping * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Message * Update vision.py * Update loader.py * Update vision.py * cache_implementation * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Save max_seq_length * Update _utils.py * Update rl.py * Update vision.py * Update llama.py * Mistral3 vllm (#3349) * [WIP] use vLLM for vision language models * Update README.md Editing icon sizes * Update README.md Updating icon sizes * Update README.md (#2885) * MoE kernels AGPLv3 * versioning * Many bug fixes (#2908) * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912) * Dynamically adjust get_per_token_logps function and patch as well (#2911) * add intel gpu with vllm support (#2903) * [bugs] fix for casual mask (#2868) * fix for casual mask * use un_casual in sdpa * add missing mask * fix for type * Explicitly check if xformers exists for attention (#2889) * Update __init__.py * Update llama.py * if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913) * Move inputs to right devices. (#2919) * Move tensors to right devices * fix multi gpu for non mistral models * multi GPU RoPE for gemma2 * Finish up multi GPU inference * Make multiGPU rope a list * Remove unnecessary transfer to CPU * Remove unnecessary move to CPU * Donot move inputs to device yet will be handled separately in another PR * Move inputs to appropriate decoder device * Make device count global variable * Cleanup RoPE device code * Fixup num_gpu to device count * Cleanup device counts * Use device index for RoPE get_cache * Donot typecast * Use tuple instead of list for tensors. Use device index directly * fixup move to device logic * WIP VLM vLLM * Make vLLM patch a function * Add save and load lora functions * Make fast_inference setup depend on the flag * Improve fast inference patching mechanism * Make vision setting depend on checks in fastbasemodel * Check LoRA and vLLM intercompatibility for vision models * Comment pointing to vLLM LoRA check * Improve lora validation on vLLM * Error out on no vLLM and increase max lora rank * Bug fixes (#3017) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * fix for casual mask (#3011) * [intel] add for intel path for llama.py (#3012) * fix for intel path * remove unuse code * Update unsloth/models/llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * Fix Gemma 2 (#3024) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * falcon force float32 on sm<75 machines (#3026) * Fix torch compile issues (#3028) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance` * Update pyproject.toml * Update _utils.py * Fixup patch vllm * Disable mllama * Use variables to decide VLM support * Better attn_impl handling * Patch TF protobuf incompatability * Torch 2.8 (#3186) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update _auto_install.py * Update pyproject.toml * Update rl.py * Protobuf issue * Update pyproject.toml * Fix extras transformers typo in pyproject.toml * Update _utils.py * Bug fixes (#3195) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * adallow float32 dtype in FastLanguageModel (#3204) * Update loader.py * Update vision.py * Suppress message and use unsloth sampling params * Use trl sampling params for now * Improve error message * fixup quantized fast inference model name * Add mistral 3 support --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> * Set padding to 0 * Fix patch * fixup patch (#3359) Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update vision.py * Versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * MXFP4 dequant * Update loader.py * Update vision.py * load_in_16bit * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update vision.py * offload_embedding * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update loader.py * Fix padding issue * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * New models * Update llama.py * Versioning * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Fix AMD * Update _utils.py * Update llama.py * Update vision.py * DEVICE_TYPE_TORCH * Update __init__.py * Update __init__.py * Update _utils.py * Move DEVICE_TYPE * Update rl_replacements.py * Update loader.py * AMD install script * Move AMD * Update _amd_install.sh * Update pyproject.toml --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com>	2025-10-17 04:05:10 -07:00
Daniel Han	77b5256f71	Update _utils.py	2025-10-16 15:59:38 -07:00
Daniel Han	c51abba19f	Update _utils.py	2025-10-16 15:36:58 -07:00
Daniel Han	0ede099ef0	AMD fixes (#3467 ) * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * custom_datatype * recheck * Float16 * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * torch_dtype * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update vision.py * Update vision.py * Fix DataParallel * Update _utils.py * Update rl.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update mapper.py * Versioning * Update loader.py * Update loader.py * Update rl.py * Versioning * Update _utils.py * Fix auto_mapping * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Message * Update vision.py * Update loader.py * Update vision.py * cache_implementation * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Save max_seq_length * Update _utils.py * Update rl.py * Update vision.py * Update llama.py * Mistral3 vllm (#3349) * [WIP] use vLLM for vision language models * Update README.md Editing icon sizes * Update README.md Updating icon sizes * Update README.md (#2885) * MoE kernels AGPLv3 * versioning * Many bug fixes (#2908) * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912) * Dynamically adjust get_per_token_logps function and patch as well (#2911) * add intel gpu with vllm support (#2903) * [bugs] fix for casual mask (#2868) * fix for casual mask * use un_casual in sdpa * add missing mask * fix for type * Explicitly check if xformers exists for attention (#2889) * Update __init__.py * Update llama.py * if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913) * Move inputs to right devices. (#2919) * Move tensors to right devices * fix multi gpu for non mistral models * multi GPU RoPE for gemma2 * Finish up multi GPU inference * Make multiGPU rope a list * Remove unnecessary transfer to CPU * Remove unnecessary move to CPU * Donot move inputs to device yet will be handled separately in another PR * Move inputs to appropriate decoder device * Make device count global variable * Cleanup RoPE device code * Fixup num_gpu to device count * Cleanup device counts * Use device index for RoPE get_cache * Donot typecast * Use tuple instead of list for tensors. Use device index directly * fixup move to device logic * WIP VLM vLLM * Make vLLM patch a function * Add save and load lora functions * Make fast_inference setup depend on the flag * Improve fast inference patching mechanism * Make vision setting depend on checks in fastbasemodel * Check LoRA and vLLM intercompatibility for vision models * Comment pointing to vLLM LoRA check * Improve lora validation on vLLM * Error out on no vLLM and increase max lora rank * Bug fixes (#3017) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * fix for casual mask (#3011) * [intel] add for intel path for llama.py (#3012) * fix for intel path * remove unuse code * Update unsloth/models/llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * Fix Gemma 2 (#3024) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * falcon force float32 on sm<75 machines (#3026) * Fix torch compile issues (#3028) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance` * Update pyproject.toml * Update _utils.py * Fixup patch vllm * Disable mllama * Use variables to decide VLM support * Better attn_impl handling * Patch TF protobuf incompatability * Torch 2.8 (#3186) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update _auto_install.py * Update pyproject.toml * Update rl.py * Protobuf issue * Update pyproject.toml * Fix extras transformers typo in pyproject.toml * Update _utils.py * Bug fixes (#3195) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * adallow float32 dtype in FastLanguageModel (#3204) * Update loader.py * Update vision.py * Suppress message and use unsloth sampling params * Use trl sampling params for now * Improve error message * fixup quantized fast inference model name * Add mistral 3 support --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> * Set padding to 0 * Fix patch * fixup patch (#3359) Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update vision.py * Versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * MXFP4 dequant * Update loader.py * Update vision.py * load_in_16bit * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update vision.py * offload_embedding * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update loader.py * Fix padding issue * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * New models * Update llama.py * Versioning * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Fix AMD * Update _utils.py * Update llama.py * Update vision.py * DEVICE_TYPE_TORCH * Update __init__.py * Update __init__.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com>	2025-10-16 07:05:58 -07:00
Daniel Han	6440419207	Fix (#3466 ) * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * custom_datatype * recheck * Float16 * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * torch_dtype * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update vision.py * Update vision.py * Fix DataParallel * Update _utils.py * Update rl.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update mapper.py * Versioning * Update loader.py * Update loader.py * Update rl.py * Versioning * Update _utils.py * Fix auto_mapping * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Message * Update vision.py * Update loader.py * Update vision.py * cache_implementation * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Save max_seq_length * Update _utils.py * Update rl.py * Update vision.py * Update llama.py * Mistral3 vllm (#3349) * [WIP] use vLLM for vision language models * Update README.md Editing icon sizes * Update README.md Updating icon sizes * Update README.md (#2885) * MoE kernels AGPLv3 * versioning * Many bug fixes (#2908) * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912) * Dynamically adjust get_per_token_logps function and patch as well (#2911) * add intel gpu with vllm support (#2903) * [bugs] fix for casual mask (#2868) * fix for casual mask * use un_casual in sdpa * add missing mask * fix for type * Explicitly check if xformers exists for attention (#2889) * Update __init__.py * Update llama.py * if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913) * Move inputs to right devices. (#2919) * Move tensors to right devices * fix multi gpu for non mistral models * multi GPU RoPE for gemma2 * Finish up multi GPU inference * Make multiGPU rope a list * Remove unnecessary transfer to CPU * Remove unnecessary move to CPU * Donot move inputs to device yet will be handled separately in another PR * Move inputs to appropriate decoder device * Make device count global variable * Cleanup RoPE device code * Fixup num_gpu to device count * Cleanup device counts * Use device index for RoPE get_cache * Donot typecast * Use tuple instead of list for tensors. Use device index directly * fixup move to device logic * WIP VLM vLLM * Make vLLM patch a function * Add save and load lora functions * Make fast_inference setup depend on the flag * Improve fast inference patching mechanism * Make vision setting depend on checks in fastbasemodel * Check LoRA and vLLM intercompatibility for vision models * Comment pointing to vLLM LoRA check * Improve lora validation on vLLM * Error out on no vLLM and increase max lora rank * Bug fixes (#3017) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * fix for casual mask (#3011) * [intel] add for intel path for llama.py (#3012) * fix for intel path * remove unuse code * Update unsloth/models/llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * Fix Gemma 2 (#3024) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * falcon force float32 on sm<75 machines (#3026) * Fix torch compile issues (#3028) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance` * Update pyproject.toml * Update _utils.py * Fixup patch vllm * Disable mllama * Use variables to decide VLM support * Better attn_impl handling * Patch TF protobuf incompatability * Torch 2.8 (#3186) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update _auto_install.py * Update pyproject.toml * Update rl.py * Protobuf issue * Update pyproject.toml * Fix extras transformers typo in pyproject.toml * Update _utils.py * Bug fixes (#3195) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * adallow float32 dtype in FastLanguageModel (#3204) * Update loader.py * Update vision.py * Suppress message and use unsloth sampling params * Use trl sampling params for now * Improve error message * fixup quantized fast inference model name * Add mistral 3 support --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> * Set padding to 0 * Fix patch * fixup patch (#3359) Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update vision.py * Versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * MXFP4 dequant * Update loader.py * Update vision.py * load_in_16bit * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update vision.py * offload_embedding * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update loader.py * Fix padding issue * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * New models * Update llama.py * Versioning * Update _utils.py * Update llama.py * Update _utils.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com>	2025-10-16 05:38:54 -07:00
Daniel Han	c95dea33a9	Update rl.py	2025-10-16 05:26:36 -07:00
Daniel Han	516f771697	Update _utils.py	2025-10-16 05:13:07 -07:00
Daniel Han	2019d0653f	Update _utils.py	2025-10-16 05:06:37 -07:00
Daniel Han	56a4eb4212	Update _utils.py	2025-10-16 04:48:55 -07:00
Daniel Han	7770a24ab2	Update _utils.py	2025-10-16 04:45:27 -07:00
Daniel Han	a6cf91d869	TorchAO	2025-10-16 03:55:20 -07:00
Datta Nimmaturi	1dd5485e95	vLLM FP8 quantized support for SFT/GRPO (#3414 ) * Prefer loading model from pretrained instead of config * Fixup FP8 forward pass and inference * [WIP] Fix lora forwards * Infer block size from weight shapes * reconstruct weights from fp8 quants for lora matmul * Return weight transpose and fix dtype * Refactor FP8 operations * Fix naming :) * Saner compile * do not depend on transformers * [WIP] fix training * Update comment * fixup training * use dequant kernel from deepseek * Differentiate between fp8 and fbgemmfp8 * fixup differentiation b/w fp8 and fbgemm_fp8 * make inputs contiguous if required * Improve dequant * More robust handling * Fixup backward pass for fbgemm_fp8 * refactor and use bf16 for dequant * Use torch fp8 block matmul * Disable torch block matmul for now * safer import and cosmetics * more cosmectics * add torchao operations * Spaceeeeeee	2025-10-16 03:07:05 -07:00
Daniel Han	7797f373d1	Update _utils.py	2025-10-16 02:42:26 -07:00
Daniel Han	e7ab86db0d	Update _utils.py	2025-10-15 14:30:50 -07:00
Daniel Han	dc060f0f7d	Update _utils.py	2025-10-15 14:29:06 -07:00
Daniel Han	0dcfb03b8b	Update _utils.py	2025-10-15 14:28:19 -07:00
Michael Han	e1a9c130e5	Update README.md Qwen3-VL + DGX	2025-10-14 20:23:32 -07:00
Daniel Han	a219198d41	Update mapper.py	2025-10-14 16:22:47 -07:00
Daniel Han	7710adc318	Update mapper.py	2025-10-14 08:02:03 -07:00
Daniel Han	b81935bd21	Versioning	2025-10-14 07:27:38 -07:00
Daniel Han	7d2f07a0b2	Update mapper.py	2025-10-14 07:20:22 -07:00
Daniel Han	41ad82c1ba	Update __init__.py	2025-10-14 05:54:32 -07:00
Daniel Han	781a4507e5	Update import_fixes.py	2025-10-14 05:40:21 -07:00
Daniel Han	abdf91927c	Update loader.py	2025-10-14 05:29:20 -07:00
Daniel Han	b3fc77f1fa	Update pyproject.toml	2025-10-14 01:55:39 -07:00
Daniel Han	f98ebd192f	Update _utils.py	2025-10-14 01:52:05 -07:00
Roland Tannous	e35be2b490	[Part2] Reinstate llama.cpp Compatibility and GGUF Conversion with Multiple Quantizations and Automated Ollama Modelfile Creation (#3356 ) * GGUF conversion code + model to template mappers + chat template adds/fixes * syntax fixes * extract tokenizer from video processor * model file cleanup after multiple quantizations * flip is_vlm flag is mmproj has text only llama.cpp support for MLM * preserve processor files for merge operation * reinstate chr(92) * fixed starling mapping * ollama Modelfile from gguf for text models * specify bf16 ollama model precision for vision models * fix keyError in templatedict when no mapping * revert chat_templates.py to original syntax * ollama modelfile template to model mapper * link save to ollama mapper, fix some bugs * rename to ollama_template_mappers * Remove old template_mappers file (renamed ollama_template_mappers) * fix final printout * fix model list and printout * remove yi base model, keep chat/instruct * fixed dangling > in HF repo readme for uploaded models * added granite model ollama support * Combine use_local_gguf() blocks * model_name relative to base_model_name	2025-10-14 01:23:14 -07:00
pluesclues	0aed5ae94a	Fix eval metric issue (#3420 ) * Update rl.py, added fix eval metric issue from online DPO * Update rl.py, enabled unsloth return logits flag for metrics	2025-10-13 17:20:46 -07:00
Etherll	3d73aebec8	improve qat (#3446 ) * Update save.py * Update vision.py * Update save.py	2025-10-13 17:03:15 -07:00
DoubleMathew	33023b9ac9	Handle transformers rename from PretrainedConfig to PreTrainedConfig (#3445 )	2025-10-13 16:00:28 -07:00
Michael Han	a049fcd460	Update README.md	2025-10-12 05:32:42 -07:00
Daniel Han	57e7589b55	Update pyproject.toml	2025-10-07 20:52:38 -07:00
Daniel Han	41ee93a2e8	Update pyproject.toml	2025-10-07 20:50:31 -07:00
Daniel Han	452ad1959e	Gemma 3 bug fixes (#3410 ) * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * custom_datatype * recheck * Float16 * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * torch_dtype * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update vision.py * Update vision.py * Fix DataParallel * Update _utils.py * Update rl.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update mapper.py * Versioning * Update loader.py * Update loader.py * Update rl.py * Versioning * Update _utils.py * Fix auto_mapping * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Message * Update vision.py * Update loader.py * Update vision.py * cache_implementation * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Save max_seq_length * Update _utils.py * Update rl.py * Update vision.py * Update llama.py * Mistral3 vllm (#3349) * [WIP] use vLLM for vision language models * Update README.md Editing icon sizes * Update README.md Updating icon sizes * Update README.md (#2885) * MoE kernels AGPLv3 * versioning * Many bug fixes (#2908) * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912) * Dynamically adjust get_per_token_logps function and patch as well (#2911) * add intel gpu with vllm support (#2903) * [bugs] fix for casual mask (#2868) * fix for casual mask * use un_casual in sdpa * add missing mask * fix for type * Explicitly check if xformers exists for attention (#2889) * Update __init__.py * Update llama.py * if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913) * Move inputs to right devices. (#2919) * Move tensors to right devices * fix multi gpu for non mistral models * multi GPU RoPE for gemma2 * Finish up multi GPU inference * Make multiGPU rope a list * Remove unnecessary transfer to CPU * Remove unnecessary move to CPU * Donot move inputs to device yet will be handled separately in another PR * Move inputs to appropriate decoder device * Make device count global variable * Cleanup RoPE device code * Fixup num_gpu to device count * Cleanup device counts * Use device index for RoPE get_cache * Donot typecast * Use tuple instead of list for tensors. Use device index directly * fixup move to device logic * WIP VLM vLLM * Make vLLM patch a function * Add save and load lora functions * Make fast_inference setup depend on the flag * Improve fast inference patching mechanism * Make vision setting depend on checks in fastbasemodel * Check LoRA and vLLM intercompatibility for vision models * Comment pointing to vLLM LoRA check * Improve lora validation on vLLM * Error out on no vLLM and increase max lora rank * Bug fixes (#3017) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * fix for casual mask (#3011) * [intel] add for intel path for llama.py (#3012) * fix for intel path * remove unuse code * Update unsloth/models/llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * Fix Gemma 2 (#3024) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * falcon force float32 on sm<75 machines (#3026) * Fix torch compile issues (#3028) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance` * Update pyproject.toml * Update _utils.py * Fixup patch vllm * Disable mllama * Use variables to decide VLM support * Better attn_impl handling * Patch TF protobuf incompatability * Torch 2.8 (#3186) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update _auto_install.py * Update pyproject.toml * Update rl.py * Protobuf issue * Update pyproject.toml * Fix extras transformers typo in pyproject.toml * Update _utils.py * Bug fixes (#3195) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * adallow float32 dtype in FastLanguageModel (#3204) * Update loader.py * Update vision.py * Suppress message and use unsloth sampling params * Use trl sampling params for now * Improve error message * fixup quantized fast inference model name * Add mistral 3 support --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> * Set padding to 0 * Fix patch * fixup patch (#3359) Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update vision.py * Versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * MXFP4 dequant * Update loader.py * Update vision.py * load_in_16bit * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update vision.py * offload_embedding * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update loader.py * Fix padding issue * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * New models --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com>	2025-10-05 00:54:24 -07:00
Michael Han	b235ec7f7f	Update README.md	2025-10-04 16:12:02 -07:00
Michael Han	afe9d39981	Update README.md	2025-10-03 04:18:21 -07:00
Michael Han	aeb2829ec9	Update README.md	2025-10-03 04:01:17 -07:00
Scott Roy	291987113b	up (#3391 )	2025-10-01 19:14:32 -07:00
Michael Han	5745677718	Adding Docker support	2025-10-01 17:04:46 -07:00
Daniel Han	a0ce4a982a	Nightly (#3394 ) * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * custom_datatype * recheck * Float16 * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * torch_dtype * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update vision.py * Update vision.py * Fix DataParallel * Update _utils.py * Update rl.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update mapper.py * Versioning * Update loader.py * Update loader.py * Update rl.py * Versioning * Update _utils.py * Fix auto_mapping * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Message * Update vision.py * Update loader.py * Update vision.py * cache_implementation * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Save max_seq_length * Update _utils.py * Update rl.py * Update vision.py * Update llama.py * Mistral3 vllm (#3349) * [WIP] use vLLM for vision language models * Update README.md Editing icon sizes * Update README.md Updating icon sizes * Update README.md (#2885) * MoE kernels AGPLv3 * versioning * Many bug fixes (#2908) * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912) * Dynamically adjust get_per_token_logps function and patch as well (#2911) * add intel gpu with vllm support (#2903) * [bugs] fix for casual mask (#2868) * fix for casual mask * use un_casual in sdpa * add missing mask * fix for type * Explicitly check if xformers exists for attention (#2889) * Update __init__.py * Update llama.py * if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913) * Move inputs to right devices. (#2919) * Move tensors to right devices * fix multi gpu for non mistral models * multi GPU RoPE for gemma2 * Finish up multi GPU inference * Make multiGPU rope a list * Remove unnecessary transfer to CPU * Remove unnecessary move to CPU * Donot move inputs to device yet will be handled separately in another PR * Move inputs to appropriate decoder device * Make device count global variable * Cleanup RoPE device code * Fixup num_gpu to device count * Cleanup device counts * Use device index for RoPE get_cache * Donot typecast * Use tuple instead of list for tensors. Use device index directly * fixup move to device logic * WIP VLM vLLM * Make vLLM patch a function * Add save and load lora functions * Make fast_inference setup depend on the flag * Improve fast inference patching mechanism * Make vision setting depend on checks in fastbasemodel * Check LoRA and vLLM intercompatibility for vision models * Comment pointing to vLLM LoRA check * Improve lora validation on vLLM * Error out on no vLLM and increase max lora rank * Bug fixes (#3017) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * fix for casual mask (#3011) * [intel] add for intel path for llama.py (#3012) * fix for intel path * remove unuse code * Update unsloth/models/llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * Fix Gemma 2 (#3024) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * falcon force float32 on sm<75 machines (#3026) * Fix torch compile issues (#3028) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance` * Update pyproject.toml * Update _utils.py * Fixup patch vllm * Disable mllama * Use variables to decide VLM support * Better attn_impl handling * Patch TF protobuf incompatability * Torch 2.8 (#3186) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update _auto_install.py * Update pyproject.toml * Update rl.py * Protobuf issue * Update pyproject.toml * Fix extras transformers typo in pyproject.toml * Update _utils.py * Bug fixes (#3195) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * adallow float32 dtype in FastLanguageModel (#3204) * Update loader.py * Update vision.py * Suppress message and use unsloth sampling params * Use trl sampling params for now * Improve error message * fixup quantized fast inference model name * Add mistral 3 support --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> * Set padding to 0 * Fix patch * fixup patch (#3359) Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update vision.py * Versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * MXFP4 dequant * Update loader.py * Update vision.py * load_in_16bit * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update vision.py * offload_embedding * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update loader.py * Fix padding issue * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com>	2025-10-01 05:07:14 -07:00
Daniel Han	f78abbc751	Update vision.py	2025-10-01 04:36:58 -07:00
Daniel Han	cc810e17cb	Update vision.py	2025-10-01 04:15:14 -07:00
Daniel Han	0ca2a140a0	execute_with_time_limit	2025-10-01 01:20:30 -07:00
Daniel Han	72d4ce88c0	Update __init__.py	2025-09-30 23:14:35 -07:00
Daniel Han	d3e04ca1d6	Update _utils.py	2025-09-30 23:10:29 -07:00
Daniel Han	83bf7d1435	Update _utils.py	2025-09-30 23:07:19 -07:00
Daniel Han	032bfb01e5	Nightly (#3392 ) * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * custom_datatype * recheck * Float16 * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * torch_dtype * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update vision.py * Update vision.py * Fix DataParallel * Update _utils.py * Update rl.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update mapper.py * Versioning * Update loader.py * Update loader.py * Update rl.py * Versioning * Update _utils.py * Fix auto_mapping * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Message * Update vision.py * Update loader.py * Update vision.py * cache_implementation * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Save max_seq_length * Update _utils.py * Update rl.py * Update vision.py * Update llama.py * Mistral3 vllm (#3349) * [WIP] use vLLM for vision language models * Update README.md Editing icon sizes * Update README.md Updating icon sizes * Update README.md (#2885) * MoE kernels AGPLv3 * versioning * Many bug fixes (#2908) * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912) * Dynamically adjust get_per_token_logps function and patch as well (#2911) * add intel gpu with vllm support (#2903) * [bugs] fix for casual mask (#2868) * fix for casual mask * use un_casual in sdpa * add missing mask * fix for type * Explicitly check if xformers exists for attention (#2889) * Update __init__.py * Update llama.py * if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913) * Move inputs to right devices. (#2919) * Move tensors to right devices * fix multi gpu for non mistral models * multi GPU RoPE for gemma2 * Finish up multi GPU inference * Make multiGPU rope a list * Remove unnecessary transfer to CPU * Remove unnecessary move to CPU * Donot move inputs to device yet will be handled separately in another PR * Move inputs to appropriate decoder device * Make device count global variable * Cleanup RoPE device code * Fixup num_gpu to device count * Cleanup device counts * Use device index for RoPE get_cache * Donot typecast * Use tuple instead of list for tensors. Use device index directly * fixup move to device logic * WIP VLM vLLM * Make vLLM patch a function * Add save and load lora functions * Make fast_inference setup depend on the flag * Improve fast inference patching mechanism * Make vision setting depend on checks in fastbasemodel * Check LoRA and vLLM intercompatibility for vision models * Comment pointing to vLLM LoRA check * Improve lora validation on vLLM * Error out on no vLLM and increase max lora rank * Bug fixes (#3017) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * fix for casual mask (#3011) * [intel] add for intel path for llama.py (#3012) * fix for intel path * remove unuse code * Update unsloth/models/llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * Fix Gemma 2 (#3024) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * falcon force float32 on sm<75 machines (#3026) * Fix torch compile issues (#3028) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance` * Update pyproject.toml * Update _utils.py * Fixup patch vllm * Disable mllama * Use variables to decide VLM support * Better attn_impl handling * Patch TF protobuf incompatability * Torch 2.8 (#3186) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update _auto_install.py * Update pyproject.toml * Update rl.py * Protobuf issue * Update pyproject.toml * Fix extras transformers typo in pyproject.toml * Update _utils.py * Bug fixes (#3195) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * adallow float32 dtype in FastLanguageModel (#3204) * Update loader.py * Update vision.py * Suppress message and use unsloth sampling params * Use trl sampling params for now * Improve error message * fixup quantized fast inference model name * Add mistral 3 support --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> * Set padding to 0 * Fix patch * fixup patch (#3359) Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update vision.py * Versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * MXFP4 dequant * Update loader.py * Update vision.py * load_in_16bit * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update vision.py * offload_embedding * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update loader.py * Fix padding issue * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com>	2025-09-30 05:03:34 -07:00
Michael Han	f529568194	Merge pull request #3384 from Etherll/patch-928 Fix loading as 8bit	2025-09-28 16:55:55 -07:00
Etherll	4e2b12101b	Update vision.py	2025-09-28 23:52:08 +03:00
Michael Han	a6dfb2894d	Update README.md	2025-09-26 17:31:46 -07:00
Daniel Han	2fd5cc70d8	Versioning	2025-09-26 07:11:48 -07:00
Daniel Han	82f76a8609	Update pyproject.toml	2025-09-26 05:29:02 -07:00
Daniel Han	61da0d3237	GPT OSS RL (#3362 ) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * custom_datatype * recheck * Float16 * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * torch_dtype * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update vision.py * Update vision.py * Fix DataParallel * Update _utils.py * Update rl.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update mapper.py * Versioning * Update loader.py * Update loader.py * Update rl.py * Versioning * Update _utils.py * Fix auto_mapping * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Message * Update vision.py * Update loader.py * Update vision.py * cache_implementation * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Save max_seq_length * Update _utils.py * Update rl.py * Update vision.py * Update llama.py * Mistral3 vllm (#3349) * [WIP] use vLLM for vision language models * Update README.md Editing icon sizes * Update README.md Updating icon sizes * Update README.md (#2885) * MoE kernels AGPLv3 * versioning * Many bug fixes (#2908) * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912) * Dynamically adjust get_per_token_logps function and patch as well (#2911) * add intel gpu with vllm support (#2903) * [bugs] fix for casual mask (#2868) * fix for casual mask * use un_casual in sdpa * add missing mask * fix for type * Explicitly check if xformers exists for attention (#2889) * Update __init__.py * Update llama.py * if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913) * Move inputs to right devices. (#2919) * Move tensors to right devices * fix multi gpu for non mistral models * multi GPU RoPE for gemma2 * Finish up multi GPU inference * Make multiGPU rope a list * Remove unnecessary transfer to CPU * Remove unnecessary move to CPU * Donot move inputs to device yet will be handled separately in another PR * Move inputs to appropriate decoder device * Make device count global variable * Cleanup RoPE device code * Fixup num_gpu to device count * Cleanup device counts * Use device index for RoPE get_cache * Donot typecast * Use tuple instead of list for tensors. Use device index directly * fixup move to device logic * WIP VLM vLLM * Make vLLM patch a function * Add save and load lora functions * Make fast_inference setup depend on the flag * Improve fast inference patching mechanism * Make vision setting depend on checks in fastbasemodel * Check LoRA and vLLM intercompatibility for vision models * Comment pointing to vLLM LoRA check * Improve lora validation on vLLM * Error out on no vLLM and increase max lora rank * Bug fixes (#3017) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * fix for casual mask (#3011) * [intel] add for intel path for llama.py (#3012) * fix for intel path * remove unuse code * Update unsloth/models/llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * Fix Gemma 2 (#3024) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * falcon force float32 on sm<75 machines (#3026) * Fix torch compile issues (#3028) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance` * Update pyproject.toml * Update _utils.py * Fixup patch vllm * Disable mllama * Use variables to decide VLM support * Better attn_impl handling * Patch TF protobuf incompatability * Torch 2.8 (#3186) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update _auto_install.py * Update pyproject.toml * Update rl.py * Protobuf issue * Update pyproject.toml * Fix extras transformers typo in pyproject.toml * Update _utils.py * Bug fixes (#3195) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * adallow float32 dtype in FastLanguageModel (#3204) * Update loader.py * Update vision.py * Suppress message and use unsloth sampling params * Use trl sampling params for now * Improve error message * fixup quantized fast inference model name * Add mistral 3 support --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> * Set padding to 0 * Fix patch * fixup patch (#3359) Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update vision.py * Versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * MXFP4 dequant * Update loader.py * Update vision.py * load_in_16bit * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update vision.py * offload_embedding * Update vision.py * Update vision.py * Update vision.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com>	2025-09-26 04:55:12 -07:00
laz-001	373c3188e1	correct python support statement (#3374 )	2025-09-26 04:52:23 -07:00
Michael Han	9c9f85b28a	Update README.md Fresh upate	2025-09-26 02:50:02 -07:00
DoubleMathew	ab6eb686dd	specify different tokenizer_path/name (#3343 )	2025-09-19 20:01:33 -07:00
DoubleMathew	35ff0f4564	peft_config before model_config (#3342 )	2025-09-19 20:01:13 -07:00
Daniel Han	d5f1abfddd	Update vision.py	2025-09-19 04:01:52 -07:00
Daniel Han	8817a91984	Update _utils.py	2025-09-19 01:17:36 -07:00
Daniel Han	a4ad3e0d70	Update loader.py	2025-09-19 01:07:34 -07:00
Daniel Han	af60134d7d	Update vision.py	2025-09-18 22:33:13 -07:00
Daniel Han	5d6fbda29a	Bug fixes (#3335 ) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * custom_datatype * recheck * Float16 * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * torch_dtype * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update vision.py * Update vision.py * Fix DataParallel * Update _utils.py * Update rl.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update mapper.py * Versioning * Update loader.py * Update loader.py * Update rl.py * Versioning * Update _utils.py * Fix auto_mapping --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>	2025-09-18 19:00:17 -07:00
Etherll	727c938ac5	Update vision.py (#3339 )	2025-09-18 14:27:39 -07:00
DoubleMathew	a26d0de384	Synthetic Data updates (#3333 )	2025-09-17 21:43:00 -07:00
DoubleMathew	0faea1fb86	update (#3332 )	2025-09-17 21:42:12 -07:00
andrewor14	3ffb3bdcfe	Fix QAT + LoRA fast path, add tests (#3307 ) Summary: The existing QAT + LoRA path only applied fake quantization to the original slow path, but the default is the fast path that calls unsloth's fast LoRA primitives. This commit integrates fake quantization into these fast primitives as well, and add unit tests to assert that fake quantization is actually taking place. Test Plan: Unit tests: ``` pytest tests/utils/test_qat.py ``` End-to-end test: https://gist.github.com/andrewor14/6360dd69b5784c71c46e80c14f53e6b6 Full fine-tuning Llama3.1-8B with and without QAT + LoRA on yahma/alpaca-cleaned for 1 epoch: - Batch size = 8 (no grad accum) - Learning rate = 2e-4 - Quantization scheme = int4 weight only (with bf16 activations) Wikitext perplexity: - Baseline = int4 quantized model finetuned without QAT - QAT int4 quantized model (with this PR) achieved 33% lower perplexity than the int4 baseline - QAT int4 quantized model without this PR was worse than the int4 baseline ``` ==> unsloth_model_lora_baseline_output/lm_eval_float.log <== \| \| \|none \| 0\|word_perplexity\|↓ \|7.5551\|± \| N/A\| ==> unsloth_model_lora_baseline_output/lm_eval_quantized.log <== \| \| \|none \| 0\|word_perplexity\|↓ \|8.7655\|± \| N/A\| ==> unsloth_model_lora_qat_int4_output/lm_eval_quantized.log <== \| \| \|none \| 0\|word_perplexity\|↓ \|8.3548\|± \| N/A\| ```	2025-09-17 15:18:17 -07:00
Daniel Han	70f790a8e4	Bug fixes (#3329 ) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * custom_datatype * recheck * Float16 * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * torch_dtype * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update vision.py * Update vision.py * Fix DataParallel * Update _utils.py * Update rl.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>	2025-09-17 05:06:23 -07:00
Michael Han	1b3fdd5565	Update README.md	2025-09-16 10:07:02 -07:00
Daniel Han	bf0267450b	Versioning	2025-09-16 08:49:42 -07:00
Daniel Han	8b1ad0ae82	Update pyproject.toml	2025-09-16 08:17:21 -07:00
Daniel Han	a6fcbbe814	Update _utils.py	2025-09-16 06:23:19 -07:00
Daniel Han	0569186f14	Update rl.py	2025-09-16 06:20:06 -07:00
pluesclues	d4c653dc2e	TRL Updated version of VLM GRPO update along with GSPO (#3132 ) * Kept, padding logic * Made sure prediction step in rl.py allows logging for callbacks in RL trainers * updated llama.py to new online_dpo changes * Update rl.py to make logic simpiler * Update rl.py, made sure tokenized_output on eval step was on same device * Update rl.py, corrected tokenized_outputs to inputs * Update rl.py, removed sagemaker stuff * Update llama.py, figures out if there is right padding automatically * Update llama.py, changed conditional statement for right padding slightlyt * Update llama.py, updated OS.environ variable to temp variable * Update rl.py, made it account for right padding in online dpo and reward modeling * Update llama.py, automatically figures out if right padding is needed * Update rl_replacements.py, fixed up passing image data to functions * Update rl_replacements.py, for VLM GRPO support with TRL * Update rl_replacements.py, gspo added * Update rl.py, forgot about Online_DPO changes in this branch * Update rl.py, forgot to not include Online DPO PR changes * Update llama.py, forgot to disinclude Online DPO PR changes * Update rl_replacements.py, updated generate and score completions to be up to date for trl * Update rl_replacements.py * Update rl_replacements.py, fixed nan issues with vlms * Update rl_replacements.py, added indent * Update rl_replacements.py, added attention mask to calculations of old and ref hidden states * Update unsloth/models/rl_replacements.py * Update unsloth/models/rl_replacements.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-09-16 05:43:07 -07:00
Datta Nimmaturi	51879e513a	Fast Inference with vLLM for VLMs (#2975 ) * [WIP] use vLLM for vision language models * Update README.md Editing icon sizes * Update README.md Updating icon sizes * Update README.md (#2885) * MoE kernels AGPLv3 * versioning * Many bug fixes (#2908) * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912) * Dynamically adjust get_per_token_logps function and patch as well (#2911) * add intel gpu with vllm support (#2903) * [bugs] fix for casual mask (#2868) * fix for casual mask * use un_casual in sdpa * add missing mask * fix for type * Explicitly check if xformers exists for attention (#2889) * Update __init__.py * Update llama.py * if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913) * Move inputs to right devices. (#2919) * Move tensors to right devices * fix multi gpu for non mistral models * multi GPU RoPE for gemma2 * Finish up multi GPU inference * Make multiGPU rope a list * Remove unnecessary transfer to CPU * Remove unnecessary move to CPU * Donot move inputs to device yet will be handled separately in another PR * Move inputs to appropriate decoder device * Make device count global variable * Cleanup RoPE device code * Fixup num_gpu to device count * Cleanup device counts * Use device index for RoPE get_cache * Donot typecast * Use tuple instead of list for tensors. Use device index directly * fixup move to device logic * WIP VLM vLLM * Make vLLM patch a function * Add save and load lora functions * Make fast_inference setup depend on the flag * Improve fast inference patching mechanism * Make vision setting depend on checks in fastbasemodel * Check LoRA and vLLM intercompatibility for vision models * Comment pointing to vLLM LoRA check * Improve lora validation on vLLM * Error out on no vLLM and increase max lora rank * Bug fixes (#3017) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * fix for casual mask (#3011) * [intel] add for intel path for llama.py (#3012) * fix for intel path * remove unuse code * Update unsloth/models/llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * Fix Gemma 2 (#3024) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * falcon force float32 on sm<75 machines (#3026) * Fix torch compile issues (#3028) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance` * Update pyproject.toml * Update _utils.py * Fixup patch vllm * Disable mllama * Use variables to decide VLM support * Better attn_impl handling * Patch TF protobuf incompatability * Torch 2.8 (#3186) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update _auto_install.py * Update pyproject.toml * Update rl.py * Protobuf issue * Update pyproject.toml * Fix extras transformers typo in pyproject.toml * Update _utils.py * Bug fixes (#3195) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * adallow float32 dtype in FastLanguageModel (#3204) * Update loader.py * Update vision.py * Suppress message and use unsloth sampling params * Use trl sampling params for now * Improve error message * fixup quantized fast inference model name --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com>	2025-09-16 05:29:08 -07:00
lightsource	67d918b00b	Add support for modules_to_save in FastModel.get_peft_model (#3317 ) * patch modules_to_save * Update vision.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-09-15 22:41:34 -07:00
Daniel Han	977689b2a2	Bug fixes (#3322 ) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * custom_datatype * recheck * Float16 * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * torch_dtype * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>	2025-09-15 01:55:24 -07:00
Daniel Han	6f5b6e90fd	Update README.md	2025-09-15 01:46:07 -07:00
Daniel Han	846a5dcbc4	Update README.md	2025-09-15 01:43:11 -07:00
Daniel Han	29ed805a13	Update README.md	2025-09-15 01:42:59 -07:00
Daniel Han	db4f3cde14	Update README.md	2025-09-15 01:40:28 -07:00
Daniel Han	2f8baabd7a	Update README.md	2025-09-15 01:40:06 -07:00
Daniel Han	92f972bb7c	Update README.md	2025-09-15 01:39:39 -07:00
Daniel Han	46e7370878	Blackwell support	2025-09-15 01:39:03 -07:00
Michael Han	bf92d129b4	Update README.md	2025-09-13 21:45:22 -07:00
Michael Han	8a1ff4a3f0	Update README.md Adding new install instructions	2025-09-13 21:30:52 -07:00
Daniel Han	2003da34f8	importlib_version	2025-09-12 03:13:15 -07:00
billishyahao	71ae760aa0	[ROCm] add hip device path (#3301 )	2025-09-12 02:57:19 -07:00
Daniel Han	6087e203f9	Bug fix	2025-09-10 05:14:15 -07:00
Daniel Han	9e837e2dc3	Update rl.py	2025-09-10 01:39:35 -07:00
Daniel Han	39e532de39	Bug fixes (#3295 ) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * custom_datatype * recheck * Float16 * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * torch_dtype * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>	2025-09-10 01:28:13 -07:00
Daniel Han	fe2c8eb76e	Update loader.py	2025-09-09 02:06:07 -07:00
Daniel Han	66b45b5aaa	Merge branch 'main' of https://github.com/unslothai/unsloth	2025-09-09 01:22:27 -07:00
DoubleMathew	034db11215	simplify uns inference (#3291 )	2025-09-08 21:07:02 -07:00
Daniel Han	4ae5db3287	Merge branch 'main' of https://github.com/unslothai/unsloth	2025-09-08 17:15:57 -07:00
Daniel Han	14791dc6c2	Update __init__.py	2025-09-08 17:15:55 -07:00
andrewor14	fa93e36312	Add support for QAT full fine-tuning (#3238 ) Summary: Following https://github.com/unslothai/unsloth/pull/2976, which adds support for QAT + LoRA, this PR adds support for QAT during full fine-tuning. See the [torchao QAT README](https://github.com/pytorch/ao/blob/main/torchao/quantization/qat/README.md) for more details. Current QAT schemes supported are: ``` fp8-int4, targeting the torch.ops.fbgemm.f8i4bf16_shuffled kernel fp8-fp8, targeting the torch.ops.fbgemm.f8f8bf16_rowwise kernel ``` Test Plan: https://gist.github.com/andrewor14/048b5c1bd01b7fa23c53913856a8ef9f Full fine-tuning Llama3.1-8B with and without QAT on `yahma/alpaca-cleaned` for 1 epoch: - Batch size = 16 (no grad accum) - Learning rate = 4e-5 - Quantization scheme = fp8-int4 Wikitext perplexity: - QAT improved perplexity by 19.2% compared to regular fine-tuning - QAT's int4 quantized model even outperformed the bf16 baseline - Regular int4 quantized model (without QAT) was significantly worse than the bf16 baseline ``` ==> unsloth_model_full_baseline_output/eval_float.log <== \| \| \|none \| 0\|word_perplexity\|↓ \|9.8446\|± \| N/A\| ==> unsloth_model_full_baseline_output/eval_quantized.log <== \| \| \|none \| 0\|word_perplexity\|↓ \|11.4595\|± \| N/A\| ==> unsloth_model_full_qat_fp8-int4_output/eval_quantized.log <== \| \| \|none \| 0\|word_perplexity\|↓ \|9.2336\|± \| N/A\| ``` Fibonacci test: - Both bf16 baseline and int4 quantized models correctly identified 13 as the next number - QAT quantized model was more succinct in its response - No substantial differences here ``` ### Instruction: Continue the fibonnaci sequence. ### Input: 1, 1, 2, 3, 5, 8 ==> unsloth_model_full_baseline_output/eval_float.log <== ### Response: The next number in the Fibonacci sequence is 13.<\|end_of_text\|> ==> unsloth_model_full_baseline_output/eval_quantized.log <== ### Response: The next number in the Fibonacci sequence is 13.<\|end_of_text\|> ==> unsloth_model_full_qat_fp8-int4_output/eval_quantized.log <== ### Response: 13<\|end_of_text\|> ```	2025-09-08 15:07:50 -07:00
DoubleMathew	c9c068fa0b	GptAttention turn training off during inference (#3289 )	2025-09-08 13:47:32 -07:00
Daniel Han	6e237fac7f	Versioning	2025-09-08 06:06:04 -07:00
Daniel Han	8a0de46a71	Update __init__.py	2025-09-08 04:57:04 -07:00
Daniel Han	63b2e8fc35	Update __init__.py	2025-09-08 02:02:11 -07:00
Roland Tannous	2011859430	Add TorchAO quantization tests with FP16 models and serialization workarounds (#3269 ) * Add TorchAO quantization tests with FP16 models and serialization workarounds * remove unrelated files * cleaned submission	2025-09-04 17:22:07 -07:00
DoubleMathew	b969975ba5	llama vision inference fix (#3270 ) * llama vision inference fix * fix via can_compile_fullgraph instead	2025-09-04 16:06:49 -07:00
Datta Nimmaturi	2c2662b51c	Filter executor not sleeping log (#3268 )	2025-09-04 22:05:42 +05:30
Daniel Han	5c1b0ae9dd	Bug fixes (#3266 ) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * custom_datatype * recheck * Float16 * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * torch_dtype * Update rl.py * Fix CE Loss * Versioning --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>	2025-09-04 05:17:59 -07:00
DoubleMathew	490027f988	disable _is_vlm (#3265 )	2025-09-04 04:52:35 -07:00
Daniel Han	721eee6a80	Update rl.py	2025-09-04 03:25:40 -07:00
Roland Tannous	0135d126df	fixed save_pretrained_torchao and associated tests (#3264 )	2025-09-03 20:24:12 -07:00
Daniel Han	f42f0d2116	Update import_fixes.py	2025-09-03 20:17:36 -07:00
Daniel Han	4a52d0f78e	Update import_fixes.py	2025-09-03 20:12:42 -07:00
Daniel Han	f1f0036a92	Move logging	2025-09-03 20:07:27 -07:00
Daniel Han	7094b4843a	Update _utils.py	2025-09-03 20:00:21 -07:00
Daniel Han	fa3575920c	Update pyproject.toml	2025-09-03 19:55:10 -07:00
Daniel Han	33ed154e81	Update llama.py	2025-09-03 19:11:53 -07:00
Jerry Zhang	969c6a0bd8	Support saving locally in `model.save_pretrained_torchao` (#3263 ) Summary: Previously the test was not ran correctly and the save to local path is not tested this PR added support for that and tries to test properly Note: `python tests/saving/test_unsloth_save.py` doesn't run test Test Plan: pytest tests/saving/test_unsloth_save.py -k test_save_torchao Reviewers: Subscribers: Tasks: Tags:	2025-09-03 17:51:33 -07:00
Daniel Han	15f3ce1372	Update save.py	2025-09-03 15:19:02 -07:00
Lei Zhenyuan	781c890c65	[Intel] make intel device support ROPE (#3164 ) * make intel device pass * abstract torch device stream	2025-09-03 04:39:57 -07:00
stevenxdavis	56dd244340	Fix incorrect function call in test_qwen3_grpo.py (#3212 ) * Update test_qwen3_grpo.py to correct function call This test file uses the incorrect name for the function, which is gradient_checkpointing_disable(), not disable_gradient_checkpointing(). I copied the line from test_llama32_sft.py - I'm not sure if this actually is required, just wanted it consistent for when other people like me test this and have no clue what they're doing when it throws an exception. * Update blackwell/test_qwen3_grpo.py Co-authored-by: Daniel Han <danielhanchen@gmail.com> --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-09-03 04:39:12 -07:00
Defi Wimar	8920d2eed2	chore: Fix Typos (#3246 )	2025-09-03 04:38:41 -07:00
Tim Paine	3f6ac1ce25	Remove old version constraint in dependency list (#3237 ) xref: https://github.com/unslothai/unsloth-zoo/pull/258	2025-09-01 02:29:58 -07:00
pluesclues	2b88f93bce	Update mistral.py, showed flag to not call cut cross entropy (#3233 ) * Update mistral.py, showed flag to not call cut cross entropy * Update mistral.py, made it so if its not equal to zero * Update unsloth/models/mistral.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-08-29 01:32:21 -07:00
Roland Tannous	711ec4a3ac	tests for mxfp4 and quantized models merge fix unsloth zoo pr 254 (#3223 )	2025-08-29 01:30:48 -07:00
Daniel Han	25b21f4899	GPT OSS Bug fixes (#3231 ) * Update rl.py * Update rl.py * Update rl.py * GPT OSS float32 * Update vision.py * Update loader.py * Update loader.py	2025-08-28 09:39:46 -07:00
Daniel Han	4058d7861a	Merge branch 'main' of https://github.com/unslothai/unsloth	2025-08-28 03:19:16 -07:00
Daniel Han	01500fdcbb	Versioning	2025-08-28 03:19:14 -07:00
Michael Han	a10e9d6d49	Merge pull request #3224 from DefiWimar7/typos chore: Fix Typos Thank you @DefiWimar7	2025-08-28 02:46:27 -07:00
DoubleMathew	1c08e89cc7	Handle transformers move to dtype from torch_dtype (#3225 )	2025-08-28 02:43:41 -07:00
DefiWimar7	8c39cb45e4	chore: Fix Typos	2025-08-28 10:44:28 +08:00
DoubleMathew	ceff1b43b3	Fix gemma-3n (#3219 ) * place gemma-3n handling inside gemma-3 conditional * cleanup	2025-08-26 19:46:27 -07:00
Jerry Zhang	f3ab8c21af	Support `model.save_pretrained_torchao` (#3111 ) Summary: Allow users merge the LoRA weights and then do a post training quantization with torchao Usage: ``` from torchao.quantization import Int8DynamicActivationInt8WeightConfig torchao_config = Int8DynamicActivationInt8WeightConfig() model.save_pretrained_torchao( save_path, tokenizer=tokenizer, torchao_config=torchao_config, ) ``` Test Plan: python tests/saving/test_unsloth_save.py Reviewers: Subscribers: Tasks: Tags:	2025-08-26 04:53:39 -07:00
Lei Zhenyuan	ac78311261	fix is casual for qwen3 (#3213 )	2025-08-26 04:45:20 -07:00
Daniel Han	f35077388d	Update vision.py	2025-08-22 04:02:59 -07:00
Daniel Han	a33ff972c1	Update loader.py	2025-08-22 04:02:19 -07:00
DoubleMathew	651970094d	adallow float32 dtype in FastLanguageModel (#3204 )	2025-08-21 16:54:39 -07:00
Daniel Han	2525052e4f	Bug fixes (#3195 ) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>	2025-08-20 07:39:43 -07:00
Daniel Han	dfb936743d	Update _utils.py	2025-08-19 23:43:58 -07:00
Michael Han	6a9f1ada59	Merge pull request #3187 from parth2510/fix-transformers-typo-extras Fix extras transformers typo in pyproject.toml	2025-08-19 15:06:11 -07:00
parth2510	3e9ef8024c	Fix extras transformers typo in pyproject.toml	2025-08-19 19:55:29 +05:30
Daniel Han	308f1b422b	Update pyproject.toml	2025-08-19 05:20:19 -07:00
Daniel Han	6fc745c731	Protobuf issue	2025-08-19 05:19:41 -07:00
Daniel Han	17a1e13b8b	Update rl.py	2025-08-19 05:04:17 -07:00
Daniel Han	7bf39fcef2	Update pyproject.toml	2025-08-19 03:24:02 -07:00
Daniel Han	5a8c81c4f9	Update _auto_install.py	2025-08-19 03:20:37 -07:00
Daniel Han	089a0056e2	Torch 2.8 (#3186 ) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>	2025-08-19 03:16:49 -07:00
Daniel Han	10f68527d8	Bug fixes (#3180 ) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>	2025-08-18 06:12:49 -07:00
andrewor14	bb7b2f40fc	Add support for QAT + LoRA (#2976 ) Summary: Quantization-aware training (QAT) helps mitigate quantization degradation by simulating quantization numerics in high precision during training (fake quantization). This PR combines QAT with LoRA by applying torchao's QAT support to the peft model. See the following for more details: - torchao QAT: https://github.com/pytorch/ao/blob/main/torchao/quantization/qat/README.md - torchtune QAT + LoRA: https://dev-discuss.pytorch.org/t/speeding-up-qat-by-1-89x-with-lora/2700 Current QAT schemes supported are: ``` fp8-fp8, targeting the torch.ops.fbgemm.f8i4bf16_shuffled kernel fp8-int4, targeting the torch.ops.fbgemm.f8f8bf16_rowwise kernel ``` Test Plan: ``` from unsloth import FastLanguageModel lora_rank = 32 model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/Qwen3-4B-Base", max_seq_length = 2048, load_in_4bit = False, fast_inference = False, max_lora_rank = lora_rank, ) model = FastLanguageModel.get_peft_model( model, r = lora_rank, target_modules = [ "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", ], lora_alpha = lora_rank*2, use_gradient_checkpointing = "unsloth", random_state = 3407, qat_scheme = "fp8-fp8", ) lora.Linear( (base_layer): FakeQuantizedLinear( in_features=2560, out_features=4096, bias=False (activation_fake_quantizer): FakeQuantizer(Float8FakeQuantizeConfig(dtype=torch.float8_e4m3fn, granularity=PerRow(), hp_value_lb=None, hp_value_ub=None)) (weight_fake_quantizer): FakeQuantizer(Float8FakeQuantizeConfig(dtype=torch.float8_e4m3fn, granularity=PerRow(), hp_value_lb=None, hp_value_ub=None)) ) ... ) ```	2025-08-18 05:56:35 -07:00
Ball	94c7392c40	fix original_push_to_hub fallback (#3115 ) Co-authored-by: root <root@LAPTOP-VEI2ITL9.localdomain>	2025-08-18 05:54:24 -07:00
RJ Nowling	36c563fffd	Replace back ticks with single quotes (#3157 ) Back ticks attempt to execute program and capture output. Should be using single quote marks.	2025-08-18 05:52:30 -07:00
Daniel Han	176805802f	Update _utils.py	2025-08-18 05:28:30 -07:00
Roland Tannous	208f68f164	fix save_to_gguf_generic quantization_method type error (#3173 )	2025-08-18 04:10:17 -07:00
Roland Tannous	bacfc57380	Convert generator expression to list to prevent potential bugs if the files variable is used multiple times in future modifications. (#3167 )	2025-08-18 04:10:08 -07:00
Daniel Han	19b2fa3ac8	Fix Blackwell	2025-08-18 03:46:39 -07:00
QL	ea4b7c2c6b	Update install instructions for latest vLLM release (#3175 ) 1. Removed the `--extra-index-url https://wheels.vllm.ai/nightly` from the uv install instructions because this causes it to crash; Removing that flag solves the issue and is more stable overall. Tested with RTX 5090 CUDA 12.8 on Linux. 2. Removed `uv pip install -U triton>=3.3.1` because triton 3.3.1 is already installed with the vllm command.	2025-08-18 03:39:05 -07:00
Daniel Han	7e1581b929	Nightly (#3169 ) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>	2025-08-15 05:03:38 -07:00
Daniel Han	b45a977c8d	Update loader.py	2025-08-15 04:08:21 -07:00
Daniel Han	64dff1456b	GPT OSS MXFP4 fix	2025-08-15 04:00:00 -07:00
Daniel Han	bbac3e3de7	Update tokenizer_utils.py	2025-08-14 19:30:33 -07:00
Daniel Han	b11d9cc935	Encoding UTF-8	2025-08-14 15:24:25 -07:00
Daniel Han	abbf1f0a43	Fix GPT OSS (#3154 ) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>	2025-08-14 05:06:58 -07:00
Daniel Han	6b5fb59eb7	Update loader.py	2025-08-13 06:33:39 -07:00
Daniel Han	806bee2433	Nightly (#3148 ) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>	2025-08-13 06:12:38 -07:00
Michael Han	413ec45d8b	Update README.md	2025-08-09 15:53:29 -07:00
Michael Han	230061cd48	Merge pull request #3120 from Etherll/patch-8825 Add Qwen3 4B to mapper.py	2025-08-08 14:31:31 -07:00
Etherll	b0004c0001	Update mapper.py	2025-08-08 23:27:21 +03:00
Michael Han	2a8ca1ef5a	Update README.md	2025-08-08 12:14:38 -07:00
Daniel Han	9975cf889a	Update pyproject.toml	2025-08-08 11:49:56 -07:00
Daniel Han	424e648005	Update _utils.py	2025-08-08 11:48:13 -07:00
Daniel Han	3fc4e4dff9	Update loader.py	2025-08-08 11:44:07 -07:00
Daniel Han	628bb6c97f	Update loader.py	2025-08-08 11:38:03 -07:00
Daniel Han	20e7c33550	Merge branch 'main' into nightly	2025-08-08 10:55:52 -07:00
Daniel Han	84ff585855	Update loader.py	2025-08-08 10:55:30 -07:00
Daniel Han	b277a09401	Update _utils.py	2025-08-08 09:59:24 -07:00
Daniel Han	5931011d6c	Update vision.py	2025-08-08 09:35:54 -07:00
Daniel Han	34a0f2901f	Update vision.py	2025-08-08 09:31:38 -07:00
Michael Han	22326b7b15	Merge pull request #3110 from Etherll/qwen3-chat-template Add Qwen3 Instruct / Thinking chat templates	2025-08-08 09:27:12 -07:00
Daniel Han	6220a4c700	Update _utils.py	2025-08-08 09:16:37 -07:00
Daniel Han	52c6458fdd	Update vision.py	2025-08-08 09:15:52 -07:00
Daniel Han	8372b318c5	Update chat_templates.py	2025-08-08 08:29:52 -07:00
Daniel Han	34776a6b84	Update vision.py	2025-08-08 03:35:56 -07:00
Daniel Han	f83f3f0efe	Update vision.py	2025-08-07 16:21:25 -07:00
Daniel Han	11b5136d99	Update vision.py	2025-08-07 16:19:31 -07:00
Daniel Han	82c4bd6005	Merge branch 'main' into nightly	2025-08-07 16:19:20 -07:00
Etherll	ab03d44d9b	Update chat_templates.py add qwen3-instruct/thinking-chat-template	2025-08-08 01:01:58 +03:00
Daniel Han	180917f148	Update loader.py	2025-08-07 09:38:49 -07:00
Daniel Han	aa96d65534	Update loader.py	2025-08-07 09:38:24 -07:00
Daniel Han	9889015485	Update mapper.py	2025-08-07 08:15:59 -07:00
Daniel Han	1a8eabebab	Update vision.py	2025-08-07 07:13:40 -07:00
Daniel Han	accb7b1ead	Update vision.py	2025-08-07 07:10:36 -07:00
Daniel Han	5a31edef5f	Update loader.py	2025-08-07 05:18:53 -07:00
Daniel Han	4d89527df6	Update vision.py	2025-08-07 05:03:21 -07:00
Daniel Han	d605b629ec	Update mapper.py	2025-08-07 04:59:19 -07:00
Daniel Han	a1746fc03e	Update pyproject.toml	2025-08-07 03:47:08 -07:00
Daniel Han	b7dcf7e5ed	Update pyproject.toml	2025-08-07 03:43:25 -07:00
Daniel Han	0171432daf	Update pyproject.toml	2025-08-07 03:29:44 -07:00
Daniel Han	382042f3b0	GPT OSS fixes	2025-08-07 02:57:19 -07:00
DoubleMathew	92c024df79	gpt-oss manually call temporary patch (#3104 ) Co-authored-by: Mathew Mathew <mathew@Mathews-MacBook-Pro.local>	2025-08-06 12:13:35 -07:00
Daniel Han	3cfdd5f59c	Merge branch 'main' into nightly	2025-08-06 06:56:44 -07:00
Daniel Han	806f926750	Nightly (#3102 ) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance` * Update _utils.py * Update vision.py	2025-08-06 06:40:59 -07:00
Daniel Han	45b6f6e860	Update vision.py	2025-08-06 06:37:03 -07:00
Daniel Han	57d858b27a	Update _utils.py	2025-08-06 06:34:07 -07:00
Daniel Han	31fb573e1e	Merge branch 'main' into nightly	2025-08-06 06:24:27 -07:00
Daniel Han	5fa56a5cda	GPT-OSS	2025-08-06 06:21:28 -07:00
DoubleMathew	9eab2cbc45	GPT-OSS support (#3099 )	2025-08-05 20:58:33 -07:00
Daniel Han	d3172a852b	Update __init__.py	2025-08-02 03:39:17 -07:00
Daniel Han	5202538bf5	Update	2025-08-02 03:36:05 -07:00
Daniel Han	c14b20010a	Merge branch 'main' into nightly	2025-08-02 03:35:03 -07:00
dongbin-lunark	eff74dc966	docs: Add WSL installation guide for xformers (#3079 )	2025-08-02 03:32:47 -07:00
DoubleMathew	8a69a68ece	get_per_token_logps_and_entropies: return tuple instead of dict (#3080 )	2025-08-02 03:31:41 -07:00
Datta Nimmaturi	d75d1dce6d	fixup rope sync for everything (#3061 )	2025-08-02 03:30:34 -07:00
Daniel Han	8e040e5870	Merge branch 'main' into nightly	2025-07-29 03:07:00 -07:00
Daniel Han	7b8505b4b7	Update _utils.py	2025-07-29 02:19:43 -07:00
Daniel Han	d0ce49732a	Update rl_replacements.py	2025-07-29 01:59:08 -07:00
Daniel Han	121785dd23	Update rl.py	2025-07-29 01:32:39 -07:00
Daniel Han	b9bcce89e9	Update pyproject.toml	2025-07-29 01:03:19 -07:00
Daniel Han	461ff94bcc	Merge branch 'main' of https://github.com/unslothai/unsloth	2025-07-29 00:43:52 -07:00
Daniel Han	c2540a1897	Merge branch 'pr/3055'	2025-07-29 00:43:19 -07:00
Etherll	b14841facf	Add gemma-3n chat template to chat_templates.py (#3051 ) * Update chat_templates.py * Update chat_templates.py	2025-07-29 00:39:38 -07:00
Daniel Han	4968d3d059	Merge branch 'pr/3052'	2025-07-29 00:38:50 -07:00
Daniel Han	284f7ed538	Fix TRL 0.20.0	2025-07-29 00:36:25 -07:00
Daniel Han	96b1b5ac2b	Merge branch 'main' of https://github.com/unslothai/unsloth	2025-07-29 00:35:41 -07:00
Sekinal	5899d5441b	Fixed wrong syntax in f-string for exception	2025-07-28 23:16:05 -06:00
Sekinal	55fc4a53f7	Fix: Added specific check for Gemma so models like BERT properly initialize	2025-07-28 22:47:58 -06:00
Etherll	4c3139c069	Update loader.py	2025-07-28 22:15:23 +03:00
Etherll	c2b06fc279	Update _utils.py	2025-07-28 21:48:36 +03:00
Etherll	49d75674f4	Update vision.py	2025-07-28 20:55:07 +03:00
Etherll	b98948db85	Update loader.py	2025-07-28 20:53:38 +03:00
Daniel Han	e812382eaf	Update _utils.py	2025-07-28 08:28:18 -07:00
Datta Nimmaturi	9deeaeebeb	Fixup multi GPU workload. (#3049 ) * sync all instead * sync after move and rope init instead * sync after rope inside * Return new tensors and no sync * Sync only current stream * Fixup mask for xformers * sync for prefill only * clean up	2025-07-28 03:04:49 -07:00
Daniel Han	aec983ea3f	Merge branch 'main' into nightly	2025-07-24 23:40:23 -07:00
Edd	8e994f07fb	Fix Llama and Gemma inference (#3034 ) * Fix Llama and Gemma inference * Add simple quality life for CUDA link error (which is not captured since we bypass all error)	2025-07-24 23:38:20 -07:00
Daniel Han	ffbd2deaea	Merge branch 'main' into nightly	2025-07-23 06:08:35 -07:00
Daniel Han	aa391aef66	Update _utils.py	2025-07-23 06:08:31 -07:00
Daniel Han	cc8fe6908b	Merge branch 'main' into nightly	2025-07-23 06:04:39 -07:00
Daniel Han	a0836ffdaf	Update pyproject.toml	2025-07-23 06:04:35 -07:00
Daniel Han	08f577854f	Merge branch 'main' into nightly	2025-07-23 05:53:09 -07:00
Daniel Han	d27e4e44d1	Fix torch compile issues (#3028 ) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance`	2025-07-23 05:52:28 -07:00
Daniel Han	fec1b2d5f6	Fix `set_stance`	2025-07-23 05:19:08 -07:00
Daniel Han	56cc02b230	Update gemma2.py	2025-07-23 03:27:52 -07:00
Daniel Han	a5f26f4f76	Update rope_embedding.py	2025-07-23 03:26:16 -07:00
Daniel Han	11d8e5fe53	Cleanup	2025-07-23 03:23:23 -07:00
Daniel Han	bf8049c1c9	check stride	2025-07-23 02:52:29 -07:00
Daniel Han	8fd8a051a9	Merge branch 'main' into nightly	2025-07-23 02:51:00 -07:00
DoubleMathew	f2ef5bd16b	falcon force float32 on sm<75 machines (#3026 )	2025-07-22 13:18:42 -07:00
Daniel Han	282ae72862	Fix Gemma 2 (#3024 ) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py	2025-07-22 04:43:43 -07:00
Daniel Han	e402cc2a74	Update _utils.py	2025-07-22 04:42:51 -07:00
Daniel Han	d23a96b060	Update _utils.py	2025-07-22 04:42:06 -07:00
Daniel Han	9884e991c0	Update _utils.py	2025-07-22 04:39:04 -07:00
Daniel Han	5906e612a0	Merge branch 'main' into nightly	2025-07-22 03:34:55 -07:00
Daniel Han	35dada2a99	Update llama.py	2025-07-22 03:34:51 -07:00
Lei Zhenyuan	7123849cf8	[intel] add for intel path for llama.py (#3012 ) * fix for intel path * remove unuse code * Update unsloth/models/llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-07-22 03:33:26 -07:00
Lei Zhenyuan	f3e41d0b1e	fix for casual mask (#3011 )	2025-07-22 03:27:57 -07:00
Daniel Han	673476393b	Merge branch 'main' into nightly	2025-07-21 05:35:09 -07:00
Daniel Han	80e7af5b9f	Bug fixes (#3017 ) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning	2025-07-21 05:30:14 -07:00
Daniel Han	7e07941383	versioning	2025-07-21 05:15:48 -07:00
Daniel Han	06f1b961f6	Fix `quantization_method`	2025-07-21 05:14:17 -07:00
Daniel Han	bef2b47599	Update llama.py	2025-07-20 03:19:10 -07:00
Daniel Han	48296987a3	Update llama.py	2025-07-20 03:17:02 -07:00
Daniel Han	9c6b199716	Update synthetic.py	2025-07-20 00:10:41 -07:00
Daniel Han	27503af0dc	Update synthetic.py	2025-07-19 03:36:53 -07:00
Daniel Han	75f615891a	Update synthetic.py	2025-07-19 03:18:18 -07:00
Daniel Han	6a65ee478c	Update synthetic.py	2025-07-19 03:11:33 -07:00
Daniel Han	36ba3c7c69	Update synthetic.py	2025-07-19 02:54:28 -07:00
Daniel Han	bb0abf54df	Merge branch 'main' into nightly	2025-07-19 01:29:41 -07:00
Quentin Gallouédec	580b5bca11	Update README.md (#2991 ) * Update README.md * Update README.md	2025-07-18 15:43:59 -07:00
Daniel Han	f32ee75b45	Merge branch 'main' into nightly	2025-07-18 05:37:10 -07:00
Daniel Han	67b16ae5c0	Bug fixes (#2998 ) * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix	2025-07-18 05:36:15 -07:00
Daniel Han	83892cd097	skip_guard_eval_unsafe fix	2025-07-18 05:31:21 -07:00
Daniel Han	ce6a73986d	Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990 ) This reverts commit `4021da634a`.	2025-07-17 15:37:23 -07:00
Daniel Han	5ee84edade	Merge branch 'main' into nightly	2025-07-17 15:36:24 -07:00
Daniel Han	4021da634a	Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model merge erro…" (#2988 ) This reverts commit `4565698ca5`.	2025-07-17 15:35:28 -07:00
Roland Tannous	4565698ca5	Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model merge error (#2986 )	2025-07-17 15:35:05 -07:00
DoubleMathew	6cdc11816d	use fastmodel (#2987 )	2025-07-17 15:34:14 -07:00
Quentin Gallouédec	2960bf8d94	Update unsloth-cli.py (#2985 )	2025-07-17 15:08:38 -07:00
Daniel Han	3824a6ad78	Bug fixes (#2982 ) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py	2025-07-17 07:02:38 -07:00
Daniel Han	06da87c79a	Update rl_replacements.py	2025-07-17 06:54:25 -07:00
Daniel Han	9d8f2e4c83	Update rl_replacements.py	2025-07-17 06:51:59 -07:00
Daniel Han	8d983b01ff	Update rl_replacements.py	2025-07-17 06:49:21 -07:00
Daniel Han	640032f115	Update rl_replacements.py	2025-07-17 06:47:23 -07:00
Daniel Han	2ae069ab2d	Update rl_replacements.py	2025-07-17 06:44:28 -07:00
Daniel Han	5ad0f54c2d	Update rl_replacements.py	2025-07-17 06:44:13 -07:00
Daniel Han	23dbf731a9	Update rl_replacements.py	2025-07-17 06:39:59 -07:00
Daniel Han	6c1a57ee4d	Update rl.py	2025-07-17 06:30:26 -07:00
Daniel Han	f6b6ab7a1b	Update rl_replacements.py	2025-07-17 06:26:10 -07:00
Daniel Han	0a67f44bb6	Update rl_replacements.py	2025-07-17 06:25:21 -07:00
Daniel Han	e66792eb05	Update rl_replacements.py	2025-07-17 06:24:33 -07:00
Daniel Han	771d5ff25f	Update rl_replacements.py	2025-07-17 06:23:37 -07:00
Daniel Han	72e2debbd6	Update pyproject.toml	2025-07-17 05:38:04 -07:00
Daniel Han	f3606ea3d9	Update pyproject.toml	2025-07-17 05:26:58 -07:00
Daniel Han	1185706555	Merge branch 'main' into nightly	2025-07-17 05:14:52 -07:00
Daniel Han	2c468550e6	Revert "GRPO Fix - Support vllm pre-dequantized quantization states in fast_dequantize kernel (#2943 )" This reverts commit `1cefffa2d2`.	2025-07-17 05:02:08 -07:00
Daniel Han	03c880f5da	Update _utils.py	2025-07-17 02:08:49 -07:00
Daniel Han	c88758124a	compiler stance	2025-07-17 01:40:49 -07:00
Daniel Han	aa8e172396	Update vision.py	2025-07-17 01:19:07 -07:00
Daniel Han	b112838f58	Update vision.py	2025-07-17 00:33:27 -07:00
Daniel Han	98cebf3d06	Update _utils.py	2025-07-14 02:45:19 -07:00
Daniel Han	e718c27474	Merge branch 'main' into nightly	2025-07-14 02:44:42 -07:00
Roland Tannous	1cefffa2d2	GRPO Fix - Support vllm pre-dequantized quantization states in fast_dequantize kernel (#2943 ) * Support pre-dequantized quantization states in fast_dequantize kernel * has_nested_quant conditional set to only * Update utils.py * Update utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-07-14 02:41:15 -07:00
Roland Tannous	32738da0d1	fix dataloader_num_workers value error in GRPOTrainer (#2944 )	2025-07-14 01:43:33 -07:00
Muzammil Khan	1eaa52ae55	fix: change lora_dropout from int to float for type consistency (#2949 ) Fixes "Argument of type 'float' cannot be assigned to parameter 'lora_dropout' of type 'int'" error by ensuring lora_dropout is consistently a float (0.0) rather than int (0) across vision.py, llama.py, and unsloth-cli.py	2025-07-14 01:42:07 -07:00
Datta Nimmaturi	665a8e4b1d	Fix falcon H1 dropout issue (#2938 ) Because we don't have down and gate multipliers, the MLP output values are too huge, causing NaN and unstable training. To bypass that lets rely on HF's implementation for the time being	2025-07-12 15:53:07 -07:00
DoubleMathew	bbcba7fc21	patch falcon h1 inference (#2932 )	2025-07-12 15:52:24 -07:00
Daniel Han	01f649f6fc	Update rl.py	2025-07-11 03:13:55 -07:00
Daniel Han	7c8cd3dd6d	Update rl.py	2025-07-11 03:11:07 -07:00
Daniel Han	7d2106e5c6	Merge branch 'main' into nightly	2025-07-11 03:04:28 -07:00
Daniel Han	f155097cd8	Uninitialized handler	2025-07-11 03:04:17 -07:00
Daniel Han	681b10dc0c	Fixes	2025-07-11 00:01:37 -07:00
Daniel Han	2dae012308	Update llama.py	2025-07-10 17:18:40 -07:00
Daniel Han	17f7447a39	Merge branch 'main' into nightly	2025-07-10 17:12:59 -07:00
Daniel Han	9071adb723	Fix GRPO	2025-07-10 17:12:49 -07:00
Michael Han	6503961a33	Merge pull request #2929 from rolandtannous/fix/fix-grpo-get-per-token-logps-argument-mismatch Fix argument mismatch in GRPO _get_per_token_logps lambda function	2025-07-10 14:30:05 -07:00
Roland Tannous	8d6da15c2e	Fix argument mismatch in GRPO _get_per_token_logps lambda function	2025-07-10 18:24:53 +00:00
Daniel Han	07df9c1233	Merge branch 'main' into nightly	2025-07-10 07:03:59 -07:00
Daniel Han	5a2dcc924b	Many bug fixes (#2927 ) * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2025-07-10 07:03:48 -07:00
Daniel Han	c73406d8ba	Update __init__.py	2025-07-10 07:03:28 -07:00
Daniel Han	ee93391155	versioning	2025-07-10 07:01:44 -07:00
Daniel Han	509cf61834	Update vision.py	2025-07-10 05:15:03 -07:00
Daniel Han	d3bd189e47	Update vision.py	2025-07-10 04:34:23 -07:00
Daniel Han	ace8de011d	Small fixes	2025-07-10 04:04:51 -07:00
Daniel Han	d4a302ff3d	Merge branch 'main' into nightly	2025-07-10 04:02:01 -07:00
Datta Nimmaturi	c77f6c3719	Move inputs to right devices. (#2919 ) * Move tensors to right devices * fix multi gpu for non mistral models * multi GPU RoPE for gemma2 * Finish up multi GPU inference * Make multiGPU rope a list * Remove unnecessary transfer to CPU * Remove unnecessary move to CPU * Donot move inputs to device yet will be handled separately in another PR * Move inputs to appropriate decoder device * Make device count global variable * Cleanup RoPE device code * Fixup num_gpu to device count * Cleanup device counts * Use device index for RoPE get_cache * Donot typecast * Use tuple instead of list for tensors. Use device index directly * fixup move to device logic	2025-07-10 04:01:03 -07:00
Daniel Han	13a32054b7	Merge branch 'main' into nightly	2025-07-10 01:50:14 -07:00
Daniel Han	62c5c315dd	Merge branch 'main' of https://github.com/unslothai/unsloth	2025-07-10 01:50:04 -07:00
Daniel Han	87e1a933d8	Update llama.py	2025-07-10 01:50:03 -07:00
DoubleMathew	643f9b068b	if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913 )	2025-07-09 23:29:41 -07:00
Daniel Han	20f665a98a	Update __init__.py	2025-07-09 16:30:57 -07:00
Datta Nimmaturi	ced87c6059	Explicitly check if xformers exists for attention (#2889 )	2025-07-09 14:15:35 -07:00
Lei Zhenyuan	6a36b6e1fc	[bugs] fix for casual mask (#2868 ) * fix for casual mask * use un_casual in sdpa * add missing mask * fix for type	2025-07-09 14:10:25 -07:00
Lei Zhenyuan	74b9feb674	add intel gpu with vllm support (#2903 )	2025-07-09 14:08:38 -07:00
Datta Nimmaturi	772f15ca49	Dynamically adjust get_per_token_logps function and patch as well (#2911 )	2025-07-09 14:07:33 -07:00
DoubleMathew	c4901fd894	silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912 )	2025-07-09 14:05:41 -07:00
Daniel Han	1fb1b72ae1	Merge branch 'main' into nightly	2025-07-09 14:03:38 -07:00
Daniel Han	7dde992481	Many bug fixes (#2908 ) * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2025-07-09 07:14:18 -07:00
Daniel Han	3cd089ec02	Update __init__.py	2025-07-09 07:00:17 -07:00
Daniel Han	8cf37d23dc	Update __init__.py	2025-07-09 03:58:13 -07:00
Daniel Han	bf7bf02f3e	Update _utils.py	2025-07-07 04:56:04 -07:00
Daniel Han	45c73f7c36	Merge branch 'main' into nightly	2025-07-06 23:01:53 -07:00
Daniel Han	735466b102	versioning	2025-07-06 22:58:14 -07:00
Daniel Han	924283602a	Merge branch 'main' into nightly	2025-07-06 22:44:43 -07:00
Daniel Han	538558df53	MoE kernels AGPLv3	2025-07-06 22:44:35 -07:00
Daniel Han	6c98d33275	Merge branch 'main' into nightly	2025-07-06 22:40:31 -07:00
Daniel Han	f0a9442a06	Update README.md (#2885 )	2025-07-05 02:07:20 -07:00
Michael Han	78e17304a0	Update README.md Updating icon sizes	2025-07-04 15:50:31 -07:00
Michael Han	9ecc97a67c	Update README.md Editing icon sizes	2025-07-04 15:37:44 -07:00
DoubleMathew	e858f08047	only warn about prepare causal attention mask when transformers<=4.52.4 (#2867 )	2025-07-04 01:54:27 -07:00
Daniel Han	68c279eaef	Merge branch 'main' into nightly	2025-07-03 15:55:30 -07:00
Michael Han	eb97606c53	Merge pull request #2873 from Erland366/fix/unslothtrainingarguments Fix `UnslothTrainingArguments` not patching `trl.Config` properly	2025-07-03 14:00:12 -07:00
Erland366	7a306bfbdc	Initialize parent class in UnslothTrainingArguments constructor	2025-07-03 15:47:58 +00:00
Erland366	d2d3875596	Refactor UnslothTrainingArguments to initialize embedding_learning_rate in constructor	2025-07-03 15:47:10 +00:00
Erland366	fe5358bacc	Refactor UnslothTrainingArguments to support fallback for TrainingArguments import	2025-07-03 15:39:56 +00:00
Erland366	80206ec1eb	Always use SFTConfig	2025-07-03 14:08:04 +00:00
Daniel Han	172084f14f	Merge branch 'main' into nightly	2025-07-02 18:02:57 -07:00
DoubleMathew	bd32b2fd66	Update CSM for faster inference (no compile) (#2865 )	2025-07-02 16:59:56 -07:00
Roland Tannous	59b30f335c	fix quantized model parameter count method (#2855 ) * fix quantized model parameter count method * function cleanup * parameter space cleanup	2025-07-01 23:36:59 -07:00
Michael Han	f4a922fc6f	Update README.md	2025-07-01 09:28:59 -07:00
Daniel Han	ed969515ee	Merge branch 'main' into nightly	2025-07-01 07:01:42 -07:00
Daniel Han	9ad09f9ab2	Fix Gemma 3N (#2854 ) * add llama model registration * fix quant tag mapping * add qwen2.5 models to registry * add option to include original model in registry * handle quant types per model size * separate registration of base and instruct llama3.2 * add QwenQVQ to registry * add gemma3 to registry * add phi * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2025-07-01 07:01:31 -07:00
Daniel Han	d11d7a2c16	Update vision.py	2025-07-01 06:59:30 -07:00
Daniel Han	0b1545d218	Update pyproject.toml	2025-07-01 06:57:58 -07:00
Daniel Han	93487dbf81	Update vision.py	2025-07-01 04:27:40 -07:00
Daniel Han	34464fa436	Update vision.py	2025-07-01 02:49:48 -07:00
Daniel Han	48986eb6c4	Merge branch 'main' into nightly	2025-07-01 02:47:20 -07:00
Daniel Han	00c947798d	Update vision.py	2025-07-01 02:47:17 -07:00
Daniel Han	c2ecdecd5c	Merge branch 'main' into nightly	2025-07-01 02:37:07 -07:00
Daniel Han	a98505a5ae	Update _utils.py	2025-07-01 02:36:54 -07:00
Daniel Han	19448b2517	Update pyproject.toml	2025-07-01 01:34:17 -07:00
Daniel Han	622f3fc85a	Move AMD to AMD branch	2025-07-01 01:10:40 -07:00
Daniel Han	f1e1b890ac	Move AMD to AMD branch	2025-07-01 01:02:51 -07:00
Daniel Han	9886a2a3a0	subprocess	2025-07-01 00:51:04 -07:00
Daniel Han	46bc57ce49	Update setup.py	2025-07-01 00:34:27 -07:00
Daniel Han	b4b75a9ea8	Update setup.py	2025-07-01 00:32:23 -07:00
Daniel Han	95d5d7dbcc	Update setup.py	2025-07-01 00:27:34 -07:00
Daniel Han	4f7185dc08	Update setup.py	2025-07-01 00:26:34 -07:00
Daniel Han	725c4616ef	Update setup.py	2025-07-01 00:19:25 -07:00
Daniel Han	71f7bb26df	Update setup.py	2025-07-01 00:18:22 -07:00
Daniel Han	ccd15f09bc	Update setup.py	2025-07-01 00:14:58 -07:00
Daniel Han	6093e18e26	Cmake and ninja move	2025-07-01 00:13:00 -07:00
Daniel Han	9d0f44cdf6	Fix setup.py	2025-07-01 00:05:56 -07:00
Daniel Han	507b5c41e4	Update _utils.py	2025-06-30 23:13:33 -07:00
Daniel Han	fba0bff2f4	Remove stale bot	2025-06-30 23:11:57 -07:00
Daniel Han	97a5f2c7e1	Update pyproject.toml	2025-06-30 21:01:55 -07:00
Daniel Han	c6155cc6d3	Update pyproject.toml	2025-06-30 21:01:32 -07:00
Daniel Han	6e612e3aa7	Update pyproject.toml	2025-06-30 21:00:49 -07:00
Daniel Han	3aec8de53b	Update pyproject.toml	2025-06-30 20:25:46 -07:00
Daniel Han	713b59a04a	Update pyproject.toml	2025-06-30 20:01:13 -07:00
Daniel Han	a409f2a430	Update pyproject.toml	2025-06-30 20:00:20 -07:00
Daniel Han	2b50604876	Update setup.py	2025-06-30 19:50:15 -07:00
Daniel Han	c7d765c425	Update setup.py	2025-06-30 19:48:00 -07:00
Daniel Han	e21ac3c3ba	Update setup.py	2025-06-30 19:47:22 -07:00
Daniel Han	719a626af0	Prints	2025-06-30 19:35:59 -07:00
Daniel Han	28a968d118	setup.py	2025-06-30 19:32:32 -07:00
Daniel Han	d7ac7f46c0	Fix setup.py	2025-06-30 19:25:49 -07:00
Daniel Han	021d1f78fc	Merge branch 'main' into nightly	2025-06-30 16:57:19 -07:00
billishyahao	25d73efe8a	[Feature] enable unsloth on amd gpu (#2520 ) * [Feature] enable unsloth on amd gpu * fix the comment --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-06-30 16:52:05 -07:00
Rishabh	3ff04c3d09	Convert torch.bfloat16, torch.float16, etc. to vLLM valid dtypes (#2811 ) * Convert torch.bfloat16, torch.float16, etc. to vLLM valid dtypes * removed newlines and extra whitespace	2025-06-30 16:39:36 -07:00
DoubleMathew	35b09e2d2a	Fix loftq None config for FastBaseModel (#2848 ) add new validate_loftq_config to __all__	2025-06-30 16:38:51 -07:00
Daniel Han	e1693c1caf	Gemma 3N bug fixes (#2842 ) * Update vision.py * Bug fix * Update mapper.py * check SDPA for Mistral 3, Pixtral * Update vision.py * Versioning * Update rl_replacements.py * Update README.md * add model registry * move hf hub utils to unsloth/utils * refactor global model info dicts to dataclasses * fix dataclass init * fix llama registration * remove deprecated key function * start registry reog * add llama vision * quant types -> Enum * remap literal quant types to QuantType Enum * add llama model registration * fix quant tag mapping * add qwen2.5 models to registry * add option to include original model in registry * handle quant types per model size * separate registration of base and instruct llama3.2 * add QwenQVQ to registry * add gemma3 to registry * add phi * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py --------- Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2025-06-30 07:15:48 -07:00
Daniel Han	21ec11eaa4	Update vision.py	2025-06-30 07:07:38 -07:00
Daniel Han	b8e661b39e	Update loader.py	2025-06-30 07:06:35 -07:00
Daniel Han	1444b1011a	Update vision.py	2025-06-30 07:03:36 -07:00
Daniel Han	ac09d15750	Update vision.py	2025-06-30 06:53:28 -07:00
Daniel Han	743a469d9f	Gemma 3N fixes	2025-06-30 06:51:49 -07:00
Daniel Han	e9257e5ecd	Versioning	2025-06-30 06:08:14 -07:00
Daniel Han	244d97d1b6	Update loader.py	2025-06-30 06:04:19 -07:00
Roland Tannous	ed93ec6049	Added conda/mamba section to blackwell installation readme (#2817 ) * Added conda/mamba section to blackwell installation readme * fix conda creation suffix and vllm install syntax	2025-06-30 06:02:57 -07:00
Daniel Han	3b998dbe9d	Gemma 3N fixes	2025-06-30 06:01:50 -07:00
Daniel Han	ee54d928f6	Merge branch 'main' into nightly	2025-06-30 05:52:19 -07:00
Daniel Han	b0088817cd	Update stale.yml	2025-06-30 02:16:09 -07:00
Daniel Han	95d2bdbec3	Create stale.yml (#2836 )	2025-06-29 21:59:43 -07:00
Daniel Han	550f19fc0d	Delete stale.yml	2025-06-29 21:58:55 -07:00
Daniel Han	901c3216d0	Update stale.yml	2025-06-29 21:57:30 -07:00
Daniel Han	cc69f5c3cb	Create stale.yml (#2832 )	2025-06-29 17:36:23 -07:00
Daniel Han	4d4d5e6da2	Delete stale.yml	2025-06-29 17:35:01 -07:00
Daniel Han	018f7677a5	Update stale.yml	2025-06-29 17:29:36 -07:00
Daniel Han	c396630163	Create stale.yml	2025-06-29 17:28:18 -07:00
Daniel Han	a59da3065d	Merge branch 'main' into nightly	2025-06-29 16:38:27 -07:00
Mehmet Oguz Derin	32eaa27b1a	Fix LoftQ with FastBaseModel (#2826 ) Pass `init_lora_weights` and `loftq_config` to `LoraConfig` constructor, which enables classes like `FastModel` to use LoftQ support. Thank you very much in advance!	2025-06-29 16:14:02 -07:00
Daniel Han	6cd3980099	gradient checkpointing	2025-06-29 03:19:20 -07:00
Daniel Han	a4a15aa9f3	Merge branch 'main' into nightly	2025-06-28 17:44:21 -07:00
DoubleMathew	f02a29e017	import undefined transformers_version for falcon model (#2822 ) * import undefined transformers_version for falcon model fixed falcon transformers version check and added error handling for FalconH1Attention bad import * Also, conditionally load module from falcon_h1 depending on if the transformers version supports is	2025-06-28 17:41:19 -07:00
DoubleMathew	cbc80ca97c	granite force layernorm upcast (#2799 )	2025-06-28 05:27:56 -07:00
Dhia Eddine Rhaiem	302691a5c1	Add falcon h1 (#2650 ) * add falcon h1 * feat: add Falcon-H1 into unsloth * address comments * fix * Update unsloth/models/llama.py Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update unsloth/models/llama.py Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fixes * fix comments --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Younes B <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Younes Belkada <younes.belkada@tii.ae> Co-authored-by: ilyasch2 <ilyas.chahed@tii.ae>	2025-06-28 04:55:13 -07:00
jeromeku	dd50a76af8	add instructions for installing on blackwell (#2812 )	2025-06-27 04:54:55 -07:00
Daniel Han	6f74526a98	Update vision.py	2025-06-27 04:12:10 -07:00
Daniel Han	bf032eab0b	Merge branch 'main' into nightly	2025-06-27 03:10:11 -07:00
Daniel Han	1aa4fa6fe7	Update loader.py	2025-06-26 12:03:29 -07:00
Daniel Han	3a37cccc6b	Update _utils.py	2025-06-26 11:31:17 -07:00
Daniel Han	ad0dbb9616	Update loader.py	2025-06-26 11:31:08 -07:00
Daniel Han	4ee91eb5ad	Merge branch 'main' into nightly	2025-06-26 10:30:51 -07:00
Daniel Han	c3c2fa2e1b	Update pyproject.toml	2025-06-26 09:15:14 -07:00
Daniel Han	e49e2e13f0	Gemma 3N (#2809 ) * Add QLoRA Train and Merge16bit Test (#2130) * add reference and unsloth lora merging tests * add test / dataset printing to test scripts * allow running tests from repo root * add qlora test readme * more readme edits * ruff formatting * additional readme comments * forgot to add actual tests * add apache license * Update pyproject.toml * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Revert * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Bug fix * Update mapper.py * check SDPA for Mistral 3, Pixtral * Update vision.py * Versioning * Update rl_replacements.py * Update README.md * add model registry * move hf hub utils to unsloth/utils * refactor global model info dicts to dataclasses * fix dataclass init * fix llama registration * remove deprecated key function * start registry reog * add llama vision * quant types -> Enum * remap literal quant types to QuantType Enum * add llama model registration * fix quant tag mapping * add qwen2.5 models to registry * add option to include original model in registry * handle quant types per model size * separate registration of base and instruct llama3.2 * add QwenQVQ to registry * add gemma3 to registry * add phi * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2025-06-26 09:14:28 -07:00
Daniel Han	48f4ac7888	Update _utils.py	2025-06-26 09:13:20 -07:00
Daniel Han	de24fd3191	Update loader.py	2025-06-26 09:11:30 -07:00
Daniel Han	65102f9b75	Update vision.py	2025-06-26 09:01:07 -07:00
Daniel Han	9298d90853	Update loader.py	2025-06-26 08:57:06 -07:00
Daniel Han	43fb58672b	Update vision.py	2025-06-26 08:54:51 -07:00
Daniel Han	3023dc63aa	Update mapper.py	2025-06-26 08:53:47 -07:00
Daniel Han	71b910a769	Update loader.py	2025-06-26 08:51:09 -07:00
Daniel Han	d82ebea900	Update mapper.py	2025-06-26 08:37:41 -07:00
Daniel Han	83e8b47a0b	Merge branch 'main' into nightly	2025-06-26 04:41:55 -07:00
Daniel Han	9746799feb	Bug fixes (#2807 ) * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update vision.py * HF Transfer * fix(utils): add missing importlib import to fix NameError (#2134) This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled. By adding the missing import statement, the code will no longer throw a NameError. * Add QLoRA Train and Merge16bit Test (#2130) * add reference and unsloth lora merging tests * add test / dataset printing to test scripts * allow running tests from repo root * add qlora test readme * more readme edits * ruff formatting * additional readme comments * forgot to add actual tests * add apache license * Update pyproject.toml * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Revert * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Bug fix * Update mapper.py * check SDPA for Mistral 3, Pixtral * Update vision.py * Versioning * Update rl_replacements.py * Update README.md * add model registry * move hf hub utils to unsloth/utils * refactor global model info dicts to dataclasses * fix dataclass init * fix llama registration * remove deprecated key function * start registry reog * add llama vision * quant types -> Enum * remap literal quant types to QuantType Enum * add llama model registration * fix quant tag mapping * add qwen2.5 models to registry * add option to include original model in registry * handle quant types per model size * separate registration of base and instruct llama3.2 * add QwenQVQ to registry * add gemma3 to registry * add phi * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py --------- Co-authored-by: naliazheli <nalia0316@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2025-06-26 04:40:47 -07:00
Daniel Han	4457366562	Update vision.py	2025-06-26 03:55:07 -07:00
Daniel Han	4663ba3be0	Update _utils.py	2025-06-26 03:39:13 -07:00
Daniel Han	388f0203df	Merge branch 'main' into nightly	2025-06-26 03:16:08 -07:00
Daniel Han	6a83bb53a5	Bug fixes (#2805 ) * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update vision.py * HF Transfer * fix(utils): add missing importlib import to fix NameError (#2134) This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled. By adding the missing import statement, the code will no longer throw a NameError. * Add QLoRA Train and Merge16bit Test (#2130) * add reference and unsloth lora merging tests * add test / dataset printing to test scripts * allow running tests from repo root * add qlora test readme * more readme edits * ruff formatting * additional readme comments * forgot to add actual tests * add apache license * Update pyproject.toml * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Revert * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Bug fix * Update mapper.py * check SDPA for Mistral 3, Pixtral * Update vision.py * Versioning * Update rl_replacements.py * Update README.md * add model registry * move hf hub utils to unsloth/utils * refactor global model info dicts to dataclasses * fix dataclass init * fix llama registration * remove deprecated key function * start registry reog * add llama vision * quant types -> Enum * remap literal quant types to QuantType Enum * add llama model registration * fix quant tag mapping * add qwen2.5 models to registry * add option to include original model in registry * handle quant types per model size * separate registration of base and instruct llama3.2 * add QwenQVQ to registry * add gemma3 to registry * add phi * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning --------- Co-authored-by: naliazheli <nalia0316@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2025-06-26 02:17:12 -07:00
Daniel Han	83c0e52e90	versioning	2025-06-26 02:13:49 -07:00
Daniel Han	c6e8c516a5	Update rl_replacements.py	2025-06-26 01:57:56 -07:00
Daniel Han	0922ec5544	Update llama.py	2025-06-26 01:56:55 -07:00
Daniel Han	5f28dbe8e4	Merge branch 'main' into nightly	2025-06-26 01:51:20 -07:00
Datta Nimmaturi	e402be69b7	Fix grpo sleep regex and indentation (#2804 )	2025-06-26 01:50:47 -07:00
Lei Zhenyuan	48d51bac5f	fix for inductor no attribute prop.multi_processor_count (#2803 )	2025-06-26 01:44:15 -07:00
Daniel Han	29e4870a45	Update vision.py	2025-06-26 01:27:16 -07:00
Daniel Han	2e9b504b28	Update rl_replacements.py	2025-06-26 01:12:21 -07:00
Daniel Han	5e99e7467f	Update rl_replacements.py	2025-06-26 00:56:05 -07:00
Daniel Han	f41bfbc092	Remove debugging	2025-06-26 00:41:37 -07:00
Daniel Han	e1ca077164	Update rl_replacements.py	2025-06-26 00:35:20 -07:00
Daniel Han	c2a901493f	Update rl_replacements.py	2025-06-26 00:27:17 -07:00
Daniel Han	33f20f0289	Generic efficient GRPO	2025-06-26 00:10:07 -07:00
DoubleMathew	c928612ee0	[4/N] Enable intel GPU for unsloth (#2801 ) * add code for xpu llama * refine code * change version check to 2.6.0 * remove unuse blank * reslove commits * Cleaned up statistics printing * Update unsloth/models/_utils.py --------- Co-authored-by: lei,zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-06-25 20:28:07 -07:00
Daniel Han	b526f8b091	Update rl_replacements.py	2025-06-25 20:13:17 -07:00
Daniel Han	2a67486109	Update rl_replacements.py	2025-06-25 19:51:43 -07:00
Daniel Han	5dea7d9076	Update rl_replacements.py	2025-06-25 18:53:19 -07:00
Daniel Han	ba9e9dcef8	Update rl_replacements.py	2025-06-25 18:51:54 -07:00
Daniel Han	2414e577da	Update rl_replacements.py	2025-06-25 18:50:45 -07:00
Daniel Han	cda0aacb5c	Merge branch 'main' into nightly	2025-06-25 16:50:18 -07:00
Daniel Han	7dc1ce9eb3	Update llama.py	2025-06-25 02:09:56 -07:00
Daniel Han	cd1aff0222	Update llama.py	2025-06-25 01:57:15 -07:00
Daniel Han	537d4b217c	Debugging only	2025-06-25 01:44:07 -07:00
Michael Han	b017f2395a	Update README.md Updating links	2025-06-25 01:32:24 -07:00
Daniel Han	404052510b	Merge branch 'main' into nightly	2025-06-24 02:03:47 -07:00
Lei Zhenyuan	dcf26ac3fb	[3/N] Enable intel GPU for unsloth (#2620 ) * enable intel xpu changes within kernels * reslove torch.version < 2.6 * change version check to 2.6.0 * resolve comments for torch_gpu_device * resolve amp fwd comments * fix typo * change cuda default logic * clean this pr * add HAS_CUDA_STREAM as default False * split GPU streams to cuda and xpu streams * add optional	2025-06-24 02:01:28 -07:00
Daniel Han	9476eccb31	Merge branch 'main' of https://github.com/unslothai/unsloth into nightly	2025-06-24 01:36:03 -07:00
Daniel Han	48ccca95e8	Merge branch 'main' into nightly	2025-06-24 01:36:02 -07:00
DoubleMathew	0a14b795d0	move min_sms in is_big_gpu inside DEVICE_TYPE if else (#2792 ) log is not defined in torch inductor so remove Remove log.warning entirely	2025-06-23 18:57:55 -07:00
pluesclues	22c4d45d48	Fixed Sequence Classification errors, loaded model weirdly (#2793 )	2025-06-23 18:56:56 -07:00
Michael Han	853a72592b	Update issue templates	2025-06-23 05:34:46 -07:00
Daniel Han	97c10f9494	Update issue templates	2025-06-23 05:26:28 -07:00
Lei Zhenyuan	1ae3425b07	[5/N] Enable intel GPU for unsloth (#2768 ) * add is_big_gpu support for xpu * make code unsloth's style	2025-06-23 04:47:34 -07:00
kilavvy	1d5af06e00	Docs: Fix typo and improve MoE docstrings (#2784 ) * Update qwen3_moe.py * Update interface.py	2025-06-23 01:09:23 -07:00
Daniel Han	d7b0653a2a	Fix GRPO (#2787 ) * Update _utils.py * Update _utils.py * versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update vision.py * HF Transfer * fix(utils): add missing importlib import to fix NameError (#2134) This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled. By adding the missing import statement, the code will no longer throw a NameError. * Add QLoRA Train and Merge16bit Test (#2130) * add reference and unsloth lora merging tests * add test / dataset printing to test scripts * allow running tests from repo root * add qlora test readme * more readme edits * ruff formatting * additional readme comments * forgot to add actual tests * add apache license * Update pyproject.toml * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Revert * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Bug fix * Update mapper.py * check SDPA for Mistral 3, Pixtral * Update vision.py * Versioning * Update rl_replacements.py * Update README.md * add model registry * move hf hub utils to unsloth/utils * refactor global model info dicts to dataclasses * fix dataclass init * fix llama registration * remove deprecated key function * start registry reog * add llama vision * quant types -> Enum * remap literal quant types to QuantType Enum * add llama model registration * fix quant tag mapping * add qwen2.5 models to registry * add option to include original model in registry * handle quant types per model size * separate registration of base and instruct llama3.2 * add QwenQVQ to registry * add gemma3 to registry * add phi * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py --------- Co-authored-by: naliazheli <nalia0316@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2025-06-22 05:54:29 -07:00
Daniel Han	1e8201f465	Update rl_replacements.py	2025-06-22 05:51:07 -07:00
Daniel Han	6f8711e2e6	Update rl_replacements.py	2025-06-22 05:34:52 -07:00
Daniel Han	ae992555e4	Update pyproject.toml	2025-06-22 05:33:57 -07:00
Daniel Han	c8e5a88001	Update rl_replacements.py	2025-06-22 05:32:38 -07:00
Daniel Han	d9601bd14a	logits / temperature	2025-06-22 04:54:18 -07:00
Daniel Han	d36abd3b71	Update rl_replacements.py	2025-06-22 03:44:56 -07:00
Daniel Han	7837f6f40c	Update rl_replacements.py	2025-06-22 03:39:30 -07:00
Daniel Han	9b612677f9	Update rl.py	2025-06-22 03:29:20 -07:00
Daniel Han	c465cf2fd2	Update rl_replacements.py	2025-06-22 02:55:00 -07:00
Daniel Han	0815525b15	Update rl_replacements.py	2025-06-22 02:37:52 -07:00
Daniel Han	0012c8906c	Merge branch 'main' into nightly	2025-06-22 02:37:11 -07:00
Daniel Han	c7c6f2c88d	Update rl.py	2025-06-21 23:29:08 -07:00
Daniel Han	d9be145849	Fix bf16 = None	2025-06-21 22:50:29 -07:00
Daniel Han	2f2930ee50	Update rl.py	2025-06-21 22:26:46 -07:00
Daniel Han	6af83b76b0	Merge branch 'main' into nightly	2025-06-21 22:21:10 -07:00
Daniel Han	05867537c1	Update _utils.py	2025-06-21 22:20:46 -07:00
Daniel Han	d71dfb1d01	Update rl_replacements.py	2025-06-21 22:20:32 -07:00
Daniel Han	3461b987fd	Fix DAPO, TRL 0.19.0	2025-06-21 22:14:21 -07:00
simpissa	8a202d6175	Fix for grpo_compute_loss_slow (#2702 ) * slice last logit * move slicing	2025-06-21 21:58:06 -07:00
Daniel Han	447ce0fb4f	Mistral Small 3.2	2025-06-21 06:44:14 -07:00
amrothemich	ca150bb27a	Update pyproject.toml (#2778 ) Switched pyproject license to dictionary type	2025-06-21 02:44:24 -07:00
Michael Han	2a200e739a	Merge pull request #2780 from rolandtannous/fix/gemma3-grpo-self-llm Fix AttributeError in GRPO trainer for models without llm attribute	2025-06-20 21:15:54 -07:00
Roland Tannous	8c563abb87	Fix Gemma3ForCausalLm does not have attribute self.llm	2025-06-21 01:07:32 +00:00
Roland Tannous	061f038ec7	Additional tests for unsloth-zoo PR#174	2025-06-21 00:22:00 +00:00
Daniel Han	1be2a6e90a	Merge branch 'main' of https://github.com/unslothai/unsloth	2025-06-20 06:30:43 -07:00
Daniel Han	d8846ebdfd	Update pyproject.toml	2025-06-20 06:30:35 -07:00
marcandrelarochelle	481292a96a	Fix TRL 1.8.2 (#2774 ) * Fix for TRL 1.8.2 Regex matching LLM initialization * Update Regex	2025-06-20 06:28:58 -07:00
Daniel Han	e43babe76f	Update __init__.py	2025-06-20 06:13:45 -07:00
Daniel Han	2e9724f279	Fix bugs	2025-06-20 06:09:03 -07:00
Datta Nimmaturi	b87ff3f528	Enable vLLM to share memory space (#2712 ) * vLLM sleep once generation is done * Make enable_sleep_model configurable * Make default to false Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com> * Force standby under environment variable --------- Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>	2025-06-19 04:04:14 -07:00
Edd	a398484d0a	Fix renaming on other model than Llama (#2762 )	2025-06-18 13:38:36 -07:00
leopardracer	c6e0366e0d	Fix Typos in Documentation and Comments (#2721 ) * Update ocr_eval.md * Update backward.py	2025-06-17 04:34:51 -07:00
pluesclues	440bbf5b52	Reward modeling update (There seems to be another patch) (#2710 ) * Update llama.py, sequence_classifcaiton update * Update llama.py, adapting to original commit * Update llama.py, for seqeuence classifcation update * Update llama.py, added transformer import * Update llama.py, dealt with output weight * Update llama.py, renamed it peft model fast forward * Update llama.py, set up is classification varaiable * Update llama.py, updated lora dict to initialize sequence classification object * Update llama.py, gets model name correctly before Lora dict is initialized * Update llama.py, Task_type_SEQ_CLS doesnt work but it does work with Task_type.CAUSAL_LM	2025-06-17 04:33:45 -07:00
Michael Han	ee76be7d58	Update issue templates Adding Reddit link	2025-06-12 01:23:36 -07:00
Roland Tannous	efe2cc43a7	tests for additional merge fix unsloth zoo pr 163 (#2719 ) * tests for additional merge fix unsloth zoo pr 163 * fixed load_dataset indent in mistral perplexity test file	2025-06-11 14:08:41 -07:00
Daniel Han	d535bf067e	Versioning	2025-06-10 06:51:07 -07:00
user799595	19399e09f9	Making protobuf version more flexible (#2637 ) * Making protobuf version more flexible * Update pyproject.toml * Update pyproject.toml --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-06-10 04:13:25 -07:00
Daniel Han	af47cfb9a3	Update pyproject.toml	2025-06-10 04:04:25 -07:00
Lei Zhenyuan	fc78af6d76	add support for torch270 (#2709 )	2025-06-10 03:59:15 -07:00
Daniel Han	0ab4544d37	Merge branch 'main' into nightly	2025-06-06 05:55:46 -07:00
Daniel Han	16af0ceb8e	versioning	2025-06-06 05:46:49 -07:00
Salpingopharyngeus	0012b13573	Ignore None to Subprocess_Commands (#2680 ) Ignores none params when building the subprocess_command for vllm. As none values stop vllm from deploying properly, as --quantize will be passed with none if quantization type isn't specified in the model name.	2025-06-05 01:25:12 -07:00
DoubleMathew	aa50ef2862	Update prepare 4d causal attention call (#2678 )	2025-06-04 12:58:50 -07:00
Daniel Han	8f465e21c5	Update rl.py	2025-06-03 00:07:52 -07:00
DoubleMathew	9bf691061d	patch sft_trainer to favor max_seq_length over max_length in config (#2669 )	2025-06-03 00:06:44 -07:00
DoubleMathew	90a4aacbf8	unsloth checkpointing fix for latest transformers==4.52.x (#2674 )	2025-06-03 00:06:06 -07:00
Roland Tannous	58f3a6e29d	reroute merge logic language models + comprehensive tests + eval kits (#2673 )	2025-06-02 20:32:57 -07:00
Daniel Han	a58fec36a5	Merge branch 'main' into nightly	2025-06-02 18:58:24 -07:00
RunFMe	332eabf309	Fix batched generation for prompts of different lengths (#2216 ) * fix ignoring of attention mask after prefill stage in decoding * update naming to avoid confusion --------- Co-authored-by: Неизвестный Пользователь722497 <dolegosmirnov@sberbank.ru>	2025-06-02 03:59:10 -07:00
Michael Han	e76172c638	Merge pull request #2662 from Datta0/model_param_fix Fix quant model param fetch regex	2025-06-01 04:19:12 -07:00
datta0	e8d6ede1fd	Make replacement logic conscise	2025-06-01 05:57:43 +00:00
Michael Han	45a32bc599	Update issue templates	2025-05-31 14:38:55 -07:00
datta0	f2a8a437b4	Fix quant model param fetch regex	2025-05-31 18:52:46 +00:00
Daniel Han	03965930e7	DeepSeek R1 Qwen	2025-05-30 01:38:53 -07:00
Daniel Han	125e9f5f84	Merge branch 'main' into nightly	2025-05-29 09:59:48 -07:00
Daniel Han	f9677b6cae	Bug fixes (#2651 ) * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * check * Update _utils.py * Update loader.py * Update loader.py * Remove prints * Update README.md typo * Update _utils.py * Update _utils.py * versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update vision.py * HF Transfer * fix(utils): add missing importlib import to fix NameError (#2134) This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled. By adding the missing import statement, the code will no longer throw a NameError. * Add QLoRA Train and Merge16bit Test (#2130) * add reference and unsloth lora merging tests * add test / dataset printing to test scripts * allow running tests from repo root * add qlora test readme * more readme edits * ruff formatting * additional readme comments * forgot to add actual tests * add apache license * Update pyproject.toml * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Revert * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Bug fix * Update mapper.py * check SDPA for Mistral 3, Pixtral * Update vision.py * Versioning * Update rl_replacements.py * Update README.md * add model registry * move hf hub utils to unsloth/utils * refactor global model info dicts to dataclasses * fix dataclass init * fix llama registration * remove deprecated key function * start registry reog * add llama vision * quant types -> Enum * remap literal quant types to QuantType Enum * add llama model registration * fix quant tag mapping * add qwen2.5 models to registry * add option to include original model in registry * handle quant types per model size * separate registration of base and instruct llama3.2 * add QwenQVQ to registry * add gemma3 to registry * add phi * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py --------- Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com> Co-authored-by: naliazheli <nalia0316@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2025-05-29 09:59:29 -07:00
Daniel Han	b087bfaf55	Update rl.py	2025-05-28 12:27:55 -07:00
Daniel Han	b290557814	Update rl.py	2025-05-28 12:27:43 -07:00
Daniel Han	acb972cee0	versioning	2025-05-28 12:19:43 -07:00
Daniel Han	85b959b9bb	Merge branch 'main' into nightly	2025-05-28 12:11:41 -07:00
DoubleMathew	95452eed81	Fix SFTtraining for new trl (#2647 ) * fix sft training with trl>0.15.2 with trl DataCollator * Update fix to accomodate both trl and transformers DataCollatorForLanguageModeling	2025-05-28 11:55:48 -07:00
Daniel Han	2b1462e813	Merge branch 'main' into nightly	2025-05-28 06:15:34 -07:00
Daniel Han	623060ba29	Latest TRL, GRPO + Bug fixes (#2645 ) * Update vision.py * Update vision.py * Update vision.py * Update vision.py * model_type_arch * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * check * Update _utils.py * Update loader.py * Update loader.py * Remove prints * Update README.md typo * Update _utils.py * Update _utils.py * versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update vision.py * HF Transfer * fix(utils): add missing importlib import to fix NameError (#2134) This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled. By adding the missing import statement, the code will no longer throw a NameError. * Add QLoRA Train and Merge16bit Test (#2130) * add reference and unsloth lora merging tests * add test / dataset printing to test scripts * allow running tests from repo root * add qlora test readme * more readme edits * ruff formatting * additional readme comments * forgot to add actual tests * add apache license * Update pyproject.toml * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Revert * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Bug fix * Update mapper.py * check SDPA for Mistral 3, Pixtral * Update vision.py * Versioning * Update rl_replacements.py * Update README.md * add model registry * move hf hub utils to unsloth/utils * refactor global model info dicts to dataclasses * fix dataclass init * fix llama registration * remove deprecated key function * start registry reog * add llama vision * quant types -> Enum * remap literal quant types to QuantType Enum * add llama model registration * fix quant tag mapping * add qwen2.5 models to registry * add option to include original model in registry * handle quant types per model size * separate registration of base and instruct llama3.2 * add QwenQVQ to registry * add gemma3 to registry * add phi * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py --------- Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com> Co-authored-by: naliazheli <nalia0316@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2025-05-28 06:15:12 -07:00
Daniel Han	251066289d	Update rl.py	2025-05-28 06:04:23 -07:00
Daniel Han	12912022f4	Update pyproject.toml	2025-05-28 05:58:02 -07:00
Daniel Han	c94e31888c	logging	2025-05-28 05:56:19 -07:00
Daniel Han	725d84997b	Update rl.py	2025-05-28 05:26:01 -07:00
Daniel Han	1dabfcd2d3	Update rl.py	2025-05-28 05:23:07 -07:00
Daniel Han	86cd1d2786	Update rl.py	2025-05-28 05:09:03 -07:00
Daniel Han	e7f76d53bb	Update rl.py	2025-05-28 05:06:24 -07:00
Daniel Han	4fb4d3a36c	Update rl.py	2025-05-28 05:04:12 -07:00
Daniel Han	dc48358b2f	versioning	2025-05-28 05:02:05 -07:00
Daniel Han	89c30967df	Update rl.py	2025-05-28 04:57:30 -07:00
Daniel Han	026ba8e678	Create LICENSE	2025-05-28 03:27:48 -07:00
jeromeku	0b5ac8f2ab	Llama4 MoE Grouped GEMM (#2639 ) * add llama4 reference layer * add llama4 reference impl * formatting	2025-05-28 03:26:35 -07:00
Daniel Han	86ecf655a9	Merge branch 'main' into nightly	2025-05-28 03:24:15 -07:00
Premik	d8bf17959a	Check the `skip_prepare_dataset` before accessing dataset fields. #2496 (#2633 )	2025-05-28 03:23:59 -07:00
Daniel Han	0ba75a4359	Merge branch 'main' into nightly	2025-05-28 02:06:42 -07:00
Michael Han	ce9e54755f	Update README.md Better Qwen3 notebook	2025-05-26 23:44:41 -07:00
Daniel Han	5327d1d36d	Flash Attention whls	2025-05-26 22:48:46 -07:00
Datta Nimmaturi	811422bbb4	Upgrade trl fix (#2544 ) * Update llama.py making set and reset functions in order to properly use autoSequenceClassification * Update fast_lora.py, added mixed precising pytorch autocasting * Update llama.py did not included rotary embeddings in the reset functions correctly * Update rl.py: correct get reward model added as well as the eval step stuff * Update rl.py removed function that did not need to be patched * Update llama.py: kept reset functions and made their names generic * Update fast_lora.py * Update rl.py, try except * Update fast_lora.py, removing downcasting stuff * Update llama.py removed depircate LLamaLinearScalingRotaryEmbedding * Update rl.py for VLLM RLOO and PPO * Update rl.py reverted * Update rl.py with peft cahnges * Update rl.py, disabling adapters screws inference up * Update rl.py getting PPO support * Update rl.py cleanup * Update rl.py cleaned up not useful commented code * Update llama.py, enabled new flag, keep padding * Upgrade trl fix Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Update rl.py made changes relative to the review * Revert accidental patch block for non grpo Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Fixup sampling params issue * Fix rl.py regex Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * loss type: grpo, drgrpo and bnpo Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Add trl version check for vllm colocate mode for RL trainers * Update rl.py For TRL 0.18.0 (Main branch of TRL at the time because its on 0.17.0) , the SFT trainer for some reason deletes the labels column and unsloth internal loss funcitons need that column for hte claculations so I add it back in like this. * Update llama.py, merge it to be dattas llama version * Update rl.py, sft changes to get 0.18.0 to be working * Update rl_replacements.py, added hidden state stuff * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py, rechanged the accumlated loss * Fixup num_iterations>1 for grpo Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com> * Update rl_replacements.py * no unnecessary logits upcast. fix naming Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com> * Update rl_replacements.py returned hidden states from logprobs * Update rl_replacements.py removed debug logic * Update rl_replacements.py, should be fine now * Update rl_replacements.py, should take new args for GRPO trainer * Update rl_replacements.py, made it compatible with trl 0.15.2 * Update rl_replacements.py, fixed typo in per tokne-Logps --------- Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com> Co-authored-by: pluesclues <136766175+pluesclues@users.noreply.github.com>	2025-05-26 17:20:57 -07:00
Daniel Han	a66f3f4cda	Colocate vLLM	2025-05-26 00:37:04 -07:00
Michael Han	1f4e74cb96	Update README.md	2025-05-25 03:35:43 -07:00
Quentin Gallouédec	ce5c2d2145	Remove dataset_text_field from SFTConfig (#2609 )	2025-05-25 03:20:16 -07:00
Richi	f6c4be39b7	add: path checking for failed llama cpp builds (#2603 )	2025-05-25 03:18:07 -07:00
Daniel Han	9b8e6b1e22	Merge branch 'main' into nightly	2025-05-21 23:21:12 -07:00
Daniel Han	dd43200718	Devstral, MedGemma	2025-05-21 07:35:36 -07:00
Michael Han	e771760e53	Update README.md Updating model support	2025-05-20 09:51:55 -07:00
Michael Han	61b68725e9	Update README.md	2025-05-19 21:26:19 -07:00
Daniel Han	b5269556c2	Update issue templates	2025-05-17 18:30:17 -07:00
Daniel Han	25c11cf2d8	Update issue templates	2025-05-17 18:29:27 -07:00
Daniel Han	7a0c8c7da1	Update issue templates	2025-05-17 05:42:10 -07:00
Daniel Han	4d5f2172f4	Fix Whisper, ModernBERT (#2565 ) * Update vision.py * Update vision.py * Update mapper.py * Update vision.py * fix: config.torch_dtype in LlamaModel_fast_forward_inference (#2091) * fix: config.torch_dtype in LlamaModel_fast_forward_inference * Update llama.py * update for consistency --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * model_type_arch * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * check * Update _utils.py * Update loader.py * Update loader.py * Remove prints * Update README.md typo * Update _utils.py * Update _utils.py * versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update vision.py * HF Transfer * fix(utils): add missing importlib import to fix NameError (#2134) This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled. By adding the missing import statement, the code will no longer throw a NameError. * Add QLoRA Train and Merge16bit Test (#2130) * add reference and unsloth lora merging tests * add test / dataset printing to test scripts * allow running tests from repo root * add qlora test readme * more readme edits * ruff formatting * additional readme comments * forgot to add actual tests * add apache license * Update pyproject.toml * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Revert * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Bug fix * Update mapper.py * check SDPA for Mistral 3, Pixtral * Update vision.py * Versioning * Update rl_replacements.py * Update README.md * add model registry * move hf hub utils to unsloth/utils * refactor global model info dicts to dataclasses * fix dataclass init * fix llama registration * remove deprecated key function * start registry reog * add llama vision * quant types -> Enum * remap literal quant types to QuantType Enum * add llama model registration * fix quant tag mapping * add qwen2.5 models to registry * add option to include original model in registry * handle quant types per model size * separate registration of base and instruct llama3.2 * add QwenQVQ to registry * add gemma3 to registry * add phi * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py --------- Co-authored-by: lurf21 <93976703+lurf21@users.noreply.github.com> Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com> Co-authored-by: naliazheli <nalia0316@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2025-05-17 05:11:50 -07:00
Daniel Han	416b89e901	Update _utils.py	2025-05-17 05:11:36 -07:00
Daniel Han	f0bf761832	Merge branch 'main' into nightly	2025-05-17 05:10:00 -07:00
Emmanuel Ferdman	2a1caa746b	Display the model name in RoPE scaling unsupported error (#2564 ) Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>	2025-05-17 05:09:06 -07:00
Daniel Han	382406a9ef	Update loader.py	2025-05-17 00:29:55 -07:00
Daniel Han	87b98cf970	Update loader.py	2025-05-17 00:29:04 -07:00
Daniel Han	811a86007d	Update loader.py	2025-05-17 00:24:27 -07:00
Daniel Han	611463b428	Update loader.py	2025-05-17 00:03:20 -07:00
Daniel Han	74d44dc611	Update vision.py	2025-05-16 23:31:15 -07:00
Daniel Han	c0ae3602e2	Update vision.py	2025-05-16 23:25:32 -07:00
Daniel Han	fe3f1c02f9	Update vision.py	2025-05-16 23:24:32 -07:00
Daniel Han	18489d1c99	Update vision.py	2025-05-16 23:20:57 -07:00
Daniel Han	b790b82e42	Update vision.py	2025-05-16 23:20:18 -07:00
Daniel Han	3c475c834d	Update vision.py	2025-05-16 23:03:04 -07:00
Daniel Han	9b571970f8	Update mapper.py	2025-05-16 23:02:06 -07:00
Daniel Han	0078d7bfb5	Merge branch 'main' into nightly	2025-05-16 23:02:00 -07:00
Michael Han	f48e240bad	Merge pull request #2563 from davedgd/main fix issue with qwen3 template double quote escapes	2025-05-16 22:38:39 -07:00
David Dobolyi	a063c4a41e	fix issue with qwen3 template double quote escapes	2025-05-16 23:26:03 -06:00
Etherll	99a2a36f73	Fix trust remote code (#2357 ) * Update _utils.py * Update loader.py * Update loader.py * Update vision.py * Update unsloth/models/vision.py * Update unsloth/models/vision.py * Update unsloth/models/vision.py * Update unsloth/models/vision.py * Update unsloth/models/_utils.py * Update unsloth/models/vision.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-05-16 16:06:42 -07:00
Daniel Han	2524de493e	Update pyproject.toml	2025-05-16 15:38:19 -07:00
Daniel Han	15b6ac613a	Merge branch 'main' of https://github.com/unslothai/unsloth	2025-05-16 15:34:41 -07:00
Daniel Han	299c8a94a4	Update _utils.py	2025-05-16 15:33:49 -07:00
Michael Han	4937cd97f0	Merge pull request #2554 from Erland366/fix/generation_config Quick fix on the CompileConfig error	2025-05-16 12:48:00 -07:00
Erland366	3cdbd879f7	Fix Nonetype on the compile_config	2025-05-16 13:16:34 +00:00
Michael Han	b22e654ef0	Update README.md	2025-05-16 01:56:40 -07:00
Michael Han	41e3701251	Update README.md TTS support	2025-05-15 15:15:53 -07:00
Daniel Han	dc6c4dc385	TTS (#2545 ) * Update rl_replacements.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Remove double generate patch * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update mapper.py * Update vision.py * fix: config.torch_dtype in LlamaModel_fast_forward_inference (#2091) * fix: config.torch_dtype in LlamaModel_fast_forward_inference * Update llama.py * update for consistency --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * model_type_arch * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * check * Update _utils.py * Update loader.py * Update loader.py * Remove prints * Update README.md typo * Update _utils.py * Update _utils.py * versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update vision.py * HF Transfer * fix(utils): add missing importlib import to fix NameError (#2134) This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled. By adding the missing import statement, the code will no longer throw a NameError. * Add QLoRA Train and Merge16bit Test (#2130) * add reference and unsloth lora merging tests * add test / dataset printing to test scripts * allow running tests from repo root * add qlora test readme * more readme edits * ruff formatting * additional readme comments * forgot to add actual tests * add apache license * Update pyproject.toml * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Revert * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Bug fix * Update mapper.py * check SDPA for Mistral 3, Pixtral * Update vision.py * Versioning * Update rl_replacements.py * Update README.md * add model registry * move hf hub utils to unsloth/utils * refactor global model info dicts to dataclasses * fix dataclass init * fix llama registration * remove deprecated key function * start registry reog * add llama vision * quant types -> Enum * remap literal quant types to QuantType Enum * add llama model registration * fix quant tag mapping * add qwen2.5 models to registry * add option to include original model in registry * handle quant types per model size * separate registration of base and instruct llama3.2 * add QwenQVQ to registry * add gemma3 to registry * add phi * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: lurf21 <93976703+lurf21@users.noreply.github.com> Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com> Co-authored-by: naliazheli <nalia0316@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2025-05-15 09:23:52 -07:00
Daniel Han	b9b7b2c054	Update loader.py	2025-05-15 09:23:11 -07:00
Daniel Han	2579cd982b	Update loader.py	2025-05-15 08:52:10 -07:00
Daniel Han	ed5712c397	Update loader.py	2025-05-15 08:39:37 -07:00
Daniel Han	b425d9da9e	Whisper	2025-05-15 08:27:17 -07:00
Daniel Han	a0dd7663c2	Sesame	2025-05-15 07:29:32 -07:00
Michael Han	b5ba71a3d3	Update README.md	2025-05-15 06:54:05 -07:00
Daniel Han	b77d59e97c	Auto vision detection	2025-05-15 06:39:03 -07:00
Daniel Han	d77a4a9d5c	Update vision.py	2025-05-15 04:52:10 -07:00
Daniel Han	2fe9067d6b	UNSLOTH_DISABLE_STATIC_GENERATION	2025-05-15 04:48:42 -07:00
Daniel Han	b02f6035ae	Update vision.py	2025-05-15 04:34:49 -07:00
Daniel Han	8865207ba4	Update vision.py	2025-05-15 04:26:05 -07:00
Janusz	b781a7ad38	Add use_rslora reference to LoraConfig inititalisation (#2539 ) Co-authored-by: jkumz <janusz.kumor01@gmail.com>	2025-05-15 04:24:18 -07:00
omahs	28304e4101	Fix typos (#2540 )	2025-05-15 04:23:27 -07:00
Daniel Han	dafdcbe88c	Update vision.py	2025-05-15 03:54:22 -07:00
Daniel Han	ba6dcd2498	Update loader.py	2025-05-15 03:46:48 -07:00
Daniel Han	30a1b25752	Update loader.py	2025-05-15 03:39:20 -07:00
Daniel Han	fda9007d77	Update loader.py	2025-05-15 03:18:16 -07:00
Daniel Han	7f22fdd9f9	Update loader.py	2025-05-15 03:00:25 -07:00
Daniel Han	5982e34484	is_multimodal	2025-05-15 02:32:29 -07:00
Daniel Han	b4b073a59d	Update loader.py	2025-05-15 02:30:12 -07:00
Daniel Han	f781c871c7	Update vision.py	2025-05-15 02:17:03 -07:00
Daniel Han	633221f8c9	Update vision.py	2025-05-15 02:07:29 -07:00
Daniel Han	151b1ae2e3	Update vision.py	2025-05-15 01:54:05 -07:00
Daniel Han	630f2b5f66	Update loader.py	2025-05-15 01:47:34 -07:00
Daniel Han	c4d8ed4939	Fix Seasame	2025-05-15 01:45:15 -07:00
Daniel Han	9dac587254	Seasame force float16 / float32	2025-05-15 01:27:14 -07:00
Daniel Han	189ed215e6	Update chat_templates.py	2025-05-15 01:13:20 -07:00
Daniel Han	992f9248bb	Merge branch 'main' into nightly	2025-05-15 01:07:32 -07:00
Michael Han	b64c84ef33	Merge pull request #2537 from kiankyars/main Add Qwen-3 chat template and Ollama template support	2025-05-14 20:45:53 -07:00
Daniel Han	c9b93f7b10	Merge branch 'main' into nightly	2025-05-14 20:30:41 -07:00
Kian Kyars	e147af330e	undo accident	2025-05-14 19:00:23 -06:00
Kian Kyars	059ccd8221	style: Place Qwen-3 template after Gemma-3, match style with other templates	2025-05-14 18:59:17 -06:00
Kian Kyars	48dc104728	Update Qwen-3 chat and Ollama templates to official full version, placed after Gemma-3	2025-05-14 18:42:55 -06:00
Kian Kyars	40ac241994	Add Qwen-3 chat template and Ollama template support	2025-05-14 18:35:02 -06:00
Daniel Han	e18a41d10f	Update pyproject.toml	2025-05-14 05:42:47 -07:00
Daniel Han	da41d4c21f	Update _utils.py	2025-05-14 05:42:11 -07:00
Daniel Han	0bbf131238	Update synthetic.py	2025-05-14 04:05:23 -07:00
Daniel Han	f971fce721	Update synthetic.py	2025-05-14 03:54:46 -07:00
Daniel Han	17d8517144	Update synthetic.py	2025-05-14 03:49:27 -07:00
Daniel Han	b47bbd3f55	Update synthetic.py	2025-05-14 03:47:29 -07:00
Daniel Han	e05735db0c	Update synthetic.py	2025-05-14 03:44:15 -07:00
Michael Han	074573a13b	Merge pull request #2527 from mmathew23/csm Add Sesame CSM	2025-05-14 02:26:12 -07:00
DoubleMathew	e317fc222d	Merge branch 'unslothai:main' into csm	2025-05-13 17:58:11 -05:00
Daniel Han	cdb8eaaf42	Versioning	2025-05-13 09:10:24 -07:00
Daniel Han	99a0627c64	Update synthetic.py	2025-05-13 08:26:17 -07:00
Daniel Han	76e632ea35	Update synthetic.py	2025-05-13 08:25:48 -07:00
Daniel Han	4f9dfadd91	Merge branch 'main' into nightly	2025-05-13 03:39:49 -07:00
Michael Han	f4cbf303fe	Update README.md	2025-05-13 01:39:59 -07:00
Daniel Han	65710647b5	Update loader_utils.py	2025-05-12 21:06:30 -07:00
Daniel Han	e99b66e711	Update pyproject.toml	2025-05-12 16:28:50 -07:00
feng lui	48cb9c724c	vLLM Windows CUDA support [tested] (#2158 ) * Update loader.py change vllm installed check by transformers utils function * Update llama.py change vllm installed check by transformers utils function * add sample notebook * fix Indentation * add global is_vLLM_available function * Pythonic style * Delete nb/Qwen2.5_(3B)-GRPO-windows.ipynb Would be great to move it to https://github.com/unslothai/notebooks - appreciate it! --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-05-12 05:33:42 -07:00
Daniel Han	67026e28ee	Update pyproject.toml	2025-05-12 04:24:18 -07:00
Daniel Han	cecdfc5a34	Fix Intel GPU	2025-05-12 03:10:21 -07:00
Lei Zhenyuan	fe6b83fd7e	[2/N] Enable intel GPU for unsloth (#2388 ) * add DEVICE_TYPE and resolve device specific API * reuse import torch * move env under device typr * resolve comments * add more comments * add more comments	2025-05-12 02:58:21 -07:00
Lei Zhenyuan	5bf77aabf5	first pr for intel GPU, resolve __init__.py and pyproject.toml (#2350 ) add better comments	2025-05-12 02:38:40 -07:00
Daniel Han	c37380c63b	Fix GRPO eval	2025-05-12 02:35:31 -07:00
Michael Han	e3f6c5eff4	Merge pull request #2466 from mmathew23/fix_pop_token_type_ids the pixtral vision notebook fails during inference	2025-05-09 14:57:31 -07:00
Mathew Mathew	3a56b3a24a	turn off compilation and fast generation for csm	2025-05-09 16:50:16 -05:00
Michael Han	1a99b4dc94	Merge pull request #2492 from yuanzhedong/yz/dev/fix-readme Fix readme example	2025-05-07 23:29:59 -07:00
Yuanzhe Dong	75f3f8a7e5	Fix readme example	2025-05-06 19:26:35 -07:00
Michael Han	8821057420	Update README.md Adding extra synthetic data notebook, cleaning repo	2025-05-05 20:56:01 -07:00
Daniel Han	6c0b8a57e4	Update __init__.py	2025-05-04 18:06:39 -07:00
Michael Han	9e2ef7c50c	Uploading HQ Unsloth Sticker	2025-05-04 05:31:57 -07:00
Michael Han	c4d0fd42be	Updating HQ logos	2025-05-04 05:25:06 -07:00
Daniel Han	84779ee11b	Better vllm deletion	2025-05-04 05:03:58 -07:00
Daniel Han	f9bf537130	Update pyproject.toml	2025-05-04 04:29:10 -07:00
Daniel Han	bad8069807	Update _utils.py	2025-05-04 03:03:50 -07:00
Michael Han	bb802c8a4a	Update README.md	2025-05-02 23:14:34 -07:00
Daniel Han	077fe260e6	Update pyproject.toml	2025-05-02 22:52:58 -07:00
Daniel Han	e287e55906	Update pyproject.toml	2025-05-02 22:47:09 -07:00
Daniel Han	fcaf726dda	Update pyproject.toml	2025-05-02 22:40:12 -07:00
Roland Tannous	4190502a1a	Added missing code of conduct (#2416 ) * Added code of conduct * fixed CONTRIBUTING -> CODE_OF_CONDUCT url link	2025-05-02 21:08:27 -07:00
Johnny	fb3fb77d43	Update pyproject.toml (#2458 )	2025-05-02 21:07:59 -07:00
jeromeku	b7fc12c8be	MoE Kernel (#2465 ) * add moe grouped gemm kernel * add benchmark, README * remove formatting from __init__.py	2025-05-02 20:59:23 -07:00
Mathew Mathew	bbe5e2a221	the pixtral vision notebook fails during inferenc with unused kwargs token_type_ids. This fixes the error	2025-05-02 21:40:15 -05:00
Daniel Han	9edbe23259	Bug fix	2025-05-02 14:06:14 -07:00
Daniel Han	d957caeae7	Fix Qwen 3 mapping	2025-05-02 09:44:01 -07:00
Michael Han	8bfe5fd4ab	Update README.md	2025-05-02 09:06:57 -07:00
Daniel Han	2249a5ff88	Qwen3 bug fixes	2025-05-02 07:18:33 -07:00
Daniel Han	8fbd80231c	Update mapper.py	2025-05-02 06:26:43 -07:00
Daniel Han	d74e7c19b3	Versioning	2025-05-02 05:05:09 -07:00
Daniel Han	439cae9fae	Remove hf_xet warning	2025-05-02 04:23:53 -07:00
Daniel Han	f90224b02f	Update __init__.py	2025-05-02 03:43:44 -07:00
Daniel Han	2a231c7ba4	Merge branch 'main' of https://github.com/unslothai/unsloth	2025-05-02 03:09:52 -07:00
Daniel Han	eb495f171f	Qwen 3	2025-05-02 03:09:44 -07:00
cblomert	629a3fcfe3	Added k_norm & q_norm to merged Qwen3 layers (#2452 )	2025-05-02 03:07:37 -07:00
Michael Han	97a63f809f	Update README.md Qwen3 notebook	2025-05-01 22:52:42 -07:00
Daniel Han	6e2b5a767f	Nightly (#2448 ) * move float32 * Ensure trust_remote_code propegates down to unsloth_compile_transformers (#2075) * Update _utils.py * Show both `peft_error` and `autoconfig_error`, not just `autoconfig_error` (#2080) When loading a PEFT model fails, only the `autoconfig_error` is shown. Instead of the `peft_error`, which is what really matters when we're trying to load a PEFT adapter, the user will see something like this: ``` RuntimeError: Unrecognized model in my_model. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, ... ``` This PR just changes it so `autoconfig_error` and `peft_error` are both displayed. * fix error message (#2046) * Update vision.py * Update _utils.py * Update pyproject.toml * Update __init__.py * Update __init__.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Remove double generate patch * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update mapper.py * Update vision.py * fix: config.torch_dtype in LlamaModel_fast_forward_inference (#2091) * fix: config.torch_dtype in LlamaModel_fast_forward_inference * Update llama.py * update for consistency --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * model_type_arch * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * check * Update _utils.py * Update loader.py * Update loader.py * Remove prints * Update README.md typo * Update _utils.py * Update _utils.py * versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update vision.py * HF Transfer * fix(utils): add missing importlib import to fix NameError (#2134) This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled. By adding the missing import statement, the code will no longer throw a NameError. * Add QLoRA Train and Merge16bit Test (#2130) * add reference and unsloth lora merging tests * add test / dataset printing to test scripts * allow running tests from repo root * add qlora test readme * more readme edits * ruff formatting * additional readme comments * forgot to add actual tests * add apache license * Update pyproject.toml * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Revert * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Bug fix * Update mapper.py * check SDPA for Mistral 3, Pixtral * Update vision.py * Versioning * Update rl_replacements.py * Update README.md * add model registry * move hf hub utils to unsloth/utils * refactor global model info dicts to dataclasses * fix dataclass init * fix llama registration * remove deprecated key function * start registry reog * add llama vision * quant types -> Enum * remap literal quant types to QuantType Enum * add llama model registration * fix quant tag mapping * add qwen2.5 models to registry * add option to include original model in registry * handle quant types per model size * separate registration of base and instruct llama3.2 * add QwenQVQ to registry * add gemma3 to registry * add phi * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py --------- Co-authored-by: Xander Hawthorne <167850078+CuppaXanax@users.noreply.github.com> Co-authored-by: Isaac Breen <isaac.breen@icloud.com> Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com> Co-authored-by: lurf21 <93976703+lurf21@users.noreply.github.com> Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com> Co-authored-by: naliazheli <nalia0316@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2025-05-01 07:42:08 -07:00
Daniel Han	88988a8eae	Update synthetic.py	2025-05-01 07:41:45 -07:00
Daniel Han	ea30448540	Update synthetic.py	2025-05-01 07:26:51 -07:00
Daniel Han	8590f903e0	Update pyproject.toml	2025-05-01 07:18:29 -07:00
Daniel Han	b6efc4f985	Update _utils.py	2025-05-01 07:17:36 -07:00
Daniel Han	9b8556a100	Update synthetic.py	2025-05-01 06:56:02 -07:00
Daniel Han	8e17139f21	Update synthetic.py	2025-05-01 06:55:43 -07:00
Daniel Han	3ff18046d4	Update synthetic.py	2025-05-01 06:49:46 -07:00
Daniel Han	9aff268a0c	Update synthetic.py	2025-05-01 06:46:04 -07:00
Daniel Han	be85c14fec	Update synthetic.py	2025-05-01 06:42:49 -07:00
Daniel Han	24f6940bf2	Update synthetic.py	2025-05-01 06:38:16 -07:00
Daniel Han	c7385aa85d	Update synthetic.py	2025-05-01 06:38:08 -07:00
Daniel Han	94c90d35be	Update synthetic.py	2025-05-01 06:36:16 -07:00
Daniel Han	f2fb23e532	Update synthetic.py	2025-05-01 06:35:28 -07:00
Daniel Han	a4d8dc31c4	Update synthetic.py	2025-05-01 06:32:20 -07:00
Daniel Han	c7d953c452	Update synthetic.py	2025-05-01 06:26:09 -07:00
Daniel Han	09470acdb8	Update synthetic.py	2025-05-01 06:23:54 -07:00
Daniel Han	72ca86306a	Update synthetic.py	2025-05-01 06:20:47 -07:00
Daniel Han	8853a0fee4	Update synthetic.py	2025-05-01 06:20:02 -07:00
Daniel Han	d93fe5e656	Update synthetic.py	2025-05-01 06:19:25 -07:00
Daniel Han	706d14ea51	Update synthetic.py	2025-05-01 06:16:58 -07:00
Daniel Han	9b31af17b4	Update synthetic.py	2025-05-01 06:16:35 -07:00
Daniel Han	080fd1b4da	Merge branch 'main' into nightly	2025-05-01 05:54:31 -07:00
Daniel Han	382961c9d3	Update mapper.py	2025-05-01 03:07:56 -07:00
Daniel Han	9a930bb095	Qwen 3, Bug Fixes (#2445 ) * bug fix #2008 (#2039) * fix (#2051) * Update loader.py * Update pyproject.toml * Update pyproject.toml * Update vision.py * more prints * Update loader.py * LoRA 16bit fix * Update vision.py * Update vision.py * Update _utils.py * Update vision.py * move forced float32 * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * move print * Update _utils.py * disable bfloat16 * Fix forced float32 * move float32 * Ensure trust_remote_code propegates down to unsloth_compile_transformers (#2075) * Update _utils.py * Show both `peft_error` and `autoconfig_error`, not just `autoconfig_error` (#2080) When loading a PEFT model fails, only the `autoconfig_error` is shown. Instead of the `peft_error`, which is what really matters when we're trying to load a PEFT adapter, the user will see something like this: ``` RuntimeError: Unrecognized model in my_model. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, ... ``` This PR just changes it so `autoconfig_error` and `peft_error` are both displayed. * fix error message (#2046) * Update vision.py * Update _utils.py * Update pyproject.toml * Update __init__.py * Update __init__.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Remove double generate patch * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update mapper.py * Update vision.py * fix: config.torch_dtype in LlamaModel_fast_forward_inference (#2091) * fix: config.torch_dtype in LlamaModel_fast_forward_inference * Update llama.py * update for consistency --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * model_type_arch * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * check * Update _utils.py * Update loader.py * Update loader.py * Remove prints * Update README.md typo * Update _utils.py * Update _utils.py * versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update vision.py * HF Transfer * fix(utils): add missing importlib import to fix NameError (#2134) This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled. By adding the missing import statement, the code will no longer throw a NameError. * Add QLoRA Train and Merge16bit Test (#2130) * add reference and unsloth lora merging tests * add test / dataset printing to test scripts * allow running tests from repo root * add qlora test readme * more readme edits * ruff formatting * additional readme comments * forgot to add actual tests * add apache license * Update pyproject.toml * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Revert * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Bug fix * Update mapper.py * check SDPA for Mistral 3, Pixtral * Update vision.py * Versioning * Update rl_replacements.py * Update README.md * add model registry * move hf hub utils to unsloth/utils * refactor global model info dicts to dataclasses * fix dataclass init * fix llama registration * remove deprecated key function * start registry reog * add llama vision * quant types -> Enum * remap literal quant types to QuantType Enum * add llama model registration * fix quant tag mapping * add qwen2.5 models to registry * add option to include original model in registry * handle quant types per model size * separate registration of base and instruct llama3.2 * add QwenQVQ to registry * add gemma3 to registry * add phi * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore --------- Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com> Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com> Co-authored-by: Xander Hawthorne <167850078+CuppaXanax@users.noreply.github.com> Co-authored-by: Isaac Breen <isaac.breen@icloud.com> Co-authored-by: lurf21 <93976703+lurf21@users.noreply.github.com> Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com> Co-authored-by: naliazheli <nalia0316@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2025-04-30 22:38:39 -07:00
Daniel Han	962e788eb1	Delete .gitignore	2025-04-30 22:36:15 -07:00
Daniel Han	64cf5d8e64	Update pyproject.toml	2025-04-30 22:35:22 -07:00
Daniel Han	599a15ea8d	Update pyproject.toml	2025-04-30 22:34:07 -07:00
Daniel Han	86a4506309	Update synthetic.py	2025-04-30 12:58:58 -07:00
Daniel Han	676febe4f0	Update synthetic.py	2025-04-30 11:17:00 -07:00
Daniel Han	62d3a9aee2	Update synthetic.py	2025-04-30 10:49:13 -07:00
Daniel Han	a075028f00	Update synthetic.py	2025-04-30 10:22:31 -07:00
Daniel Han	e27063fd5c	Update synthetic.py	2025-04-30 10:22:01 -07:00
Daniel Han	71095ce435	Update synthetic.py	2025-04-30 10:08:50 -07:00
Daniel Han	f7bd9718bb	Update synthetic.py	2025-04-30 10:07:49 -07:00
Daniel Han	4c20d2a499	Update synthetic.py	2025-04-30 10:07:38 -07:00
Daniel Han	e0ba82eebb	Update synthetic.py	2025-04-30 09:59:54 -07:00
Daniel Han	40b623a61d	Update synthetic.py	2025-04-30 09:49:27 -07:00
Daniel Han	e08a4ceeb1	Update synthetic.py	2025-04-30 09:48:53 -07:00
Daniel Han	b60af64236	Update synthetic.py	2025-04-30 09:40:13 -07:00
Daniel Han	72b84b8e73	Update synthetic.py	2025-04-30 09:32:57 -07:00
Daniel Han	bc7ac80890	Update synthetic.py	2025-04-30 09:30:33 -07:00
Daniel Han	237075e18e	Update synthetic.py	2025-04-30 09:25:46 -07:00
Daniel Han	1236aa851f	Update synthetic.py	2025-04-30 09:25:27 -07:00
Daniel Han	b91e804a75	Update synthetic.py	2025-04-30 09:24:31 -07:00
Daniel Han	44cf9b6d37	Update synthetic.py	2025-04-30 09:22:40 -07:00
Daniel Han	6d3f8871d1	Update synthetic.py	2025-04-30 09:21:05 -07:00
Daniel Han	621a1316d6	Update synthetic.py	2025-04-30 09:19:23 -07:00
Daniel Han	c6d8158389	Update synthetic.py	2025-04-30 08:57:23 -07:00
Daniel Han	6d2bb0eda2	Update synthetic.py	2025-04-30 08:56:01 -07:00
Daniel Han	390d55ee8e	Update synthetic.py	2025-04-30 08:52:26 -07:00
Daniel Han	69323c5498	Update synthetic.py	2025-04-30 08:50:27 -07:00
Daniel Han	7e70c81342	Update synthetic.py	2025-04-30 08:49:04 -07:00
Daniel Han	45a173217c	Update synthetic.py	2025-04-30 08:47:23 -07:00
Daniel Han	ca3db2980d	Update synthetic.py	2025-04-30 08:45:04 -07:00
Daniel Han	6c122be95e	Update loader.py	2025-04-30 08:44:15 -07:00
Daniel Han	ec9892f636	Update synthetic.py	2025-04-30 08:39:49 -07:00
Daniel Han	ed9709bdcf	Xet and Synthetic	2025-04-30 08:37:56 -07:00
Daniel Han	fd07824f0f	Update mapper.py	2025-04-30 07:35:02 -07:00
Daniel Han	bf19381ea7	Merge branch 'main' into nightly	2025-04-30 07:31:43 -07:00
Daniel Han	43d483122f	Merge branch 'main' of https://github.com/unslothai/unsloth	2025-04-30 07:31:34 -07:00
Daniel Han	46a0d8c7e5	Synthetic data	2025-04-30 07:31:27 -07:00
Michael Han	f4283a800e	Merge pull request #2439 from Etherll/patch-1 Update mapper.py to add Qwen3 base	2025-04-30 06:18:46 -07:00
Daniel Han	6bde1af86e	Update synthetic.py	2025-04-30 05:21:36 -07:00
Etherll	d4ef475c33	Update mapper.py	2025-04-30 14:39:23 +03:00
Daniel Han	3dee30e5cd	Update synthetic.py	2025-04-30 00:16:37 -07:00
Daniel Han	58111d52b0	Update synthetic.py	2025-04-30 00:10:39 -07:00
Daniel Han	8ad73ecd51	Update synthetic.py	2025-04-30 00:09:00 -07:00
Daniel Han	771f5502c0	Update synthetic.py	2025-04-30 00:07:45 -07:00
Daniel Han	d5759b7e51	Update synthetic.py	2025-04-30 00:05:28 -07:00
Daniel Han	b0c256571e	Update synthetic.py	2025-04-30 00:03:54 -07:00
Daniel Han	85944576e3	Update synthetic.py	2025-04-30 00:02:52 -07:00
Daniel Han	71daff0aff	Update synthetic.py	2025-04-30 00:00:09 -07:00
Daniel Han	7930d38c72	Update synthetic.py	2025-04-29 23:57:46 -07:00
Daniel Han	163f7a1c6d	Update synthetic.py	2025-04-29 23:53:13 -07:00
Daniel Han	f331d5e537	Merge branch 'main' into nightly	2025-04-29 23:48:43 -07:00
Daniel Han	a7eb02790e	Update synthetic.py	2025-04-29 23:48:33 -07:00
Daniel Han	f03961da09	Update synthetic.py	2025-04-29 23:47:09 -07:00
Michael Han	af337af35c	Merge pull request #2436 from Datta0/qwen3_support Qwen3 inference fixes	2025-04-29 20:18:03 -07:00
Dattu Sharma	5640401435	Qwen3 inference fixes	2025-04-30 03:03:38 +00:00
Daniel Han	ecfd56aabe	Update _utils.py	2025-04-29 11:43:24 -07:00
Daniel Han	7b97cf2304	Update synthetic.py	2025-04-29 11:43:13 -07:00
Daniel Han	7e6dbd9ccd	Create __init__.py	2025-04-29 11:18:03 -07:00
Daniel Han	8babbeaded	Create synthetic.py	2025-04-29 11:17:10 -07:00
Daniel Han	9b5446d5a2	Versioning	2025-04-29 09:50:51 -07:00
Daniel Han	d3f419f6ac	Update mapper.py	2025-04-29 00:50:46 -07:00
Michael Han	9945bcb629	Merge pull request #2427 from Datta0/qwen3_support Fixup qwen3 qk norm	2025-04-28 23:45:21 -07:00
Dattu Sharma	8cb2400e45	fixup qwen3 qk norm Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>	2025-04-29 06:22:07 +00:00
Michael Han	f637694085	Merge pull request #2423 from Datta0/qwen3_support Fixup qwen3	2025-04-28 19:32:45 -07:00
Dattu Sharma	26cf059574	fixup qwen3 Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>	2025-04-29 02:31:55 +00:00
Michael Han	53e6fba362	Update README.md	2025-04-28 19:08:12 -07:00
Michael Han	e4ab75296d	Merge pull request #2211 from Datta0/qwen3_support [WIP] Initial support for Qwen3. Will udpate when the model is released	2025-04-28 19:05:40 -07:00
Daniel Han	9d9447b7e1	Merge branch 'main' into nightly	2025-04-26 03:56:23 -07:00
Daniel Han	acea2c09bb	Update pyproject.toml	2025-04-26 03:45:29 -07:00
Daniel Han	de64444965	Versioning	2025-04-26 03:44:21 -07:00
Daniel Han	6366a4764b	Merge branch 'main' into nightly	2025-04-19 19:46:17 -07:00
Michael Han	31a37cb2ce	Merge pull request #2381 from Erland366/fix/saving_vlm_4bit Fix saving 4bit for VLM	2025-04-19 14:26:11 -07:00
Erland366	ed16a50bf9	feat: Add validation for 4bit save method and implement corresponding error handling	2025-04-19 20:36:30 +00:00
Michael Han	f1128eea1b	Merge pull request #2375 from unslothai/revert-2358-patch-1 Revert "fix: improved error handling when llama.cpp build fails"	2025-04-17 20:33:40 -07:00
Michael Han	6b261264a2	Revert "fix: improved error handling when llama.cpp build fails"	2025-04-17 20:33:25 -07:00
Michael Han	7a5b81db74	Merge pull request #2358 from Hansehart/patch-1 fix: improved error handling when llama.cpp build fails	2025-04-17 14:03:45 -07:00
Richi	4b3ae022a9	fix: improved error handling when llama.cpp build fails	2025-04-16 09:17:25 +02:00
Etherll	9fc1e3b945	feat: Support custom `auto_model` for wider model compatibility (Whisper, Bert,etc) & `attn_implementation` support (#2263 ) * Update loader.py * Update vision.py * Update vision.py fix attn_implementation * Refactor: Improve parameter handling and checks in loader/vision	2025-04-14 14:10:05 -07:00
Daniel Han	0e17bfb282	Merge branch 'main' into nightly	2025-04-09 23:39:14 -07:00
Michael Han	a41cda7dfb	Update question.md	2025-04-09 15:31:56 -07:00
Michael Han	76f24eb8a2	Update feature_request.md	2025-04-09 15:30:56 -07:00
Michael Han	ea8b427fc2	Update documentation.md	2025-04-09 15:29:20 -07:00
Michael Han	3b10fbde0a	Update bug_report.md	2025-04-09 15:24:35 -07:00
Michael Han	92c87a4e97	Update question.md	2025-04-09 15:20:44 -07:00
Michael Han	9c0663af1a	Update feature_request.md	2025-04-09 15:20:06 -07:00
Michael Han	61386a6a1a	Update documentation.md	2025-04-09 15:19:40 -07:00
Michael Han	0b5b98d688	Merge pull request #2323 from unslothai/shimmyshimmer-patch-2 Update bug_report.md	2025-04-09 15:19:21 -07:00
Michael Han	f96ff98cc5	Update bug_report.md	2025-04-09 15:17:33 -07:00
Daniel Han	a40b3c5578	Llama4	2025-04-06 01:43:42 -07:00
Michael Han	29b25e36eb	Update README.md	2025-04-05 14:56:01 -07:00
Daniel Han	ea14a66e21	Update _auto_install.py	2025-04-05 14:30:10 -07:00
Michael Han	56f7c1edac	Merge pull request #2119 from jackswl/patch-1 Update README.md	2025-04-02 21:15:45 -07:00
Michael Han	ad9a0e7672	Merge pull request #2267 from Kimizhao/main Update README.md	2025-04-02 02:07:35 -07:00
zhaozh	c107f46b5e	Update README.md Gemma3 HF uploaded GGUFs, 4-bit models link.	2025-04-02 16:10:21 +08:00
datta0	00589310df	add comments and use modified function	2025-04-02 06:49:06 +00:00
Michael Han	fd20192aef	Merge pull request #2255 from jeromeku/registry-refactor Registry refactor	2025-04-01 23:16:27 -07:00
Daniel Han	5ae9c4359b	Merge branch 'main' into nightly	2025-04-01 14:15:13 -07:00
jeromeku	9a14edcd2f	more registry readme updates	2025-03-31 18:34:18 -07:00
jeromeku	b33970525c	clear registry when executing individual model registration file	2025-03-31 18:24:15 -07:00
jeromeku	8d393f29c1	make llama registration more specific	2025-03-31 18:12:52 -07:00
jeromeku	e2cfec6339	add registry readme	2025-03-31 18:11:58 -07:00
jeromeku	ecf70d6caa	add quant type test	2025-03-31 17:58:44 -07:00
jeromeku	bb66d454e2	remove deprecated registration api	2025-03-31 17:36:42 -07:00
jeromeku	959727a2d2	add model search method	2025-03-31 17:36:19 -07:00
jeromeku	d93120db9d	refactor model registration tests for new registry apis	2025-03-31 17:22:26 -07:00
jeromeku	2ff490e23b	add global register models	2025-03-31 17:11:35 -07:00
jeromeku	65ea6356e4	refactor naming for mistral and phi	2025-03-31 17:08:11 -07:00
jeromeku	9a276978d2	rename deepseek registration methods	2025-03-31 17:03:05 -07:00
jeromeku	16f644e95d	rename model registration methods	2025-03-31 17:01:51 -07:00
jeromeku	5c402c9e82	add mistral small to registry	2025-03-31 15:31:01 -07:00
jeromeku	025e22b666	remove redundant code when constructing model names	2025-03-31 15:06:08 -07:00
Michael Han	a8517a3009	Merge pull request #2250 from unslothai/jeromeku-patch-1 Fix feature_request ISSUE_TEMPLATE	2025-03-31 12:51:21 -07:00
jeromeku	79299596cd	Fix feature_request ISSUE_TEMPLATE	2025-03-31 12:28:44 -07:00
jeromeku	7157f3c47c	add deepseek distill models	2025-03-31 12:04:57 -07:00
jeromeku	7dec39e3b6	add deepseek distill llama	2025-03-31 11:47:51 -07:00
jeromeku	767044e7f2	add deepseek r1 zero	2025-03-31 11:32:21 -07:00
jeromeku	9f8f78c90b	add deepseek r1 base	2025-03-31 11:30:47 -07:00
jeromeku	e2ff538fc5	add deepseek v3	2025-03-31 11:23:22 -07:00
jeromeku	a46811c471	add phi	2025-03-31 10:22:50 -07:00
jeromeku	756af9f35f	add gemma3 to registry	2025-03-31 10:10:20 -07:00
jeromeku	2222e5ad58	add QwenQVQ to registry	2025-03-31 09:45:15 -07:00
jeromeku	76a2b62766	separate registration of base and instruct llama3.2	2025-03-31 09:35:11 -07:00
jeromeku	0f0aa0c476	handle quant types per model size	2025-03-31 09:27:43 -07:00
jeromeku	671dd3dc14	add option to include original model in registry	2025-03-31 09:09:34 -07:00
jeromeku	0395604928	add qwen2.5 models to registry	2025-03-31 08:45:53 -07:00
Michael Han	79c35a51ba	Merge pull request #2242 from jeromeku/issues-templates Issues templates	2025-03-30 22:04:47 -07:00
jeromeku	95da062046	fix quant tag mapping	2025-03-30 16:14:33 -07:00
jeromeku	7f0059c881	make wording more user-friendly	2025-03-30 15:52:31 -07:00
jeromeku	ee953ef710	improve wording	2025-03-30 15:47:33 -07:00
jeromeku	39926f1730	more edits	2025-03-30 15:46:25 -07:00
jeromeku	bbc5981054	fix typos, better wording	2025-03-30 15:39:20 -07:00
jeromeku	7b113bbe99	make templates more concise	2025-03-30 15:36:45 -07:00
jeromeku	e204fdbd0e	more template edits	2025-03-30 15:34:01 -07:00
jeromeku	e00e3dd3a2	generalize documentation template	2025-03-30 15:21:47 -07:00
jeromeku	c3d517406d	fix question template	2025-03-30 15:19:58 -07:00
jeromeku	7d93ca559f	clean up bug template	2025-03-30 15:19:00 -07:00
jeromeku	d620b54909	fix template labels	2025-03-30 15:11:00 -07:00
jeromeku	c2ea782cd0	add question template	2025-03-30 15:08:38 -07:00
jeromeku	aeffab10f8	Update custom.md	2025-03-30 15:06:42 -07:00
jeromeku	8a587f232e	Update issue templates	2025-03-30 15:06:42 -07:00
jeromeku	dcbc2fa776	Update issue templates	2025-03-30 15:06:41 -07:00
jeromeku	dac21f8bdf	Update and rename custom.md to documentation_request.md	2025-03-30 15:06:41 -07:00
jeromeku	4e62b2180e	Update issue templates	2025-03-30 15:06:41 -07:00
jeromeku	0130265ca8	add llama model registration	2025-03-30 15:05:33 -07:00
jeromeku	6abdb1fef6	remap literal quant types to QuantType Enum	2025-03-30 14:39:57 -07:00
jeromeku	6b4bf12873	quant types -> Enum	2025-03-30 14:37:30 -07:00
jeromeku	3d1249a551	add llama vision	2025-03-30 11:44:52 -07:00
jeromeku	ab7c51b4a5	start registry reog	2025-03-30 11:36:48 -07:00
jeromeku	35e3b48c2b	remove deprecated key function	2025-03-30 11:06:59 -07:00
jeromeku	85209602f3	fix llama registration	2025-03-30 11:06:11 -07:00
jeromeku	410c4b4c76	fix dataclass init	2025-03-30 10:58:51 -07:00
jeromeku	4b3df3d214	refactor global model info dicts to dataclasses	2025-03-30 10:43:00 -07:00
jeromeku	f21d61ae5d	move hf hub utils to unsloth/utils	2025-03-28 16:54:38 -07:00
jeromeku	6f6a7a5e9b	add model registry	2025-03-28 16:49:12 -07:00
datta0	406eb8cc71	Enable qwen3 and qwen3moe	2025-03-28 05:58:01 +00:00
datta0	71424b35ad	Add Qwen3Moe and necessitate transformers version	2025-03-28 05:50:43 +00:00
datta0	73c42e5ded	Initial support for Qwen3. Will udpate when the model is released	2025-03-27 14:35:57 +00:00
Michael Han	0b8e01ddb9	Update README.md	2025-03-27 00:26:18 -07:00
Jack Shi Wei Lun	949ac8eb6f	Update README.md	2025-03-26 21:20:16 +08:00
Daniel Han	7afd6afe2c	Merge branch 'main' into nightly	2025-03-26 05:21:24 -07:00
Daniel Han	2dc3930a53	Bug Fixes (#2197 ) * Update loader.py * model names * Gemma 3 chat template * Bug fixes * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update rl.py * Update chat_templates.py * Update chat_templates.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Revert * Update _utils.py * forced precision * Autocast * Update vision.py * Update vision.py * Update rl.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl.py * vLLM fixes * constexpr * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update save.py * New models * Triton windows update (#1976) * Update pyproject.toml * Update README.md * Update RMS LayerNorm implementation, and list compr. change in chat templates (#1974) * Update RMS LayerNorm implementation with optimizations and testing suite * perf: optimize list comprehension in get_ollama_eos_tokens * Update Zoo * Update llama.py * Update llama.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update vision.py * grpo fix * Update rl_replacements.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update mapper.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update rl.py * Update _utils.py * Version * Update pyproject.toml * Update llama.py * Update llama.py * bug fix #2008 (#2039) * fix (#2051) * Update loader.py * Update pyproject.toml * Update pyproject.toml * Update vision.py * more prints * Update loader.py * LoRA 16bit fix * Update vision.py * Update vision.py * Update _utils.py * Update vision.py * move forced float32 * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * move print * Update _utils.py * disable bfloat16 * Fix forced float32 * move float32 * Ensure trust_remote_code propegates down to unsloth_compile_transformers (#2075) * Update _utils.py * Show both `peft_error` and `autoconfig_error`, not just `autoconfig_error` (#2080) When loading a PEFT model fails, only the `autoconfig_error` is shown. Instead of the `peft_error`, which is what really matters when we're trying to load a PEFT adapter, the user will see something like this: ``` RuntimeError: Unrecognized model in my_model. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, ... ``` This PR just changes it so `autoconfig_error` and `peft_error` are both displayed. * fix error message (#2046) * Update vision.py * Update _utils.py * Update pyproject.toml * Update __init__.py * Update __init__.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Remove double generate patch * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update mapper.py * Update vision.py * fix: config.torch_dtype in LlamaModel_fast_forward_inference (#2091) * fix: config.torch_dtype in LlamaModel_fast_forward_inference * Update llama.py * update for consistency --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * model_type_arch * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * check * Update _utils.py * Update loader.py * Update loader.py * Remove prints * Update _utils.py * Update _utils.py * versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update vision.py * HF Transfer * fix(utils): add missing importlib import to fix NameError (#2134) This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled. By adding the missing import statement, the code will no longer throw a NameError. * Add QLoRA Train and Merge16bit Test (#2130) * add reference and unsloth lora merging tests * add test / dataset printing to test scripts * allow running tests from repo root * add qlora test readme * more readme edits * ruff formatting * additional readme comments * forgot to add actual tests * add apache license * Update pyproject.toml * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Revert * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Bug fix * Update mapper.py * check SDPA for Mistral 3, Pixtral * Update vision.py * Versioning * Update rl_replacements.py --------- Co-authored-by: Akshay Behl <126911424+Captain-T2004@users.noreply.github.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com> Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com> Co-authored-by: Xander Hawthorne <167850078+CuppaXanax@users.noreply.github.com> Co-authored-by: Isaac Breen <isaac.breen@icloud.com> Co-authored-by: lurf21 <93976703+lurf21@users.noreply.github.com> Co-authored-by: naliazheli <nalia0316@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com>	2025-03-26 05:19:48 -07:00
Daniel Han	9bbba3a511	Update rl_replacements.py	2025-03-26 05:18:18 -07:00
Daniel Han	8e6dfed0ec	Versioning	2025-03-26 04:29:44 -07:00
Daniel Han	c6588c73b9	Update vision.py	2025-03-26 04:13:43 -07:00
Daniel Han	599bfeb38c	check SDPA for Mistral 3, Pixtral	2025-03-26 04:11:27 -07:00
Daniel Han	98d946acad	Update mapper.py	2025-03-26 03:54:57 -07:00
Daniel Han	f197aa2b0a	Bug fix	2025-03-26 03:50:46 -07:00
Daniel Han	adf8fb9b80	Update vision.py	2025-03-26 03:13:48 -07:00
Daniel Han	b9914974cd	Update vision.py	2025-03-25 23:46:35 -07:00
Daniel Han	fc9f708e8d	Update vision.py	2025-03-25 23:31:33 -07:00
Daniel Han	59d04bb523	Update vision.py	2025-03-25 23:20:15 -07:00
Daniel Han	416cfd5cab	Update vision.py	2025-03-25 23:20:00 -07:00
Daniel Han	5ade8353d2	Revert	2025-03-25 23:17:19 -07:00
Daniel Han	1a67227923	Update loader.py	2025-03-25 23:15:14 -07:00
Daniel Han	8080c4edb0	Update loader.py	2025-03-25 23:13:26 -07:00
Daniel Han	ebccb588b7	Update vision.py	2025-03-25 23:07:35 -07:00
Daniel Han	8e6d90bec7	Update vision.py	2025-03-25 23:04:11 -07:00
Daniel Han	81984ed2a2	Update vision.py	2025-03-25 22:50:08 -07:00
Daniel Han	92c612b100	Update vision.py	2025-03-25 22:26:24 -07:00
Daniel Han	d2b5c807cc	Merge branch 'main' into nightly	2025-03-21 18:02:09 -07:00
Daniel Han	b126e8947d	Update pyproject.toml	2025-03-21 18:02:05 -07:00
Daniel Han	6a50448564	Merge branch 'main' into nightly	2025-03-21 17:55:39 -07:00
Daniel Han	c466303956	Fix Transformers 4.45 (#2151 ) * Update pyproject.toml * Update _utils.py * Update _utils.py * Update _utils.py * Batch samples * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update mapper.py * Update vision.py * Temporary patches * Update loader.py * model names * Gemma 3 chat template * Bug fixes * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update rl.py * Update chat_templates.py * Update chat_templates.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Revert * Update _utils.py * forced precision * Autocast * Update vision.py * Update vision.py * Update rl.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl.py * vLLM fixes * constexpr * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update save.py * New models * Triton windows update (#1976) * Update pyproject.toml * Update README.md * Update RMS LayerNorm implementation, and list compr. change in chat templates (#1974) * Update RMS LayerNorm implementation with optimizations and testing suite * perf: optimize list comprehension in get_ollama_eos_tokens * Update Zoo * Update llama.py * Update llama.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update vision.py * grpo fix * Update rl_replacements.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update mapper.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update rl.py * Update _utils.py * Version * Update pyproject.toml * Update llama.py * Update llama.py * bug fix #2008 (#2039) * fix (#2051) * Update loader.py * Update pyproject.toml * Update pyproject.toml * Update vision.py * more prints * Update loader.py * LoRA 16bit fix * Update vision.py * Update vision.py * Update _utils.py * Update vision.py * move forced float32 * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * move print * Update _utils.py * disable bfloat16 * Fix forced float32 * move float32 * Ensure trust_remote_code propegates down to unsloth_compile_transformers (#2075) * Update _utils.py * Show both `peft_error` and `autoconfig_error`, not just `autoconfig_error` (#2080) When loading a PEFT model fails, only the `autoconfig_error` is shown. Instead of the `peft_error`, which is what really matters when we're trying to load a PEFT adapter, the user will see something like this: ``` RuntimeError: Unrecognized model in my_model. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, ... ``` This PR just changes it so `autoconfig_error` and `peft_error` are both displayed. * fix error message (#2046) * Update vision.py * Update _utils.py * Update pyproject.toml * Update __init__.py * Update __init__.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Remove double generate patch * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update mapper.py * Update vision.py * fix: config.torch_dtype in LlamaModel_fast_forward_inference (#2091) * fix: config.torch_dtype in LlamaModel_fast_forward_inference * Update llama.py * update for consistency --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * model_type_arch * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * check * Update _utils.py * Update loader.py * Update loader.py * Remove prints * Update _utils.py * Update _utils.py * versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update vision.py * HF Transfer * fix(utils): add missing importlib import to fix NameError (#2134) This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled. By adding the missing import statement, the code will no longer throw a NameError. * Add QLoRA Train and Merge16bit Test (#2130) * add reference and unsloth lora merging tests * add test / dataset printing to test scripts * allow running tests from repo root * add qlora test readme * more readme edits * ruff formatting * additional readme comments * forgot to add actual tests * add apache license * Update pyproject.toml --------- Co-authored-by: Akshay Behl <126911424+Captain-T2004@users.noreply.github.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com> Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com> Co-authored-by: Xander Hawthorne <167850078+CuppaXanax@users.noreply.github.com> Co-authored-by: Isaac Breen <isaac.breen@icloud.com> Co-authored-by: lurf21 <93976703+lurf21@users.noreply.github.com> Co-authored-by: naliazheli <nalia0316@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com>	2025-03-21 17:55:12 -07:00
Daniel Han	65aca02a25	Update pyproject.toml	2025-03-21 17:54:07 -07:00
jeromeku	4a18c881c5	Add QLoRA Train and Merge16bit Test (#2130 ) * add reference and unsloth lora merging tests * add test / dataset printing to test scripts * allow running tests from repo root * add qlora test readme * more readme edits * ruff formatting * additional readme comments * forgot to add actual tests * add apache license	2025-03-21 17:53:37 -07:00
naliazheli	472dd5462f	fix(utils): add missing importlib import to fix NameError (#2134 ) This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled. By adding the missing import statement, the code will no longer throw a NameError.	2025-03-21 17:44:25 -07:00
Daniel Han	db7c23ec5a	HF Transfer	2025-03-21 17:41:40 -07:00
Daniel Han	7bf58c2465	Update vision.py	2025-03-21 17:39:51 -07:00
Daniel Han	90a5593bf0	Update llama.py	2025-03-21 17:38:39 -07:00
Daniel Han	8e9446e848	Update llama.py	2025-03-21 17:34:27 -07:00
Daniel Han	7d04000128	Update llama.py	2025-03-21 17:26:20 -07:00
Daniel Han	75e0dee7fc	Update llama.py	2025-03-21 17:24:41 -07:00
Daniel Han	f8b1b21d43	Update llama.py	2025-03-21 17:14:52 -07:00
Daniel Han	dbb7bd3a7a	Update llama.py	2025-03-21 17:13:14 -07:00
Daniel Han	4ce55e6201	Update llama.py	2025-03-21 17:08:00 -07:00
Daniel Han	eaded6d504	Update llama.py	2025-03-21 17:06:06 -07:00
Daniel Han	f900297fca	Update llama.py	2025-03-21 17:05:43 -07:00
Daniel Han	aca67486c3	Update llama.py	2025-03-21 17:01:11 -07:00
Daniel Han	df44d6a4fb	Update llama.py	2025-03-21 17:00:35 -07:00
Daniel Han	cc3c81e2ed	Update llama.py	2025-03-21 16:58:00 -07:00
Daniel Han	d360225a84	Update llama.py	2025-03-21 16:53:44 -07:00
Daniel Han	8014699bec	Update llama.py	2025-03-21 16:50:29 -07:00
Daniel Han	6efb8b1143	Update llama.py	2025-03-21 16:48:43 -07:00
Daniel Han	ce20a1f250	Update llama.py	2025-03-21 16:44:10 -07:00
Daniel Han	e666023a5e	Update llama.py	2025-03-21 16:42:32 -07:00
Daniel Han	8c5be02e29	Update llama.py	2025-03-21 15:50:32 -07:00
Daniel Han	db4b5c9715	Update llama.py	2025-03-21 15:50:05 -07:00
Daniel Han	0b847e7c91	Update llama.py	2025-03-21 15:46:47 -07:00
Daniel Han	826e7775d8	Update llama.py	2025-03-21 15:44:38 -07:00
Daniel Han	678bdda2b8	Update llama.py	2025-03-21 15:43:21 -07:00
Daniel Han	254988c404	Update llama.py	2025-03-21 15:39:43 -07:00
Daniel Han	f79b6b4cbe	Update llama.py	2025-03-21 15:31:00 -07:00
Daniel Han	f12572ee0d	Update llama.py	2025-03-21 15:27:47 -07:00
Daniel Han	e783ba5795	Update _utils.py	2025-03-21 15:25:06 -07:00
Daniel Han	2f6d65c934	Update _utils.py	2025-03-21 15:24:17 -07:00
Daniel Han	4057bfe021	Update _utils.py	2025-03-21 15:20:48 -07:00
Daniel Han	82c5e7a45c	versioning	2025-03-21 15:18:11 -07:00
Daniel Han	d7dfe1e9b0	Update _utils.py	2025-03-21 15:17:40 -07:00
Daniel Han	544824eb32	Update _utils.py	2025-03-21 15:17:30 -07:00
Daniel Han	b69472a198	Merge branch 'main' into nightly	2025-03-21 15:17:16 -07:00
Jack Shi Wei Lun	f616b65a17	Update README.md typo	2025-03-20 13:13:58 +08:00
Daniel Han	eaf27d5b43	Small fix (#2114 ) * versioning * Update _utils.py * Update llama.py * Update llama.py * Bug fixes * FastModel * __doc__ * Update vision.py * Update loader.py * Update loader.py * Update loader.py * version * move use_modelscope to _utils (#1938) * move use_modelscope to _utils * Update _utils.py * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Don't use revision when loading model_config and is_peft=True (#1949) * More syntax warnings (#1944) * move use_modelscope to _utils * fix * Update _utils.py * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Full finetuning and other fixes * UNSLOTH_ENABLE_FULL_FINETUNING * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * full finetuning * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * max_seq_length * Update rl.py * Update rl.py * Update rl.py * Update pyproject.toml * AutoModelForImageTextToText * Update mapper.py * Update pyproject.toml * Update _utils.py * Update _utils.py * Update _utils.py * Batch samples * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update mapper.py * Update vision.py * Temporary patches * Update loader.py * model names * Gemma 3 chat template * Bug fixes * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update rl.py * Update chat_templates.py * Update chat_templates.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Revert * Update _utils.py * forced precision * Autocast * Update vision.py * Update vision.py * Update rl.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl.py * vLLM fixes * constexpr * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update save.py * New models * Triton windows update (#1976) * Update pyproject.toml * Update README.md * Update RMS LayerNorm implementation, and list compr. change in chat templates (#1974) * Update RMS LayerNorm implementation with optimizations and testing suite * perf: optimize list comprehension in get_ollama_eos_tokens * Update Zoo * Update llama.py * Update llama.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update vision.py * grpo fix * Update rl_replacements.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update mapper.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update rl.py * Update _utils.py * Version * Update pyproject.toml * Update llama.py * Update llama.py * bug fix #2008 (#2039) * fix (#2051) * Update loader.py * Update pyproject.toml * Update pyproject.toml * Update vision.py * more prints * Update loader.py * LoRA 16bit fix * Update vision.py * Update vision.py * Update _utils.py * Update vision.py * move forced float32 * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * move print * Update _utils.py * disable bfloat16 * Fix forced float32 * move float32 * Ensure trust_remote_code propegates down to unsloth_compile_transformers (#2075) * Update _utils.py * Show both `peft_error` and `autoconfig_error`, not just `autoconfig_error` (#2080) When loading a PEFT model fails, only the `autoconfig_error` is shown. Instead of the `peft_error`, which is what really matters when we're trying to load a PEFT adapter, the user will see something like this: ``` RuntimeError: Unrecognized model in my_model. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, ... ``` This PR just changes it so `autoconfig_error` and `peft_error` are both displayed. * fix error message (#2046) * Update vision.py * Update _utils.py * Update pyproject.toml * Update __init__.py * Update __init__.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Remove double generate patch * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update mapper.py * Update vision.py * fix: config.torch_dtype in LlamaModel_fast_forward_inference (#2091) * fix: config.torch_dtype in LlamaModel_fast_forward_inference * Update llama.py * update for consistency --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * model_type_arch * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * check * Update _utils.py * Update loader.py * Update loader.py * Remove prints --------- Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com> Co-authored-by: Wilson Wu <140025193+wiwu2390@users.noreply.github.com> Co-authored-by: Akshay Behl <126911424+Captain-T2004@users.noreply.github.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com> Co-authored-by: Xander Hawthorne <167850078+CuppaXanax@users.noreply.github.com> Co-authored-by: Isaac Breen <isaac.breen@icloud.com> Co-authored-by: lurf21 <93976703+lurf21@users.noreply.github.com>	2025-03-19 08:45:52 -07:00
Daniel Han	194508d561	Remove prints	2025-03-19 08:44:22 -07:00
Daniel Han	063cca03c8	Update loader.py	2025-03-19 08:41:40 -07:00
Daniel Han	305c362ba8	Update loader.py	2025-03-19 08:37:53 -07:00
Daniel Han	3424ad1599	Update _utils.py	2025-03-19 08:31:00 -07:00
Daniel Han	1ca384f8ab	check	2025-03-19 08:27:36 -07:00
Daniel Han	2e7da38488	Update loader.py	2025-03-19 08:24:31 -07:00
Daniel Han	40d1f36e5c	Merge branch 'main' into nightly	2025-03-19 08:24:29 -07:00
Daniel Han	1c5676a83f	Bug fixes (#2113 ) * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Update _utils.py * Version * versioning * Update _utils.py * Update llama.py * Update llama.py * Bug fixes * FastModel * __doc__ * Update vision.py * Update loader.py * Update loader.py * Update loader.py * version * move use_modelscope to _utils (#1938) * move use_modelscope to _utils * Update _utils.py * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Don't use revision when loading model_config and is_peft=True (#1949) * More syntax warnings (#1944) * move use_modelscope to _utils * fix * Update _utils.py * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Full finetuning and other fixes * UNSLOTH_ENABLE_FULL_FINETUNING * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * full finetuning * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * max_seq_length * Update rl.py * Update rl.py * Update rl.py * Update pyproject.toml * AutoModelForImageTextToText * Update mapper.py * Update pyproject.toml * Update _utils.py * Update _utils.py * Update _utils.py * Batch samples * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update mapper.py * Update vision.py * Temporary patches * Update loader.py * model names * Gemma 3 chat template * Bug fixes * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update rl.py * Update chat_templates.py * Update chat_templates.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Revert * Update _utils.py * forced precision * Autocast * Update vision.py * Update vision.py * Update rl.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl.py * vLLM fixes * constexpr * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update save.py * New models * Triton windows update (#1976) * Update pyproject.toml * Update README.md * Update RMS LayerNorm implementation, and list compr. change in chat templates (#1974) * Update RMS LayerNorm implementation with optimizations and testing suite * perf: optimize list comprehension in get_ollama_eos_tokens * Update Zoo * Update llama.py * Update llama.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update vision.py * grpo fix * Update rl_replacements.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update mapper.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update rl.py * Update _utils.py * Version * Update pyproject.toml * Update llama.py * Update llama.py * bug fix #2008 (#2039) * fix (#2051) * Update loader.py * Update pyproject.toml * Update pyproject.toml * Update vision.py * more prints * Update loader.py * LoRA 16bit fix * Update vision.py * Update vision.py * Update _utils.py * Update vision.py * move forced float32 * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * move print * Update _utils.py * disable bfloat16 * Fix forced float32 * move float32 * Ensure trust_remote_code propegates down to unsloth_compile_transformers (#2075) * Update _utils.py * Show both `peft_error` and `autoconfig_error`, not just `autoconfig_error` (#2080) When loading a PEFT model fails, only the `autoconfig_error` is shown. Instead of the `peft_error`, which is what really matters when we're trying to load a PEFT adapter, the user will see something like this: ``` RuntimeError: Unrecognized model in my_model. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, ... ``` This PR just changes it so `autoconfig_error` and `peft_error` are both displayed. * fix error message (#2046) * Update vision.py * Update _utils.py * Update pyproject.toml * Update __init__.py * Update __init__.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Remove double generate patch * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update mapper.py * Update vision.py * fix: config.torch_dtype in LlamaModel_fast_forward_inference (#2091) * fix: config.torch_dtype in LlamaModel_fast_forward_inference * Update llama.py * update for consistency --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * model_type_arch * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py --------- Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com> Co-authored-by: Wilson Wu <140025193+wiwu2390@users.noreply.github.com> Co-authored-by: Akshay Behl <126911424+Captain-T2004@users.noreply.github.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com> Co-authored-by: Xander Hawthorne <167850078+CuppaXanax@users.noreply.github.com> Co-authored-by: Isaac Breen <isaac.breen@icloud.com> Co-authored-by: lurf21 <93976703+lurf21@users.noreply.github.com>	2025-03-19 07:12:08 -07:00
Daniel Han	4c89bec1bd	Merge branch 'main' into nightly	2025-03-19 05:37:36 -07:00
Daniel Han	c43a785e7c	Update vision.py	2025-03-19 04:34:39 -07:00
Daniel Han	db49a4de37	Update vision.py	2025-03-19 04:30:06 -07:00
Michael Han	2d0885f33f	Merge pull request #2110 from unslothai/shimmyshimmer-patch-1 Updating new FFT 8bit support	2025-03-19 04:24:32 -07:00
Michael Han	d8fc81f47b	Update README.md	2025-03-19 04:23:52 -07:00
Michael Han	2f0de2be1f	Update README.md	2025-03-19 04:21:39 -07:00
Daniel Han	3c1b5a09b6	Update vision.py	2025-03-19 03:45:28 -07:00
Daniel Han	bd9c7d353e	Update vision.py	2025-03-19 03:27:49 -07:00
Daniel Han	d8bf75417e	Update vision.py	2025-03-19 03:20:55 -07:00
Daniel Han	84ae94b124	Update vision.py	2025-03-19 03:08:17 -07:00
Daniel Han	d2ab2860c8	model_type_arch	2025-03-19 03:03:43 -07:00
Daniel Han	f5569ca2b4	Update vision.py	2025-03-19 02:59:40 -07:00
Daniel Han	f2dffa4537	Update vision.py	2025-03-19 02:52:25 -07:00
Daniel Han	a706ec3d17	Update vision.py	2025-03-19 02:50:24 -07:00
Daniel Han	ab1441ccfd	Update vision.py	2025-03-19 02:50:08 -07:00
Daniel Han	a6ae11a426	Update vision.py	2025-03-19 02:47:29 -07:00
Daniel Han	52c62b9420	Update vision.py	2025-03-19 02:41:43 -07:00
Daniel Han	69ab65dfc7	Update vision.py	2025-03-19 02:39:50 -07:00
Daniel Han	67b48d9db4	Update vision.py	2025-03-19 02:29:32 -07:00
Daniel Han	a4d9b192b5	Update vision.py	2025-03-19 02:28:09 -07:00
Daniel Han	17a6bc1bd4	Update vision.py	2025-03-19 02:17:17 -07:00
Daniel Han	c417b1f67e	Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly	2025-03-19 02:17:05 -07:00
Daniel Han	333aff6e84	versioning	2025-03-19 02:14:54 -07:00
lurf21	00a98f17f5	fix: config.torch_dtype in LlamaModel_fast_forward_inference (#2091 ) * fix: config.torch_dtype in LlamaModel_fast_forward_inference * Update llama.py * update for consistency --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-03-19 02:06:48 -07:00
Daniel Han	10d27a5179	Update vision.py	2025-03-19 02:04:12 -07:00
Daniel Han	09b1d254f8	Update mapper.py	2025-03-19 01:59:17 -07:00
Daniel Han	1f9e5769f3	Update vision.py	2025-03-19 01:33:43 -07:00
Daniel Han	d14b36e157	Update vision.py	2025-03-19 01:31:26 -07:00
Daniel Han	cdd005bf5a	Update vision.py	2025-03-18 23:53:39 -07:00
Daniel Han	8886f3aa31	Update vision.py	2025-03-18 23:42:26 -07:00
Daniel Han	71c405eda5	Update vision.py	2025-03-18 23:37:34 -07:00
Daniel Han	2795865d6f	Remove double generate patch	2025-03-18 23:06:20 -07:00
Daniel Han	897aef9899	Update vision.py	2025-03-18 22:46:30 -07:00
Daniel Han	3109603785	Update vision.py	2025-03-18 22:33:18 -07:00
Daniel Han	3622d7e76d	Update vision.py	2025-03-18 22:29:55 -07:00
Daniel Han	f5c94ba3a7	Update vision.py	2025-03-18 22:26:58 -07:00
Daniel Han	8831d1d440	Update vision.py	2025-03-18 21:08:07 -07:00
Daniel Han	9660605540	Update vision.py	2025-03-18 05:29:09 -07:00
Daniel Han	1556e64864	Merge branch 'main' into nightly	2025-03-18 05:25:09 -07:00
Daniel Han	49eece4d94	Many bug fixes (#2087 ) * _wrap_fast_inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * SFT dataset prepare * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update llama.py * Update llama.py * Update utils.py * bug fix * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Update _utils.py * Version * versioning * Update _utils.py * Update llama.py * Update llama.py * Bug fixes * FastModel * __doc__ * Update vision.py * Update loader.py * Update loader.py * Update loader.py * version * move use_modelscope to _utils (#1938) * move use_modelscope to _utils * Update _utils.py * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Don't use revision when loading model_config and is_peft=True (#1949) * More syntax warnings (#1944) * move use_modelscope to _utils * fix * Update _utils.py * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Full finetuning and other fixes * UNSLOTH_ENABLE_FULL_FINETUNING * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * full finetuning * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * max_seq_length * Update rl.py * Update rl.py * Update rl.py * Update pyproject.toml * AutoModelForImageTextToText * Update mapper.py * Update pyproject.toml * Update _utils.py * Update _utils.py * Update _utils.py * Batch samples * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update mapper.py * Update vision.py * Temporary patches * Update loader.py * model names * Gemma 3 chat template * Bug fixes * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update rl.py * Update chat_templates.py * Update chat_templates.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Revert * Update _utils.py * forced precision * Autocast * Update vision.py * Update vision.py * Update rl.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl.py * vLLM fixes * constexpr * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update save.py * New models * Triton windows update (#1976) * Update pyproject.toml * Update README.md * Update RMS LayerNorm implementation, and list compr. change in chat templates (#1974) * Update RMS LayerNorm implementation with optimizations and testing suite * perf: optimize list comprehension in get_ollama_eos_tokens * Update Zoo * Update llama.py * Update llama.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update vision.py * grpo fix * Update rl_replacements.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update mapper.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update rl.py * Update _utils.py * Version * Update pyproject.toml * Update llama.py * Update llama.py * bug fix #2008 (#2039) * fix (#2051) * Update loader.py * Update pyproject.toml * Update pyproject.toml * Update vision.py * more prints * Update loader.py * LoRA 16bit fix * Update vision.py * Update vision.py * Update _utils.py * Update vision.py * move forced float32 * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * move print * Update _utils.py * disable bfloat16 * Fix forced float32 * move float32 * Ensure trust_remote_code propegates down to unsloth_compile_transformers (#2075) * Update _utils.py * Show both `peft_error` and `autoconfig_error`, not just `autoconfig_error` (#2080) When loading a PEFT model fails, only the `autoconfig_error` is shown. Instead of the `peft_error`, which is what really matters when we're trying to load a PEFT adapter, the user will see something like this: ``` RuntimeError: Unrecognized model in my_model. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, ... ``` This PR just changes it so `autoconfig_error` and `peft_error` are both displayed. * fix error message (#2046) * Update vision.py * Update _utils.py * Update pyproject.toml * Update __init__.py * Update __init__.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update rl_replacements.py --------- Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com> Co-authored-by: Wilson Wu <140025193+wiwu2390@users.noreply.github.com> Co-authored-by: Akshay Behl <126911424+Captain-T2004@users.noreply.github.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com> Co-authored-by: Xander Hawthorne <167850078+CuppaXanax@users.noreply.github.com> Co-authored-by: Isaac Breen <isaac.breen@icloud.com>	2025-03-18 05:15:57 -07:00
Daniel Han	53d1fac54f	Update rl_replacements.py	2025-03-18 05:09:35 -07:00
Daniel Han	51503620cc	Update vision.py	2025-03-18 05:01:27 -07:00
Daniel Han	3eb4d76b28	Update rl_replacements.py	2025-03-18 04:59:01 -07:00
Daniel Han	a8f24ea882	Update vision.py	2025-03-18 04:51:06 -07:00
Daniel Han	cc0b7135e3	Update vision.py	2025-03-18 04:50:18 -07:00
Daniel Han	fad1ad9122	Update vision.py	2025-03-18 04:49:35 -07:00
Daniel Han	aca62cfa01	Update vision.py	2025-03-18 03:45:55 -07:00
Daniel Han	141084e7a1	Update vision.py	2025-03-18 02:27:20 -07:00
Daniel Han	16237517de	Update rl_replacements.py	2025-03-18 02:26:50 -07:00
Daniel Han	0c91c02196	Update rl_replacements.py	2025-03-18 02:23:03 -07:00
Daniel Han	a3f76fb4ad	Update rl_replacements.py	2025-03-18 02:05:49 -07:00
Daniel Han	9a60fdce08	Update rl_replacements.py	2025-03-18 02:05:30 -07:00
Daniel Han	f906b6de13	Update vision.py	2025-03-18 01:59:39 -07:00
Daniel Han	043c9f12b8	Update vision.py	2025-03-18 01:53:52 -07:00
Daniel Han	28f80c8957	Update vision.py	2025-03-18 01:44:50 -07:00
Daniel Han	1f12ce24c8	Update vision.py	2025-03-18 01:43:34 -07:00
Daniel Han	1ce93de34b	Update vision.py	2025-03-18 01:42:28 -07:00
Daniel Han	f095c5d51d	Update vision.py	2025-03-18 01:33:38 -07:00
Daniel Han	f9dee6fd1f	Update vision.py	2025-03-18 01:33:29 -07:00
Daniel Han	e1470efb3d	Update vision.py	2025-03-18 01:28:49 -07:00
Daniel Han	6b7f14f5b1	Update vision.py	2025-03-18 01:27:39 -07:00
Daniel Han	4f78da7e93	Update __init__.py	2025-03-18 01:10:14 -07:00
Daniel Han	91bfdca738	Update __init__.py	2025-03-18 01:10:04 -07:00
Daniel Han	3cf27c1039	Update pyproject.toml	2025-03-18 01:09:51 -07:00
Daniel Han	098692bd96	Update _utils.py	2025-03-18 01:05:58 -07:00
Daniel Han	28b96e7e77	Update vision.py	2025-03-18 01:01:46 -07:00
Kareem	5d87090e4d	fix error message (#2046 )	2025-03-17 21:46:20 -07:00
Isaac Breen	7eba1b9708	Show both `peft_error` and `autoconfig_error`, not just `autoconfig_error` (#2080 ) When loading a PEFT model fails, only the `autoconfig_error` is shown. Instead of the `peft_error`, which is what really matters when we're trying to load a PEFT adapter, the user will see something like this: ``` RuntimeError: Unrecognized model in my_model. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, ... ``` This PR just changes it so `autoconfig_error` and `peft_error` are both displayed.	2025-03-17 21:45:29 -07:00
Daniel Han	a22a6ba6bc	Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly	2025-03-17 21:44:00 -07:00
Daniel Han	1bd7c733a6	Update _utils.py	2025-03-17 21:43:51 -07:00
Xander Hawthorne	04bcab022c	Ensure trust_remote_code propegates down to unsloth_compile_transformers (#2075 )	2025-03-17 21:43:47 -07:00
Daniel Han	ebf2a4abb6	move float32	2025-03-17 19:42:49 -07:00
Daniel Han	56a8dc6067	Fix forced float32	2025-03-17 19:24:21 -07:00
Daniel Han	a731e2fb6c	disable bfloat16	2025-03-17 18:28:46 -07:00
Daniel Han	7dd9817bd0	Update _utils.py	2025-03-17 16:57:14 -07:00
Daniel Han	845a253ed4	move print	2025-03-17 04:51:28 -07:00
Daniel Han	f8d198a947	Update _utils.py	2025-03-17 04:49:26 -07:00
Daniel Han	96f4073208	Update _utils.py	2025-03-17 04:47:58 -07:00
Daniel Han	5a2c87fa5d	Update _utils.py	2025-03-17 04:45:49 -07:00
Daniel Han	d247f1e20d	Update _utils.py	2025-03-17 04:42:50 -07:00
Daniel Han	3679194e0e	move forced float32	2025-03-17 04:41:44 -07:00
Daniel Han	947f201856	Update vision.py	2025-03-17 04:17:59 -07:00
Daniel Han	47a2492f87	Update _utils.py	2025-03-16 23:37:46 -07:00
Daniel Han	d50f42a8b4	Update vision.py	2025-03-16 23:27:59 -07:00
Daniel Han	74fc293c99	Update vision.py	2025-03-16 23:20:43 -07:00
Daniel Han	268930e2cd	LoRA 16bit fix	2025-03-16 23:18:57 -07:00
Daniel Han	3c1dfe2f58	Update loader.py	2025-03-16 22:15:54 -07:00
Daniel Han	ce2884f563	more prints	2025-03-16 22:13:57 -07:00
Daniel Han	f01757601f	Update vision.py	2025-03-16 22:10:36 -07:00
Daniel Han	1eafb4a3b3	Update pyproject.toml	2025-03-16 20:31:54 -07:00
Daniel Han	35caede8ea	Update pyproject.toml	2025-03-16 20:30:46 -07:00
Daniel Han	4824e17063	Update loader.py	2025-03-16 20:18:53 -07:00
Kareem	136837e5cc	fix (#2051 )	2025-03-16 15:19:58 -07:00
Mukkesh Ganesh	745e0da8ae	bug fix #2008 (#2039 )	2025-03-16 15:19:14 -07:00
Daniel Han	9aa93db23c	Update llama.py	2025-03-15 23:34:52 -07:00
Daniel Han	344e6616a8	Update llama.py	2025-03-15 22:39:09 -07:00
Daniel Han	7b020eff46	Update pyproject.toml	2025-03-15 19:40:02 -07:00
Daniel Han	10ab6e32b8	Version	2025-03-15 19:22:44 -07:00
Daniel Han	79aab4b74a	Merge branch 'main' into nightly	2025-03-15 19:15:26 -07:00
Daniel Han	9233e42c9e	Update _utils.py	2025-03-15 17:58:14 -07:00
Michael Han	d82a707a4a	Update README.md	2025-03-15 17:47:25 -07:00
Daniel Han	ad08cb9730	Update rl.py	2025-03-15 17:13:18 -07:00
Daniel Han	50d41d6d9f	Merge branch 'main' into nightly	2025-03-15 16:36:18 -07:00
Daniel Han	e1c24a01f8	Update README.md (#2028 )	2025-03-14 22:06:53 -07:00
Daniel Han	05fdaff970	Gemma 3 readme (#2019 ) * Update README.md * Update README.md * Update README.md --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2025-03-14 11:12:02 -07:00
Daniel Han	71baea8e55	Update _utils.py	2025-03-14 10:04:10 -07:00
Daniel Han	0f1d78d8e4	Update save.py	2025-03-14 09:54:48 -07:00
Daniel Han	8cfe8a57e6	Precision issues	2025-03-14 08:33:33 -07:00
Daniel Han	b4cd82d59f	Update vision.py	2025-03-14 08:19:02 -07:00
Daniel Han	360cc66779	Update _utils.py	2025-03-14 08:17:49 -07:00
Daniel Han	afd297d281	Update vision.py	2025-03-14 08:17:36 -07:00
Daniel Han	0f587bfe2c	Update save.py	2025-03-14 08:08:43 -07:00
Daniel Han	b8aaf550a7	GGUF saving (#2017 ) * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * fix an import error (#1767) * fix an import error * Delete .gitignore * Update loader.py * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * SamplingParams * Convert mask to float (#1762) * [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753) * Add latest xformers * Add a couple of lines to docs * vLLMSamplingParams * Update __init__.py * default num_chunks == -1 * Versioning * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl_replacements.py * Update pyproject.toml * Update pyproject.toml * Export Model to ollama.com (#1648) * Ollama Export Model to ollama.com Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Check for model_name Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * subprocess use instead of requests \| added check for ollama server Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model \| fix Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Push to Ollama Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> --------- Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Update cross_entropy_loss.py * torch_cuda_device * Update utils.py * Update utils.py * Update utils.py * device * device * Update loader.py * Update llama.py * Update README.md * Update llama.py * Update llama.py * Update _utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * __version__ * Update rl.py * Bug fixes * Bug fixes * Update llama.py * Update _utils.py * _wrap_fast_inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * SFT dataset prepare * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update llama.py * Update llama.py * Update utils.py * bug fix * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Update _utils.py * Version * versioning * Update _utils.py * Update llama.py * Update llama.py * Bug fixes * FastModel * __doc__ * Update vision.py * Update loader.py * Update loader.py * Update loader.py * version * move use_modelscope to _utils (#1938) * move use_modelscope to _utils * Update _utils.py * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Don't use revision when loading model_config and is_peft=True (#1949) * More syntax warnings (#1944) * move use_modelscope to _utils * fix * Update _utils.py * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Full finetuning and other fixes * UNSLOTH_ENABLE_FULL_FINETUNING * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * full finetuning * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * max_seq_length * Update rl.py * Update rl.py * Update rl.py * Update pyproject.toml * AutoModelForImageTextToText * Update mapper.py * Update pyproject.toml * Update _utils.py * Update _utils.py * Update _utils.py * Batch samples * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update mapper.py * Update vision.py * Temporary patches * Update loader.py * model names * Gemma 3 chat template * Bug fixes * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update rl.py * Update chat_templates.py * Update chat_templates.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Revert * Update _utils.py * forced precision * Autocast * Update vision.py * Update vision.py * Update rl.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl.py * vLLM fixes * constexpr * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update save.py * New models * Triton windows update (#1976) * Update pyproject.toml * Update README.md * Update RMS LayerNorm implementation, and list compr. change in chat templates (#1974) * Update RMS LayerNorm implementation with optimizations and testing suite * perf: optimize list comprehension in get_ollama_eos_tokens * Update Zoo * Update llama.py * Update llama.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update vision.py * grpo fix * Update rl_replacements.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update mapper.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update save.py * Update save.py * Update save.py --------- Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com> Co-authored-by: Jyotin Goel <120490013+gjyotin305@users.noreply.github.com> Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com> Co-authored-by: Wilson Wu <140025193+wiwu2390@users.noreply.github.com> Co-authored-by: Akshay Behl <126911424+Captain-T2004@users.noreply.github.com>	2025-03-14 07:58:57 -07:00
Daniel Han	fecb6492de	Update save.py	2025-03-14 07:56:56 -07:00
Daniel Han	2b07988c11	Update save.py	2025-03-14 07:36:35 -07:00
Daniel Han	dd0e790d3f	Update save.py	2025-03-14 07:26:23 -07:00
Daniel Han	cb4579c199	Update vision.py	2025-03-14 07:09:23 -07:00
Daniel Han	10c166b452	Merge branch 'main' into nightly	2025-03-14 07:09:16 -07:00
Daniel Han	3410744e88	Gemma 3, bug fixes (#2014 ) * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * fix an import error (#1767) * fix an import error * Delete .gitignore * Update loader.py * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * SamplingParams * Convert mask to float (#1762) * [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753) * Add latest xformers * Add a couple of lines to docs * vLLMSamplingParams * Update __init__.py * default num_chunks == -1 * Versioning * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl_replacements.py * Update pyproject.toml * Update pyproject.toml * Export Model to ollama.com (#1648) * Ollama Export Model to ollama.com Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Check for model_name Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * subprocess use instead of requests \| added check for ollama server Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model \| fix Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Push to Ollama Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> --------- Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Update cross_entropy_loss.py * torch_cuda_device * Update utils.py * Update utils.py * Update utils.py * device * device * Update loader.py * Update llama.py * Update README.md * Update llama.py * Update llama.py * Update _utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * __version__ * Update rl.py * Bug fixes * Bug fixes * Update llama.py * Update _utils.py * _wrap_fast_inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * SFT dataset prepare * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update llama.py * Update llama.py * Update utils.py * bug fix * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Update _utils.py * Version * versioning * Update _utils.py * Update llama.py * Update llama.py * Bug fixes * FastModel * __doc__ * Update vision.py * Update loader.py * Update loader.py * Update loader.py * version * move use_modelscope to _utils (#1938) * move use_modelscope to _utils * Update _utils.py * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Don't use revision when loading model_config and is_peft=True (#1949) * More syntax warnings (#1944) * move use_modelscope to _utils * fix * Update _utils.py * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Full finetuning and other fixes * UNSLOTH_ENABLE_FULL_FINETUNING * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * full finetuning * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * max_seq_length * Update rl.py * Update rl.py * Update rl.py * Update pyproject.toml * AutoModelForImageTextToText * Update mapper.py * Update pyproject.toml * Update _utils.py * Update _utils.py * Update _utils.py * Batch samples * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update mapper.py * Update vision.py * Temporary patches * Update loader.py * model names * Gemma 3 chat template * Bug fixes * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update rl.py * Update chat_templates.py * Update chat_templates.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Revert * Update _utils.py * forced precision * Autocast * Update vision.py * Update vision.py * Update rl.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl.py * vLLM fixes * constexpr * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update save.py * New models * Triton windows update (#1976) * Update pyproject.toml * Update README.md * Update RMS LayerNorm implementation, and list compr. change in chat templates (#1974) * Update RMS LayerNorm implementation with optimizations and testing suite * perf: optimize list comprehension in get_ollama_eos_tokens * Update Zoo * Update llama.py * Update llama.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update vision.py * grpo fix * Update rl_replacements.py * Update vision.py * Update rl_replacements.py * Update vision.py * Update mapper.py * Update vision.py * Update vision.py * Update loader.py --------- Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com> Co-authored-by: Jyotin Goel <120490013+gjyotin305@users.noreply.github.com> Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com> Co-authored-by: Wilson Wu <140025193+wiwu2390@users.noreply.github.com> Co-authored-by: Akshay Behl <126911424+Captain-T2004@users.noreply.github.com>	2025-03-14 06:42:44 -07:00
Daniel Han	c53be2c3b8	Update loader.py	2025-03-14 06:26:01 -07:00
Daniel Han	f125d6f9ef	Update vision.py	2025-03-14 06:17:35 -07:00
Daniel Han	12dadb8c35	Update vision.py	2025-03-14 06:17:20 -07:00
Daniel Han	695ebd956c	Update mapper.py	2025-03-14 06:15:44 -07:00
Daniel Han	d42f11f41e	Update vision.py	2025-03-14 06:11:51 -07:00
Daniel Han	64820ea4bd	Update rl_replacements.py	2025-03-14 06:11:19 -07:00
Daniel Han	c0c42e9716	Update vision.py	2025-03-14 06:06:10 -07:00
Daniel Han	026fb59e2f	Update rl_replacements.py	2025-03-14 06:01:42 -07:00
Daniel Han	86bc0a4761	grpo fix	2025-03-14 05:57:17 -07:00
Daniel Han	fdb2f8e177	Update vision.py	2025-03-14 05:54:57 -07:00
Daniel Han	7322f95971	Update rl_replacements.py	2025-03-14 05:51:08 -07:00
Daniel Han	b8500b055d	Update vision.py	2025-03-14 05:41:17 -07:00
Daniel Han	0a64bb88f6	Update vision.py	2025-03-14 05:40:56 -07:00
Daniel Han	7c4889a84c	Update vision.py	2025-03-14 04:59:29 -07:00
Daniel Han	4d49983b8b	Update vision.py	2025-03-14 04:58:32 -07:00
Daniel Han	70d09475fa	Update vision.py	2025-03-14 04:51:43 -07:00
Daniel Han	5c2ce48bb1	Update vision.py	2025-03-14 04:43:11 -07:00
Daniel Han	490b1b087a	Update vision.py	2025-03-14 04:40:29 -07:00
Daniel Han	99c2eec8e0	Update vision.py	2025-03-14 04:39:00 -07:00
Daniel Han	43db95099b	Update vision.py	2025-03-14 04:37:31 -07:00
Daniel Han	87071e4f4c	Update vision.py	2025-03-14 04:30:26 -07:00
Daniel Han	a9bd11e336	Update vision.py	2025-03-14 04:30:02 -07:00
Daniel Han	20512f8a9b	Update vision.py	2025-03-14 04:26:10 -07:00
Daniel Han	5d85b29a2a	Update llama.py	2025-03-14 04:08:17 -07:00
Daniel Han	ca3a09e7dd	Update llama.py	2025-03-14 04:02:41 -07:00
Daniel Han	3ff81862ef	Merge branch 'main' into nightly	2025-03-14 02:59:25 -07:00
Daniel Han	cbfb67441c	Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly	2025-03-14 02:54:47 -07:00
Daniel Han	67e498ae82	Update Zoo	2025-03-14 02:54:40 -07:00
Nino Risteski	f4d97caf5e	Update RMS LayerNorm implementation, and list compr. change in chat templates (#1974 ) * Update RMS LayerNorm implementation with optimizations and testing suite * perf: optimize list comprehension in get_ollama_eos_tokens	2025-03-14 02:53:21 -07:00
Akshay Behl	8baebe46a0	Triton windows update (#1976 ) * Update pyproject.toml * Update README.md	2025-03-14 02:51:30 -07:00
Daniel Han	d66f5ff082	New models	2025-03-14 02:41:52 -07:00
Daniel Han	5d6fa45c3f	Update save.py	2025-03-14 02:32:53 -07:00
Daniel Han	0a09349096	Update _utils.py	2025-03-14 02:21:15 -07:00
Daniel Han	c92ef07ab7	Update _utils.py	2025-03-14 02:07:13 -07:00
Daniel Han	c508113286	Update _utils.py	2025-03-14 02:00:03 -07:00
Daniel Han	758bca7414	Update _utils.py	2025-03-14 01:44:04 -07:00
Daniel Han	abfe34f7c0	Update llama.py	2025-03-14 01:39:45 -07:00
Daniel Han	f25063f060	Update llama.py	2025-03-14 01:36:12 -07:00
Daniel Han	aaca723291	Update llama.py	2025-03-14 01:26:45 -07:00
Daniel Han	82835c2904	Update llama.py	2025-03-14 01:25:35 -07:00
Daniel Han	88c042500b	Update llama.py	2025-03-14 01:20:18 -07:00
Daniel Han	456e05dc57	Update llama.py	2025-03-14 01:18:54 -07:00
Daniel Han	d628218ac8	Update llama.py	2025-03-14 01:18:04 -07:00
Daniel Han	c2b9855084	Update llama.py	2025-03-14 00:16:24 -07:00
Daniel Han	813fb7edcf	Update rl.py	2025-03-13 23:08:46 -07:00
Daniel Han	1d7dba52d2	Update vision.py	2025-03-13 18:34:26 -07:00
Daniel Han	6df2ef3667	Update vision.py	2025-03-13 18:31:03 -07:00
Daniel Han	b28b7fa364	Update vision.py	2025-03-13 18:26:30 -07:00
Daniel Han	5843d784f5	constexpr	2025-03-13 18:23:29 -07:00
Daniel Han	78584619f5	vLLM fixes	2025-03-13 17:57:46 -07:00
Daniel Han	325741b7b7	Update rl.py	2025-03-13 16:02:56 -07:00
Daniel Han	d0e0dad7d0	Gemma 3 bug fixes (#2005 ) * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649) * edit save.py to fix gguf saving breaks. * add check for .exe or not exe file extension for linux and windows * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * unsloth_num_chunks * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py (#1754) Fix typo in comment: know -> now. This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well. * Optional logits * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * fix an import error (#1767) * fix an import error * Delete .gitignore * Update loader.py * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * SamplingParams * Convert mask to float (#1762) * [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753) * Add latest xformers * Add a couple of lines to docs * vLLMSamplingParams * Update __init__.py * default num_chunks == -1 * Versioning * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl_replacements.py * Update pyproject.toml * Update pyproject.toml * Export Model to ollama.com (#1648) * Ollama Export Model to ollama.com Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Check for model_name Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * subprocess use instead of requests \| added check for ollama server Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model \| fix Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Push to Ollama Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> --------- Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Update cross_entropy_loss.py * torch_cuda_device * Update utils.py * Update utils.py * Update utils.py * device * device * Update loader.py * Update llama.py * Update README.md * Update llama.py * Update llama.py * Update _utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * __version__ * Update rl.py * Bug fixes * Bug fixes * Update llama.py * Update _utils.py * _wrap_fast_inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * SFT dataset prepare * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update llama.py * Update llama.py * Update utils.py * bug fix * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Update _utils.py * Version * versioning * Update _utils.py * Update llama.py * Update llama.py * Bug fixes * FastModel * __doc__ * Update vision.py * Update loader.py * Update loader.py * Update loader.py * version * move use_modelscope to _utils (#1938) * move use_modelscope to _utils * Update _utils.py * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Don't use revision when loading model_config and is_peft=True (#1949) * More syntax warnings (#1944) * move use_modelscope to _utils * fix * Update _utils.py * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Full finetuning and other fixes * UNSLOTH_ENABLE_FULL_FINETUNING * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * full finetuning * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * max_seq_length * Update rl.py * Update rl.py * Update rl.py * Update pyproject.toml * AutoModelForImageTextToText * Update mapper.py * Update pyproject.toml * Update _utils.py * Update _utils.py * Update _utils.py * Batch samples * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update mapper.py * Update vision.py * Temporary patches * Update loader.py * model names * Gemma 3 chat template * Bug fixes * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update rl.py * Update chat_templates.py * Update chat_templates.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Revert * Update _utils.py * forced precision * Autocast * Update vision.py * Update vision.py * Update rl.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py --------- Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> Co-authored-by: Gennadii Manzhos <105049664+everythingisc00l@users.noreply.github.com> Co-authored-by: Seth Weidman <seth@sethweidman.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com> Co-authored-by: Jyotin Goel <120490013+gjyotin305@users.noreply.github.com> Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com> Co-authored-by: Wilson Wu <140025193+wiwu2390@users.noreply.github.com>	2025-03-13 06:41:42 -07:00
Daniel Han	df9573c754	Update vision.py	2025-03-13 06:39:17 -07:00
Daniel Han	188ccdde3e	Update vision.py	2025-03-13 06:34:55 -07:00
Daniel Han	bf54c7e8a7	Update vision.py	2025-03-13 06:25:18 -07:00
Daniel Han	ed00fb213c	Update vision.py	2025-03-13 06:24:37 -07:00
Daniel Han	35664afebd	Update vision.py	2025-03-13 06:15:42 -07:00
Daniel Han	43e8a6d714	Update rl.py	2025-03-13 06:12:36 -07:00
Daniel Han	0aab37286c	Update vision.py	2025-03-13 06:08:33 -07:00
Daniel Han	f07c4ebc68	Update vision.py	2025-03-13 06:06:24 -07:00
Daniel Han	b2a3c36963	Autocast	2025-03-13 06:02:33 -07:00
Daniel Han	eef49713e4	forced precision	2025-03-13 05:46:02 -07:00
Daniel Han	87c4fcf1bb	Update _utils.py	2025-03-13 05:38:21 -07:00
Daniel Han	da0048d2b5	Revert	2025-03-13 05:13:17 -07:00
Daniel Han	b9b68c9ee3	Update vision.py	2025-03-13 01:30:06 -07:00
Daniel Han	1dc76614d8	Update vision.py	2025-03-13 01:27:50 -07:00
Daniel Han	b1eff862e0	Update loader.py	2025-03-13 01:17:48 -07:00
Daniel Han	e6a65ca866	Update vision.py	2025-03-13 01:15:25 -07:00
Daniel Han	66a734b22a	Update vision.py	2025-03-12 22:23:29 -07:00
Daniel Han	0dd88f91ec	Update vision.py	2025-03-12 22:22:23 -07:00
Daniel Han	d0517a527d	Update chat_templates.py	2025-03-12 21:52:38 -07:00
Daniel Han	437eb8184f	Update chat_templates.py	2025-03-12 21:47:53 -07:00
Daniel Han	7fa4bf813f	Update rl.py	2025-03-12 21:47:01 -07:00
Daniel Han	2441039ce1	Update llama.py	2025-03-12 21:42:27 -07:00
Daniel Han	1d07d4ded2	Update llama.py	2025-03-12 21:40:20 -07:00
Daniel Han	fb5230ae6d	Update vision.py	2025-03-12 21:38:51 -07:00
Daniel Han	79b59cc8a6	Update vision.py	2025-03-12 21:35:29 -07:00
Daniel Han	702f85bd54	Update vision.py	2025-03-12 21:34:34 -07:00
Daniel Han	eaa5947342	Update vision.py	2025-03-12 21:33:22 -07:00
Daniel Han	f904e66e7b	Update vision.py	2025-03-12 21:31:37 -07:00
Daniel Han	be660d3bb1	Bug fixes	2025-03-12 21:30:02 -07:00
Daniel Han	7fe2874157	Gemma 3 chat template	2025-03-12 20:37:19 -07:00
Daniel Han	5349526e35	model names	2025-03-12 20:04:27 -07:00
Daniel Han	fa6628b3e9	Update loader.py	2025-03-12 19:25:41 -07:00
Daniel Han	914bd92a8c	Temporary patches	2025-03-12 19:20:17 -07:00
Daniel Han	8e9f52f16d	Update vision.py	2025-03-12 06:51:23 -07:00
Daniel Han	f8a490e16e	Merge branch 'main' into nightly	2025-03-12 05:00:45 -07:00
Daniel Han	33e020c064	Update _utils.py	2025-03-12 04:52:07 -07:00
Daniel Han	2c54bfd7ff	Update _utils.py	2025-03-12 04:46:34 -07:00
Daniel Han	356a74d4dd	Update mapper.py	2025-03-12 04:07:45 -07:00
Daniel Han	f35d5977d6	Gemma 3 (#1986 ) * Update llama.py * GRPO optimized * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Selective Log softmax * Fix GRPO bsz * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Fix TRL * Metrics GRPO * Update rl_replacements.py * Update rl_replacements.py * No compile * Update rl.py * Remove docs * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649) * edit save.py to fix gguf saving breaks. * add check for .exe or not exe file extension for linux and windows * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * unsloth_num_chunks * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py (#1754) Fix typo in comment: know -> now. This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well. * Optional logits * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * fix an import error (#1767) * fix an import error * Delete .gitignore * Update loader.py * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * SamplingParams * Convert mask to float (#1762) * [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753) * Add latest xformers * Add a couple of lines to docs * vLLMSamplingParams * Update __init__.py * default num_chunks == -1 * Versioning * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl_replacements.py * Update pyproject.toml * Update pyproject.toml * Export Model to ollama.com (#1648) * Ollama Export Model to ollama.com Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Check for model_name Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * subprocess use instead of requests \| added check for ollama server Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model \| fix Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Push to Ollama Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> --------- Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Update cross_entropy_loss.py * torch_cuda_device * Update utils.py * Update utils.py * Update utils.py * device * device * Update loader.py * Update llama.py * Update README.md * Update llama.py * Update llama.py * Update _utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * __version__ * Update rl.py * Bug fixes * Bug fixes * Update llama.py * Update _utils.py * _wrap_fast_inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * SFT dataset prepare * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update llama.py * Update llama.py * Update utils.py * bug fix * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Update _utils.py * Version * versioning * Update _utils.py * Update llama.py * Update llama.py * Bug fixes * FastModel * __doc__ * Update vision.py * Update loader.py * Update loader.py * Update loader.py * version * move use_modelscope to _utils (#1938) * move use_modelscope to _utils * Update _utils.py * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Don't use revision when loading model_config and is_peft=True (#1949) * More syntax warnings (#1944) * move use_modelscope to _utils * fix * Update _utils.py * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Full finetuning and other fixes * UNSLOTH_ENABLE_FULL_FINETUNING * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * full finetuning * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * max_seq_length * Update rl.py * Update rl.py * Update rl.py * Update pyproject.toml * AutoModelForImageTextToText * Update mapper.py * Update pyproject.toml * Update _utils.py * Update _utils.py * Update _utils.py * Batch samples * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update mapper.py --------- Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> Co-authored-by: Gennadii Manzhos <105049664+everythingisc00l@users.noreply.github.com> Co-authored-by: Seth Weidman <seth@sethweidman.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com> Co-authored-by: Jyotin Goel <120490013+gjyotin305@users.noreply.github.com> Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com> Co-authored-by: Wilson Wu <140025193+wiwu2390@users.noreply.github.com>	2025-03-12 01:23:34 -07:00
Daniel Han	89a2517a8d	Update mapper.py	2025-03-11 23:35:46 -07:00
Daniel Han	c319cacac4	Update vision.py	2025-03-11 22:55:34 -07:00
Daniel Han	a3c2856977	Update vision.py	2025-03-11 22:36:51 -07:00
Daniel Han	8148f3e7f0	Update vision.py	2025-03-11 22:34:57 -07:00
Daniel Han	7208177c66	Update loader.py	2025-03-11 22:31:14 -07:00
Daniel Han	d4e0f42951	Update vision.py	2025-03-11 22:29:06 -07:00
Daniel Han	7f03388c66	Update loader.py	2025-03-11 22:27:53 -07:00
Daniel Han	b061ab453e	Update _utils.py	2025-03-11 22:22:36 -07:00
Daniel Han	032d109d98	Update loader.py	2025-03-11 22:02:39 -07:00
Daniel Han	7e76b42b46	Update loader.py	2025-03-11 20:45:53 -07:00
Daniel Han	50803b0a2f	Update loader.py	2025-03-11 20:44:01 -07:00
Daniel Han	2249e266b7	Update loader.py	2025-03-11 20:43:25 -07:00
Daniel Han	904fe3485e	Batch samples	2025-03-11 19:46:41 -07:00
Daniel Han	1b71f08854	Update _utils.py	2025-03-11 15:25:21 -07:00
Daniel Han	b7131a0697	Update _utils.py	2025-03-11 15:22:16 -07:00
Daniel Han	57c6894583	Update _utils.py	2025-03-11 15:20:20 -07:00
Daniel Han	361fa338d8	Update pyproject.toml	2025-03-11 14:59:29 -07:00
Daniel Han	683bd5fd9b	Update mapper.py	2025-03-11 05:37:29 -07:00
Daniel Han	b78f12cff8	AutoModelForImageTextToText	2025-03-11 04:30:01 -07:00
Daniel Han	e680904d46	Update pyproject.toml	2025-03-11 00:18:55 -07:00
Daniel Han	a564d888bd	Update rl.py	2025-03-10 05:00:45 -07:00
Daniel Han	2cedc89ac8	Update rl.py	2025-03-10 04:58:59 -07:00
Daniel Han	b97ac4e76f	Update rl.py	2025-03-10 04:57:47 -07:00
Daniel Han	c6f73dd521	max_seq_length	2025-03-10 04:39:25 -07:00
Daniel Han	3443a8503f	Update _utils.py	2025-03-10 00:31:49 -07:00
Daniel Han	c9dafd5eaf	Update loader.py	2025-03-09 23:33:21 -07:00
Daniel Han	49dce011e9	Update loader.py	2025-03-09 23:24:21 -07:00
Daniel Han	5e23241d6e	Update loader.py	2025-03-09 23:22:13 -07:00
Daniel Han	a6215fcc50	full finetuning	2025-03-09 23:18:52 -07:00
Daniel Han	2f737d1144	Update vision.py	2025-03-09 23:13:30 -07:00
Daniel Han	4560215e41	Update vision.py	2025-03-09 23:11:28 -07:00
Daniel Han	4122cc7547	Update loader.py	2025-03-09 23:08:34 -07:00
Daniel Han	4cf033f6c7	Update loader.py	2025-03-09 23:06:06 -07:00
Daniel Han	186a2f6ee6	Update loader.py	2025-03-09 23:03:27 -07:00
Daniel Han	d41e5578e1	UNSLOTH_ENABLE_FULL_FINETUNING	2025-03-09 22:57:24 -07:00
Daniel Han	66f661ea02	Full finetuning and other fixes	2025-03-09 22:52:34 -07:00
Daniel Han	0a33a1d5fc	Update loader.py	2025-03-08 18:48:54 -08:00
Kareem	f1d77f7857	More syntax warnings (#1944 ) * move use_modelscope to _utils * fix * Update _utils.py * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-03-08 17:55:37 -08:00
Wilson Wu	c5e163db74	Don't use revision when loading model_config and is_peft=True (#1949 )	2025-03-08 17:51:53 -08:00
Kareem	7a1199cf0b	move use_modelscope to _utils (#1938 ) * move use_modelscope to _utils * Update _utils.py * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-03-08 17:51:07 -08:00
Daniel Han	b79ce73b2c	Merge branch 'main' into nightly	2025-03-08 17:50:50 -08:00
Daniel Han	08815f9f57	Bug fixes (#1951 ) * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update _utils.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * GRPO optimized * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Selective Log softmax * Fix GRPO bsz * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Fix TRL * Metrics GRPO * Update rl_replacements.py * Update rl_replacements.py * No compile * Update rl.py * Remove docs * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649) * edit save.py to fix gguf saving breaks. * add check for .exe or not exe file extension for linux and windows * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * unsloth_num_chunks * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py (#1754) Fix typo in comment: know -> now. This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well. * Optional logits * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * fix an import error (#1767) * fix an import error * Delete .gitignore * Update loader.py * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * SamplingParams * Convert mask to float (#1762) * [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753) * Add latest xformers * Add a couple of lines to docs * vLLMSamplingParams * Update __init__.py * default num_chunks == -1 * Versioning * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl_replacements.py * Update pyproject.toml * Update pyproject.toml * Export Model to ollama.com (#1648) * Ollama Export Model to ollama.com Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Check for model_name Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * subprocess use instead of requests \| added check for ollama server Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model \| fix Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Push to Ollama Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> --------- Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Update cross_entropy_loss.py * torch_cuda_device * Update utils.py * Update utils.py * Update utils.py * device * device * Update loader.py * Update llama.py * Update README.md * Update llama.py * Update llama.py * Update _utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * __version__ * Update rl.py * Bug fixes * Bug fixes * Update llama.py * Update _utils.py * _wrap_fast_inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * SFT dataset prepare * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update llama.py * Update llama.py * Update utils.py * bug fix * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Update _utils.py * Version * versioning * Update _utils.py * Update llama.py * Update llama.py * Bug fixes * FastModel * __doc__ * Update vision.py * Update loader.py * Update loader.py * Update loader.py * version --------- Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> Co-authored-by: Gennadii Manzhos <105049664+everythingisc00l@users.noreply.github.com> Co-authored-by: Seth Weidman <seth@sethweidman.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com> Co-authored-by: Jyotin Goel <120490013+gjyotin305@users.noreply.github.com>	2025-03-08 04:34:55 -08:00
Daniel Han	e18896439c	version	2025-03-08 04:29:19 -08:00
Daniel Han	723dbcd576	Update loader.py	2025-03-08 03:38:22 -08:00
Daniel Han	09b53c79ed	Update loader.py	2025-03-08 03:36:45 -08:00
Daniel Han	87c643cb42	Update loader.py	2025-03-08 03:35:22 -08:00
Daniel Han	454b19929a	Update vision.py	2025-03-08 03:23:23 -08:00
Daniel Han	964339a236	__doc__	2025-03-08 03:20:03 -08:00
Daniel Han	ff17bdbc11	FastModel	2025-03-08 03:02:45 -08:00
Daniel Han	a9026d9e9e	Bug fixes	2025-03-07 01:43:39 -08:00
Daniel Han	7a0ce38a0d	Merge branch 'main' into nightly	2025-03-06 14:47:39 -08:00
Daniel Han	de6be085ca	Merge branch 'main' of https://github.com/unslothai/unsloth	2025-03-06 14:44:16 -08:00
Daniel Han	0a646e40f5	Big bug fixes Fixes: 1. #1932 2. #1931 3. #1928 4. #1925 5. #1921 6. #1918 7. #1923 8. #1922 9. #1921 Please do: `pip install --upgrade --force-reinstall --no-deps unsloth unsloth_zoo` for local machines. Colab / Kaggle please restart and delete / disconnect runtime and redo Apologies on the issues!	2025-03-06 14:43:48 -08:00
Daniel Han	c60621670d	Bug fixes	2025-03-06 14:39:04 -08:00
Daniel Han	53d0ba079e	Bug fixes (#1920 ) * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update pyproject.toml * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update _utils.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * GRPO optimized * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Selective Log softmax * Fix GRPO bsz * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Fix TRL * Metrics GRPO * Update rl_replacements.py * Update rl_replacements.py * No compile * Update rl.py * Remove docs * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649) * edit save.py to fix gguf saving breaks. * add check for .exe or not exe file extension for linux and windows * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * unsloth_num_chunks * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py (#1754) Fix typo in comment: know -> now. This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well. * Optional logits * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * fix an import error (#1767) * fix an import error * Delete .gitignore * Update loader.py * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * SamplingParams * Convert mask to float (#1762) * [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753) * Add latest xformers * Add a couple of lines to docs * vLLMSamplingParams * Update __init__.py * default num_chunks == -1 * Versioning * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl_replacements.py * Update pyproject.toml * Update pyproject.toml * Export Model to ollama.com (#1648) * Ollama Export Model to ollama.com Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Check for model_name Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * subprocess use instead of requests \| added check for ollama server Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model \| fix Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Push to Ollama Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> --------- Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Update cross_entropy_loss.py * torch_cuda_device * Update utils.py * Update utils.py * Update utils.py * device * device * Update loader.py * Update llama.py * Update README.md * Update llama.py * Update llama.py * Update _utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * __version__ * Update rl.py * Bug fixes * Bug fixes * Update llama.py * Update _utils.py * _wrap_fast_inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * SFT dataset prepare * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update llama.py * Update llama.py * Update utils.py * bug fix * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Update _utils.py * Version * versioning * Update _utils.py * Update llama.py * Update llama.py --------- Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> Co-authored-by: Gennadii Manzhos <105049664+everythingisc00l@users.noreply.github.com> Co-authored-by: Seth Weidman <seth@sethweidman.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com> Co-authored-by: Jyotin Goel <120490013+gjyotin305@users.noreply.github.com>	2025-03-06 05:16:15 -08:00
Daniel Han	4000d2ec6b	Update llama.py	2025-03-06 03:46:00 -08:00
Daniel Han	ed2cfd1e0a	Update llama.py	2025-03-06 03:44:40 -08:00
Daniel Han	5ed619522b	Update _utils.py	2025-03-06 03:22:10 -08:00
Daniel Han	a666967405	versioning	2025-03-06 03:14:01 -08:00
Daniel Han	9325426c3e	Merge branch 'main' into nightly	2025-03-06 02:35:00 -08:00
Daniel Han	83eaa2f087	Logits fixes (#1916 ) * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update pyproject.toml * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update _utils.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * GRPO optimized * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Selective Log softmax * Fix GRPO bsz * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Fix TRL * Metrics GRPO * Update rl_replacements.py * Update rl_replacements.py * No compile * Update rl.py * Remove docs * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649) * edit save.py to fix gguf saving breaks. * add check for .exe or not exe file extension for linux and windows * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * unsloth_num_chunks * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py (#1754) Fix typo in comment: know -> now. This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well. * Optional logits * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * fix an import error (#1767) * fix an import error * Delete .gitignore * Update loader.py * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * SamplingParams * Convert mask to float (#1762) * [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753) * Add latest xformers * Add a couple of lines to docs * vLLMSamplingParams * Update __init__.py * default num_chunks == -1 * Versioning * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl_replacements.py * Update pyproject.toml * Update pyproject.toml * Export Model to ollama.com (#1648) * Ollama Export Model to ollama.com Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Check for model_name Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * subprocess use instead of requests \| added check for ollama server Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model \| fix Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Push to Ollama Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> --------- Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Update cross_entropy_loss.py * torch_cuda_device * Update utils.py * Update utils.py * Update utils.py * device * device * Update loader.py * Update llama.py * Update README.md * Update llama.py * Update llama.py * Update _utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * __version__ * Update rl.py * Bug fixes * Bug fixes * Update llama.py * Update _utils.py * _wrap_fast_inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * SFT dataset prepare * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update llama.py * Update llama.py * Update utils.py * bug fix * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Update _utils.py * Version --------- Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> Co-authored-by: Gennadii Manzhos <105049664+everythingisc00l@users.noreply.github.com> Co-authored-by: Seth Weidman <seth@sethweidman.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com> Co-authored-by: Jyotin Goel <120490013+gjyotin305@users.noreply.github.com>	2025-03-06 02:32:32 -08:00
Daniel Han	85b2b7e941	Version	2025-03-06 02:29:59 -08:00
Daniel Han	1f7001b5c3	Update _utils.py	2025-03-06 01:31:01 -08:00
Daniel Han	033d6dcf60	Update __init__.py	2025-03-06 01:10:50 -08:00
Daniel Han	a52b5a48dd	Update _utils.py	2025-03-05 23:48:00 -08:00
Daniel Han	f52f12a073	Update rl.py	2025-03-05 23:32:04 -08:00
Daniel Han	ed2d8a9303	Update rl.py	2025-03-05 23:19:12 -08:00
Daniel Han	76cfa1a14f	Update rl.py	2025-03-05 23:17:51 -08:00
Daniel Han	c97447e983	Update _utils.py	2025-03-05 23:10:48 -08:00
Daniel Han	50397d8510	Update _utils.py	2025-03-05 23:05:52 -08:00
Daniel Han	94d50c367a	Update _utils.py	2025-03-05 23:02:15 -08:00
Daniel Han	9313007ded	Update _utils.py	2025-03-05 22:59:43 -08:00
Daniel Han	a877c3700b	Merge branch 'main' into nightly	2025-03-05 22:58:40 -08:00
Daniel Han	105020e4cc	Update _utils.py	2025-03-05 22:58:14 -08:00
Daniel Han	303daeb1e1	Python 3.12 fix	2025-03-05 12:58:24 -08:00
Daniel Han	2afeb37839	Many bug fixes (#1900 ) * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * autocast * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update pyproject.toml * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update _utils.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * GRPO optimized * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Selective Log softmax * Fix GRPO bsz * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Fix TRL * Metrics GRPO * Update rl_replacements.py * Update rl_replacements.py * No compile * Update rl.py * Remove docs * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649) * edit save.py to fix gguf saving breaks. * add check for .exe or not exe file extension for linux and windows * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * unsloth_num_chunks * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py (#1754) Fix typo in comment: know -> now. This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well. * Optional logits * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * fix an import error (#1767) * fix an import error * Delete .gitignore * Update loader.py * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * SamplingParams * Convert mask to float (#1762) * [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753) * Add latest xformers * Add a couple of lines to docs * vLLMSamplingParams * Update __init__.py * default num_chunks == -1 * Versioning * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl_replacements.py * Update pyproject.toml * Update pyproject.toml * Export Model to ollama.com (#1648) * Ollama Export Model to ollama.com Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Check for model_name Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * subprocess use instead of requests \| added check for ollama server Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model \| fix Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Push to Ollama Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> --------- Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Update cross_entropy_loss.py * torch_cuda_device * Update utils.py * Update utils.py * Update utils.py * device * device * Update loader.py * Update llama.py * Update README.md * Update llama.py * Update llama.py * Update _utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * __version__ * Update rl.py * Bug fixes * Bug fixes * Update llama.py * Update _utils.py * _wrap_fast_inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * SFT dataset prepare * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update llama.py * Update llama.py * Update utils.py * bug fix * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update __init__.py --------- Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> Co-authored-by: Gennadii Manzhos <105049664+everythingisc00l@users.noreply.github.com> Co-authored-by: Seth Weidman <seth@sethweidman.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com> Co-authored-by: Jyotin Goel <120490013+gjyotin305@users.noreply.github.com>	2025-03-05 05:13:32 -08:00
Daniel Han	f557986862	Update __init__.py	2025-03-05 04:55:06 -08:00
Daniel Han	5a4c03fc60	Update llama.py	2025-03-05 04:50:26 -08:00
Daniel Han	b28f5dfe64	Update llama.py	2025-03-05 04:47:44 -08:00
Daniel Han	a434bf37f6	Update llama.py	2025-03-05 04:45:46 -08:00
Daniel Han	752c826719	Update llama.py	2025-03-05 04:22:20 -08:00
Daniel Han	3290a0c736	Update llama.py	2025-03-05 03:59:59 -08:00
Daniel Han	fc4101646f	bug fix	2025-03-05 03:49:38 -08:00
Daniel Han	5de84d1ead	Update utils.py	2025-03-05 03:46:21 -08:00
Daniel Han	0597d535bb	Update llama.py	2025-03-05 03:38:16 -08:00
Daniel Han	ea4e8e4371	Update llama.py	2025-03-05 03:30:13 -08:00
Daniel Han	8c181dfdbe	Update rl.py	2025-03-05 03:24:08 -08:00
Daniel Han	7d669e8b18	Update rl_replacements.py	2025-03-05 01:11:03 -08:00
Daniel Han	f1b158bdda	Update rl_replacements.py	2025-03-05 01:03:13 -08:00
Daniel Han	dd364f3b80	Update rl_replacements.py	2025-03-05 01:00:01 -08:00
Daniel Han	363208bf16	Update pyproject.toml	2025-03-05 00:56:56 -08:00
Daniel Han	2cd1793e83	SFT dataset prepare	2025-03-05 00:51:10 -08:00
Daniel Han	7e84cddb97	Update _utils.py	2025-03-04 19:44:54 -08:00
Daniel Han	2569b0f245	Update llama.py	2025-03-04 19:39:49 -08:00
Daniel Han	ba944d6ef8	Update llama.py	2025-03-04 19:20:38 -08:00
Daniel Han	a95d7322f0	Update llama.py	2025-03-04 19:19:12 -08:00
Daniel Han	6e49701337	Update llama.py	2025-03-04 19:15:40 -08:00
Daniel Han	f54cc4f15c	Update llama.py	2025-03-04 19:11:54 -08:00
Daniel Han	6ba41c59ab	Update llama.py	2025-03-04 19:08:59 -08:00
Daniel Han	c7e980a208	Update llama.py	2025-03-04 19:04:46 -08:00
Daniel Han	81d0ed64e4	Update llama.py	2025-03-04 19:02:12 -08:00
Daniel Han	f40e145559	Update llama.py	2025-03-04 18:59:55 -08:00
Daniel Han	d006df611f	Update llama.py	2025-03-04 18:21:54 -08:00
Daniel Han	fa5e37de2d	Update llama.py	2025-03-04 18:18:02 -08:00
Daniel Han	0b1cf7640e	_wrap_fast_inference	2025-03-04 18:15:45 -08:00
Daniel Han	ed90df2c4f	Update _utils.py	2025-03-04 16:30:57 -08:00
Daniel Han	fe68005095	Update llama.py	2025-03-04 16:16:21 -08:00
Daniel Han	d84f723a82	Merge branch 'main' into nightly	2025-03-04 14:20:27 -08:00
Daniel Han	6a03ea2f81	Bug fixes	2025-03-04 14:19:56 -08:00
Daniel Han	6208d99833	Bug fix	2025-03-04 13:26:47 -08:00
Daniel Han	c813f2de1d	Bug fix	2025-03-04 04:22:23 -08:00
Daniel Han	d5c427adea	Bug fix	2025-03-04 04:12:38 -08:00
Daniel Han	3e5f061133	Bug fixes (#1891 ) * Update rl.py * Patching * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * NEFTune * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Extra replacements * Update rl_replacements.py * Update rl.py * extra RL replacements * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update _utils.py * Update loader_utils.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * autocast * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update pyproject.toml * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update _utils.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * GRPO optimized * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Selective Log softmax * Fix GRPO bsz * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Fix TRL * Metrics GRPO * Update rl_replacements.py * Update rl_replacements.py * No compile * Update rl.py * Remove docs * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649) * edit save.py to fix gguf saving breaks. * add check for .exe or not exe file extension for linux and windows * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * unsloth_num_chunks * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py (#1754) Fix typo in comment: know -> now. This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well. * Optional logits * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * fix an import error (#1767) * fix an import error * Delete .gitignore * Update loader.py * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * SamplingParams * Convert mask to float (#1762) * [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753) * Add latest xformers * Add a couple of lines to docs * vLLMSamplingParams * Update __init__.py * default num_chunks == -1 * Versioning * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl_replacements.py * Update pyproject.toml * Update pyproject.toml * Export Model to ollama.com (#1648) * Ollama Export Model to ollama.com Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Check for model_name Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * subprocess use instead of requests \| added check for ollama server Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model \| fix Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Push to Ollama Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> --------- Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Update cross_entropy_loss.py * torch_cuda_device * Update utils.py * Update utils.py * Update utils.py * device * device * Update loader.py * Update llama.py * Update README.md * Update llama.py * Update llama.py * Update _utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * __version__ * Update rl.py * Bug fixes --------- Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> Co-authored-by: Gennadii Manzhos <105049664+everythingisc00l@users.noreply.github.com> Co-authored-by: Seth Weidman <seth@sethweidman.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com> Co-authored-by: Jyotin Goel <120490013+gjyotin305@users.noreply.github.com>	2025-03-04 03:55:49 -08:00
Daniel Han	94bda540e2	Bug fixes	2025-03-04 03:47:51 -08:00
Daniel Han	f22fe83726	Update rl.py	2025-03-04 03:31:38 -08:00
Daniel Han	607539435e	__version__	2025-03-04 02:57:54 -08:00
Daniel Han	e0c0b71831	Update utils.py	2025-03-04 02:38:44 -08:00
Daniel Han	0387f997f7	Update utils.py	2025-03-04 02:37:09 -08:00
Daniel Han	c38c1d52dd	Update utils.py	2025-03-04 02:35:19 -08:00
Daniel Han	963e876ea4	Update utils.py	2025-03-04 02:28:35 -08:00
Daniel Han	71735c342f	Update llama.py	2025-03-03 23:59:11 -08:00
Daniel Han	a4fa7f920f	Update llama.py	2025-03-03 23:58:58 -08:00
Daniel Han	63b7b34424	Update llama.py	2025-03-03 23:41:37 -08:00
Daniel Han	0460a67ed4	Update llama.py	2025-03-03 23:32:02 -08:00
Daniel Han	61a3daa667	Update llama.py	2025-03-03 23:00:26 -08:00
Michael Han	6491abfb78	Merge pull request #1885 from unslothai/shimmyshimmer-patch-6 Update README.md	2025-03-03 21:28:00 -08:00
Michael Han	c018ea28db	Update README.md	2025-03-03 21:27:20 -08:00
Daniel Han	9c3bb53a0a	Update utils.py	2025-03-03 18:46:16 -08:00
Daniel Han	baed76e0a5	Update utils.py	2025-03-03 18:33:18 -08:00
Daniel Han	2f9767887a	Update utils.py	2025-03-03 18:29:35 -08:00
Daniel Han	3671756291	Update utils.py	2025-03-03 17:27:59 -08:00
Daniel Han	806bf910fc	Update utils.py	2025-03-03 17:17:25 -08:00
Daniel Han	7e0bb36f9f	Update _utils.py	2025-03-03 17:12:56 -08:00
Daniel Han	ce708bef1c	Update llama.py	2025-03-03 15:49:04 -08:00
Daniel Han	6662fd652c	Update llama.py	2025-03-03 15:48:55 -08:00
Daniel Han	d7ffc09329	Update README.md	2025-03-03 14:58:30 -08:00
Daniel Han	643637dd5e	Update llama.py	2025-03-03 02:36:16 -08:00
Daniel Han	64ab4df808	Update loader.py	2025-03-03 02:30:53 -08:00
Daniel Han	391fe2907b	device	2025-03-03 00:04:08 -08:00
Daniel Han	37541f149a	device	2025-03-02 23:58:17 -08:00
Daniel Han	cb6318299d	Update utils.py	2025-03-02 23:43:02 -08:00
Daniel Han	ed45bd9cd3	Update utils.py	2025-03-02 23:41:35 -08:00
Daniel Han	0b69bf34cc	Update utils.py	2025-03-02 23:38:32 -08:00
Daniel Han	ed75ff330a	torch_cuda_device	2025-03-02 23:31:16 -08:00
Daniel Han	b73a2c39d7	Update cross_entropy_loss.py	2025-03-02 23:08:44 -08:00
Michael Han	e02561d883	Update README.md	2025-03-02 20:44:26 -08:00
Michael Han	8b5883275d	Update README.md	2025-03-02 20:35:27 -08:00
Michael Han	788563f8fe	Update README.md	2025-03-02 20:34:36 -08:00
Daniel Han	f47415973a	Merge branch 'main' into nightly	2025-03-02 20:28:24 -08:00
J. M Areeb Uzair	c6d2433547	Added Python version warning to Windows Install Section (#1872 ) I spent half a day on the wrong Python version, so I am adding this big, red sign.	2025-03-02 03:48:21 -08:00
Mohamed Mekkouri	8180e9803c	Fix Layernorm when num_cols not a power of 2 (#1867 ) * fix * Update layernorm.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-03-01 14:23:22 -08:00
Daniel Han	8a72e20aa5	Update vision.py	2025-03-01 13:47:01 -08:00
Daniel Han	4f166026d7	LoRA	2025-03-01 02:52:20 -08:00
Daniel Han	481e3e41b2	Update llama.py	2025-03-01 00:23:20 -08:00
Daniel Han	f4748c020d	Update _utils.py	2025-03-01 00:21:28 -08:00
Daniel Han	f3b5469213	Update granite.py	2025-03-01 00:20:34 -08:00
Daniel Han	6b735edbcb	Update granite.py	2025-03-01 00:19:12 -08:00
Daniel Han	6485cbb499	Prelim release	2025-03-01 00:13:11 -08:00
Aditya Ghai	08bc291300	Direct windows support for unsloth (#1841 ) * Direct Windows Support(main) * Update pyproject.toml * Update README.md Added the suggested changes to README	2025-02-27 20:25:46 -08:00
Daniel Han	841626c405	Update rl_replacements.py	2025-02-27 03:46:47 -08:00
Daniel Han	f55395b5f0	Update rl_replacements.py	2025-02-27 03:42:03 -08:00
Michael Han	569b4422c4	Update README.md	2025-02-26 17:03:47 -08:00
Michael Han	86aea0b4f8	Update README.md	2025-02-26 16:58:32 -08:00
Kareem	71d2a24575	fixed syntax warnings (#1522 ) * fixed most of syntax warnings * all syntaxwarnings fixed * Syntax fixes --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-02-26 03:50:37 -08:00
Charles London	96346a5f35	Fix key error in GRPOTrainer (#1818 ) * fix keyerror in GRPOTrainer * check for train in _metrics	2025-02-25 15:22:35 -08:00
Igor Kilbas	455517aae9	Fix: GRPO with Mistral and importing (#1831 ) * fix: mistral and importing * minor change * Style :) * Update mistral.py * Update mistral.py * Update mistral.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-02-25 15:21:26 -08:00
Jyotin Goel	c316ad8910	Export Model to ollama.com (#1648 ) * Ollama Export Model to ollama.com Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Check for model_name Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * subprocess use instead of requests \| added check for ollama server Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * create_ollama_model \| fix Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> * Push to Ollama Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in> --------- Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>	2025-02-22 02:37:01 -08:00
Michael Han	ab701257d6	Update README.md	2025-02-21 22:59:19 -08:00
Daniel Han	734a9a0611	Update _utils.py	2025-02-20 09:24:07 -08:00
Daniel Han	4570c8b41e	Bug Fixes (#1774 ) * Update rl.py * Update tokenizer_utils.py * Auto patching * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update rl.py * Update tokenizer_utils.py * Update rl.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update tokenizer_utils.py * Update rl.py * Update rl.py * Update rl.py * max seq length * Update rl.py * Update rl.py * Patching * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * NEFTune * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Extra replacements * Update rl_replacements.py * Update rl.py * extra RL replacements * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update _utils.py * Update loader_utils.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * autocast * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update pyproject.toml * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update _utils.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * GRPO optimized * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Selective Log softmax * Fix GRPO bsz * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Fix TRL * Metrics GRPO * Update rl_replacements.py * Update rl_replacements.py * No compile * Update rl.py * Remove docs * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649) * edit save.py to fix gguf saving breaks. * add check for .exe or not exe file extension for linux and windows * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * unsloth_num_chunks * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py (#1754) Fix typo in comment: know -> now. This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well. * Optional logits * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * fix an import error (#1767) * fix an import error * Delete .gitignore * Update loader.py * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * SamplingParams * Convert mask to float (#1762) * [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753) * Add latest xformers * Add a couple of lines to docs * vLLMSamplingParams * Update __init__.py * default num_chunks == -1 * Versioning * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl_replacements.py * Update pyproject.toml * Update pyproject.toml --------- Co-authored-by: Gennadii Manzhos <105049664+everythingisc00l@users.noreply.github.com> Co-authored-by: Seth Weidman <seth@sethweidman.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com>	2025-02-20 09:20:49 -08:00
Daniel Han	b39a8605cc	Update pyproject.toml	2025-02-20 09:02:41 -08:00
Daniel Han	92ea46eae4	Update pyproject.toml	2025-02-20 08:43:46 -08:00
Daniel Han	0cf0a33c41	Update rl_replacements.py	2025-02-20 08:31:37 -08:00
Daniel Han	7a6e004288	Update rl_replacements.py	2025-02-20 08:28:21 -08:00
Daniel Han	3a06c5054c	Update _utils.py	2025-02-20 07:51:33 -08:00
Daniel Han	0d28429ba1	Update llama.py	2025-02-20 07:46:14 -08:00
Daniel Han	4dc880dae8	Update llama.py	2025-02-20 07:40:45 -08:00
Daniel Han	9c812cc72a	Update llama.py	2025-02-20 07:36:23 -08:00
Daniel Han	1a0f18e598	Update llama.py	2025-02-20 07:25:31 -08:00
Daniel Han	2c438b3b99	Update llama.py	2025-02-20 07:01:14 -08:00
Daniel Han	0345c8b1c3	Merge branch 'main' into nightly	2025-02-20 05:14:57 -08:00
Daniel Han	cd3a733602	bug fix	2025-02-20 04:47:12 -08:00
Daniel Han	a45a08f91b	Memory Efficient GRPO (#1773 ) * Update __init__.py * Update loader.py * Update rl.py * Update rl.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Better TRL handling * Update rl.py * Update tokenizer_utils.py * Auto patching * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update rl.py * Update tokenizer_utils.py * Update rl.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update tokenizer_utils.py * Update rl.py * Update rl.py * Update rl.py * max seq length * Update rl.py * Update rl.py * Patching * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * NEFTune * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Extra replacements * Update rl_replacements.py * Update rl.py * extra RL replacements * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update _utils.py * Update loader_utils.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * autocast * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update pyproject.toml * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update _utils.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * GRPO optimized * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Selective Log softmax * Fix GRPO bsz * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Fix TRL * Metrics GRPO * Update rl_replacements.py * Update rl_replacements.py * No compile * Update rl.py * Remove docs * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649) * edit save.py to fix gguf saving breaks. * add check for .exe or not exe file extension for linux and windows * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * unsloth_num_chunks * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py (#1754) Fix typo in comment: know -> now. This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well. * Optional logits * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * fix an import error (#1767) * fix an import error * Delete .gitignore * Update loader.py * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * SamplingParams * Convert mask to float (#1762) * [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753) * Add latest xformers * Add a couple of lines to docs * vLLMSamplingParams * Update __init__.py * default num_chunks == -1 * Versioning --------- Co-authored-by: Gennadii Manzhos <105049664+everythingisc00l@users.noreply.github.com> Co-authored-by: Seth Weidman <seth@sethweidman.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com>	2025-02-20 04:23:28 -08:00
Daniel Han	89711bf4f8	Versioning	2025-02-20 04:22:17 -08:00
Daniel Han	3a0fb38744	default num_chunks == -1	2025-02-19 23:51:06 -08:00
Daniel Han	50acb0f4f8	Update __init__.py	2025-02-19 23:45:07 -08:00
Daniel Han	940bce0b04	vLLMSamplingParams	2025-02-19 23:43:52 -08:00
Daniel Han	ad46d1a4a7	Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly	2025-02-19 23:40:50 -08:00
Ben	b8a2ceca14	[Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753 ) * Add latest xformers * Add a couple of lines to docs	2025-02-19 23:40:07 -08:00
Edd	16e69efc96	Convert mask to float (#1762 )	2025-02-19 23:38:48 -08:00
Daniel Han	c27074c28b	SamplingParams	2025-02-19 23:37:52 -08:00
Nino Risteski	1b329a6731	fix an import error (#1767 ) * fix an import error * Delete .gitignore * Update loader.py * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-02-19 23:32:25 -08:00
Daniel Han	e9c0591119	Update rl.py	2025-02-19 23:27:16 -08:00
Daniel Han	f3488a88c5	Merge branch 'main' into nightly	2025-02-19 23:24:21 -08:00
Daniel Han	fbe9ee80d4	Update README.md (#1768 )	2025-02-19 23:24:05 -08:00
Daniel Han	55e8c086dd	Update rl.py	2025-02-19 23:23:39 -08:00
Daniel Han	4fdbe839ea	Update rl.py	2025-02-19 23:15:13 -08:00
Daniel Han	b4bd52f978	Update rl.py	2025-02-19 23:06:18 -08:00
Daniel Han	93804d9998	Update rl_replacements.py	2025-02-19 22:03:47 -08:00
Daniel Han	fe3c84d250	Update rl.py	2025-02-19 21:41:17 -08:00
Daniel Han	4dfbbb2b9a	Update rl.py	2025-02-19 17:58:25 -08:00
Daniel Han	90158637d3	Update rl.py	2025-02-19 16:40:37 -08:00
Daniel Han	5129cf3049	Update rl.py	2025-02-19 16:38:44 -08:00
Daniel Han	98ad241821	Update rl.py	2025-02-19 16:37:12 -08:00
Daniel Han	09d8c9b2ca	Update rl.py	2025-02-19 12:51:22 -08:00
Daniel Han	e06905724d	Update rl.py	2025-02-19 12:47:15 -08:00
Daniel Han	dc04a82307	Update rl.py	2025-02-19 03:41:51 -08:00
Daniel Han	3797412198	Optional logits	2025-02-19 02:23:00 -08:00
Seth Weidman	c2938ccc83	Update rl_replacements.py (#1754 ) Fix typo in comment: know -> now. This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well.	2025-02-19 02:12:07 -08:00
Daniel Han	b7a53c37bd	Update rl_replacements.py	2025-02-18 01:57:18 -08:00
Daniel Han	9eb8e34a15	Update rl_replacements.py	2025-02-18 00:17:07 -08:00
Daniel Han	ab27ddc6a3	Update rl.py	2025-02-18 00:13:26 -08:00
Daniel Han	6148ce8d46	Update rl.py	2025-02-18 00:09:11 -08:00
Daniel Han	79141331f1	Update rl.py	2025-02-18 00:05:40 -08:00
Daniel Han	16f0cc2214	Update rl.py	2025-02-18 00:01:09 -08:00
Daniel Han	7a8ae1f272	Update rl.py	2025-02-17 23:57:57 -08:00
Daniel Han	2c81afe484	Update rl_replacements.py	2025-02-17 23:47:52 -08:00
Daniel Han	d6325aa94d	Update rl_replacements.py	2025-02-17 22:30:20 -08:00
Daniel Han	6741b050d7	Update rl_replacements.py	2025-02-17 22:30:13 -08:00
Daniel Han	99b56e4193	Update rl.py	2025-02-17 22:24:57 -08:00
Daniel Han	6688732d2e	unsloth_num_chunks	2025-02-17 22:17:11 -08:00
Daniel Han	cc523685ae	Update rl_replacements.py	2025-02-17 21:17:58 -08:00
Daniel Han	df9c98bc14	Update rl_replacements.py	2025-02-17 21:10:26 -08:00
Daniel Han	afb60b7772	Update rl_replacements.py	2025-02-17 21:08:32 -08:00
Daniel Han	cb8a2a5550	Update rl_replacements.py	2025-02-17 21:03:44 -08:00
Daniel Han	bd3bd2103d	Update rl_replacements.py	2025-02-17 20:38:18 -08:00
Daniel Han	b6f473b804	Update rl_replacements.py	2025-02-17 20:21:20 -08:00
Daniel Han	d05e13c70c	Update rl.py	2025-02-17 20:08:15 -08:00
Daniel Han	c9b15eb00f	Update rl.py	2025-02-17 20:04:26 -08:00
Daniel Han	d5002e1ebf	Update rl_replacements.py	2025-02-17 20:04:14 -08:00
Daniel Han	798f8daaf1	Update rl.py	2025-02-17 20:00:04 -08:00
Daniel Han	c2917b06c7	Update rl.py	2025-02-17 19:58:17 -08:00
Daniel Han	53e14f4b2d	Update rl_replacements.py	2025-02-17 19:53:49 -08:00
Daniel Han	fe483b6210	Update rl_replacements.py	2025-02-17 19:45:07 -08:00
Daniel Han	86512dd59f	Update rl_replacements.py	2025-02-17 19:44:43 -08:00
Daniel Han	74f9c9e1fd	Update llama.py	2025-02-17 19:16:41 -08:00
Daniel Han	397a0e49d5	Update llama.py	2025-02-17 00:49:35 -08:00
Daniel Han	f68ab5ad84	Update rl_replacements.py	2025-02-17 00:43:11 -08:00
Daniel Han	518ecd6638	Update rl_replacements.py	2025-02-17 00:05:32 -08:00
Daniel Han	da456076d3	Update rl_replacements.py	2025-02-16 23:49:20 -08:00
Daniel Han	759d23c7a4	Update llama.py	2025-02-16 21:22:35 -08:00
Daniel Han	74463e92d9	Update rl_replacements.py	2025-02-16 21:06:47 -08:00
Daniel Han	d4e9e38dfe	Update rl_replacements.py	2025-02-16 20:55:26 -08:00
Daniel Han	f665fb243d	Update rl_replacements.py	2025-02-16 20:55:11 -08:00
Daniel Han	834d3d69a1	Update rl_replacements.py	2025-02-16 20:20:01 -08:00
Daniel Han	3db0f8a0b3	Update rl_replacements.py	2025-02-16 19:47:39 -08:00
Daniel Han	ccc609fa43	Update rl_replacements.py	2025-02-16 18:34:45 -08:00
Daniel Han	0174dfc14f	Update rl_replacements.py	2025-02-16 18:27:04 -08:00
Daniel Han	bdf52260d4	Update rl_replacements.py	2025-02-16 18:09:03 -08:00
Daniel Han	435552fc8a	Update rl_replacements.py	2025-02-16 17:13:24 -08:00
Daniel Han	4914c563e4	Update rl_replacements.py	2025-02-16 16:14:21 -08:00
Gennadii Manzhos	3584c1b855	llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649 ) * edit save.py to fix gguf saving breaks. * add check for .exe or not exe file extension for linux and windows	2025-02-16 02:04:08 -08:00
Daniel Han	a841d358a9	Update rl_replacements.py	2025-02-16 01:49:29 -08:00
Daniel Han	2cc8a54f7a	Update rl_replacements.py	2025-02-15 20:04:35 -08:00
Daniel Han	bfb3494b64	Update rl.py	2025-02-15 18:35:07 -08:00
Daniel Han	e86e739087	Update rl.py	2025-02-15 18:34:25 -08:00
Daniel Han	11e9251e89	Update rl_replacements.py	2025-02-15 18:06:12 -08:00
Daniel Han	780e02656b	Update rl.py	2025-02-15 18:03:57 -08:00
Daniel Han	d056dd5c80	Update rl.py	2025-02-15 18:00:08 -08:00
Daniel Han	a87a0a6193	Update rl.py	2025-02-15 17:57:47 -08:00
Daniel Han	28fbb67281	Update rl.py	2025-02-15 17:48:52 -08:00
Daniel Han	cb43671874	Remove docs	2025-02-15 17:36:18 -08:00
Daniel Han	57e6b9ddcc	Update rl.py	2025-02-15 16:45:57 -08:00
Daniel Han	630e4258e6	No compile	2025-02-15 16:45:25 -08:00
Daniel Han	6eb707f48d	Merge branch 'main' into nightly	2025-02-15 03:12:52 -08:00
Daniel Han	66a1345421	Fix weird tokenizer issue	2025-02-15 03:12:43 -08:00
Daniel Han	895c7ca320	Update mapper.py	2025-02-15 02:48:36 -08:00
Daniel Han	c5e9299a73	Add GRPO metrics (#1718 ) * Update llama.py * Update llama.py * Faster inference? * Update llama.py * Update llama.py * Update utils.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update mapper.py * Fast Inference via vLLM * Update llama.py * Update llama.py * Update utils.py * Create rl.py * PatchRL * Update rl.py * Update rl.py * Update rl.py * PatchRLStatistics * Update rl.py * Update rl.py * Update rl.py * Update utils.py * Update utils.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * RL metrics * Update rl.py * RL metrics * Update __init__.py * Update rl.py * Update rl.py * Update rl.py * Update chat_templates.py * Update mapper.py * Fp8 cache * Update llama.py * Update llama.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update __init__.py * Update loader.py * Update rl.py * Update rl.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Better TRL handling * Update rl.py * Update tokenizer_utils.py * Auto patching * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update rl.py * Update tokenizer_utils.py * Update rl.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update tokenizer_utils.py * Update rl.py * Update rl.py * Update rl.py * max seq length * Update rl.py * Update rl.py * Patching * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * NEFTune * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Extra replacements * Update rl_replacements.py * Update rl.py * extra RL replacements * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update _utils.py * Update loader_utils.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * autocast * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update pyproject.toml * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update _utils.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * GRPO optimized * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Selective Log softmax * Fix GRPO bsz * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Fix TRL * Metrics GRPO * Update rl_replacements.py * Update rl_replacements.py	2025-02-15 02:24:01 -08:00
Daniel Han	49321fb88e	Update rl_replacements.py	2025-02-15 02:17:26 -08:00
Daniel Han	b014f87c5c	Update rl_replacements.py	2025-02-15 02:12:49 -08:00
Daniel Han	7d1e6ae263	Metrics GRPO	2025-02-15 02:08:33 -08:00
Daniel Han	6d385c6f92	Merge branch 'main' into nightly	2025-02-15 01:56:40 -08:00
Daniel Han	c66350a48a	Memory efficient GRPO, DPO etc (#1716 ) * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Faster inference? * Update llama.py * Update llama.py * Update utils.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update mapper.py * Fast Inference via vLLM * Update llama.py * Update llama.py * Update utils.py * Create rl.py * PatchRL * Update rl.py * Update rl.py * Update rl.py * PatchRLStatistics * Update rl.py * Update rl.py * Update rl.py * Update utils.py * Update utils.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * RL metrics * Update rl.py * RL metrics * Update __init__.py * Update rl.py * Update rl.py * Update rl.py * Update chat_templates.py * Update mapper.py * Fp8 cache * Update llama.py * Update llama.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update __init__.py * Update loader.py * Update rl.py * Update rl.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Better TRL handling * Update rl.py * Update tokenizer_utils.py * Auto patching * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update rl.py * Update tokenizer_utils.py * Update rl.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update tokenizer_utils.py * Update rl.py * Update rl.py * Update rl.py * max seq length * Update rl.py * Update rl.py * Patching * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * NEFTune * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Extra replacements * Update rl_replacements.py * Update rl.py * extra RL replacements * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update _utils.py * Update loader_utils.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * autocast * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update pyproject.toml * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update _utils.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * GRPO optimized * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Selective Log softmax * Fix GRPO bsz * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Fix TRL	2025-02-15 01:19:39 -08:00
Daniel Han	a7f32f3412	Fix TRL	2025-02-15 01:13:41 -08:00
Daniel Han	40cc4cbe05	Update rl_replacements.py	2025-02-14 16:08:49 -08:00
Daniel Han	b1e963b2ea	Update rl_replacements.py	2025-02-14 16:03:13 -08:00
Daniel Han	02b45397d6	Update rl_replacements.py	2025-02-14 16:01:29 -08:00
Daniel Han	caca33f401	Update rl_replacements.py	2025-02-14 15:58:13 -08:00
Daniel Han	ee672a214e	Update rl.py	2025-02-14 15:56:05 -08:00
Daniel Han	294037324f	Fix GRPO bsz	2025-02-14 15:32:02 -08:00
Daniel Han	4b385df264	Selective Log softmax	2025-02-14 14:56:37 -08:00
Daniel Han	a204e6ecb9	Update rl_replacements.py	2025-02-14 04:53:06 -08:00
Daniel Han	9adbd6909b	Update rl_replacements.py	2025-02-14 04:49:41 -08:00
Daniel Han	6de69db2be	Update rl_replacements.py	2025-02-14 04:45:48 -08:00
Daniel Han	10c359f231	Update rl.py	2025-02-14 04:44:03 -08:00
Daniel Han	5a9f9b7f24	Update rl.py	2025-02-14 04:42:05 -08:00
Daniel Han	a142182cc9	Update rl.py	2025-02-14 04:38:03 -08:00
Daniel Han	409762bd19	Update rl.py	2025-02-14 04:35:03 -08:00
Daniel Han	546be40339	Update rl_replacements.py	2025-02-14 04:33:41 -08:00
Daniel Han	c1d028ced8	Update rl_replacements.py	2025-02-14 04:32:24 -08:00
Daniel Han	79f345d484	Update rl.py	2025-02-14 04:31:27 -08:00
Daniel Han	194c1869b9	GRPO optimized	2025-02-14 04:30:15 -08:00
Daniel Han	dc2ef0b255	Merge branch 'main' into nightly	2025-02-13 22:18:26 -08:00
Daniel Han	3136fd9611	Fix bugs (#1706 ) * Bug fixes * fix: flash_attn_detection_error (#1556) * fix: flash_attn_detection_error * Update _utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mapper.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * dim fix * Update _utils.py * Torch 2.6 support * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Faster inference? * Update llama.py * Update llama.py * Update utils.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update mapper.py * Fast Inference via vLLM * Update llama.py * Update llama.py * Update utils.py * Create rl.py * PatchRL * Update rl.py * Update rl.py * Update rl.py * PatchRLStatistics * Update rl.py * Update rl.py * Update rl.py * Update utils.py * Update utils.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * RL metrics * Update rl.py * RL metrics * Update __init__.py * Update rl.py * Update rl.py * Update rl.py * Update chat_templates.py * Update mapper.py * Fp8 cache * Update llama.py * Update llama.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update __init__.py * Update loader.py * Update rl.py * Update rl.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Better TRL handling * Update rl.py * Update tokenizer_utils.py * Auto patching * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update rl.py * Update tokenizer_utils.py * Update rl.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update tokenizer_utils.py * Update rl.py * Update rl.py * Update rl.py * max seq length * Update rl.py * Update rl.py * Patching * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * NEFTune * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Extra replacements * Update rl_replacements.py * Update rl.py * extra RL replacements * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update _utils.py * Update loader_utils.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * autocast * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update pyproject.toml * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update _utils.py * Update llama.py * Update _utils.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py --------- Co-authored-by: Zhe Zhang <2631992879@qq.com>	2025-02-13 19:12:19 -08:00
Daniel Han	4d1de7af9e	Update llama.py	2025-02-13 19:11:38 -08:00
Daniel Han	4377b396c3	Update llama.py	2025-02-13 17:25:12 -08:00
Daniel Han	4dbfdcebbd	Update llama.py	2025-02-13 17:22:39 -08:00
Daniel Han	a1db8042cb	Update llama.py	2025-02-13 17:18:42 -08:00
Daniel Han	90375db93b	Update rl_replacements.py	2025-02-13 17:15:29 -08:00
Daniel Han	ee1c2b4abd	Update llama.py	2025-02-13 17:12:06 -08:00
Daniel Han	c51e4e2d21	Update llama.py	2025-02-13 17:11:33 -08:00
Daniel Han	e20a459343	Update llama.py	2025-02-13 17:05:04 -08:00
Daniel Han	c17bf7e04c	Update llama.py	2025-02-13 17:00:14 -08:00
Daniel Han	dccd3999f3	Update rl.py	2025-02-13 16:47:27 -08:00
Daniel Han	8a5b163fe3	Update rl.py	2025-02-13 16:38:21 -08:00
Daniel Han	0651fe19ab	Update rl.py	2025-02-13 16:37:05 -08:00
Daniel Han	5346a5f96a	Update rl.py	2025-02-13 16:35:09 -08:00
Daniel Han	b9c4ab96cb	Update rl.py	2025-02-13 16:27:23 -08:00
Daniel Han	4190857458	Update rl_replacements.py	2025-02-13 16:27:02 -08:00
Daniel Han	144b9b9e53	Update _utils.py	2025-02-13 16:23:51 -08:00
Daniel Han	8ae57e25a7	Update llama.py	2025-02-13 16:11:51 -08:00
Daniel Han	834c3a4492	Merge branch 'main' into nightly	2025-02-13 15:14:35 -08:00
Daniel Han	4f9301d321	Update _utils.py	2025-02-13 15:14:17 -08:00
Daniel Han	26f8d8580b	Update pyproject.toml	2025-02-13 15:13:55 -08:00
Daniel Han	bf2ee8eed2	Merge branch 'main' into nightly	2025-02-13 15:02:56 -08:00
Daniel Han	39eaefce14	Update rl.py	2025-02-13 15:02:50 -08:00
Daniel Han	3ad4076e4b	Merge branch 'main' into nightly	2025-02-13 14:59:59 -08:00
Daniel Han	635e921506	Update _utils.py	2025-02-13 14:59:52 -08:00
Daniel Han	5c19724e7f	Update dpo.py	2025-02-13 14:59:42 -08:00
Daniel Han	f11079ba65	Merge branch 'main' into nightly	2025-02-13 14:57:58 -08:00
Daniel Han	43cda3240c	Update __init__.py	2025-02-13 14:55:14 -08:00
Daniel Han	010de17c90	Update _utils.py	2025-02-13 14:54:49 -08:00
Daniel Han	95fb1d699d	Fix bugs (#1701 ) * Phi 4 * Update llama.py * Torch.Cuda Is Available Condition and Warning (#1545) * check for torch.cuda and triton if available on my machine(mac m3) the cuda were not available * Update pyproject.toml * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update mistral.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix * Bug fixes * Update mapper.py * Add dropout to granite to match HF's implementation (#1557) Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com> * Update llama.py * Update llama.py * Bug fixes * fix: flash_attn_detection_error (#1556) * fix: flash_attn_detection_error * Update _utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mapper.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * dim fix * Update _utils.py * Torch 2.6 support * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Faster inference? * Update llama.py * Update llama.py * Update utils.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update mapper.py * Fast Inference via vLLM * Update llama.py * Update llama.py * Update utils.py * Create rl.py * PatchRL * Update rl.py * Update rl.py * Update rl.py * PatchRLStatistics * Update rl.py * Update rl.py * Update rl.py * Update utils.py * Update utils.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * RL metrics * Update rl.py * RL metrics * Update __init__.py * Update rl.py * Update rl.py * Update rl.py * Update chat_templates.py * Update mapper.py * Fp8 cache * Update llama.py * Update llama.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update __init__.py * Update loader.py * Update rl.py * Update rl.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Better TRL handling * Update rl.py * Update tokenizer_utils.py * Auto patching * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update rl.py * Update tokenizer_utils.py * Update rl.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update tokenizer_utils.py * Update rl.py * Update rl.py * Update rl.py * max seq length * Update rl.py * Update rl.py * Patching * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * NEFTune * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Extra replacements * Update rl_replacements.py * Update rl.py * extra RL replacements * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update _utils.py * Update loader_utils.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * autocast * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update pyproject.toml * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update llama.py * Update _utils.py --------- Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com> Co-authored-by: AminWhat <88392440+aminwhat@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Zhe Zhang <2631992879@qq.com>	2025-02-13 14:50:44 -08:00
Daniel Han	44e2879efb	Update _utils.py	2025-02-13 14:47:54 -08:00
Daniel Han	50568f1c15	Update llama.py	2025-02-13 14:42:38 -08:00
Daniel Han	c59c2738d9	Merge branch 'main' into nightly	2025-02-13 14:40:49 -08:00
Daniel Han	b1d2c9fb30	Update rl_replacements.py	2025-02-13 04:44:21 -08:00
Daniel Han	de7e76c250	Update rl_replacements.py	2025-02-13 04:40:53 -08:00
Daniel Han	1013ab5891	Update rl_replacements.py	2025-02-13 04:40:42 -08:00
Daniel Han	a0ed33d99a	Update rl_replacements.py	2025-02-13 04:34:13 -08:00
Daniel Han	dffbba5ff4	Update rl_replacements.py	2025-02-13 04:25:56 -08:00
Daniel Han	8a7a6c1a8e	Update rl_replacements.py	2025-02-13 02:06:36 -08:00
Daniel Han	ef9ad91e36	Update rl_replacements.py	2025-02-13 02:04:11 -08:00
Daniel Han	ca8080290c	Update rl_replacements.py	2025-02-13 02:01:51 -08:00
Daniel Han	386cb81c23	Update llama.py	2025-02-13 01:57:45 -08:00
Daniel Han	b181a8ff2b	Update rl_replacements.py	2025-02-13 01:52:07 -08:00
Daniel Han	5cd34b0b95	Update rl_replacements.py	2025-02-13 01:48:38 -08:00
Daniel Han	d594ba5812	Update rl_replacements.py	2025-02-13 01:44:00 -08:00
Daniel Han	c9da968975	Update rl_replacements.py	2025-02-13 01:42:55 -08:00
Michael Han	28cb38675f	Merge pull request #1688 from unslothai/shimmyshimmer-patch-5 Update README.md	2025-02-13 01:14:23 -08:00
Michael Han	6097db77bb	Update README.md	2025-02-13 01:14:06 -08:00
Daniel Han	53653128ff	Update llama.py	2025-02-13 00:29:34 -08:00
Daniel Han	3ddad518b6	Update llama.py	2025-02-13 00:26:58 -08:00
Daniel Han	8a35445c23	Update llama.py	2025-02-13 00:10:44 -08:00
Daniel Han	005a397436	Update llama.py	2025-02-12 23:03:40 -08:00
Daniel Han	7d881ae18a	Update llama.py	2025-02-12 23:00:51 -08:00
Daniel Han	a94e37db16	Update llama.py	2025-02-12 22:56:47 -08:00
Daniel Han	310a40b3f9	Update llama.py	2025-02-12 22:34:50 -08:00
Daniel Han	85bb48288a	Update pyproject.toml	2025-02-12 21:16:44 -08:00
Daniel Han	df43bdb8bf	Update llama.py	2025-02-12 21:06:33 -08:00
Daniel Han	eb5a14617f	Update llama.py	2025-02-12 20:33:37 -08:00
Daniel Han	318a59f6b4	Update llama.py	2025-02-12 20:29:10 -08:00
Daniel Han	35bad803b5	Update llama.py	2025-02-12 20:24:48 -08:00
Daniel Han	e270294550	Update rl_replacements.py	2025-02-12 20:18:07 -08:00
Daniel Han	3f500fdd12	Update llama.py	2025-02-12 19:11:09 -08:00
Daniel Han	56dffe007c	Update llama.py	2025-02-12 19:10:48 -08:00
Daniel Han	2f25ff2698	Update llama.py	2025-02-12 19:07:50 -08:00
Daniel Han	46007979eb	Update llama.py	2025-02-12 19:01:23 -08:00
Daniel Han	43458dafd4	Update llama.py	2025-02-12 18:57:01 -08:00
Daniel Han	2b7655527b	Update rl_replacements.py	2025-02-12 18:50:45 -08:00
Daniel Han	07616241a0	Update llama.py	2025-02-12 18:44:47 -08:00
Daniel Han	77f40b1ed9	Update rl_replacements.py	2025-02-12 16:23:47 -08:00
Daniel Han	5d040c70a7	Update rl_replacements.py	2025-02-12 16:19:34 -08:00
Daniel Han	2b360989c6	Update rl_replacements.py	2025-02-12 16:19:13 -08:00
Daniel Han	956ccb79e3	Update rl_replacements.py	2025-02-12 16:16:31 -08:00
Daniel Han	7681bff612	Update llama.py	2025-02-12 03:56:12 -08:00
Daniel Han	20acdcef31	Update rl_replacements.py	2025-02-12 03:50:32 -08:00
Daniel Han	0ddf688a56	autocast	2025-02-12 03:50:07 -08:00
Daniel Han	6ebf29b9fe	Update llama.py	2025-02-12 03:32:12 -08:00
Daniel Han	1f4b8e0c9c	Update llama.py	2025-02-12 03:24:32 -08:00
Daniel Han	4a08c6fc32	Update llama.py	2025-02-12 03:08:40 -08:00
Daniel Han	60ba876dc9	Update llama.py	2025-02-12 03:03:43 -08:00
Daniel Han	b88b77efce	Update rl.py	2025-02-12 02:27:34 -08:00
Daniel Han	771a6c95e3	Update rl_replacements.py	2025-02-12 01:53:16 -08:00
Daniel Han	c915b0ae2f	Update rl_replacements.py	2025-02-11 23:58:26 -08:00
Daniel Han	0e51ebdd58	Update rl.py	2025-02-11 23:47:33 -08:00
Daniel Han	bae1d69611	Update loader_utils.py	2025-02-11 23:45:26 -08:00
Daniel Han	08dea00cfb	Update _utils.py	2025-02-11 23:10:11 -08:00
Daniel Han	56d3ea2a7a	Update rl_replacements.py	2025-02-11 22:02:41 -08:00
Daniel Han	30adb81fc2	Update llama.py	2025-02-11 22:02:22 -08:00
Daniel Han	725c59bfd2	Update rl_replacements.py	2025-02-11 22:00:44 -08:00
Daniel Han	01e6c71d7c	Merge branch 'main' into nightly	2025-02-11 21:31:34 -08:00
Daniel Han	26e5d6ac08	Update rl_replacements.py	2025-02-11 21:31:23 -08:00
Daniel Han	b333b3064d	Update rl_replacements.py	2025-02-11 21:18:55 -08:00
Daniel Han	947649af63	Update rl_replacements.py	2025-02-11 21:16:56 -08:00
Daniel Han	f41d01e74d	Update rl_replacements.py	2025-02-11 21:14:41 -08:00
Daniel Han	b7b7213295	Update rl_replacements.py	2025-02-11 21:13:31 -08:00
Daniel Han	4bdd2ed59c	extra RL replacements	2025-02-11 21:10:32 -08:00
Daniel Han	fd48c77ff7	Update rl.py	2025-02-11 20:39:55 -08:00
Daniel Han	cf14867bc0	Update rl_replacements.py	2025-02-11 20:37:18 -08:00
Daniel Han	c4fdd39c08	Extra replacements	2025-02-11 20:35:34 -08:00
Daniel Han	44810d7876	Update rl.py	2025-02-11 19:34:41 -08:00
Daniel Han	fcebdb08bb	Update rl.py	2025-02-11 19:34:29 -08:00
Daniel Han	f24d897a29	Update rl.py	2025-02-11 19:00:53 -08:00
Daniel Han	275b836de1	Update rl.py	2025-02-11 18:57:35 -08:00
Daniel Han	e35dfacb2a	Update rl.py	2025-02-11 18:56:09 -08:00
Daniel Han	7b265dbe0c	Update rl.py	2025-02-11 18:54:39 -08:00
Daniel Han	2af746b1bd	Update rl.py	2025-02-11 18:49:09 -08:00
Daniel Han	7ea918d85b	NEFTune	2025-02-11 18:19:16 -08:00
Daniel Han	800536774a	Update rl.py	2025-02-11 16:20:14 -08:00
Daniel Han	af4cd27eb9	Update rl.py	2025-02-11 16:04:56 -08:00
Daniel Han	722c4ecca6	Update rl.py	2025-02-11 16:03:33 -08:00
Daniel Han	da96bd8374	Update rl.py	2025-02-11 15:57:32 -08:00
Daniel Han	7c3b51100f	Update rl.py	2025-02-11 15:53:46 -08:00
Daniel Han	aae24f5ae3	Patching	2025-02-11 15:11:16 -08:00
Daniel Han	875181b6d2	Update rl.py	2025-02-11 15:00:44 -08:00
Daniel Han	da7e35fd35	Update rl.py	2025-02-11 14:27:31 -08:00
Daniel Han	9680d0f73a	max seq length	2025-02-11 14:22:19 -08:00
Daniel Han	6fbba44ff0	Update rl.py	2025-02-11 14:11:33 -08:00
Daniel Han	766c71844f	Update rl.py	2025-02-11 03:25:37 -08:00
Daniel Han	f1a924c31f	Update rl.py	2025-02-11 03:23:05 -08:00
Daniel Han	c792fa4f20	Update tokenizer_utils.py	2025-02-11 03:21:59 -08:00
Daniel Han	9f1b839d34	Update rl.py	2025-02-11 03:20:41 -08:00
Daniel Han	6f6a544b5c	Update rl.py	2025-02-11 03:16:07 -08:00
Daniel Han	764d22e0bd	Update rl.py	2025-02-11 03:08:04 -08:00
Daniel Han	cd4778f78f	Update rl.py	2025-02-11 03:05:44 -08:00
Daniel Han	edb2a62bdb	Update rl.py	2025-02-11 03:04:05 -08:00
Daniel Han	cf4ccb543d	Update rl.py	2025-02-11 01:39:51 -08:00
Daniel Han	815ef563e4	Update rl.py	2025-02-11 01:36:57 -08:00
Daniel Han	9a64373b27	Update rl.py	2025-02-11 01:36:30 -08:00
Daniel Han	be18ce5db9	Update rl.py	2025-02-11 01:33:43 -08:00
Daniel Han	0b61f8e86e	Update tokenizer_utils.py	2025-02-11 01:30:02 -08:00
Daniel Han	835dab9903	Update tokenizer_utils.py	2025-02-11 01:28:28 -08:00
Daniel Han	d57f8a36ec	Update tokenizer_utils.py	2025-02-11 01:27:50 -08:00
Daniel Han	95029e4163	Update tokenizer_utils.py	2025-02-11 01:25:17 -08:00
Daniel Han	3261dffeed	Update tokenizer_utils.py	2025-02-11 01:22:58 -08:00
Daniel Han	3ded920491	Update tokenizer_utils.py	2025-02-11 01:17:13 -08:00
Daniel Han	afbad75e20	Update tokenizer_utils.py	2025-02-11 01:14:06 -08:00
Daniel Han	b3bbe3d3f9	Update tokenizer_utils.py	2025-02-11 00:42:47 -08:00
Daniel Han	12cccc52c9	Update rl.py	2025-02-11 00:37:08 -08:00
Daniel Han	0288ca825f	Update tokenizer_utils.py	2025-02-11 00:36:12 -08:00
Daniel Han	9d6ad6b400	Update rl.py	2025-02-11 00:24:31 -08:00
Daniel Han	07f39e0b05	Update tokenizer_utils.py	2025-02-11 00:23:24 -08:00
Daniel Han	543e6d5ab3	Update tokenizer_utils.py	2025-02-11 00:22:02 -08:00
Daniel Han	a4cde480a9	Update tokenizer_utils.py	2025-02-11 00:06:08 -08:00
Daniel Han	2aa87bd8f1	Auto patching	2025-02-10 23:33:15 -08:00
Daniel Han	0767a5eccb	Update tokenizer_utils.py	2025-02-10 23:30:08 -08:00
Daniel Han	d3fefc2095	Update rl.py	2025-02-10 23:25:37 -08:00
Daniel Han	56921915f5	Better TRL handling	2025-02-10 23:24:36 -08:00
Michael Han	a5d1391d12	Merge pull request #1654 from unslothai/shimmyshimmer-patch-4 Update README.md	2025-02-09 19:57:27 -08:00
Michael Han	9807456b29	Update README.md	2025-02-09 19:57:15 -08:00
Daniel Han	d07aa0e4d3	Update tokenizer_utils.py	2025-02-09 19:21:58 -08:00
Daniel Han	766f9e5d47	Update tokenizer_utils.py	2025-02-09 19:19:51 -08:00
Daniel Han	a80d468199	Merge branch 'main' into nightly	2025-02-09 19:06:32 -08:00
Diogo Neves	36c3d36e74	Fixed Triton url (#1607 ) Triton's link was pointing to the old research url	2025-02-08 19:41:39 -08:00
Daniel Han	bc7897805b	Merge branch 'main' into nightly	2025-02-07 00:53:37 -08:00
Michael Han	74fce13683	Update README.md	2025-02-06 17:20:19 -08:00
Daniel Han	cd52ac2e16	GRPO Bug fixes (#1623 ) * use exact model name * Update save.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * print * Update _utils.py * Update _utils.py * Update llama.py * Update _utils.py * Update vision.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * accurate_accumulation * Update loader.py * Update loader.py * Update _utils.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update pyproject.toml * Update __init__.py * Update pyproject.toml * Update __init__.py * Update __init__.py * Fix Triton heuristics https://github.com/triton-lang/triton/issues/5224 * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Xformers * Update loader.py * Update loader.py * Rewind * Update _utils.py * Update _utils.py * requires grad * Update loader.py * Update _utils.py * Update loader.py * changing model to base_model if peft model is already used * Improve debugging experience (#1512) * Create CONTRIBUTING.md (#1472) Creating contributing guidelines * Update CONTRIBUTING.md improved sentence * Improve logging control in `unsloth_compile_transformers` by conditionally redirecting stdout based on UNSLOTH_DISABLE_LOGGER environment variable --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> * Update loader.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a8edd0931a`. * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Auto change is_bfloat16_supported * Update llama.py * Force data-type * Update llama.py * All attention refactor fix (#1491) * change initilization of n_heads, n_kv_heads, hidden_size in llama.py * do the same for cohere, mistral, gemma2, granite * do the same for flexattention,cohere, mistral, granite * Update llama.py * Update llama.py * Update granite to work with latest post_patch methods (#1502) * Update granite to work with latest post_patch methods * Pass position_embeddings for granite even if transformers<4.47 * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Minor fixes for granite models (#1503) * Update granite.py Grab residual multiplier directly from layer * Update llama.py Version should read >= 4.47.1 as that is the version requiring the changes * Update granite.py * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * support modelscope models and datasets (#1481) * support modelscope * change modelscope args * remove useless import * remove useless import * fix * wip * fix * remove useless code * add readme * add some comments * change print to raise error * update comment * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Merge branch 'main' into nightly * Phi 4 * Update llama.py * Torch.Cuda Is Available Condition and Warning (#1545) * check for torch.cuda and triton if available on my machine(mac m3) the cuda were not available * Update pyproject.toml * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update mistral.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix * Bug fixes * Update mapper.py * Add dropout to granite to match HF's implementation (#1557) Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com> * Update llama.py * Update llama.py * Bug fixes * fix: flash_attn_detection_error (#1556) * fix: flash_attn_detection_error * Update _utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mapper.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * dim fix * Update _utils.py * Torch 2.6 support * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Faster inference? * Update llama.py * Update llama.py * Update utils.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update mapper.py * Fast Inference via vLLM * Update llama.py * Update llama.py * Update utils.py * Create rl.py * PatchRL * Update rl.py * Update rl.py * Update rl.py * PatchRLStatistics * Update rl.py * Update rl.py * Update rl.py * Update utils.py * Update utils.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * RL metrics * Update rl.py * RL metrics * Update __init__.py * Update rl.py * Update rl.py * Update rl.py * Update chat_templates.py * Update mapper.py * Fp8 cache * Update llama.py * Update llama.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update __init__.py * Update loader.py * Update rl.py * Update rl.py * Update _utils.py --------- Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com> Co-authored-by: Itsuro Tajima <tajima@georepublic.de> Co-authored-by: Muhammad Osama <muhammadosama1994@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Z <coffeevampirebusiness@gmail.com> Co-authored-by: tastelikefeet <58414341+tastelikefeet@users.noreply.github.com> Co-authored-by: AminWhat <88392440+aminwhat@users.noreply.github.com> Co-authored-by: Zhe Zhang <2631992879@qq.com>	2025-02-06 05:08:22 -08:00
Daniel Han	10b604431a	Update _utils.py	2025-02-06 05:07:37 -08:00
Daniel Han	d437235d15	Update rl.py	2025-02-06 04:29:04 -08:00
Daniel Han	0974094f02	Update rl.py	2025-02-06 04:23:20 -08:00
Daniel Han	7854bc2cb8	Merge branch 'main' into nightly	2025-02-06 03:49:16 -08:00
Daniel Han	c8e4d4b767	Update	2025-02-06 03:23:54 -08:00
Daniel Han	5d6275c957	Update _utils.py	2025-02-06 02:44:46 -08:00
Daniel Han	e288d96272	Update pyproject.toml	2025-02-06 02:44:23 -08:00
Daniel Han	144190bd06	GRPO, vLLM, Bug Fixes, Reinforcement Learning (#1620 ) * use exact model name * Update save.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * print * Update _utils.py * Update _utils.py * Update llama.py * Update _utils.py * Update vision.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * accurate_accumulation * Update loader.py * Update loader.py * Update _utils.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update pyproject.toml * Update __init__.py * Update pyproject.toml * Update __init__.py * Update __init__.py * Fix Triton heuristics https://github.com/triton-lang/triton/issues/5224 * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Xformers * Update loader.py * Update loader.py * Rewind * Update _utils.py * Update _utils.py * requires grad * Update loader.py * Update _utils.py * Update loader.py * changing model to base_model if peft model is already used * Improve debugging experience (#1512) * Create CONTRIBUTING.md (#1472) Creating contributing guidelines * Update CONTRIBUTING.md improved sentence * Improve logging control in `unsloth_compile_transformers` by conditionally redirecting stdout based on UNSLOTH_DISABLE_LOGGER environment variable --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> * Update loader.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a8edd0931a`. * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Auto change is_bfloat16_supported * Update llama.py * Force data-type * Update llama.py * All attention refactor fix (#1491) * change initilization of n_heads, n_kv_heads, hidden_size in llama.py * do the same for cohere, mistral, gemma2, granite * do the same for flexattention,cohere, mistral, granite * Update llama.py * Update llama.py * Update granite to work with latest post_patch methods (#1502) * Update granite to work with latest post_patch methods * Pass position_embeddings for granite even if transformers<4.47 * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Minor fixes for granite models (#1503) * Update granite.py Grab residual multiplier directly from layer * Update llama.py Version should read >= 4.47.1 as that is the version requiring the changes * Update granite.py * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * support modelscope models and datasets (#1481) * support modelscope * change modelscope args * remove useless import * remove useless import * fix * wip * fix * remove useless code * add readme * add some comments * change print to raise error * update comment * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Merge branch 'main' into nightly * Phi 4 * Update llama.py * Torch.Cuda Is Available Condition and Warning (#1545) * check for torch.cuda and triton if available on my machine(mac m3) the cuda were not available * Update pyproject.toml * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update mistral.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix * Bug fixes * Update mapper.py * Add dropout to granite to match HF's implementation (#1557) Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com> * Update llama.py * Update llama.py * Bug fixes * fix: flash_attn_detection_error (#1556) * fix: flash_attn_detection_error * Update _utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mapper.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * dim fix * Update _utils.py * Torch 2.6 support * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Faster inference? * Update llama.py * Update llama.py * Update utils.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update mapper.py * Fast Inference via vLLM * Update llama.py * Update llama.py * Update utils.py * Create rl.py * PatchRL * Update rl.py * Update rl.py * Update rl.py * PatchRLStatistics * Update rl.py * Update rl.py * Update rl.py * Update utils.py * Update utils.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * RL metrics * Update rl.py * RL metrics * Update __init__.py * Update rl.py * Update rl.py * Update rl.py * Update chat_templates.py * Update mapper.py * Fp8 cache * Update llama.py * Update llama.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update __init__.py * Update loader.py --------- Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com> Co-authored-by: Itsuro Tajima <tajima@georepublic.de> Co-authored-by: Muhammad Osama <muhammadosama1994@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Z <coffeevampirebusiness@gmail.com> Co-authored-by: tastelikefeet <58414341+tastelikefeet@users.noreply.github.com> Co-authored-by: AminWhat <88392440+aminwhat@users.noreply.github.com> Co-authored-by: Zhe Zhang <2631992879@qq.com>	2025-02-06 02:41:12 -08:00
Daniel Han	9dd1219358	Update loader.py	2025-02-06 02:36:42 -08:00
Daniel Han	7d9a7aaa3d	Update __init__.py	2025-02-06 02:32:34 -08:00
Daniel Han	79d2b0aea3	Update rl.py	2025-02-06 02:31:01 -08:00
Daniel Han	d6ff188cec	Update rl.py	2025-02-06 02:24:13 -08:00
Daniel Han	965dee5f5e	Update rl.py	2025-02-06 02:21:08 -08:00
Daniel Han	e71a8aa101	Update rl.py	2025-02-06 02:14:59 -08:00
Daniel Han	68da8b69b7	Update rl.py	2025-02-06 01:57:37 -08:00
Daniel Han	da966a4e1c	Update rl.py	2025-02-06 01:50:22 -08:00
Daniel Han	fc4555c288	Update rl.py	2025-02-06 01:47:44 -08:00
Daniel Han	e20efe32a3	Update rl.py	2025-02-06 01:16:43 -08:00
Daniel Han	10d65ea437	Update rl.py	2025-02-06 01:12:33 -08:00
Daniel Han	4724be861e	Update rl.py	2025-02-06 01:10:42 -08:00
Daniel Han	eb1f9b91f1	Update rl.py	2025-02-06 01:08:31 -08:00
Daniel Han	5c0ed78b9b	Update rl.py	2025-02-06 01:07:00 -08:00
Daniel Han	c998665f0b	Update rl.py	2025-02-06 01:06:40 -08:00
Daniel Han	c7f2ec3338	Update rl.py	2025-02-06 01:06:20 -08:00
Daniel Han	d52341f06b	Update rl.py	2025-02-06 01:05:48 -08:00
Daniel Han	71c34db351	Update rl.py	2025-02-06 01:02:31 -08:00
Daniel Han	32f09a14d6	Update rl.py	2025-02-06 00:59:13 -08:00
Daniel Han	b196ba85ca	Update llama.py	2025-02-05 20:30:36 -08:00
Daniel Han	6719a6134b	Update llama.py	2025-02-05 20:24:33 -08:00
Daniel Han	18de8f29c8	Fp8 cache	2025-02-05 20:00:02 -08:00
Daniel Han	b8a833e1af	Update mapper.py	2025-02-05 18:31:01 -08:00
Daniel Han	7cc03b2b51	Update chat_templates.py	2025-02-05 17:52:36 -08:00
Daniel Han	9455082740	Update rl.py	2025-02-05 15:36:59 -08:00
Daniel Han	1ff75a7dc7	Update rl.py	2025-02-05 15:21:53 -08:00
Daniel Han	c0805c415f	Update rl.py	2025-02-05 15:16:44 -08:00
Daniel Han	ba0a2871c2	Update __init__.py	2025-02-05 15:11:40 -08:00
Daniel Han	093cd0cde3	RL metrics	2025-02-05 15:08:10 -08:00
Daniel Han	cc6bb7d1db	Update rl.py	2025-02-05 15:02:52 -08:00
Daniel Han	8ce4a73bfc	RL metrics	2025-02-05 14:59:01 -08:00
Daniel Han	b7bd548779	Update rl.py	2025-02-05 07:28:16 -08:00
Daniel Han	9b76d49761	Update rl.py	2025-02-05 07:27:07 -08:00
Daniel Han	cc927d2d18	Update rl.py	2025-02-05 07:25:05 -08:00
Daniel Han	d7c3f9cba6	Update rl.py	2025-02-05 07:13:40 -08:00
Daniel Han	aebeeb4901	Update rl.py	2025-02-05 06:58:04 -08:00
Daniel Han	8506d6be95	Update rl.py	2025-02-05 06:56:12 -08:00
Daniel Han	e59e196448	Update rl.py	2025-02-05 06:50:39 -08:00
Daniel Han	7e7fc35625	Update rl.py	2025-02-05 06:50:14 -08:00
Daniel Han	6aa0fc7e28	Update rl.py	2025-02-05 06:48:54 -08:00
Daniel Han	a6f919f60c	Update rl.py	2025-02-05 06:44:18 -08:00
Daniel Han	eed2ac7329	Update rl.py	2025-02-05 06:41:37 -08:00
Daniel Han	bfe87a51bf	Update rl.py	2025-02-05 06:37:32 -08:00
Daniel Han	0aa4c035e8	Update rl.py	2025-02-05 06:32:51 -08:00
Daniel Han	ea55289b5a	Update rl.py	2025-02-05 06:28:54 -08:00
Daniel Han	dc7b58bad3	Update rl.py	2025-02-05 06:14:12 -08:00
Daniel Han	648efd0525	Update utils.py	2025-02-05 06:02:42 -08:00
Daniel Han	e7a1f0458e	Update utils.py	2025-02-05 06:01:38 -08:00
Daniel Han	fc02b50a56	Update rl.py	2025-02-05 05:47:23 -08:00
Daniel Han	c0c4f56208	Update rl.py	2025-02-05 05:45:05 -08:00
Daniel Han	6386de4cce	Update rl.py	2025-02-05 05:36:51 -08:00
Daniel Han	a94afad455	PatchRLStatistics	2025-02-05 05:36:04 -08:00
Daniel Han	19b40e883b	Update rl.py	2025-02-05 05:24:36 -08:00
Daniel Han	e702cfa179	Update rl.py	2025-02-05 05:23:19 -08:00
Daniel Han	7dfd171c55	Update rl.py	2025-02-05 05:19:37 -08:00
Daniel Han	5e69427fda	PatchRL	2025-02-05 05:17:38 -08:00
Daniel Han	665f52065f	Create rl.py	2025-02-05 05:14:01 -08:00
Daniel Han	b20253f713	Update utils.py	2025-02-05 04:02:40 -08:00
Daniel Han	0cab914893	Update llama.py	2025-02-05 02:56:16 -08:00
Daniel Han	e4ac52fe85	Update llama.py	2025-02-05 02:30:51 -08:00
Daniel Han	4157c640b7	Fast Inference via vLLM	2025-02-05 02:21:15 -08:00
Daniel Han	5b7b456514	Update mapper.py	2025-02-03 19:43:52 -08:00
Daniel Han	82e011e8d4	Update utils.py	2025-02-03 15:43:40 -08:00
Daniel Han	dc7f0fca4f	Update utils.py	2025-02-02 17:12:21 -08:00
Daniel Han	a274e75db7	Update utils.py	2025-02-02 17:07:31 -08:00
Daniel Han	9296a6a93d	Update utils.py	2025-02-02 17:04:37 -08:00
Daniel Han	126d804d2a	Update utils.py	2025-02-02 17:00:24 -08:00
Daniel Han	ed31cf80c6	Update utils.py	2025-02-02 16:31:28 -08:00
Daniel Han	6e5a6af0d3	Update utils.py	2025-02-02 16:29:26 -08:00
Daniel Han	31449956a3	Update utils.py	2025-02-02 16:26:55 -08:00
Daniel Han	e961d68733	Update utils.py	2025-02-02 16:24:10 -08:00
Daniel Han	35609081cd	Update utils.py	2025-02-02 16:23:31 -08:00
Daniel Han	56b6c37ec0	Update utils.py	2025-02-02 16:18:56 -08:00
Daniel Han	1f42dc194a	Update utils.py	2025-02-02 15:23:05 -08:00
Daniel Han	8d457854ac	Update utils.py	2025-02-02 15:17:50 -08:00
Daniel Han	61eb34b6c2	Update llama.py	2025-02-02 14:54:58 -08:00
Daniel Han	e91706febf	Update llama.py	2025-02-02 14:51:14 -08:00
Daniel Han	24b84d89ec	Update utils.py	2025-02-02 13:52:15 -08:00
Daniel Han	ffe31d5cea	Update llama.py	2025-02-02 13:49:25 -08:00
Daniel Han	50d5250a57	Update llama.py	2025-02-02 13:47:14 -08:00
Daniel Han	9314e46e07	Faster inference?	2025-02-02 13:45:25 -08:00
Daniel Han	14d6199e63	Update llama.py	2025-02-02 04:00:47 -08:00
Daniel Han	f6bffaee84	Update llama.py	2025-02-02 03:57:24 -08:00
Daniel Han	f5d65f6570	Update llama.py	2025-02-02 03:56:25 -08:00
Daniel Han	865b7d685f	Update llama.py	2025-02-02 03:56:13 -08:00
Daniel Han	6af2af1b48	Update llama.py	2025-02-02 03:49:18 -08:00
Daniel Han	c77471114b	Update llama.py	2025-02-02 02:23:53 -08:00
Daniel Han	52aeca630e	Update llama.py	2025-02-02 02:18:33 -08:00
Daniel Han	69c2cd23c0	Update llama.py	2025-02-02 02:15:54 -08:00
Daniel Han	2b5250f601	Update llama.py	2025-02-02 02:14:13 -08:00
Daniel Han	17c486acf8	Update llama.py	2025-02-02 02:11:33 -08:00
Daniel Han	ceea79e3a2	Update llama.py	2025-02-02 02:10:17 -08:00
Daniel Han	c571e2395e	Update llama.py	2025-02-02 02:06:13 -08:00
Daniel Han	4936049259	Torch 2.6 support	2025-02-02 00:43:26 -08:00
Daniel Han	70fa248fe7	Merge branch 'main' into nightly	2025-02-01 23:46:58 -08:00
Daniel Han	e91ae59e84	Update _utils.py	2025-02-01 23:46:40 -08:00
Daniel Han	890c9cf818	dim fix	2025-02-01 19:27:33 -08:00
Daniel Han	a75869f545	Update gemma.py	2025-02-01 19:15:55 -08:00
Daniel Han	ff7cb20f93	Update gemma.py	2025-02-01 19:11:20 -08:00
Daniel Han	c360fcd526	Update gemma.py	2025-02-01 19:02:36 -08:00
Daniel Han	147623d270	Update gemma.py	2025-02-01 17:54:00 -08:00
Daniel Han	e2b23f17b1	Mistral 24B, Qwen 2.5 VL support (#1598 ) * use exact model name * Update save.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * print * Update _utils.py * Update _utils.py * Update llama.py * Update _utils.py * Update vision.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * accurate_accumulation * Update loader.py * Update loader.py * Update _utils.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update pyproject.toml * Update __init__.py * Update pyproject.toml * Update __init__.py * Update __init__.py * Fix Triton heuristics https://github.com/triton-lang/triton/issues/5224 * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Xformers * Update loader.py * Update loader.py * Rewind * Update _utils.py * Update _utils.py * requires grad * Update loader.py * Update _utils.py * Update loader.py * changing model to base_model if peft model is already used * Improve debugging experience (#1512) * Create CONTRIBUTING.md (#1472) Creating contributing guidelines * Update CONTRIBUTING.md improved sentence * Improve logging control in `unsloth_compile_transformers` by conditionally redirecting stdout based on UNSLOTH_DISABLE_LOGGER environment variable --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> * Update loader.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a8edd0931a`. * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Auto change is_bfloat16_supported * Update llama.py * Force data-type * Update llama.py * All attention refactor fix (#1491) * change initilization of n_heads, n_kv_heads, hidden_size in llama.py * do the same for cohere, mistral, gemma2, granite * do the same for flexattention,cohere, mistral, granite * Update llama.py * Update llama.py * Update granite to work with latest post_patch methods (#1502) * Update granite to work with latest post_patch methods * Pass position_embeddings for granite even if transformers<4.47 * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Minor fixes for granite models (#1503) * Update granite.py Grab residual multiplier directly from layer * Update llama.py Version should read >= 4.47.1 as that is the version requiring the changes * Update granite.py * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * support modelscope models and datasets (#1481) * support modelscope * change modelscope args * remove useless import * remove useless import * fix * wip * fix * remove useless code * add readme * add some comments * change print to raise error * update comment * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Merge branch 'main' into nightly * Phi 4 * Update llama.py * Torch.Cuda Is Available Condition and Warning (#1545) * check for torch.cuda and triton if available on my machine(mac m3) the cuda were not available * Update pyproject.toml * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update mistral.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix * Bug fixes * Update mapper.py * Add dropout to granite to match HF's implementation (#1557) Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com> * Update llama.py * Update llama.py * Bug fixes * fix: flash_attn_detection_error (#1556) * fix: flash_attn_detection_error * Update _utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mapper.py --------- Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com> Co-authored-by: Itsuro Tajima <tajima@georepublic.de> Co-authored-by: Muhammad Osama <muhammadosama1994@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Z <coffeevampirebusiness@gmail.com> Co-authored-by: tastelikefeet <58414341+tastelikefeet@users.noreply.github.com> Co-authored-by: AminWhat <88392440+aminwhat@users.noreply.github.com> Co-authored-by: Zhe Zhang <2631992879@qq.com>	2025-01-31 03:34:36 -08:00
Daniel Han	0240dc2c5e	Merge branch 'main' into nightly	2025-01-31 03:33:30 -08:00
Daniel Han	4f57c09ee4	Update _utils.py	2025-01-31 03:33:24 -08:00
Daniel Han	7267b22c4e	Merge branch 'main' into nightly	2025-01-31 03:02:42 -08:00
Daniel Han	b5763a886d	Update mapper.py	2025-01-31 03:02:37 -08:00
Michael Han	edec640658	Merge pull request #1595 from unslothai/shimmyshimmer-patch-3 Update README.md	2025-01-30 21:05:57 -08:00
Michael Han	789af5b7f9	Update README.md	2025-01-30 21:05:45 -08:00
Michael Han	69d879970a	Merge pull request #1580 from unslothai/shimmyshimmer-patch-2 Update README.md	2025-01-26 14:12:10 -08:00
Michael Han	748d1f1fd0	Update README.md Updating super old benchmarks	2025-01-26 14:11:58 -08:00
Daniel Han	d847f90a29	Fix triton.ops	2025-01-22 17:49:20 -08:00
Daniel Han	73d58170b2	move TritonOps	2025-01-22 16:56:01 -08:00
Daniel Han	021bdad687	triton.ops error	2025-01-22 16:53:35 -08:00
Daniel Han	780a799542	Update __init__.py	2025-01-22 16:46:54 -08:00
Daniel Han	5509502af4	Merge branch 'main' into nightly	2025-01-22 16:46:04 -08:00
Daniel Han	2c79a95a0d	Update __init__.py	2025-01-22 16:45:41 -08:00
Daniel Han	8ea67d78ac	Fix triton.ops missing Triton 3.2	2025-01-22 16:44:48 -08:00
Michael Han	065b49867c	Merge pull request #1569 from unslothai/shimmyshimmer-patch-1 Update README.md	2025-01-20 22:13:30 -08:00
Michael Han	b4c3b5eea9	Update README.md	2025-01-20 22:13:07 -08:00
Daniel Han	e633d7f056	Update mapper.py	2025-01-20 08:10:20 -08:00
Daniel Han	f90bd4ec49	Fix Mistral, Qwen (#1565 ) * use exact model name * Update save.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * print * Update _utils.py * Update _utils.py * Update llama.py * Update _utils.py * Update vision.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * accurate_accumulation * Update loader.py * Update loader.py * Update _utils.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update pyproject.toml * Update __init__.py * Update pyproject.toml * Update __init__.py * Update __init__.py * Fix Triton heuristics https://github.com/triton-lang/triton/issues/5224 * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Xformers * Update loader.py * Update loader.py * Rewind * Update _utils.py * Update _utils.py * requires grad * Update loader.py * Update _utils.py * Update loader.py * changing model to base_model if peft model is already used * Improve debugging experience (#1512) * Create CONTRIBUTING.md (#1472) Creating contributing guidelines * Update CONTRIBUTING.md improved sentence * Improve logging control in `unsloth_compile_transformers` by conditionally redirecting stdout based on UNSLOTH_DISABLE_LOGGER environment variable --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> * Update loader.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a8edd0931a`. * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Auto change is_bfloat16_supported * Update llama.py * Force data-type * Update llama.py * All attention refactor fix (#1491) * change initilization of n_heads, n_kv_heads, hidden_size in llama.py * do the same for cohere, mistral, gemma2, granite * do the same for flexattention,cohere, mistral, granite * Update llama.py * Update llama.py * Update granite to work with latest post_patch methods (#1502) * Update granite to work with latest post_patch methods * Pass position_embeddings for granite even if transformers<4.47 * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Minor fixes for granite models (#1503) * Update granite.py Grab residual multiplier directly from layer * Update llama.py Version should read >= 4.47.1 as that is the version requiring the changes * Update granite.py * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * support modelscope models and datasets (#1481) * support modelscope * change modelscope args * remove useless import * remove useless import * fix * wip * fix * remove useless code * add readme * add some comments * change print to raise error * update comment * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Merge branch 'main' into nightly * Phi 4 * Update llama.py * Torch.Cuda Is Available Condition and Warning (#1545) * check for torch.cuda and triton if available on my machine(mac m3) the cuda were not available * Update pyproject.toml * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update mistral.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix * Bug fixes * Update mapper.py * Add dropout to granite to match HF's implementation (#1557) Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com> * Update llama.py * Update llama.py * Bug fixes * fix: flash_attn_detection_error (#1556) * fix: flash_attn_detection_error * Update _utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> --------- Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com> Co-authored-by: Itsuro Tajima <tajima@georepublic.de> Co-authored-by: Muhammad Osama <muhammadosama1994@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Z <coffeevampirebusiness@gmail.com> Co-authored-by: tastelikefeet <58414341+tastelikefeet@users.noreply.github.com> Co-authored-by: AminWhat <88392440+aminwhat@users.noreply.github.com> Co-authored-by: Zhe Zhang <2631992879@qq.com>	2025-01-20 01:27:24 -08:00
Zhe Zhang	9c5accea7b	fix: flash_attn_detection_error (#1556 ) * fix: flash_attn_detection_error * Update _utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-01-20 01:25:31 -08:00
Daniel Han	6b90239a33	Bug fixes	2025-01-20 01:10:55 -08:00
Daniel Han	928a9c631b	Update llama.py	2025-01-19 19:19:08 -08:00
Daniel Han	7995b56526	Merge branch 'main' into nightly	2025-01-19 19:18:05 -08:00
Daniel Han	b979a01ba9	Update llama.py	2025-01-19 15:24:14 -08:00
Daniel Han	db16fb3022	Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly	2025-01-19 14:03:11 -08:00
Datta Nimmaturi	f5fb462bec	Add dropout to granite to match HF's implementation (#1557 ) Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>	2025-01-19 03:54:12 -08:00
Daniel Han	0adfa0bc7d	Update mapper.py	2025-01-19 01:37:13 -08:00
Daniel Han	fdd0ace6fd	Update issue templates	2025-01-17 00:43:11 -08:00
Daniel Han	1576396cd0	Bug fixes	2025-01-16 03:09:02 -08:00
Daniel Han	3b908c36e8	Fix	2025-01-16 01:22:13 -08:00
Daniel Han	883d793607	Update _utils.py	2025-01-16 01:18:15 -08:00
Daniel Han	77e2c4a0d7	Update _utils.py	2025-01-16 01:15:42 -08:00
Daniel Han	d806dcaf8d	Update _utils.py	2025-01-16 01:10:40 -08:00
Daniel Han	d05463bcd0	Update _utils.py	2025-01-16 01:09:23 -08:00
Daniel Han	624ad17c9a	Update _utils.py	2025-01-16 01:07:23 -08:00
Daniel Han	6b5bfa5147	Update mistral.py	2025-01-16 00:58:46 -08:00
Daniel Han	d996955627	Update mistral.py	2025-01-16 00:56:56 -08:00
AminWhat	6b22725df7	Torch.Cuda Is Available Condition and Warning (#1545 ) * check for torch.cuda and triton if available on my machine(mac m3) the cuda were not available * Update pyproject.toml * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-01-15 23:02:23 -08:00
Daniel Han	5266a43d0b	Merge branch 'main' into nightly	2025-01-15 22:55:45 -08:00
Michael Han	3c789bc88e	Merge pull request #1542 from unslothai/shimmyshimmer-patch-3 Update README.md	2025-01-14 23:20:19 -08:00
Michael Han	e3162dc5bf	Update README.md Update to benchmark tables	2025-01-14 23:20:07 -08:00
Daniel Han	ef86ce5cba	Update llama.py	2025-01-14 22:32:44 -08:00
Daniel Han	fb1397d926	Merge branch 'main' into nightly	2025-01-14 22:32:02 -08:00
Daniel Han	0a2c397393	Update issue templates	2025-01-14 03:13:35 -08:00
Daniel Han	64c54c284e	Update bug_report.md (#1538 )	2025-01-14 03:12:17 -08:00
Daniel Han	a732ae88f9	Update issue templates	2025-01-14 03:10:29 -08:00
Michael Han	2033f40135	Merge pull request #1529 from unslothai/shimmyshimmer-patch-2 Update README.md	2025-01-11 17:35:11 -08:00
Michael Han	08c330b7cc	Update README.md	2025-01-11 17:34:51 -08:00
Michael Han	9569392187	Merge pull request #1515 from unslothai/shimmyshimmer-patch-1 Update README.md for Notebooks	2025-01-10 10:13:04 -08:00
Daniel Han	a72a9d9b06	Update mapper.py	2025-01-10 04:34:23 -08:00
Michael Han	db14c7f182	Update README.md	2025-01-09 16:59:43 -08:00
Michael Han	59d7cd9888	Update README.md	2025-01-08 23:02:27 -08:00
Daniel Han	e42bd98706	Update Unsloth-Zoo	2025-01-08 16:46:04 -08:00
Daniel Han	4feae9ae42	Update _utils.py	2025-01-08 15:48:40 -08:00
Daniel Han	1767be3692	Update tokenizer_utils.py	2025-01-08 15:48:11 -08:00
Daniel Han	3b4364985f	Phi-4 bug fix	2025-01-08 15:40:27 -08:00
Daniel Han	6cbfca8c63	Phi-4 (#1523 ) * use exact model name * Update save.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * print * Update _utils.py * Update _utils.py * Update llama.py * Update _utils.py * Update vision.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * accurate_accumulation * Update loader.py * Update loader.py * Update _utils.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update pyproject.toml * Update __init__.py * Update pyproject.toml * Update __init__.py * Update __init__.py * Fix Triton heuristics https://github.com/triton-lang/triton/issues/5224 * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Xformers * Update loader.py * Update loader.py * Rewind * Update _utils.py * Update _utils.py * requires grad * Update loader.py * Update _utils.py * Update loader.py * changing model to base_model if peft model is already used * Improve debugging experience (#1512) * Create CONTRIBUTING.md (#1472) Creating contributing guidelines * Update CONTRIBUTING.md improved sentence * Improve logging control in `unsloth_compile_transformers` by conditionally redirecting stdout based on UNSLOTH_DISABLE_LOGGER environment variable --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> * Update loader.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `b7ddf962d2`. * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Auto change is_bfloat16_supported * Update llama.py * Force data-type * Update llama.py * All attention refactor fix (#1491) * change initilization of n_heads, n_kv_heads, hidden_size in llama.py * do the same for cohere, mistral, gemma2, granite * do the same for flexattention,cohere, mistral, granite * Update llama.py * Update llama.py * Update granite to work with latest post_patch methods (#1502) * Update granite to work with latest post_patch methods * Pass position_embeddings for granite even if transformers<4.47 * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Minor fixes for granite models (#1503) * Update granite.py Grab residual multiplier directly from layer * Update llama.py Version should read >= 4.47.1 as that is the version requiring the changes * Update granite.py * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * support modelscope models and datasets (#1481) * support modelscope * change modelscope args * remove useless import * remove useless import * fix * wip * fix * remove useless code * add readme * add some comments * change print to raise error * update comment * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Merge branch 'main' into nightly * Phi 4 --------- Co-authored-by: Itsuro Tajima <tajima@georepublic.de> Co-authored-by: Muhammad Osama <muhammadosama1994@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Z <coffeevampirebusiness@gmail.com> Co-authored-by: tastelikefeet <58414341+tastelikefeet@users.noreply.github.com>	2025-01-08 15:10:46 -08:00
Daniel Han	1820995bae	Merge branch 'main' into nightly	2025-01-08 15:10:13 -08:00
Daniel Han	f77d6d608c	Phi 4	2025-01-08 14:38:41 -08:00
Daniel Han	0554918864	Merge branch 'main' into nightly	2025-01-08 12:42:18 -08:00
sebaxakerhtc	71ca60c7f0	Update __init__.py (#1520 ) * Update __init__.py This PR is solving the (issue)[https://github.com/unslothai/unsloth/issues/1518] with some GPUs * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-01-07 14:51:17 -08:00
Daniel Han	d90aefea98	Update pyproject.toml	2025-01-07 04:29:09 -08:00
Daniel Han	63782ea3af	Bug fixes (#1516 ) * use exact model name * Update save.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * print * Update _utils.py * Update _utils.py * Update llama.py * Update _utils.py * Update vision.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * accurate_accumulation * Update loader.py * Update loader.py * Update _utils.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update pyproject.toml * Update __init__.py * Update pyproject.toml * Update __init__.py * Update __init__.py * Fix Triton heuristics https://github.com/triton-lang/triton/issues/5224 * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Xformers * Update loader.py * Update loader.py * Rewind * Update _utils.py * Update _utils.py * requires grad * Update loader.py * Update _utils.py * Update loader.py * changing model to base_model if peft model is already used * Improve debugging experience (#1512) * Create CONTRIBUTING.md (#1472) Creating contributing guidelines * Update CONTRIBUTING.md improved sentence * Improve logging control in `unsloth_compile_transformers` by conditionally redirecting stdout based on UNSLOTH_DISABLE_LOGGER environment variable --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> * Update loader.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `b7ddf962d2`. * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Auto change is_bfloat16_supported * Update llama.py * Force data-type * Update llama.py * All attention refactor fix (#1491) * change initilization of n_heads, n_kv_heads, hidden_size in llama.py * do the same for cohere, mistral, gemma2, granite * do the same for flexattention,cohere, mistral, granite * Update llama.py * Update llama.py * Update granite to work with latest post_patch methods (#1502) * Update granite to work with latest post_patch methods * Pass position_embeddings for granite even if transformers<4.47 * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Minor fixes for granite models (#1503) * Update granite.py Grab residual multiplier directly from layer * Update llama.py Version should read >= 4.47.1 as that is the version requiring the changes * Update granite.py * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * support modelscope models and datasets (#1481) * support modelscope * change modelscope args * remove useless import * remove useless import * fix * wip * fix * remove useless code * add readme * add some comments * change print to raise error * update comment * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> --------- Co-authored-by: Itsuro Tajima <tajima@georepublic.de> Co-authored-by: Muhammad Osama <muhammadosama1994@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com> Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Z <coffeevampirebusiness@gmail.com> Co-authored-by: tastelikefeet <58414341+tastelikefeet@users.noreply.github.com>	2025-01-07 04:23:14 -08:00
tastelikefeet	83421fd2b5	support modelscope models and datasets (#1481 ) * support modelscope * change modelscope args * remove useless import * remove useless import * fix * wip * fix * remove useless code * add readme * add some comments * change print to raise error * update comment * Update loader.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-01-07 04:09:36 -08:00
Z	3cde4e1922	Minor fixes for granite models (#1503 ) * Update granite.py Grab residual multiplier directly from layer * Update llama.py Version should read >= 4.47.1 as that is the version requiring the changes * Update granite.py * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-01-07 03:58:40 -08:00
Datta Nimmaturi	c8e9dcf4f8	Update granite to work with latest post_patch methods (#1502 ) * Update granite to work with latest post_patch methods * Pass position_embeddings for granite even if transformers<4.47 * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-01-07 03:49:11 -08:00
Daniel Han	84fa1af77e	Update llama.py	2025-01-07 03:39:08 -08:00
Daniel Han	b7d47c1d8a	Update llama.py	2025-01-07 03:33:46 -08:00
Kareem	0e2110f7b8	All attention refactor fix (#1491 ) * change initilization of n_heads, n_kv_heads, hidden_size in llama.py * do the same for cohere, mistral, gemma2, granite * do the same for flexattention,cohere, mistral, granite	2025-01-07 02:41:15 -08:00
Michael Han	4ce92cfe2c	Update README.md Notebook links	2025-01-07 02:02:59 -08:00
Daniel Han	a4aba47ebb	Update llama.py	2025-01-07 01:56:32 -08:00
Daniel Han	0672e71b17	Force data-type	2025-01-07 01:51:49 -08:00
Daniel Han	6320381fb6	Update llama.py	2025-01-07 01:43:20 -08:00
Daniel Han	ab2b72c5f0	Auto change is_bfloat16_supported	2025-01-07 01:40:04 -08:00
Daniel Han	9e00262be6	Update llama.py	2025-01-07 01:10:41 -08:00
Daniel Han	4df3af2f57	Update llama.py	2025-01-07 01:05:04 -08:00
Daniel Han	358316522f	Update llama.py	2025-01-07 01:02:35 -08:00
Daniel Han	97c3e282fb	Update llama.py	2025-01-07 01:02:24 -08:00
Daniel Han	020c793a1e	Update llama.py	2025-01-07 00:50:56 -08:00
Daniel Han	656099cfc3	Update llama.py	2025-01-07 00:45:22 -08:00
Daniel Han	837d620dbd	Update llama.py	2025-01-07 00:41:26 -08:00
Daniel Han	c95380cc9c	Update llama.py	2025-01-07 00:41:16 -08:00
Daniel Han	689ca57214	Update llama.py	2025-01-07 00:38:04 -08:00
Daniel Han	dc33cc94a7	Update llama.py	2025-01-07 00:34:46 -08:00
Daniel Han	f791766ab9	Update llama.py	2025-01-07 00:34:09 -08:00
Daniel Han	a7740ba8e9	Update llama.py	2025-01-07 00:30:32 -08:00
Daniel Han	883c25d34c	Merge branch 'pr/1509' into nightly	2025-01-06 22:08:07 -08:00
Daniel Han	c4720f1baf	Update llama.py	2025-01-06 22:06:00 -08:00
Daniel Han	294cd8ea32	Revert "Update llama.py" This reverts commit `a8edd0931a`.	2025-01-06 22:05:44 -08:00
Daniel Han	a8edd0931a	Update llama.py	2025-01-06 22:05:14 -08:00
Daniel Han	0f6b518ee1	Update llama.py	2025-01-06 18:56:26 -08:00
Daniel Han	adb2dcfd2b	Update loader.py	2025-01-06 18:13:48 -08:00
Edd	9940583287	Improve debugging experience (#1512 ) * Create CONTRIBUTING.md (#1472) Creating contributing guidelines * Update CONTRIBUTING.md improved sentence * Improve logging control in `unsloth_compile_transformers` by conditionally redirecting stdout based on UNSLOTH_DISABLE_LOGGER environment variable --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>	2025-01-06 18:04:27 -08:00
Daniel Han	8cf3e6fa2b	Merge branch 'main' into nightly	2025-01-06 18:03:53 -08:00
Muhammad Osama	6b9e11bdf3	changing model to base_model if peft model is already used	2025-01-05 18:18:42 -06:00
Michael Han	48627f876c	Merge pull request #1507 from NinoRisteski/patch-1 Update CONTRIBUTING.md	2025-01-05 01:56:40 -08:00
Nino Risteski	8063abc004	Update CONTRIBUTING.md improved sentence	2025-01-05 09:24:10 +01:00
Michael Han	fb49390494	Create CONTRIBUTING.md (#1472 ) Creating contributing guidelines	2025-01-04 22:09:25 -08:00
Daniel Han	d08c8afd6c	Merge branch 'pr/1339' into nightly	2025-01-04 22:08:00 -08:00
Daniel Han	3e1c5ec3a0	Update loader.py	2025-01-04 22:03:11 -08:00
Daniel Han	c697d6d01a	Update _utils.py	2025-01-04 00:42:39 -08:00
Daniel Han	5cf47b3e63	Update loader.py	2025-01-02 22:44:17 -08:00
Daniel Han	75ffad921f	requires grad	2025-01-02 18:25:25 -08:00
Daniel Han	69017405db	Update _utils.py	2025-01-01 22:34:40 -08:00
Daniel Han	163ef43181	Update _utils.py	2025-01-01 22:32:58 -08:00
Daniel Han	f7a322e5d5	Rewind	2025-01-01 16:34:18 -08:00
Daniel Han	16b42fd5e2	Update loader.py	2025-01-01 16:27:08 -08:00
Daniel Han	b3dec6af35	Update loader.py	2025-01-01 02:06:41 -08:00
Daniel Han	f2ff798c4e	Xformers	2024-12-31 22:42:28 -08:00
Daniel Han	48c743d508	Update __init__.py	2024-12-31 12:35:56 -08:00
Daniel Han	6e05c84a26	Update __init__.py	2024-12-31 12:31:30 -08:00
Daniel Han	c83f5422a1	Update __init__.py	2024-12-31 00:38:18 -08:00
Daniel Han	fddf14ebc8	Update __init__.py	2024-12-30 14:13:49 -08:00
Daniel Han	5c439a0bd6	Fix Triton heuristics https://github.com/triton-lang/triton/issues/5224	2024-12-30 13:52:42 -08:00
Daniel Han	9611ee433e	Update __init__.py	2024-12-29 19:36:11 -08:00
Daniel Han	7c74db901b	Update __init__.py	2024-12-29 19:35:05 -08:00
Daniel Han	10a7d4fc93	Update pyproject.toml	2024-12-29 19:34:30 -08:00
Daniel Han	4b7aa371fa	Update __init__.py	2024-12-29 19:29:19 -08:00
Daniel Han	87e8b675e5	Merge branch 'main' into nightly	2024-12-29 03:58:02 -08:00
Daniel Han	408563debc	Update _utils.py	2024-12-29 03:57:58 -08:00
Daniel Han	e254125954	Bug fixes (#1484 ) * Update save.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * print * Update _utils.py * Update _utils.py * Update llama.py * Update _utils.py * Update vision.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * accurate_accumulation * Update loader.py * Update loader.py * Update _utils.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update pyproject.toml	2024-12-29 03:57:31 -08:00
Daniel Han	4b9c6110b1	Update pyproject.toml	2024-12-29 03:57:21 -08:00
Daniel Han	19beafcebd	Update loader.py	2024-12-29 03:53:46 -08:00
Daniel Han	47879d01ae	Update loader.py	2024-12-28 21:28:49 -08:00
Daniel Han	ea1286faf4	Update loader.py	2024-12-28 18:02:00 -08:00
Daniel Han	f98b98ae46	Update loader.py	2024-12-28 03:29:58 -08:00
Daniel Han	5930e5e8e2	Update _utils.py	2024-12-28 03:24:18 -08:00
Daniel Han	3402f7403d	Update loader.py	2024-12-28 03:21:41 -08:00
Daniel Han	5395a053c0	Update loader.py	2024-12-28 03:12:03 -08:00
Daniel Han	aacc8e227b	accurate_accumulation	2024-12-28 03:11:53 -08:00
Daniel Han	6cdf1868ce	Update loader.py	2024-12-27 01:02:40 -08:00
Daniel Han	7fdcb7f24d	Update _utils.py	2024-12-27 00:59:16 -08:00
Daniel Han	e1880e2b76	Update _utils.py	2024-12-27 00:54:12 -08:00
Daniel Han	8ac2d07946	Update _utils.py	2024-12-27 00:33:41 -08:00
Daniel Han	05d975a591	Update _utils.py	2024-12-26 23:45:19 -08:00
Daniel Han	9e1004f95c	Update _utils.py	2024-12-26 23:41:44 -08:00
Daniel Han	59ffd06268	Update _utils.py	2024-12-26 23:37:25 -08:00
Daniel Han	9837ec964d	Update _utils.py	2024-12-26 23:35:15 -08:00
Daniel Han	414d55cf89	Update _utils.py	2024-12-26 22:25:19 -08:00
Daniel Han	2ab4dca36c	Update vision.py	2024-12-26 21:39:38 -08:00
Daniel Han	4da5306917	Update _utils.py	2024-12-26 19:30:49 -08:00
Daniel Han	1f81f9f5ab	Update llama.py	2024-12-26 19:16:58 -08:00
Daniel Han	19cb433a7b	Update _utils.py	2024-12-26 19:12:13 -08:00
Daniel Han	f1f390fd88	Update _utils.py	2024-12-26 19:12:02 -08:00
Daniel Han	9197ce07cb	print	2024-12-26 18:45:16 -08:00
Daniel Han	bbe37dcc51	Update _utils.py	2024-12-26 18:21:30 -08:00
Daniel Han	03c889dc21	Update _utils.py	2024-12-26 18:19:45 -08:00
Daniel Han	de96d480e0	Update _utils.py	2024-12-26 18:12:52 -08:00
Daniel Han	734c5338dc	Update _utils.py	2024-12-26 17:11:22 -08:00
Daniel Han	264b85da26	Update save.py	2024-12-26 17:07:25 -08:00
Daniel Han	da4741bbef	Update pyproject.toml	2024-12-26 04:12:46 -08:00
Daniel Han	17b591675a	Update _utils.py	2024-12-26 04:05:07 -08:00
Daniel Han	47f210922d	Update pyproject.toml	2024-12-26 04:04:23 -08:00
Daniel Han	30bb143a87	Update pyproject.toml	2024-12-26 03:26:01 -08:00
Daniel Han	25d7b25ab5	Merge branch 'main' into nightly	2024-12-26 01:44:31 -08:00
Daniel Han	c8d2f5b8da	Update save.py	2024-12-26 01:44:12 -08:00
Daniel Han	160ff801f7	Bug fixes (#1473 ) * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules * Fix vision model tokenizer padding side. (#1384) * Dynamic quants (#1379) * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> * Update README.md Unsloth Dynamic 4-bit Quantization Update * Fix vision model tokenizer padding side. * Update vision.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Add citation section to README.md (#1377) * Add citation section to README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Granite support (#1218) * [WIP] Support for Granite * Fixup inference * Cleanup flex attention * remove sliding window * Use torch.add for residual multiplier * Llama 3.3 * Update llama.py * Update llama.py * fullgraph * Fix loader.py to work on Windows (#1453) * Update README.md Llama 3.3 + Reddit * Update README.md Apple ML Cross Entropy * Update README.md Removing double citation * Fix loader.py to work on Windows --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Update save.py warning message (#1425) * Update README.md Llama 3.3 + Reddit * Update README.md Apple ML Cross Entropy * Update README.md Removing double citation * Update save.py warning message --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Change _fix_chat_template in case a template has both endif and endfor (#1388) * Update llama and derivatives to pass position embeddings explicitly for transformers v4.47+ (#1442) * Update save.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Temp fix * Update _utils.py * Update _utils.py * Update pyproject.toml * Name Error Bug Fix - import from packaging.version import Version (#1468) * Version * Update pyproject.toml * Update pyproject.toml * Version * Update pyproject.toml * Update pyproject.toml * dependencies * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update mistral.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update granite.py * Update cohere.py * Triton windows * Update gemma2.py * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Residual & LoRA * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Bug fix * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update loader.py --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> Co-authored-by: Zewen Shen <zewen.public@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Scott Phillips <polygonguru@gmail.com> Co-authored-by: qingy1337 <qxli2@students.everettcc.edu> Co-authored-by: Giulia Baldini <44327645+giuliabaldini@users.noreply.github.com> Co-authored-by: Yonghye Kwon <developer.0hye@gmail.com>	2024-12-26 01:35:19 -08:00
Daniel Han	ae636f21c9	Update loader.py	2024-12-26 01:32:56 -08:00
Daniel Han	16e7cbf989	Update _utils.py	2024-12-26 01:32:00 -08:00
Daniel Han	c767f5adb1	Merge branch 'main' into nightly	2024-12-26 01:31:22 -08:00
Daniel Han	db2b947223	Update loader.py	2024-12-26 01:29:07 -08:00
Daniel Han	c933c0c30f	Update loader.py	2024-12-26 01:26:27 -08:00
Daniel Han	e74fd06280	Update loader.py	2024-12-26 01:23:02 -08:00
Daniel Han	ea04b72532	Bug fix	2024-12-26 01:22:06 -08:00
Daniel Han	e39365018f	Update loader.py	2024-12-26 00:33:15 -08:00
Daniel Han	b3da38237c	Update loader.py	2024-12-25 23:57:42 -08:00
Daniel Han	d6f27f3084	Update loader.py	2024-12-25 23:49:40 -08:00
Daniel Han	6f2b7be36a	Update loader.py	2024-12-25 23:01:54 -08:00
Daniel Han	488a649af0	Residual & LoRA	2024-12-25 23:01:34 -08:00
Daniel Han	a20b380d5e	Bug Fixes (#1470 ) * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules * Fix vision model tokenizer padding side. (#1384) * Dynamic quants (#1379) * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> * Update README.md Unsloth Dynamic 4-bit Quantization Update * Fix vision model tokenizer padding side. * Update vision.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Add citation section to README.md (#1377) * Add citation section to README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Granite support (#1218) * [WIP] Support for Granite * Fixup inference * Cleanup flex attention * remove sliding window * Use torch.add for residual multiplier * Llama 3.3 * Update llama.py * Update llama.py * fullgraph * Fix loader.py to work on Windows (#1453) * Update README.md Llama 3.3 + Reddit * Update README.md Apple ML Cross Entropy * Update README.md Removing double citation * Fix loader.py to work on Windows --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Update save.py warning message (#1425) * Update README.md Llama 3.3 + Reddit * Update README.md Apple ML Cross Entropy * Update README.md Removing double citation * Update save.py warning message --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Change _fix_chat_template in case a template has both endif and endfor (#1388) * Update llama and derivatives to pass position embeddings explicitly for transformers v4.47+ (#1442) * Update save.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Temp fix * Update _utils.py * Update _utils.py * Update pyproject.toml * Name Error Bug Fix - import from packaging.version import Version (#1468) * Version * Update pyproject.toml * Update pyproject.toml * Version * Update pyproject.toml * Update pyproject.toml * dependencies * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update mistral.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update granite.py * Update cohere.py * Triton windows * Update gemma2.py * Update pyproject.toml * Update _utils.py * Update pyproject.toml --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> Co-authored-by: Zewen Shen <zewen.public@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Scott Phillips <polygonguru@gmail.com> Co-authored-by: qingy1337 <qxli2@students.everettcc.edu> Co-authored-by: Giulia Baldini <44327645+giuliabaldini@users.noreply.github.com> Co-authored-by: Yonghye Kwon <developer.0hye@gmail.com>	2024-12-24 03:37:03 -08:00
Daniel Han	258d8135e7	Update pyproject.toml	2024-12-24 03:35:04 -08:00
Daniel Han	b705130e59	Update _utils.py	2024-12-24 03:34:54 -08:00
Daniel Han	fd2295d6a6	Update pyproject.toml	2024-12-24 01:55:04 -08:00
Daniel Han	1d04daaad8	Update gemma2.py	2024-12-24 00:18:47 -08:00
Daniel Han	77ca6e2dca	Triton windows	2024-12-24 00:17:33 -08:00
Daniel Han	739509e3a7	Update cohere.py	2024-12-24 00:11:17 -08:00
Daniel Han	1f8be5012e	Update granite.py	2024-12-24 00:09:23 -08:00
Daniel Han	f97f49b08d	Update pyproject.toml	2024-12-24 00:08:49 -08:00
Daniel Han	660df7b9ab	Update pyproject.toml	2024-12-24 00:08:03 -08:00
Daniel Han	61147bfbad	Update pyproject.toml	2024-12-24 00:07:37 -08:00
Daniel Han	c728dd9e87	Update mistral.py	2024-12-24 00:04:36 -08:00
Daniel Han	6e6e65bd46	Update pyproject.toml	2024-12-24 00:02:30 -08:00
Daniel Han	e6d50c8839	Update pyproject.toml	2024-12-23 22:49:16 -08:00
Daniel Han	5323157160	Update pyproject.toml	2024-12-23 22:47:20 -08:00
Daniel Han	fba4cf7fb4	Update pyproject.toml	2024-12-23 22:47:11 -08:00
Daniel Han	753b954702	dependencies	2024-12-23 22:27:35 -08:00
Daniel Han	c939334570	Update pyproject.toml	2024-12-23 21:53:43 -08:00
Daniel Han	01b256b4cf	Update pyproject.toml	2024-12-23 21:52:50 -08:00
Daniel Han	4cc97b23cc	Version	2024-12-23 21:43:32 -08:00
Daniel Han	2f969fa137	Update pyproject.toml	2024-12-23 21:16:30 -08:00
Daniel Han	25c5b0524d	Update pyproject.toml	2024-12-23 21:14:01 -08:00
Daniel Han	69dc1ad694	Version	2024-12-23 21:07:51 -08:00
Yonghye Kwon	cecbb08c03	Name Error Bug Fix - import from packaging.version import Version (#1468 )	2024-12-22 23:22:36 -08:00
Daniel Han	e8ae4727d0	Update pyproject.toml	2024-12-21 01:24:09 -08:00
Daniel Han	14886235e3	Update _utils.py	2024-12-21 01:11:30 -08:00
Daniel Han	6181fb7126	Merge branch 'main' into nightly	2024-12-20 03:30:58 -08:00
Daniel Han	42b624bb77	Bug fix	2024-12-20 03:30:01 -08:00
Daniel Han	97ff8efaca	Merge branch 'main' into nightly	2024-12-20 03:26:58 -08:00
Daniel Han	87b1ce2824	Typo	2024-12-20 03:26:55 -08:00
Daniel Han	d5830e9c2f	Merge branch 'main' into nightly	2024-12-20 03:24:37 -08:00
Daniel Han	703e8b96f1	Typo	2024-12-20 03:24:17 -08:00
Daniel Han	6bf5d8d626	Bug fixes (#1458 ) * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules * Fix vision model tokenizer padding side. (#1384) * Dynamic quants (#1379) * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> * Update README.md Unsloth Dynamic 4-bit Quantization Update * Fix vision model tokenizer padding side. * Update vision.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Add citation section to README.md (#1377) * Add citation section to README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Granite support (#1218) * [WIP] Support for Granite * Fixup inference * Cleanup flex attention * remove sliding window * Use torch.add for residual multiplier * Llama 3.3 * Update llama.py * Update llama.py * fullgraph * Fix loader.py to work on Windows (#1453) * Update README.md Llama 3.3 + Reddit * Update README.md Apple ML Cross Entropy * Update README.md Removing double citation * Fix loader.py to work on Windows --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Update save.py warning message (#1425) * Update README.md Llama 3.3 + Reddit * Update README.md Apple ML Cross Entropy * Update README.md Removing double citation * Update save.py warning message --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Change _fix_chat_template in case a template has both endif and endfor (#1388) * Update llama and derivatives to pass position embeddings explicitly for transformers v4.47+ (#1442) * Update save.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Temp fix * Update _utils.py --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> Co-authored-by: Zewen Shen <zewen.public@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Scott Phillips <polygonguru@gmail.com> Co-authored-by: qingy1337 <qxli2@students.everettcc.edu> Co-authored-by: Giulia Baldini <44327645+giuliabaldini@users.noreply.github.com>	2024-12-20 03:09:59 -08:00
Daniel Han	f5bcf31169	Update _utils.py	2024-12-20 03:08:59 -08:00
Daniel Han	58bcf66d27	Temp fix	2024-12-20 03:07:04 -08:00
Daniel Han	2afe47565a	Update llama.py	2024-12-20 03:03:26 -08:00
Daniel Han	69728ffd14	Update llama.py	2024-12-20 03:01:28 -08:00
Daniel Han	cf8ec670ed	Update llama.py	2024-12-20 02:59:27 -08:00
Daniel Han	05c99a207b	Update llama.py	2024-12-20 02:53:45 -08:00
Daniel Han	faa6825049	Update llama.py	2024-12-20 02:50:23 -08:00
Daniel Han	3d8fc11d96	Update llama.py	2024-12-20 02:47:32 -08:00
Daniel Han	7e5a6ffab1	Update mistral.py	2024-12-20 02:46:29 -08:00
Daniel Han	3019b5967b	Update llama.py	2024-12-20 02:45:43 -08:00
Daniel Han	57b8ddf21f	Update save.py	2024-12-20 02:40:42 -08:00
Datta Nimmaturi	8e5d68286e	Update llama and derivatives to pass position embeddings explicitly for transformers v4.47+ (#1442 )	2024-12-20 02:35:42 -08:00
Giulia Baldini	f3e6a28c3f	Change _fix_chat_template in case a template has both endif and endfor (#1388 )	2024-12-20 02:23:30 -08:00
qingy1337	47cee7fd7e	Update save.py warning message (#1425 ) * Update README.md Llama 3.3 + Reddit * Update README.md Apple ML Cross Entropy * Update README.md Removing double citation * Update save.py warning message --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2024-12-20 02:22:27 -08:00
Scott Phillips	104eeac1db	Fix loader.py to work on Windows (#1453 ) * Update README.md Llama 3.3 + Reddit * Update README.md Apple ML Cross Entropy * Update README.md Removing double citation * Fix loader.py to work on Windows --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2024-12-20 02:20:15 -08:00
Daniel Han	d9ed4bef09	fullgraph	2024-12-12 01:14:27 -08:00
Michael Han	6dd9d4e2b1	Merge pull request #1412 from unslothai/shimmyshimmer-patch-5 Update README.md	2024-12-10 14:43:50 -08:00
Michael Han	4aa34a0afa	Update README.md Removing double citation	2024-12-10 14:43:27 -08:00
Michael Han	0b25173b10	Merge pull request #1411 from unslothai/shimmyshimmer-patch-4 Update README.md	2024-12-10 12:15:21 -08:00
Michael Han	18a830c971	Update README.md Apple ML Cross Entropy	2024-12-10 12:15:03 -08:00
Daniel Han	b9fe2588e5	Update llama.py	2024-12-10 02:46:14 -08:00
Daniel Han	67d1e9eb50	Update llama.py	2024-12-09 14:19:56 -08:00
Michael Han	4eec751ba7	Merge pull request #1401 from unslothai/shimmyshimmer-patch-3 Update README.md	2024-12-07 14:25:05 -08:00
Michael Han	89e3ddcb55	Update README.md Llama 3.3 + Reddit	2024-12-07 14:24:48 -08:00
Daniel Han	e79b11d31d	Merge branch 'main' into nightly	2024-12-07 00:15:56 -08:00
Daniel Han	39de04dbdd	Update _utils.py	2024-12-07 00:15:48 -08:00
Daniel Han	0e0d8fc322	Llama 3.3 (#1393 ) * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules * Fix vision model tokenizer padding side. (#1384) * Dynamic quants (#1379) * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> * Update README.md Unsloth Dynamic 4-bit Quantization Update * Fix vision model tokenizer padding side. * Update vision.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Add citation section to README.md (#1377) * Add citation section to README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Granite support (#1218) * [WIP] Support for Granite * Fixup inference * Cleanup flex attention * remove sliding window * Use torch.add for residual multiplier * Llama 3.3 --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> Co-authored-by: Zewen Shen <zewen.public@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2024-12-06 13:05:15 -08:00
Daniel Han	11f901a19e	Llama 3.3	2024-12-06 12:25:08 -08:00
Datta Nimmaturi	0c6813df2f	Granite support (#1218 ) * [WIP] Support for Granite * Fixup inference * Cleanup flex attention * remove sliding window * Use torch.add for residual multiplier	2024-12-05 00:01:53 -08:00
Edd	eaee5ddfa9	Add citation section to README.md (#1377 ) * Add citation section to README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2024-12-04 23:59:13 -08:00
Zewen Shen	a0377c529d	Fix vision model tokenizer padding side. (#1384 ) * Dynamic quants (#1379) * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> * Update README.md Unsloth Dynamic 4-bit Quantization Update * Fix vision model tokenizer padding side. * Update vision.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2024-12-04 23:54:18 -08:00
Daniel Han	80dc5bd26f	Merge branch 'main' into nightly	2024-12-04 23:53:08 -08:00
Michael Han	4ecfdb5450	Merge pull request #1383 from unslothai/shimmyshimmer-patch-2 Update README.md	2024-12-04 21:32:36 -08:00
Michael Han	da7cdb2c8c	Update README.md Unsloth Dynamic 4-bit Quantization Update	2024-12-04 21:32:23 -08:00
Daniel Han	35ca26c898	Dynamic quants (#1379 ) * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update mapper.py * modules --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com>	2024-12-04 05:38:05 -08:00
Daniel Han	9d6ab2ce78	modules	2024-12-04 04:26:36 -08:00
Daniel Han	f5bdfee1a3	Update mapper.py	2024-12-04 02:36:42 -08:00
Daniel Han	a8a00edbda	Merge branch 'main' into nightly	2024-12-04 02:36:37 -08:00
Daniel Han	e7edb9b339	Fix llama.cpp GGUF (#1375 ) * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py * Feat/kto (#1316) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Fix orpo/dpo trainer (#1286) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * skip modules * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Fix llama.cpp * Update save.py * Update save.py * Update vision.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com> Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com>	2024-12-03 17:29:59 -08:00
Daniel Han	c802979c4d	Merge branch 'main' into nightly	2024-12-03 17:29:24 -08:00
Daniel Han	b807f3fd2d	Update save.py	2024-12-03 17:28:46 -08:00
Daniel Han	31e472ec9e	Update save.py	2024-12-03 17:11:10 -08:00
Daniel Han	b3585ba73b	Update _utils.py	2024-12-03 17:01:16 -08:00
Daniel Han	16e475bbd9	Update save.py	2024-12-03 16:59:56 -08:00
Daniel Han	933cfe1ccf	Update save.py	2024-12-03 16:57:32 -08:00
Michael Han	a94f1548f9	Merge pull request #1374 from unslothai/shimmyshimmer-patch-1 Update README.md	2024-12-03 16:52:21 -08:00
Michael Han	16cf998173	Update README.md Fixing Qwen links	2024-12-03 16:50:52 -08:00
Daniel Han	dfcbb8ac26	Update save.py	2024-12-03 16:40:43 -08:00
Daniel Han	8311b13827	Update save.py	2024-12-03 16:40:36 -08:00
Daniel Han	f4f5f32e85	Update save.py	2024-12-03 16:35:38 -08:00
Daniel Han	c09c7ab3a7	Update save.py	2024-12-03 16:34:15 -08:00
Daniel Han	79fa6d3829	Update save.py	2024-12-03 16:25:40 -08:00
Daniel Han	c3b3d3bd03	Update vision.py	2024-12-03 16:25:13 -08:00
Daniel Han	9953ab1593	Update save.py	2024-12-03 16:24:23 -08:00
Daniel Han	853e7c3687	Update save.py	2024-12-03 16:16:28 -08:00
Daniel Han	133772a416	Fix llama.cpp	2024-12-03 16:03:38 -08:00
Daniel Han	e0908e0d30	Update llama.py	2024-12-01 02:36:17 -08:00
Daniel Han	d7d8591f83	Update llama.py	2024-12-01 02:30:21 -08:00
Daniel Han	f730e997b6	Update llama.py	2024-12-01 02:29:11 -08:00
Daniel Han	51bf5eae95	Update llama.py	2024-12-01 02:25:48 -08:00
Daniel Han	4620e76e3d	Update llama.py	2024-12-01 02:22:21 -08:00
Daniel Han	f67a062010	Update llama.py	2024-12-01 02:20:54 -08:00
Daniel Han	8f14160dbb	Update llama.py	2024-12-01 02:19:07 -08:00
Daniel Han	d44a8e0bdd	Update llama.py	2024-12-01 02:15:21 -08:00
Daniel Han	479b4824dc	Update llama.py	2024-12-01 02:13:34 -08:00
Daniel Han	f4b8710843	Update llama.py	2024-12-01 02:10:43 -08:00
Daniel Han	a45d642641	Update llama.py	2024-12-01 02:02:31 -08:00
Daniel Han	ec30c12cbb	Update vision.py	2024-11-28 00:02:13 -08:00
Daniel Han	aa6ef77fad	skip modules	2024-11-28 00:01:25 -08:00
Daniel Han	a823352381	Merge branch 'main' into nightly	2024-11-26 16:40:05 -08:00
cell-dame	5eeb53fb42	Fix orpo/dpo trainer (#1286 ) * change the colab notebook for dpo zephyr and orpo * use original tokenizer * Update README.md * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2024-11-26 14:32:06 -08:00
Edd	ae7aabd648	Feat/kto (#1316 ) * Add PatchKTOTrainer and update model imports * Update dpo.py * Update __init__.py * Delete unsloth/models/kto.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2024-11-26 14:22:02 -08:00
Daniel Han	8144766f78	Update pyproject.toml	2024-11-26 03:29:59 -08:00
Daniel Han	67fd43f6f5	Bug fixes for vision (#1340 ) * Update __init__.py * Update __init__.py * Patching * Update cross_entropy_loss.py * CE Loss * Update _utils.py * Update _utils.py * CE Loss * Update _utils.py * Update _utils.py * Layernorm * Update _utils.py * Update _utils.py * Post patch * Update _utils.py * Update llama.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py * Update loader.py * kwargs * logits * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * error * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update loader.py * Update llama.py * Update vision.py * Update loader.py * Old torch versions * Update loader.py * Update loader.py * prints * recheck * Update loader.py * Update loader.py * Update _utils.py * Update _utils.py * Update mapper.py --------- Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>	2024-11-26 03:25:55 -08:00
Itsuro Tajima	4cae1aa0be	use exact model name	2024-11-26 20:20:34 +09:00
Daniel Han	0f1fa094c7	Update mapper.py	2024-11-26 03:20:21 -08:00
Daniel Han	3c598843b6	Update _utils.py	2024-11-26 03:18:09 -08:00
Daniel Han	bba36b7d5e	Update _utils.py	2024-11-26 03:09:32 -08:00
Daniel Han	41e6f94c03	Update loader.py	2024-11-26 03:07:50 -08:00
Daniel Han	aa17dbedbd	Update loader.py	2024-11-26 02:58:16 -08:00
Daniel Han	59da965df7	recheck	2024-11-26 02:52:19 -08:00
Daniel Han	9d8bb08270	prints	2024-11-26 00:47:27 -08:00
Daniel Han	a02de20cb8	Update loader.py	2024-11-26 00:35:05 -08:00
Daniel Han	73bc801914	Update loader.py	2024-11-26 00:34:51 -08:00
Daniel Han	14920eb3eb	Old torch versions	2024-11-26 00:26:37 -08:00
Daniel Han	bc028b0dd1	Update loader.py	2024-11-26 00:18:00 -08:00
Daniel Han	485efdcd13	Update vision.py	2024-11-26 00:16:33 -08:00
Daniel Han	aaba695e00	Update llama.py	2024-11-26 00:08:15 -08:00
Daniel Han	e59a427aee	Update loader.py	2024-11-26 00:01:13 -08:00
Daniel Han	77a2e3dda5	Update _utils.py	2024-11-25 23:39:26 -08:00
Daniel Han	4017c470b2	Update _utils.py	2024-11-25 23:35:36 -08:00
Daniel Han	66253a0007	Update _utils.py	2024-11-25 23:33:19 -08:00
Daniel Han	90618c4304	Update _utils.py	2024-11-25 23:03:35 -08:00
Daniel Han	b72cdbd5dd	Update _utils.py	2024-11-25 22:56:24 -08:00
Daniel Han	946a46a618	Update _utils.py	2024-11-25 22:52:57 -08:00
Daniel Han	2bce754b23	Update _utils.py	2024-11-25 22:37:02 -08:00
Daniel Han	cde73c424c	Update _utils.py	2024-11-25 22:35:55 -08:00
Daniel Han	61fbf5757a	Update _utils.py	2024-11-25 22:33:18 -08:00
Daniel Han	ad5ca0d59f	Update _utils.py	2024-11-25 22:32:27 -08:00
Daniel Han	1c7a2bbe99	Update _utils.py	2024-11-25 22:31:09 -08:00
Daniel Han	f395434291	Update _utils.py	2024-11-25 22:27:49 -08:00
Daniel Han	6e7e6a52ef	Update _utils.py	2024-11-25 22:20:16 -08:00
Daniel Han	5d08a89f36	Update _utils.py	2024-11-25 22:18:33 -08:00
Daniel Han	41008f7ece	Update _utils.py	2024-11-25 22:17:20 -08:00
Daniel Han	307fd67a83	error	2024-11-25 22:12:22 -08:00
Daniel Han	0487293f4c	Update _utils.py	2024-11-25 22:03:08 -08:00
Daniel Han	7188082852	Update _utils.py	2024-11-25 21:59:40 -08:00
Daniel Han	4475041c36	Update _utils.py	2024-11-25 21:57:56 -08:00
Daniel Han	7c298a79ed	Update llama.py	2024-11-25 21:52:42 -08:00
Daniel Han	b8631b7bd8	Update llama.py	2024-11-25 21:46:36 -08:00
Daniel Han	549c5be61c	Update llama.py	2024-11-25 21:39:19 -08:00
Daniel Han	41878b581f	logits	2024-11-25 21:31:43 -08:00
Daniel Han	360a4c8702	kwargs	2024-11-25 18:48:41 -08:00
Daniel Han	7e7656c5a1	Update loader.py	2024-11-22 16:04:53 -08:00
Daniel Han	3bae0beca6	Merge branch 'main' into nightly	2024-11-22 15:03:28 -08:00
Daniel Han	4fe74c93ee	Update pyproject.toml	2024-11-21 17:46:22 -08:00
Daniel Han	aebf67a61e	Delete docs github button.png	2024-11-21 11:25:12 -08:00
Daniel Han	6d34ab821b	Vision (#1318 ) * Add files via upload * Add files via upload * Add files via upload * Add files via upload * Update README.md * Update README.md * Update README.md * Update README.md --------- Co-authored-by: Michael <107991372+shimmyshimmer@users.noreply.github.com>	2024-11-21 11:24:12 -08:00
Daniel Han	7296f5eed7	Update _utils.py	2024-11-21 06:45:40 -08:00
Daniel Han	967f9fb23d	Update vision.py	2024-11-21 06:07:06 -08:00
Daniel Han	ddf118a8fc	Vision support (#1315 ) * Fix pad token * Update llama.py * Typo * ignored labels * Revert "ignored labels" This reverts commit `4b25138ac7`. * More patching * Update _utils.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Feat/all tmp (#1219) * Update save.py Check whether path is in /tmp dir for Kaggle environment * Update save.py Move temporary_location to /tmp in Kaggle * Enhance Kaggle environment support in save and tokenizer utilities --------- Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com> Co-authored-by: Erland366 <erland.pg366@gmail.com> * Bug fixes * Update pyproject.toml * Update _utils.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Tied weights * Revert "Tied weights" This reverts commit `820cd4efef`. * Tied weights * Utils * CE Loss patching * Update __init__.py * Update __init__.py * Patching * Update cross_entropy_loss.py * CE Loss * Update _utils.py * Update _utils.py * CE Loss * Update _utils.py * Update _utils.py * Layernorm * Update _utils.py * Update _utils.py * Post patch * Update _utils.py * Update llama.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py * Update llama.py * Fix #853 * fix/sfttrainer-compatibility (#1293) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update rms_layernorm.py * Update rms_layernorm.py * Gemma * Update rms_layernorm.py * Update gemma2.py * Cut Cross Entropy * Update llama.py * Cut Cross Entropy * Update llama.py * Update llama.py * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update mapper.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * patch_fast_lora * vision * Update fast_lora.py * Update _utils.py * Update _utils.py * Vision * Update trainer.py * Update save.py * FastBaseVisionModel * Update loader_utils.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update vision.py * Update _utils.py * tokenizer_name * Update loader.py * Update vision.py * Update save.py * Update save.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update _utils.py --------- Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com> Co-authored-by: Erland366 <erland.pg366@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>	2024-11-21 05:01:44 -08:00
Daniel Han	a02a11a10c	Update _utils.py	2024-11-21 04:13:54 -08:00
Daniel Han	6c050b805a	Update vision.py	2024-11-21 04:10:43 -08:00
Daniel Han	044d1962c9	Update vision.py	2024-11-21 04:10:24 -08:00
Daniel Han	bef3496dbd	Update vision.py	2024-11-21 04:05:05 -08:00
Daniel Han	e22a03948a	Update vision.py	2024-11-21 04:01:17 -08:00
Daniel Han	02ed7bfd7f	Update vision.py	2024-11-21 03:58:18 -08:00
Daniel Han	8d7b0d9298	Update vision.py	2024-11-21 03:54:09 -08:00
Daniel Han	4c3473082d	Update save.py	2024-11-21 03:47:57 -08:00
Daniel Han	331c7d62e4	Update save.py	2024-11-21 03:18:31 -08:00
Daniel Han	38767cd63f	Update vision.py	2024-11-21 03:05:13 -08:00
Daniel Han	ef64ffc879	Update loader.py	2024-11-21 02:29:24 -08:00
Daniel Han	104594648d	tokenizer_name	2024-11-21 02:28:00 -08:00
Daniel Han	8f2db72aec	Update _utils.py	2024-11-21 02:27:07 -08:00
Daniel Han	dd78e6ed9a	Update vision.py	2024-11-21 02:23:35 -08:00
Daniel Han	99cdccd91a	Update loader.py	2024-11-21 02:23:11 -08:00
Daniel Han	0570871947	Update vision.py	2024-11-21 02:20:47 -08:00
Daniel Han	52e089e17a	Update loader.py	2024-11-21 02:18:59 -08:00
Daniel Han	cf302f0d17	Update vision.py	2024-11-21 02:17:20 -08:00
Daniel Han	83c79ba1d9	Update loader_utils.py	2024-11-21 02:15:43 -08:00
Daniel Han	528b7f58af	FastBaseVisionModel	2024-11-21 02:12:45 -08:00
Daniel Han	9b69eb9144	Update save.py	2024-11-21 01:51:46 -08:00
Daniel Han	2b47e95f53	Merge branch 'main' into nightly	2024-11-21 01:51:06 -08:00
Daniel Han	0cdda6b3a3	Update trainer.py	2024-11-21 01:49:46 -08:00
Daniel Han	925a63120e	Vision	2024-11-21 01:48:20 -08:00
Daniel Han	67d40f3f6d	Update _utils.py	2024-11-20 19:40:08 -08:00
Daniel Han	4be7341e49	Update _utils.py	2024-11-20 19:15:05 -08:00
Daniel Han	c67183f183	Update fast_lora.py	2024-11-20 17:07:53 -08:00
Daniel Han	d9f042d7e0	vision	2024-11-20 04:15:53 -08:00
Daniel Han	81cd49d2e7	patch_fast_lora	2024-11-20 03:36:41 -08:00
Michael	778359ee9e	Add files via upload	2024-11-20 01:47:23 -08:00
Michael	26a3095d76	Add files via upload	2024-11-20 01:44:15 -08:00
Daniel Han	6c3b7f0e32	Update _utils.py	2024-11-19 16:56:53 -08:00
Daniel Han	65a5049423	Update _utils.py	2024-11-19 15:49:01 -08:00
Daniel Han	e5b2f577de	Update _utils.py	2024-11-19 13:01:38 -08:00
Daniel Han	56a19d82de	Update _utils.py	2024-11-19 03:54:35 -08:00
Daniel Han	1f62b73677	Update _utils.py	2024-11-19 03:53:17 -08:00
Daniel Han	f8ccb5758a	Update _utils.py	2024-11-19 03:50:20 -08:00
Daniel Han	d67bd4cfb7	Update _utils.py	2024-11-19 03:46:32 -08:00
Daniel Han	699a9ff81e	Update _utils.py	2024-11-19 03:45:23 -08:00
Daniel Han	80f9f6a225	Update _utils.py	2024-11-19 02:25:22 -08:00
Daniel Han	ee98b75c06	Update mapper.py	2024-11-18 12:18:34 -08:00
Daniel Han	5f59faf526	Update _utils.py	2024-11-17 22:01:20 -08:00
Daniel Han	096a77d9e6	Update _utils.py	2024-11-17 22:01:07 -08:00
Daniel Han	0e77184c23	Update _utils.py	2024-11-17 21:31:17 -08:00
Daniel Han	c8082e46aa	Update _utils.py	2024-11-17 20:24:10 -08:00
Daniel Han	26546e68b7	Update _utils.py	2024-11-17 20:16:33 -08:00
Daniel Han	4b30c7a89b	Update _utils.py	2024-11-17 17:12:34 -08:00
Daniel Han	db1c5f414a	Update _utils.py	2024-11-17 16:56:26 -08:00
Daniel Han	d8c6c3e903	Update _utils.py	2024-11-17 16:54:44 -08:00
Daniel Han	ccf033893e	Update __init__.py	2024-11-17 16:13:13 -08:00
Daniel Han	3c2794ecee	Update __init__.py	2024-11-17 16:01:06 -08:00
Daniel Han	73bbd9e795	Update llama.py	2024-11-17 16:00:49 -08:00
Daniel Han	df62b6242d	Update llama.py	2024-11-17 15:58:00 -08:00
Daniel Han	05fb970edd	Update llama.py	2024-11-17 15:54:12 -08:00
Daniel Han	fa8e59eb1b	Cut Cross Entropy	2024-11-17 14:32:41 -08:00
Daniel Han	1dc066afda	Update llama.py	2024-11-17 00:21:44 -08:00
Daniel Han	c4eacf50da	Cut Cross Entropy	2024-11-16 23:53:46 -08:00
Daniel Han	e7ad484169	Update gemma2.py	2024-11-16 15:18:38 -08:00
Daniel Han	d47d838ee8	Update rms_layernorm.py	2024-11-16 15:00:28 -08:00
Daniel Han	263eaaa27f	Gemma	2024-11-16 13:55:11 -08:00
Daniel Han	e49a4d9277	Update rms_layernorm.py	2024-11-16 13:01:54 -08:00
Daniel Han	2cf0203166	Update rms_layernorm.py	2024-11-16 12:18:47 -08:00
Edd	b69fee4a36	fix/sfttrainer-compatibility (#1293 ) * Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance * Update trainer.py * Update trainer.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2024-11-14 17:07:29 -08:00
Daniel Han	786aea6365	Fix #853	2024-11-14 01:26:13 -08:00
Daniel Han	8e899bf956	Update llama.py	2024-11-14 01:11:16 -08:00
Daniel Han	686a97d750	Merge branch 'main' into nightly	2024-11-13 19:07:33 -08:00
Daniel Han	892115606d	Update _utils.py	2024-11-13 19:07:26 -08:00
Daniel Han	2dca0cb94b	Bug fixes (#1288 ) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (#1165) * chore: update chat_templates.py (#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py * Update _utils.py * fix/transformers-unpack (#1180) * Fix DPO, ORPO (#1177) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (#1165) * chore: update chat_templates.py (#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * donot upcast lm_head and embeddings to float32 (#1186) * Cleanup upcast logs (#1188) * Fix/phi-longrope (#1193) * Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding * Typo * Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache * Update llama.py * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update transformers * Unk token issues * Update _utils.py * Fix pad token * Update llama.py * Typo * ignored labels * Revert "ignored labels" This reverts commit `4b25138ac7`. * More patching * Update _utils.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Feat/all tmp (#1219) * Update save.py Check whether path is in /tmp dir for Kaggle environment * Update save.py Move temporary_location to /tmp in Kaggle * Enhance Kaggle environment support in save and tokenizer utilities --------- Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com> Co-authored-by: Erland366 <erland.pg366@gmail.com> * Bug fixes * Update pyproject.toml * Update _utils.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Tied weights * Revert "Tied weights" This reverts commit `820cd4efef`. * Tied weights * Utils * CE Loss patching * Update __init__.py * Update __init__.py * Patching * Update cross_entropy_loss.py * CE Loss * Update _utils.py * Update _utils.py * CE Loss * Update _utils.py * Update _utils.py * Layernorm * Update _utils.py * Update _utils.py * Post patch * Update _utils.py * Update llama.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder * Fix/export mistral (#1281) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * DOC Update - Update README.md with os.environ in example (#1269) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/get_chat_template (#1246) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix/sft-trainer (#1276) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Update trainer.py * Update trainer.py * Update trainer.py * Update tokenizer_utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com> Co-authored-by: Erland366 <erland.pg366@gmail.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk> Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>	2024-11-13 19:05:40 -08:00
Daniel Han	f554e663ec	Update tokenizer_utils.py	2024-11-13 19:05:15 -08:00
Daniel Han	2e4bca5cbf	Update trainer.py	2024-11-13 18:53:40 -08:00
Daniel Han	022d571835	Update trainer.py	2024-11-13 18:48:59 -08:00
Daniel Han	33c85a3bd0	Update trainer.py	2024-11-13 18:44:54 -08:00
Daniel Han	cec6e570a8	Update __init__.py	2024-11-13 17:38:26 -08:00
Edd	cad6df52c5	fix/sft-trainer (#1276 ) * Add patch for SFTTrainer to maintain backward compatibility with TRL changes * Update trainer.py * Update trainer.py * Refactor trainer patch to maintain backward compatibility with TRL changes * Update trainer.py * Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2024-11-13 17:33:30 -08:00
Edd	cb3608b72d	fix/get_chat_template (#1246 ) * Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to * Remove type hinting * Update chat_templates.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2024-11-13 00:06:48 -08:00
Uday Girish Maradana	b230fa13eb	DOC Update - Update README.md with os.environ in example (#1269 ) * Update README.md with os.environ in example Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required. Small change but a bit time saver for those who straight away copies the tutorials * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2024-11-12 23:55:28 -08:00
Edd	fb9a3ca1a1	Fix/export mistral (#1281 ) * Enhance install_python_non_blocking to handle protobuf installation and process management * Revert "Enhance install_python_non_blocking to handle protobuf installation and process management" This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266" This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58. * Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2024-11-12 23:53:50 -08:00
Daniel Han	c94fa058b0	Merge branch 'main' into nightly	2024-11-12 23:51:46 -08:00
Daniel Han	6007831cef	Update _utils.py	2024-11-12 10:54:58 -08:00
Daniel Han	6cc21e378d	Qwen 2.5 (#1280 ) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (#1165) * chore: update chat_templates.py (#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py * Update _utils.py * fix/transformers-unpack (#1180) * Fix DPO, ORPO (#1177) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (#1165) * chore: update chat_templates.py (#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * donot upcast lm_head and embeddings to float32 (#1186) * Cleanup upcast logs (#1188) * Fix/phi-longrope (#1193) * Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding * Typo * Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache * Update llama.py * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update transformers * Unk token issues * Update _utils.py * Fix pad token * Update llama.py * Typo * ignored labels * Revert "ignored labels" This reverts commit `9d07be077b`. * More patching * Update _utils.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Feat/all tmp (#1219) * Update save.py Check whether path is in /tmp dir for Kaggle environment * Update save.py Move temporary_location to /tmp in Kaggle * Enhance Kaggle environment support in save and tokenizer utilities --------- Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com> Co-authored-by: Erland366 <erland.pg366@gmail.com> * Bug fixes * Update pyproject.toml * Update _utils.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Tied weights * Revert "Tied weights" This reverts commit `8090b7c01a`. * Tied weights * Utils * CE Loss patching * Update __init__.py * Update __init__.py * Patching * Update cross_entropy_loss.py * CE Loss * Update _utils.py * Update _utils.py * CE Loss * Update _utils.py * Update _utils.py * Layernorm * Update _utils.py * Update _utils.py * Post patch * Update _utils.py * Update llama.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update cross_entropy_loss.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * triton_cast * Update utils.py * Qwen 2.5 Coder --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com> Co-authored-by: Erland366 <erland.pg366@gmail.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk>	2024-11-12 03:22:41 -08:00
Daniel Han	58810bf37e	Qwen 2.5 Coder	2024-11-11 18:46:05 -08:00
Daniel Han	451ed2952d	Update utils.py	2024-11-11 00:37:37 -08:00
Daniel Han	bbe1dda8c0	triton_cast	2024-11-11 00:17:22 -08:00
Daniel Han	a935d80026	Update tokenizer_utils.py	2024-11-11 00:04:02 -08:00
Daniel Han	61665e96d8	Update tokenizer_utils.py	2024-11-09 17:40:32 -08:00
Daniel Han	3c9e9b4a44	Update tokenizer_utils.py	2024-11-09 17:37:11 -08:00
Daniel Han	58f370ab26	Update tokenizer_utils.py	2024-11-09 17:34:47 -08:00
Daniel Han	6530a66a82	Update tokenizer_utils.py	2024-11-09 17:01:33 -08:00
Daniel Han	16edb6bdcc	Update _utils.py	2024-11-07 01:11:45 -08:00
Daniel Han	e7d0ce17a8	Update cross_entropy_loss.py	2024-11-06 21:07:50 -08:00
Daniel Han	b3ce5868ca	Merge branch 'main' into nightly	2024-11-06 19:02:44 -08:00
Daniel Han	a93762532d	Update _utils.py	2024-11-06 19:00:42 -08:00
Daniel Han	01cd5b3370	Update loader.py	2024-11-06 19:00:23 -08:00
Daniel Han	a8ddc39482	Update loader.py	2024-11-06 19:00:13 -08:00
Daniel Han	070128c5c1	Merge branch 'main' into nightly	2024-11-06 17:21:21 -08:00
Daniel Han	c9b5d5cea3	Bug fixes (#1259 ) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (#1165) * chore: update chat_templates.py (#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py * Update _utils.py * fix/transformers-unpack (#1180) * Fix DPO, ORPO (#1177) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (#1165) * chore: update chat_templates.py (#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * donot upcast lm_head and embeddings to float32 (#1186) * Cleanup upcast logs (#1188) * Fix/phi-longrope (#1193) * Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding * Typo * Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache * Update llama.py * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update transformers * Unk token issues * Update _utils.py * Fix pad token * Update llama.py * Typo * ignored labels * Revert "ignored labels" This reverts commit `9d07be077b`. * More patching * Update _utils.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Feat/all tmp (#1219) * Update save.py Check whether path is in /tmp dir for Kaggle environment * Update save.py Move temporary_location to /tmp in Kaggle * Enhance Kaggle environment support in save and tokenizer utilities --------- Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com> Co-authored-by: Erland366 <erland.pg366@gmail.com> * Bug fixes * Update pyproject.toml * Update _utils.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Tied weights * Revert "Tied weights" This reverts commit `8090b7c01a`. * Tied weights * Utils * CE Loss patching * Update __init__.py * Update __init__.py * Patching * Update cross_entropy_loss.py * CE Loss * Update _utils.py * Update _utils.py * CE Loss * Update _utils.py * Update _utils.py * Layernorm * Update _utils.py * Update _utils.py * Post patch * Update _utils.py * Update llama.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Throw error when inferencing longer than max_popsition_embeddings (#1236) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * CLI now handles user input strings for dtype correctly (#1235) Co-authored-by: root <root@ieeres.chu.cam.ac.uk> * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update flex_attention.py * Update loader.py * Update loader.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com> Co-authored-by: Erland366 <erland.pg366@gmail.com> Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com> Co-authored-by: root <root@ieeres.chu.cam.ac.uk>	2024-11-06 17:17:19 -08:00
Daniel Han	c62f901d00	Update _utils.py	2024-11-06 17:14:56 -08:00
Daniel Han	7ce530b5cc	Update flex_attention.py	2024-11-06 16:59:15 -08:00
Daniel Han	c8005418c5	Update flex_attention.py	2024-11-06 15:56:52 -08:00
Daniel Han	92c99ce97c	Update flex_attention.py	2024-11-06 15:56:26 -08:00
Daniel Han	297e25007c	Update flex_attention.py	2024-11-06 15:39:32 -08:00
Daniel Han	746ff24ed2	Update loader.py	2024-11-06 15:13:03 -08:00
Daniel Han	52e3a2bf9a	Update loader.py	2024-11-06 15:05:10 -08:00
Daniel Han	1d11e3e391	Update flex_attention.py	2024-11-06 14:54:53 -08:00
Daniel Han	b3f1a866f4	Update flex_attention.py	2024-11-06 14:51:52 -08:00
Daniel Han	981bf005a6	Update _utils.py	2024-11-06 14:49:37 -08:00
Daniel Han	0a6bdf93b5	Update _utils.py	2024-11-06 14:05:52 -08:00
Daniel Han	33874414b8	Update flex_attention.py	2024-11-06 13:09:29 -08:00
Edwin Fennell	08db916009	CLI now handles user input strings for dtype correctly (#1235 ) Co-authored-by: root <root@ieeres.chu.cam.ac.uk>	2024-11-06 12:23:09 -08:00
Datta Nimmaturi	13ab93547e	Throw error when inferencing longer than max_popsition_embeddings (#1236 ) * Throw error when inferencing longer than max_popsition_embeddings without rope scaling * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2024-11-06 12:22:08 -08:00
Edd	8a14fe3f1e	Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254 ) * Fix: cast logits to float32 in cross_entropy_forward to prevent errors * Update cross_entropy_loss.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2024-11-06 12:16:02 -08:00
Daniel Han	ccce1f24e4	Merge branch 'main' into nightly	2024-11-06 12:13:56 -08:00
Daniel Han	4f8bf42442	Bug fixes (#1255 ) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (#1165) * chore: update chat_templates.py (#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py * Update _utils.py * fix/transformers-unpack (#1180) * Fix DPO, ORPO (#1177) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (#1165) * chore: update chat_templates.py (#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * donot upcast lm_head and embeddings to float32 (#1186) * Cleanup upcast logs (#1188) * Fix/phi-longrope (#1193) * Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding * Typo * Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache * Update llama.py * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update transformers * Unk token issues * Update _utils.py * Fix pad token * Update llama.py * Typo * ignored labels * Revert "ignored labels" This reverts commit `9d07be077b`. * More patching * Update _utils.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Feat/all tmp (#1219) * Update save.py Check whether path is in /tmp dir for Kaggle environment * Update save.py Move temporary_location to /tmp in Kaggle * Enhance Kaggle environment support in save and tokenizer utilities --------- Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com> Co-authored-by: Erland366 <erland.pg366@gmail.com> * Bug fixes * Update pyproject.toml * Update _utils.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Tied weights * Revert "Tied weights" This reverts commit `8090b7c01a`. * Tied weights * Utils * CE Loss patching * Update __init__.py * Update __init__.py * Patching * Update cross_entropy_loss.py * CE Loss * Update _utils.py * Update _utils.py * CE Loss * Update _utils.py * Update _utils.py * Layernorm * Update _utils.py * Update _utils.py * Post patch * Update _utils.py * Update llama.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com> Co-authored-by: Erland366 <erland.pg366@gmail.com>	2024-11-06 12:08:55 -08:00
Daniel Han	4df520f08e	Update _utils.py	2024-11-06 02:10:37 -08:00
Daniel Han	126b5a3ff7	Update _utils.py	2024-11-06 00:18:04 -08:00
Daniel Han	7cda12fd10	Update _utils.py	2024-11-05 22:48:36 -08:00
Daniel Han	362d187f53	Update _utils.py	2024-11-05 21:44:38 -08:00
Daniel Han	a12cc6b55e	Update _utils.py	2024-11-05 21:24:56 -08:00
Daniel Han	1fa35b1e69	Merge branch 'main' into nightly	2024-11-05 21:24:47 -08:00
Daniel Han	7c684fb793	Bug fix (#1249 ) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (#1165) * chore: update chat_templates.py (#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py * Update _utils.py * fix/transformers-unpack (#1180) * Fix DPO, ORPO (#1177) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (#1165) * chore: update chat_templates.py (#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * donot upcast lm_head and embeddings to float32 (#1186) * Cleanup upcast logs (#1188) * Fix/phi-longrope (#1193) * Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding * Typo * Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache * Update llama.py * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update transformers * Unk token issues * Update _utils.py * Fix pad token * Update llama.py * Typo * ignored labels * Revert "ignored labels" This reverts commit `9d07be077b`. * More patching * Update _utils.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Feat/all tmp (#1219) * Update save.py Check whether path is in /tmp dir for Kaggle environment * Update save.py Move temporary_location to /tmp in Kaggle * Enhance Kaggle environment support in save and tokenizer utilities --------- Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com> Co-authored-by: Erland366 <erland.pg366@gmail.com> * Bug fixes * Update pyproject.toml * Update _utils.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Tied weights * Revert "Tied weights" This reverts commit `8090b7c01a`. * Tied weights * Utils * CE Loss patching * Update __init__.py * Update __init__.py * Patching * Update cross_entropy_loss.py * CE Loss * Update _utils.py * Update _utils.py * CE Loss * Update _utils.py * Update _utils.py * Layernorm * Update _utils.py * Update _utils.py * Post patch * Update _utils.py * Update llama.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml * Update _utils.py * Update llama.py * CE Loss * Update cross_entropy_loss.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com> Co-authored-by: Erland366 <erland.pg366@gmail.com>	2024-11-05 21:08:11 -08:00
Daniel Han	1df89aec1e	Update llama.py	2024-11-05 21:07:02 -08:00
Daniel Han	6fed054c2e	Merge branch 'main' into nightly	2024-11-05 21:06:13 -08:00
Daniel Han	3f9b395d48	Update cross_entropy_loss.py	2024-11-05 21:01:33 -08:00
Daniel Han	84cf5e1585	Update cross_entropy_loss.py	2024-11-05 20:58:15 -08:00
Daniel Han	792c473b7d	Update cross_entropy_loss.py	2024-11-05 20:29:30 -08:00
Daniel Han	e9451a36d4	Update _utils.py	2024-11-05 14:54:07 -08:00
Daniel Han	cf2a714d74	Update cross_entropy_loss.py	2024-11-05 14:43:49 -08:00
Daniel Han	27cd2b3a0d	CE Loss	2024-11-05 14:40:57 -08:00
Daniel Han	fef9932bf0	Update llama.py	2024-11-05 13:52:12 -08:00
Daniel Han	92a044197d	Update _utils.py	2024-11-05 13:35:43 -08:00
Daniel Han	15268ba184	Bug fixes (#1245 ) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (#1165) * chore: update chat_templates.py (#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py * Update _utils.py * fix/transformers-unpack (#1180) * Fix DPO, ORPO (#1177) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (#1165) * chore: update chat_templates.py (#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * donot upcast lm_head and embeddings to float32 (#1186) * Cleanup upcast logs (#1188) * Fix/phi-longrope (#1193) * Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding * Typo * Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache * Update llama.py * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update transformers * Unk token issues * Update _utils.py * Fix pad token * Update llama.py * Typo * ignored labels * Revert "ignored labels" This reverts commit `9d07be077b`. * More patching * Update _utils.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Feat/all tmp (#1219) * Update save.py Check whether path is in /tmp dir for Kaggle environment * Update save.py Move temporary_location to /tmp in Kaggle * Enhance Kaggle environment support in save and tokenizer utilities --------- Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com> Co-authored-by: Erland366 <erland.pg366@gmail.com> * Bug fixes * Update pyproject.toml * Update _utils.py * Update __init__.py * Update __init__.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Tied weights * Revert "Tied weights" This reverts commit `8090b7c01a`. * Tied weights * Utils * CE Loss patching * Update __init__.py * Update __init__.py * Patching * Update cross_entropy_loss.py * CE Loss * Update _utils.py * Update _utils.py * CE Loss * Update _utils.py * Update _utils.py * Layernorm * Update _utils.py * Update _utils.py * Post patch * Update _utils.py * Update llama.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * typing * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * int64 * Update _utils.py * Update cross_entropy_loss.py * constexpr * constexpr * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Update _utils.py * CE * Update cross_entropy_loss.py * Update _utils.py * Update llama.py * Update _utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update utils.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * typing * Update rope_embedding.py * types * Disable compiling * Update _utils.py * Update _utils.py * Forward hook * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update _utils.py * Update pyproject.toml --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com> Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com> Co-authored-by: Erland366 <erland.pg366@gmail.com>	2024-11-05 13:29:37 -08:00
Daniel Han	5bb3e0d462	Update pyproject.toml	2024-11-05 13:28:25 -08:00
Daniel Han	09ad3757c9	Update _utils.py	2024-11-05 12:05:17 -08:00
Daniel Han	0f9b6390a6	Update llama.py	2024-11-05 02:06:27 -08:00
Daniel Han	250109be45	Update llama.py	2024-11-05 01:55:17 -08:00
Daniel Han	116f286d0c	Update _utils.py	2024-11-05 01:52:27 -08:00
Daniel Han	c18bac88be	Update llama.py	2024-11-05 01:42:44 -08:00
Daniel Han	ed61c33d2c	Update _utils.py	2024-11-05 01:41:04 -08:00
Daniel Han	a184f38f7a	Forward hook	2024-11-05 01:36:11 -08:00
Daniel Han	09c1f6d677	Update _utils.py	2024-11-05 01:30:34 -08:00
Daniel Han	b0f010259f	Update _utils.py	2024-11-05 00:15:56 -08:00
Daniel Han	f627c1bf35	Disable compiling	2024-11-05 00:11:32 -08:00
Daniel Han	81a87cc262	types	2024-11-05 00:09:11 -08:00
Daniel Han	4076dc49cb	Update rope_embedding.py	2024-11-05 00:08:16 -08:00
Daniel Han	0ab731d40b	typing	2024-11-05 00:05:56 -08:00
Daniel Han	9384cb9f72	Update rms_layernorm.py	2024-11-04 23:54:02 -08:00
Daniel Han	0b2363bacd	Update rms_layernorm.py	2024-11-04 23:47:38 -08:00
Daniel Han	6836e43f27	Update rms_layernorm.py	2024-11-04 23:45:25 -08:00
Daniel Han	eddf0c46f4	Update rms_layernorm.py	2024-11-04 23:39:03 -08:00
Daniel Han	bf97546a16	Update rms_layernorm.py	2024-11-04 23:34:11 -08:00
Daniel Han	4485a342e0	Update rms_layernorm.py	2024-11-04 23:33:08 -08:00
Daniel Han	34abdb309d	Update rms_layernorm.py	2024-11-04 23:30:55 -08:00
Daniel Han	fd07c2ee73	Update rms_layernorm.py	2024-11-04 23:28:57 -08:00
Daniel Han	c35f115537	Update rms_layernorm.py	2024-11-04 23:27:50 -08:00
Daniel Han	d80444fe3e	Update rms_layernorm.py	2024-11-04 23:24:33 -08:00
Daniel Han	40858e4f76	Update rms_layernorm.py	2024-11-04 23:20:32 -08:00
Daniel Han	a3429d208d	Update rms_layernorm.py	2024-11-04 23:19:02 -08:00
Daniel Han	4e4bdc4fcf	Update utils.py	2024-11-04 23:16:01 -08:00
Daniel Han	578c8fde96	Update rms_layernorm.py	2024-11-04 23:11:03 -08:00
Daniel Han	15fdce80b9	Update rms_layernorm.py	2024-11-04 23:08:05 -08:00
Daniel Han	106d35fe8a	Update rms_layernorm.py	2024-11-04 23:06:13 -08:00
Daniel Han	9de3649586	Update rms_layernorm.py	2024-11-04 23:04:34 -08:00
Daniel Han	74e5310ba1	Update rms_layernorm.py	2024-11-04 23:01:14 -08:00
Daniel Han	08f6b3ac3c	Update rms_layernorm.py	2024-11-04 22:38:47 -08:00
Daniel Han	1c8582dd34	Update _utils.py	2024-11-04 22:10:52 -08:00
Daniel Han	3524fb6fd9	Update llama.py	2024-11-04 22:08:49 -08:00
Daniel Han	344b32931f	Update _utils.py	2024-11-04 21:47:48 -08:00
Daniel Han	1518da6eef	Update cross_entropy_loss.py	2024-11-04 19:48:09 -08:00
Daniel Han	543d31f6ec	CE	2024-11-04 19:45:10 -08:00
Daniel Han	72c568498a	Update _utils.py	2024-11-04 17:56:16 -08:00
Daniel Han	a1b608e889	Update _utils.py	2024-11-04 01:23:24 -08:00
Daniel Han	1f0fd97f46	Update _utils.py	2024-11-04 01:20:31 -08:00
Daniel Han	80dea539a7	Update cross_entropy_loss.py	2024-11-04 00:32:35 -08:00
Daniel Han	287d50c257	Update cross_entropy_loss.py	2024-11-04 00:07:52 -08:00
Daniel Han	0c48e0c179	constexpr	2024-11-04 00:05:51 -08:00
Daniel Han	d2725e7910	constexpr	2024-11-04 00:03:24 -08:00
Daniel Han	006bafb437	Update cross_entropy_loss.py	2024-11-04 00:01:04 -08:00
Daniel Han	ffe5d81c4f	Update _utils.py	2024-11-03 23:55:39 -08:00
Daniel Han	eaff11e1c3	int64	2024-11-03 23:53:03 -08:00
Daniel Han	5eddfd8cfe	Update cross_entropy_loss.py	2024-11-03 21:42:13 -08:00
Daniel Han	fee02b903b	Update cross_entropy_loss.py	2024-11-03 21:40:13 -08:00
Daniel Han	c61a9b5593	Update cross_entropy_loss.py	2024-11-03 21:38:19 -08:00
Daniel Han	c1f3875371	Update cross_entropy_loss.py	2024-11-03 21:35:27 -08:00
Daniel Han	9384452eae	Update cross_entropy_loss.py	2024-11-03 21:35:01 -08:00
Daniel Han	539a1406bd	Update cross_entropy_loss.py	2024-11-03 21:33:26 -08:00
Daniel Han	9875cbd810	Update cross_entropy_loss.py	2024-11-03 21:31:18 -08:00
Daniel Han	9d13739145	Update cross_entropy_loss.py	2024-11-03 21:29:19 -08:00
Daniel Han	f6acbbeb03	Update cross_entropy_loss.py	2024-11-03 21:27:37 -08:00
Daniel Han	1e98cc3eb2	typing	2024-11-03 21:25:32 -08:00
Daniel Han	43c3da8776	Update cross_entropy_loss.py	2024-11-03 21:23:28 -08:00
Daniel Han	c5b142e7b5	Update cross_entropy_loss.py	2024-11-03 21:22:08 -08:00
Daniel Han	4cedfeae87	Update cross_entropy_loss.py	2024-11-03 21:19:42 -08:00
Daniel Han	c49cf23ac1	Update cross_entropy_loss.py	2024-11-03 21:18:26 -08:00
Daniel Han	adebfa1874	Update cross_entropy_loss.py	2024-11-03 21:16:19 -08:00
Daniel Han	66b807b6dd	Update cross_entropy_loss.py	2024-11-03 21:11:34 -08:00
Daniel Han	86f5a300e0	Update cross_entropy_loss.py	2024-11-03 21:09:34 -08:00
Daniel Han	a4471de988	Update cross_entropy_loss.py	2024-11-03 21:07:23 -08:00
Daniel Han	dc552adcf1	Update cross_entropy_loss.py	2024-11-03 21:03:24 -08:00
Daniel Han	7fd6bed2d3	Update cross_entropy_loss.py	2024-11-03 20:58:59 -08:00
Daniel Han	ad79b86e86	Update cross_entropy_loss.py	2024-11-03 20:58:31 -08:00
Daniel Han	f13c60b73d	Update cross_entropy_loss.py	2024-11-03 20:49:59 -08:00
Daniel Han	630ec299ac	Update cross_entropy_loss.py	2024-11-03 20:03:03 -08:00
Daniel Han	9a68c7fec9	Update cross_entropy_loss.py	2024-11-03 19:59:59 -08:00
Daniel Han	2cff29f70c	Update cross_entropy_loss.py	2024-11-03 19:57:26 -08:00
Daniel Han	e773a4ffe6	Update cross_entropy_loss.py	2024-11-03 18:35:43 -08:00
Daniel Han	f3a93319b0	Update cross_entropy_loss.py	2024-11-03 18:33:02 -08:00
Daniel Han	90af825bed	Update _utils.py	2024-11-03 18:13:11 -08:00
Daniel Han	5eeffb8965	Update llama.py	2024-11-03 18:08:22 -08:00
Daniel Han	74f0223765	Update _utils.py	2024-11-03 17:56:03 -08:00
Daniel Han	6ca6013f1d	Post patch	2024-11-03 17:49:08 -08:00
Daniel Han	374e0d9294	Update _utils.py	2024-11-03 17:15:27 -08:00
Daniel Han	0b79fb1618	Update _utils.py	2024-11-03 17:09:37 -08:00
Daniel Han	68b6bebded	Layernorm	2024-11-03 17:05:33 -08:00
Daniel Han	73ed5dcfa0	Update _utils.py	2024-11-03 15:25:18 -08:00
Daniel Han	cc5c57b764	Update _utils.py	2024-11-03 15:21:03 -08:00
Daniel Han	37b160d724	CE Loss	2024-11-03 15:15:58 -08:00
Daniel Han	3591f347eb	Update _utils.py	2024-11-03 15:02:38 -08:00
Daniel Han	21d50ad49d	Update _utils.py	2024-11-03 15:01:29 -08:00
Daniel Han	2ea161c913	CE Loss	2024-11-03 14:15:08 -08:00
Daniel Han	29a73e562d	Update cross_entropy_loss.py	2024-11-03 14:09:47 -08:00
Daniel Han	13e8f1a52d	Patching	2024-11-03 01:18:48 -07:00
Daniel Han	b2260942fa	Update __init__.py	2024-11-03 00:22:38 -07:00
Daniel Han	0ebecb759a	Update __init__.py	2024-11-02 23:14:41 -07:00
Daniel Han	726e7c933b	CE Loss patching	2024-11-02 19:20:48 -07:00
Daniel Han	4b8906fe50	Utils	2024-11-02 19:08:57 -07:00
Daniel Han	27d2d1df49	Tied weights	2024-10-31 16:25:26 -07:00
Daniel Han	e2aa4d6a1a	Revert "Tied weights" This reverts commit `820cd4efef`.	2024-10-31 12:38:20 -07:00
Daniel Han	820cd4efef	Tied weights	2024-10-31 12:36:21 -07:00
Daniel Han	f3bd5d6d33	Update cross_entropy_loss.py	2024-10-31 02:00:04 -07:00
Daniel Han	dd5d035a99	Update cross_entropy_loss.py	2024-10-31 01:56:03 -07:00
Daniel Han	8942b2fc76	Update cross_entropy_loss.py	2024-10-31 01:23:41 -07:00
Daniel Han	67ca220d94	Update cross_entropy_loss.py	2024-10-30 17:35:20 -07:00
Daniel Han	dea8630305	Update cross_entropy_loss.py	2024-10-30 17:26:34 -07:00
Daniel Han	93785d3578	Update cross_entropy_loss.py	2024-10-30 17:19:30 -07:00
Daniel Han	4a369c264a	Update cross_entropy_loss.py	2024-10-30 16:55:24 -07:00
Daniel Han	65e99c28cb	Update cross_entropy_loss.py	2024-10-30 16:53:04 -07:00
Daniel Han	c8056af236	Update cross_entropy_loss.py	2024-10-30 16:50:56 -07:00
Daniel Han	895a5364f0	Update cross_entropy_loss.py	2024-10-30 16:46:33 -07:00
Daniel Han	6ef574b58a	Update cross_entropy_loss.py	2024-10-30 16:37:37 -07:00
Daniel Han	c60053e4af	Update cross_entropy_loss.py	2024-10-30 15:46:09 -07:00
Daniel Han	9bf58f1770	Update cross_entropy_loss.py	2024-10-30 15:33:37 -07:00
Daniel Han	cc5a1728ff	Update cross_entropy_loss.py	2024-10-30 14:57:14 -07:00
Daniel Han	233ec38c26	Update _utils.py	2024-10-30 14:01:49 -07:00
Daniel Han	353f7eeb9f	Update _utils.py	2024-10-30 14:00:03 -07:00
Daniel Han	6ce3958c87	Update _utils.py	2024-10-30 13:54:11 -07:00
Daniel Han	cfc84d83d9	Update _utils.py	2024-10-30 13:51:10 -07:00
Daniel Han	6e6cb3c0da	Update __init__.py	2024-10-30 13:47:21 -07:00
Daniel Han	a919f17648	Update __init__.py	2024-10-30 13:44:05 -07:00
Daniel Han	bedce9ac2c	Update _utils.py	2024-10-30 13:35:01 -07:00
Daniel Han	f88122bd43	Update pyproject.toml	2024-10-30 13:16:00 -07:00
Daniel Han	2d86834813	Bug fixes	2024-10-30 13:11:43 -07:00
Daniel Han	09f667a533	Feat/all tmp (#1219 ) * Update save.py Check whether path is in /tmp dir for Kaggle environment * Update save.py Move temporary_location to /tmp in Kaggle * Enhance Kaggle environment support in save and tokenizer utilities --------- Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com> Co-authored-by: Erland366 <erland.pg366@gmail.com>	2024-10-30 00:43:03 -07:00
Daniel Han	8e30e2e646	Update cross_entropy_loss.py	2024-10-28 15:01:04 -07:00
Daniel Han	d320355de4	Update cross_entropy_loss.py	2024-10-28 14:47:11 -07:00
Daniel Han	1726f04b97	Update cross_entropy_loss.py	2024-10-28 14:30:06 -07:00
Daniel Han	5c669defd5	Update _utils.py	2024-10-28 10:41:31 -07:00
Daniel Han	54ed0fa410	Update _utils.py	2024-10-28 01:12:33 -07:00
Daniel Han	cbbdff23fc	More patching	2024-10-28 01:10:23 -07:00
Daniel Han	38e5b23223	Revert "ignored labels" This reverts commit `4b25138ac7`.	2024-10-27 22:18:05 -07:00
Daniel Han	4b25138ac7	ignored labels	2024-10-27 22:10:59 -07:00
Daniel Han	e4205ffad5	Typo	2024-10-27 19:09:27 -07:00
Daniel Han	65f754e0f4	Update llama.py	2024-10-27 19:08:14 -07:00
Daniel Han	ac8d5fc3cb	Fix pad token	2024-10-27 19:06:57 -07:00
Daniel Han	6f19b9aecd	Update _utils.py	2024-10-27 17:34:33 -07:00
Daniel Han	d818697900	Unk token issues	2024-10-27 17:32:26 -07:00
Daniel Han	b040e3407d	Merge branch 'main' into nightly	2024-10-27 16:24:20 -07:00
Daniel Han	9e9d6fe660	Merge branch 'main' of https://github.com/unslothai/unsloth	2024-10-27 15:09:42 -07:00
Daniel Han	55fd65a6ed	Update _utils.py	2024-10-27 15:09:35 -07:00
Edd	539fcea071	Fix/casting continue pretraining (#1200 ) * Bring back float32 if float16 instead of bfloat16 * Refactor mixed precision handling for lm_head and embed_tokens to ensure correct dtype usage * Fix dtype retrieval for embed_tokens and lm_head in mixed precision training * Fix dtype retrieval for embed_tokens and lm_head to use weight dtype in mixed precision training * Fix dtype handling for embed_tokens and lm_head to ensure correct float32 usage in mixed precision training * Fix dtype assignment for lm_head modules to ensure correct weight dtype usage in mixed precision training	2024-10-27 15:06:45 -07:00
Daniel Han	9d5f58224d	Update pyproject.toml	2024-10-26 18:05:55 -07:00
Daniel Han	e7ede2f7db	Torch 2.5	2024-10-26 18:03:15 -07:00
Daniel Han	a4a724bab2	Merge branch 'main' into nightly	2024-10-26 01:22:21 -07:00
Daniel Han	c58dc701c8	Bug fixes (#1195 ) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (#1165) * chore: update chat_templates.py (#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py * Update _utils.py * fix/transformers-unpack (#1180) * Fix DPO, ORPO (#1177) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (#1165) * chore: update chat_templates.py (#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * donot upcast lm_head and embeddings to float32 (#1186) * Cleanup upcast logs (#1188) * Fix/phi-longrope (#1193) * Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding * Typo * Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache * Update llama.py * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update transformers --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com> Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>	2024-10-26 01:21:24 -07:00
Daniel Han	1e8980127c	Update transformers	2024-10-26 01:20:37 -07:00
Edd	c8c4cb3a6d	Fix/phi-longrope (#1193 ) * Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding * Typo * Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache * Update llama.py * Update llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2024-10-25 15:44:10 -07:00
Datta Nimmaturi	1ba5a0161d	Cleanup upcast logs (#1188 )	2024-10-25 12:17:54 -07:00
Datta Nimmaturi	06050f1802	donot upcast lm_head and embeddings to float32 (#1186 )	2024-10-25 01:28:12 -07:00
Daniel Han	dcf27bcca7	Merge branch 'main' into nightly	2024-10-24 12:17:57 -07:00
Daniel Han	519c0df00c	Update _utils.py	2024-10-24 12:17:48 -07:00
Daniel Han	06a5c752e3	Fix 4.47 issue (#1182 ) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (#1165) * chore: update chat_templates.py (#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py * Update _utils.py * fix/transformers-unpack (#1180) * Fix DPO, ORPO (#1177) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (#1165) * chore: update chat_templates.py (#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>	2024-10-24 12:17:21 -07:00
Daniel Han	e24b2db194	Update _utils.py	2024-10-24 12:17:09 -07:00
Daniel Han	6f34885c29	Update _utils.py	2024-10-24 12:14:14 -07:00
Daniel Han	8603d08f3b	Update cross_entropy_loss.py	2024-10-24 12:11:38 -07:00
Edd	79effae03d	fix/transformers-unpack (#1180 ) * Fix DPO, ORPO (#1177) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (#1165) * chore: update chat_templates.py (#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>	2024-10-24 12:10:52 -07:00
Daniel Han	e3e4b7dfc3	Update _utils.py	2024-10-24 01:11:20 -07:00
Daniel Han	a6e4a8bf76	Fix DPO, ORPO (#1177 ) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (#1165) * chore: update chat_templates.py (#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py * n_items * Update cross_entropy_loss.py * Fix DPO, ORPO * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>	2024-10-24 00:36:37 -07:00
Daniel Han	0bd8517b1e	Update _utils.py	2024-10-24 00:25:28 -07:00
Daniel Han	d6382ca656	Merge branch 'main' into nightly	2024-10-24 00:24:27 -07:00
Daniel Han	ccf1f946f3	Fix DPO, ORPO	2024-10-24 00:17:26 -07:00
Daniel Han	7fa3179e88	Update cross_entropy_loss.py	2024-10-23 22:18:24 -07:00
Daniel Han	dd8487a63e	n_items	2024-10-23 22:13:45 -07:00
Daniel Han	aa48184c41	Update _utils.py	2024-10-23 12:39:58 -07:00
Edd	da8e547678	Fix/patch tokenizer (#1171 ) * fix: correct tokenizer handling in patch_sft_trainer_tokenizer * Revert "fix: correct tokenizer handling in patch_sft_trainer_tokenizer" This reverts commit 7a98e465cbd4f980c8b364b0396d44f2d052090f. * fix: correct condition for test_text assignment in patch_sft_trainer_tokenizer	2024-10-23 12:32:33 -07:00
Daniel Han	4c85177719	Many bug fixes (#1162 ) * Fix TRL * Update mistral.py * Patch processing_class * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Installation guide (#1165) * chore: update chat_templates.py (#1166) orginal -> original * Disable Flex Attention * Update tokenizer_utils.py * Update _utils.py --------- Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>	2024-10-23 03:14:57 -07:00
Daniel Han	8e4cd551e7	Update _utils.py	2024-10-23 03:14:48 -07:00
Daniel Han	f1d3a8ae6c	Update tokenizer_utils.py	2024-10-23 03:04:22 -07:00
Daniel Han	6a1bef2c4a	Disable Flex Attention	2024-10-23 02:58:40 -07:00
Ikko Eltociear Ashimine	5d1de9d42e	chore: update chat_templates.py (#1166 ) orginal -> original	2024-10-23 00:59:02 -07:00
timothelaborie	8dbf2d5daa	Installation guide (#1165 )	2024-10-23 00:55:26 -07:00
Daniel Han	1623c3242b	Update tokenizer_utils.py	2024-10-22 01:28:38 -07:00
Daniel Han	444bc97f13	Update tokenizer_utils.py	2024-10-22 01:22:13 -07:00
Daniel Han	e749d57859	Update tokenizer_utils.py	2024-10-22 01:09:20 -07:00
Daniel Han	48eda0ba34	Update tokenizer_utils.py	2024-10-22 01:05:41 -07:00
Daniel Han	ccf31a0285	Update tokenizer_utils.py	2024-10-22 00:57:36 -07:00
Daniel Han	e8bc00bd33	Update tokenizer_utils.py	2024-10-22 00:55:47 -07:00
Daniel Han	30230e6a02	Patch processing_class	2024-10-22 00:53:38 -07:00
Daniel Han	0b20946bcf	Update mistral.py	2024-10-22 00:29:30 -07:00
Daniel Han	4a51572765	Fix TRL	2024-10-21 01:02:53 -07:00
Daniel Han	a51a84f62d	Update save.py	2024-10-20 01:52:21 -07:00
Daniel Han	108fa8dfbe	Update _utils.py	2024-10-18 23:10:30 -07:00
Daniel Han	828ef9afc5	Fix `get_token`	2024-10-18 23:08:30 -07:00
vo1d-ai	f63a2a5026	fix: compute_loss bug (#1151 ) Currently, Unsloth doesn't pass additional parameters to Trainer.compute_loss such as return_outputs. This leads to errors when calling trainer.evaluate(). This change fixes the bug by properly passing parameters to Trainer.compute_loss.	2024-10-18 20:46:07 -07:00
Daniel Han	cde7401259	Update _utils.py	2024-10-17 20:50:05 -07:00
Daniel Han	139c3b29b3	Update README.md	2024-10-17 20:46:11 -07:00
Daniel Han	3a33dad3c9	Update README.md	2024-10-17 20:45:40 -07:00
Daniel Han	d57dcf58a1	Gradient Accumulation Fix (#1146 ) * Unsloth Zoo * Update trainer.py * Update trainer.py * Update cross_entropy_loss.py * n_items * Update llama.py * kwargs * Remove extraneous f prefixes (#1133) Co-authored-by: Emil Sadek <esadek@users.noreply.github.com> * Update __init__.py * kwargs * Update trainer.py * Update trainer.py * Update trainer.py * Fix GA * Update _utils.py * Update llama.py * Update tokenizer_utils.py * Warn on old versions * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py --------- Co-authored-by: Emil Sadek <esadek@hotmail.com> Co-authored-by: Emil Sadek <esadek@users.noreply.github.com>	2024-10-17 20:43:07 -07:00
Daniel Han	6ba202708c	Update mapper.py	2024-10-16 21:48:05 -07:00
Daniel Han	d2a032e117	Gradient Accumulation Fix (#1134 ) * Unsloth Zoo * Update trainer.py * Update trainer.py * Update cross_entropy_loss.py * n_items * Update llama.py * kwargs * Remove extraneous f prefixes (#1133) Co-authored-by: Emil Sadek <esadek@users.noreply.github.com> * Update __init__.py --------- Co-authored-by: Emil Sadek <esadek@hotmail.com> Co-authored-by: Emil Sadek <esadek@users.noreply.github.com>	2024-10-14 19:17:35 -07:00
Daniel Han	5bd7d3640f	Update save.py	2024-10-11 00:00:06 -07:00
Daniel Han	9dd4462bf9	Update save.py	2024-10-10 23:22:17 -07:00
Giulia Baldini	592191b061	Only remove folder in sentenpiece check if it was created (#1121 )	2024-10-10 23:21:27 -07:00
Giulia Baldini	5f2d5a3021	Handle absolute paths using pathlib (#1120 )	2024-10-10 23:20:34 -07:00
Daniel Han	e130e748f0	Reload	2024-10-05 17:21:48 -07:00
Daniel Han	c89ae6b9b4	Merge branch 'nightly'	2024-10-01 00:45:16 -07:00
Daniel Han	3c47723bb2	Update README.md	2024-10-01 00:40:17 -07:00
Daniel Han	7fc9b07b94	Update tokenizer_utils.py	2024-10-01 00:35:54 -07:00
Daniel Han	3017eae097	Update tokenizer_utils.py	2024-10-01 00:20:01 -07:00
Daniel Han	dfc5cd3c80	Update tokenizer_utils.py	2024-10-01 00:14:52 -07:00
Daniel Han	ac3f564f7e	Update chat_templates.py	2024-09-30 23:08:46 -07:00
Daniel Han	248c27d205	Fix merges (#1079 ) * Layernorm * Update layernorm.py * Update layernorm.py * Update layernorm.py * Update layernorm.py * Update layernorm.py * Update layernorm.py * Patch layernorm * Update layernorm.py * RMS Layernorm * Update rms_layernorm.py * Causal LM * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update layernorm.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Llama 3.2 * Update _utils.py * Update _utils.py * Update _utils.py * Update llama.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update loader.py * Update loader.py * Update loader.py * Dependencies * Update pyproject.toml * Update _utils.py	2024-09-30 03:03:01 -07:00
Daniel Han	0f1f2cf728	Update _utils.py	2024-09-30 02:51:32 -07:00
Daniel Han	906f88bb4d	Update pyproject.toml	2024-09-30 02:48:31 -07:00
Daniel Han	e31152134e	Dependencies	2024-09-30 02:09:15 -07:00
Daniel Han	45916d36cf	Update loader.py	2024-09-29 23:22:05 -07:00
Daniel Han	a529a39c81	Update loader.py	2024-09-29 23:15:22 -07:00
Daniel Han	f744b3159e	Update loader.py	2024-09-29 23:13:44 -07:00
Daniel Han	ab43e02a94	Merge branch 'main' into nightly	2024-09-29 23:13:16 -07:00
Daniel Han	afbb140a79	Update loader.py	2024-09-29 01:42:58 -07:00
Daniel Han	b314837622	Update pyproject.toml	2024-09-27 01:36:45 -07:00
Daniel Han	c0b4d640f2	Update tokenizer_utils.py	2024-09-26 01:23:40 -07:00
Daniel Han	88a542a129	Update README.md	2024-09-26 00:12:42 -07:00
Daniel Han	6bbca3aaa8	Update README.md	2024-09-26 00:05:38 -07:00
Daniel Han	4f4ef22035	Update README.md	2024-09-26 00:02:15 -07:00
Daniel Han	930d2ad1a8	Update pyproject.toml	2024-09-25 23:47:15 -07:00
Daniel Han	5b345ec757	Update pyproject.toml	2024-09-25 23:13:49 -07:00
Daniel Han	c331c886ee	Remove version checks	2024-09-25 23:00:09 -07:00
Daniel Han	63e3a85efb	Update _utils.py	2024-09-25 22:56:41 -07:00
Daniel Han	dc8bca6713	Update llama.py	2024-09-25 22:12:21 -07:00
Daniel Han	8c6acbc6ce	Update llama.py	2024-09-25 19:38:01 -07:00
Daniel Han	50b9003936	Update llama.py	2024-09-25 19:15:40 -07:00
Daniel Han	70775fa740	Update llama.py	2024-09-25 18:24:30 -07:00
Daniel Han	013a2ed95b	Update llama.py	2024-09-25 18:10:46 -07:00
Daniel Han	8d910ecba9	Update llama.py	2024-09-25 18:07:21 -07:00
Daniel Han	ff921b8601	Update llama.py	2024-09-25 17:55:42 -07:00
Daniel Han	b61d75592a	Update llama.py	2024-09-25 17:51:36 -07:00
Daniel Han	2aae911368	Update llama.py	2024-09-25 17:46:29 -07:00
Daniel Han	48777c492a	Update vision.py	2024-09-25 17:44:13 -07:00
Daniel Han	62791d8f98	Update llama.py	2024-09-25 14:32:29 -07:00
Daniel Han	2af2e6e439	Update _utils.py	2024-09-25 14:12:16 -07:00
Daniel Han	54e59f3f49	Update _utils.py	2024-09-25 13:28:05 -07:00
Daniel Han	4e516b76c5	Update _utils.py	2024-09-25 13:19:01 -07:00
Daniel Han	6bc0a12470	Merge branch 'main' into nightly	2024-09-25 13:18:42 -07:00
Daniel Han	3cc9c2410e	Update llama.py	2024-09-25 12:42:24 -07:00
Daniel Han	0c07b760de	Fix version	2024-09-25 12:35:38 -07:00
Daniel Han	f22a14801d	Llama 3.2	2024-09-25 12:18:43 -07:00
Daniel Han	0a05da55f0	Llama 3.2 (#1058 ) * Layernorm * Update layernorm.py * Update layernorm.py * Update layernorm.py * Update layernorm.py * Update layernorm.py * Update layernorm.py * Patch layernorm * Update layernorm.py * RMS Layernorm * Update rms_layernorm.py * Causal LM * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update layernorm.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update _utils.py * Update _utils.py * Llama 3.2	2024-09-25 11:48:24 -07:00
Daniel Han	fa0b63b10b	Llama 3.2	2024-09-25 11:42:41 -07:00
Daniel Han	50edf0fd32	Update _utils.py	2024-09-25 00:23:23 -07:00
Daniel Han	b40724b730	Update _utils.py	2024-09-25 00:22:54 -07:00
Daniel Han	fe61245404	Update cross_entropy_loss.py	2024-09-25 00:11:57 -07:00
Daniel Han	dc7b21305d	Update cross_entropy_loss.py	2024-09-25 00:09:19 -07:00
Daniel Han	91930ec80c	Update cross_entropy_loss.py	2024-09-25 00:07:49 -07:00
Daniel Han	ad4b093ece	Update cross_entropy_loss.py	2024-09-25 00:06:10 -07:00
Daniel Han	bb12d6bd2b	Update cross_entropy_loss.py	2024-09-25 00:04:58 -07:00
Daniel Han	cb875a5b45	Update cross_entropy_loss.py	2024-09-25 00:01:56 -07:00
Daniel Han	507a490355	Update cross_entropy_loss.py	2024-09-25 00:00:51 -07:00
Daniel Han	e56ce736ff	Update layernorm.py	2024-09-24 23:59:09 -07:00
Daniel Han	5bb6b3bbd2	Update cross_entropy_loss.py	2024-09-24 23:57:22 -07:00
Daniel Han	45554fb66d	Update cross_entropy_loss.py	2024-09-24 23:55:25 -07:00
Daniel Han	6f39ef47c0	Update cross_entropy_loss.py	2024-09-24 23:54:14 -07:00
Daniel Han	7f74268e01	Update cross_entropy_loss.py	2024-09-24 23:52:54 -07:00
Daniel Han	85556bc385	Causal LM	2024-09-24 23:49:57 -07:00
Daniel Han	634489002d	Update rms_layernorm.py	2024-09-24 23:13:51 -07:00
Daniel Han	fc4ca43ee1	RMS Layernorm	2024-09-24 22:50:33 -07:00
Daniel Han	d23fd17447	Update layernorm.py	2024-09-24 17:24:39 -07:00
Daniel Han	5103781c3f	Patch layernorm	2024-09-24 17:22:20 -07:00
Daniel Han	9079e3c6e4	Update layernorm.py	2024-09-24 17:03:19 -07:00
Daniel Han	6801979ef3	Update layernorm.py	2024-09-24 17:00:23 -07:00
Daniel Han	4f7e68a3fe	Update layernorm.py	2024-09-24 16:54:52 -07:00
Daniel Han	9b2c5d4814	Update layernorm.py	2024-09-24 16:45:50 -07:00
Daniel Han	55d41d72a0	Update layernorm.py	2024-09-24 16:44:33 -07:00
Daniel Han	873f245009	Update layernorm.py	2024-09-24 16:29:54 -07:00
Daniel Han	88269a26a8	Layernorm	2024-09-24 02:49:48 -07:00
Daniel Han	b41c182296	Update _utils.py	2024-09-23 10:56:55 -07:00
Daniel Han	9c26f9d3bb	Update README.md	2024-09-23 01:36:50 -07:00
Daniel Han	45ca9501a4	Qwen 2.5	2024-09-23 01:27:12 -07:00
Daniel Han	6e387d8ff8	Update chat_templates.py	2024-09-23 01:07:06 -07:00
Daniel Han	388d5149a9	Upgrade Ollama presets	2024-09-23 01:02:24 -07:00
Daniel Han	a08812cc52	Update chat_templates.py	2024-09-23 00:29:21 -07:00
Daniel Han	1b8ef43c14	Update tokenizer_utils.py	2024-09-23 00:04:33 -07:00
Daniel Han	96fd381293	Update tokenizer_utils.py	2024-09-22 23:00:24 -07:00
Daniel Han	c1c37c49a6	Update tokenizer_utils.py	2024-09-22 22:48:35 -07:00
Daniel Han	7e2654ab7a	Update mapper.py	2024-09-22 22:13:59 -07:00
Daniel Han	f216cdc289	Update _utils.py	2024-09-22 02:38:28 -07:00
Nazim Ali	0904f7395d	fix: chat_templates.py bug (#1048 ) * fix: chat_template bug * fix: check trainer attribute values are not None	2024-09-22 01:18:37 -07:00
Daniel Han	cf3e072867	Update chat_templates.py	2024-09-21 01:56:17 -07:00
Daniel Han	c7c674472f	Merge branch 'nightly'	2024-09-18 14:30:33 -07:00
Daniel Han	9fd82c95aa	Update mapper.py	2024-09-18 14:23:22 -07:00
Daniel Han	1d4ae059c5	Update README.md (#1036 )	2024-09-18 13:23:45 -07:00
Daniel Han	6d8b4c53a5	Update llama.py	2024-09-17 17:38:12 -07:00
Daniel Han	3289975025	Update mapper.py	2024-09-17 10:50:57 -07:00
Daniel Han	563432635a	Update mapper.py	2024-09-15 21:50:00 -07:00
Daniel Han	89ada9ef0b	Update _utils.py	2024-09-15 18:04:18 -07:00
Daniel Han	c5d7bb591d	Update README.md (#1033 )	2024-09-15 17:42:09 -07:00
Daniel Han	d27d992f58	Update utils.py	2024-09-08 19:47:21 -07:00
Daniel Han	0ad16b7c1f	Update __init__.py	2024-09-08 15:51:27 -07:00
Daniel Han	ffb6aa905f	Update README.md	2024-09-08 14:30:54 -07:00
Daniel Han	1bba6954f1	Update README.md	2024-09-08 12:29:31 -07:00
Daniel Han	74c2141bb3	Bug fixes (#1004 ) * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * update token retrieval logic (#952) * Fix DPO (#947) * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * update hf token retrieval logic --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * get_token * Update README.md * Update gemma2.py * Update rms_layernorm.py * synchronize * Update gemma2.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * layernorm * Update rms_layernorm.py * Update gemma2.py * Update rms_layernorm.py * Update rms_layernorm.py * revert * Gemma * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma2.py * Change UnslothTrainingArguments base class to SFTConfig (#979) * Cohere * Update trainer.py * Cohere * Cohere * New models * Update llama.py * Update llama.py * Update cohere.py * Update llama.py * Update cohere.py * retry * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * _apply_lora_mlp * Update _utils.py * Gemma fixes * Update llama.py * Update flex_attention.py * Update llama.py * layernorm * Update llama.py * Update llama.py * Flex Attention * Update gemma2.py * Update __init__.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update chat_templates.py (#999) fix all misspelled "unsued" to "unused" * Update key from "from" to "user" (#1000) When use [tokenizer.apply_chat_template](https://huggingface.co/docs/transformers/main/en/chat_templating), the key should be "role" rather than "from", this is liknk to [this issue](https://github.com/unslothai/unsloth/issues/994) I don't know it is suitable for all situation, I also can add a dedicated parameter of the key if you think it is better. * Update chat_templates.py * Also patch the KTO trainer (#1001) * flex attention * Update llama.py * Update flex_attention.py * Update flex_attention.py * Update _utils.py * Update _utils.py * Update flex_attention.py * Update gemma2.py * Update gemma2.py --------- Co-authored-by: Hafedh <70411813+not-lain@users.noreply.github.com> Co-authored-by: Tuan Pham <82665400+vTuanpham@users.noreply.github.com> Co-authored-by: Yihao Wang <42559837+AgainstEntropy@users.noreply.github.com> Co-authored-by: Peng <zphu1024@gmail.com> Co-authored-by: Kyle Corbitt <kyle@openpipe.ai>	2024-09-08 03:16:09 -07:00
Daniel Han	658e162032	Bug fixes	2024-09-04 00:28:53 -07:00
Daniel Han	a8490a2a8a	Fix bug	2024-09-03 17:30:40 -07:00
Daniel Han	1b81cf1859	Gemma faster inference (#987 ) * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * update token retrieval logic (#952) * Fix DPO (#947) * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * update hf token retrieval logic --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * get_token * Update README.md * Update gemma2.py * Update rms_layernorm.py * synchronize * Update gemma2.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * layernorm * Update rms_layernorm.py * Update gemma2.py * Update rms_layernorm.py * Update rms_layernorm.py * revert * Gemma * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma2.py * Change UnslothTrainingArguments base class to SFTConfig (#979) * Cohere * Update trainer.py * Cohere * Cohere * New models * Update llama.py * Update llama.py * Update cohere.py * Update llama.py * Update cohere.py * retry * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * _apply_lora_mlp * Update _utils.py * Gemma fixes * Update llama.py * Update flex_attention.py --------- Co-authored-by: Hafedh <70411813+not-lain@users.noreply.github.com> Co-authored-by: Tuan Pham <82665400+vTuanpham@users.noreply.github.com>	2024-09-03 13:52:12 -07:00
Daniel Han	7c3d1091ba	Cohere, Bug fixes (#984 ) * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * update token retrieval logic (#952) * Fix DPO (#947) * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * update hf token retrieval logic --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * get_token * Update README.md * Update gemma2.py * Update rms_layernorm.py * synchronize * Update gemma2.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * layernorm * Update rms_layernorm.py * Update gemma2.py * Update rms_layernorm.py * Update rms_layernorm.py * revert * Gemma * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma2.py * Change UnslothTrainingArguments base class to SFTConfig (#979) * Cohere * Update trainer.py * Cohere * Cohere * New models * Update llama.py * Update llama.py * Update cohere.py * Update llama.py * Update cohere.py * retry * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * _apply_lora_mlp * Update _utils.py --------- Co-authored-by: Hafedh <70411813+not-lain@users.noreply.github.com> Co-authored-by: Tuan Pham <82665400+vTuanpham@users.noreply.github.com>	2024-09-03 01:52:32 -07:00
Daniel Han	b140367a87	Update save.py	2024-08-27 00:08:39 -07:00
Daniel Han	9f36b83db0	Update gemma2.py	2024-08-23 23:43:57 -07:00
Daniel Han	353991f14a	Phi 3.5 bug fix (#955 ) * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * update token retrieval logic (#952) * Fix DPO (#947) * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * update hf token retrieval logic --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * get_token * Update README.md --------- Co-authored-by: Hafedh <70411813+not-lain@users.noreply.github.com>	2024-08-23 17:38:24 -07:00
Daniel Han	199766c644	Fix DPO (#947 ) * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py	2024-08-22 02:18:03 -07:00
Daniel Han	cadff4f883	Update README.md (#941 ) Co-authored-by: Michael <107991372+shimmyshimmer@users.noreply.github.com>	2024-08-20 17:59:50 -07:00
Daniel Han	8e61906e6f	Update chat_templates.py	2024-08-20 16:54:11 -07:00
Daniel Han	fb60340a90	Phi 3.5 (#940 ) * LongRoPE * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mapper.py * Phi 3.5	2024-08-20 16:51:39 -07:00
Daniel Han	0927c34392	Update README.md (#938 )	2024-08-19 17:18:30 -07:00
Daniel Han	861f232047	Merge branch 'main' into nightly	2024-08-19 17:13:12 -07:00
Daniel Han	92a3aec9e9	Update _auto_install.py	2024-08-19 17:12:46 -07:00
Daniel Han	4450110756	Create _auto_install.py	2024-08-19 17:12:32 -07:00
Daniel Han	bb9539cc82	Fix NEFTune (#937 ) * untrained tokens llama 3.1 base * Update tokenizer_utils.py * Update tokenizer_utils.py * Bug fixes * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update llama.py * Update llama.py	2024-08-19 16:17:52 -07:00
Daniel Han	3b4ce17bc9	Merge branch 'main' into nightly	2024-08-19 16:17:00 -07:00
Daniel Han	91bdf27729	Update llama.py	2024-08-19 16:14:01 -07:00
Daniel Han	ce1863d91f	Update llama.py	2024-08-19 16:11:09 -07:00
Daniel Han	dcf7e6e952	Update llama.py	2024-08-19 16:08:14 -07:00
Daniel Han	4f51fe0a8c	Update tokenizer_utils.py	2024-08-19 16:03:24 -07:00
Daniel Han	b9f71049a4	Update tokenizer_utils.py	2024-08-19 15:52:13 -07:00
Daniel Han	1387a4a23f	Update tokenizer_utils.py	2024-08-19 15:50:16 -07:00
Daniel Han	7b35c195c0	Update llama.py	2024-08-19 15:08:53 -07:00
Daniel Han	c16e95decd	Bug fixes	2024-08-19 15:04:25 -07:00
Daniel Han	23752a7ab1	Bug #930 (#931 ) * untrained tokens llama 3.1 base * Update tokenizer_utils.py * Update tokenizer_utils.py	2024-08-16 23:39:44 -07:00
Daniel Han	a8f9f177b3	Update tokenizer_utils.py	2024-08-16 23:38:43 -07:00
Daniel Han	733075c5cd	Update tokenizer_utils.py	2024-08-16 23:38:02 -07:00
Daniel Han	a0fa23e66c	Merge branch 'main' into nightly	2024-08-16 23:37:16 -07:00
Daniel Han	a3eee645f1	untrained tokens llama 3.1 base (#929 )	2024-08-16 19:57:19 -07:00
Daniel Han	bd60ad7d1c	untrained tokens llama 3.1 base	2024-08-16 19:28:43 -07:00
Daniel Han	a8bed84683	Update __init__.py	2024-08-15 15:07:42 -07:00
Daniel Han	cdd961d2c1	Bug fixes	2024-08-15 15:04:46 -07:00
Daniel Han	c90bb8dc32	Fix mapping (#921 ) * Update pyproject.toml * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * fix_tokenizer * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update pyproject.toml * Update _utils.py * Update gemma2.py * Update gemma2.py * Update _utils.py * gemma 2 mask * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Torch 2.4 Xformers 0.0.27post2 * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Gemma 2 fixes * Update gemma2.py * Update llama.py * Update llama.py * Update save.py * Update save.py * Update llama.py * Update cross_entropy_loss.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Providing more flexibility for users to customize their llama when using LoRA (#910) * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py * return model * Update tokenizer_utils.py * Update chat_templates.py * Update tokenizer_utils.py * Train on completions * load_in_4bit=False broken * Update llama.py * MAP_TO_UNSLOTH_16bit * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update mapper.py * works! --------- Co-authored-by: Po-Lung Wang <Brownwang0426@gmail.com>	2024-08-15 01:15:35 -07:00
Daniel Han	1091f03d77	Bug Fixes (#920 ) * Update pyproject.toml * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * fix_tokenizer * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update pyproject.toml * Update _utils.py * Update gemma2.py * Update gemma2.py * Update _utils.py * gemma 2 mask * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Torch 2.4 Xformers 0.0.27post2 * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Gemma 2 fixes * Update gemma2.py * Update llama.py * Update llama.py * Update save.py * Update save.py * Update llama.py * Update cross_entropy_loss.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Providing more flexibility for users to customize their llama when using LoRA (#910) * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py * return model * Update tokenizer_utils.py * Update chat_templates.py * Update tokenizer_utils.py * Train on completions * load_in_4bit=False broken --------- Co-authored-by: Po-Lung Wang <Brownwang0426@gmail.com>	2024-08-15 00:31:30 -07:00
Daniel Han	56dbb23135	Fix chat templates (#917 ) * Update pyproject.toml * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * fix_tokenizer * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update pyproject.toml * Update _utils.py * Update gemma2.py * Update gemma2.py * Update _utils.py * gemma 2 mask * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Torch 2.4 Xformers 0.0.27post2 * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Gemma 2 fixes * Update gemma2.py * Update llama.py * Update llama.py * Update save.py * Update save.py * Update llama.py * Update cross_entropy_loss.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Providing more flexibility for users to customize their llama when using LoRA (#910) * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py * return model * Update tokenizer_utils.py * Update chat_templates.py * Update tokenizer_utils.py * Train on completions --------- Co-authored-by: Po-Lung Wang <Brownwang0426@gmail.com>	2024-08-14 00:58:02 -07:00
Daniel Han	ec413e63e0	Fix Chat Templates (#916 ) * Update pyproject.toml * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * fix_tokenizer * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update pyproject.toml * Update _utils.py * Update gemma2.py * Update gemma2.py * Update _utils.py * gemma 2 mask * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Torch 2.4 Xformers 0.0.27post2 * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Gemma 2 fixes * Update gemma2.py * Update llama.py * Update llama.py * Update save.py * Update save.py * Update llama.py * Update cross_entropy_loss.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Providing more flexibility for users to customize their llama when using LoRA (#910) * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py * return model * Update tokenizer_utils.py * Update chat_templates.py * Update tokenizer_utils.py --------- Co-authored-by: Po-Lung Wang <Brownwang0426@gmail.com>	2024-08-13 17:54:02 -07:00
Daniel Han	1204107724	Fix DPO stats (#906 ) * Update pyproject.toml * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * fix_tokenizer * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update pyproject.toml * Update _utils.py * Update gemma2.py * Update gemma2.py * Update _utils.py * gemma 2 mask * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Torch 2.4 Xformers 0.0.27post2 * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Gemma 2 fixes * Update gemma2.py * Update llama.py * Update llama.py * Update save.py * Update save.py * Update llama.py * Update cross_entropy_loss.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py * Update dpo.py	2024-08-11 18:26:20 -07:00
Daniel Han	1397f2e1ab	Torch 2.4, Xformers>0.0.27, TRL>0.9, Python 3.12 + bug fixes (#902 ) * Update pyproject.toml * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * fix_tokenizer * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update pyproject.toml * Update _utils.py * Update gemma2.py * Update gemma2.py * Update _utils.py * gemma 2 mask * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Torch 2.4 Xformers 0.0.27post2 * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Gemma 2 fixes * Update gemma2.py * Update llama.py * Update llama.py * Update save.py * Update save.py	2024-08-10 19:59:40 -07:00
Daniel Han	064ff70bc5	Update _utils.py	2024-08-07 10:48:39 -07:00
Daniel Han	0ea0802d47	Update _utils.py	2024-08-07 10:47:11 -07:00
Daniel Han	d60422fb68	Update _utils.py	2024-08-07 01:11:06 -07:00
Daniel Han	734e478605	Fix tokenizers (#887 ) * Update pyproject.toml * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * fix_tokenizer * Update tokenizer_utils.py * Update tokenizer_utils.py	2024-08-06 20:24:44 -07:00
Daniel Han	9be6480ec5	Update README.md	2024-08-05 00:00:53 -07:00
Daniel Han	ba87b3dd31	Update README.md	2024-08-04 23:59:57 -07:00
Daniel Han	d9e330ded7	Update README.md	2024-08-04 23:50:40 -07:00
Daniel Han	7f9c0d8c20	Update llama.py	2024-08-04 23:49:35 -07:00
emuchogu	fe4b9da764	pascal support (#870 ) Co-authored-by: Edward Muchogu <muchogu@gmail.com>	2024-08-04 23:45:51 -07:00
moontidef	8ee7c42a32	fix: fix config.torch_dtype bug (#874 ) fix the bug #404 and the bug https://github.com/hiyouga/LLaMA-Factory/issues/4698#issue-2393500878	2024-08-04 23:45:34 -07:00
Daniel Han	9283909b6d	Update pyproject.toml	2024-08-04 11:28:21 -07:00
Daniel Han	8633d860d5	Merge branch 'main' into nightly	2024-08-01 18:19:38 -07:00
Daniel Han	c069555926	Fix RoPE extension (#846 ) * bugs * Update _utils.py * flash-attn softcapping * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update mapper.py * Update README.md * Update _utils.py * Fix ROPE extension issue and device mismatch (#840) * When an exception has been assigned using as target, it is cleared at the end of the except clause.(https://docs.python.org/3/reference/compound_stmts.html#the-try-statement) * Update loader.py * round up to extend rope size * inv_freq.device changed, make sure they are on the same device --------- Co-authored-by: xiaoyang <xiaoyang@youzan.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update gemma.py --------- Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: xiaoyang <xiaoyang@youzan.com>	2024-07-31 12:10:33 -07:00
Daniel Han	9c29d37e6a	Update gemma.py	2024-07-31 12:09:33 -07:00
Daniel Han	4e3fde6539	Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly	2024-07-31 12:05:28 -07:00
XiaoYang	32735460a8	Fix ROPE extension issue and device mismatch (#840 ) * When an exception has been assigned using as target, it is cleared at the end of the except clause.(https://docs.python.org/3/reference/compound_stmts.html#the-try-statement) * Update loader.py * round up to extend rope size * inv_freq.device changed, make sure they are on the same device --------- Co-authored-by: xiaoyang <xiaoyang@youzan.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2024-07-31 12:05:08 -07:00
Daniel Han	64d8a32358	Merge branch 'main' into nightly	2024-07-31 12:05:01 -07:00
Daniel Han	2521a8b39f	Update README.md	2024-07-31 09:50:11 -07:00
Daniel Han	4e03b77673	Gemma (#843 ) * bugs * Update _utils.py * flash-attn softcapping * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update mapper.py * Update README.md * Update _utils.py	2024-07-31 08:54:58 -07:00
Daniel Han	0989eb3265	Update _utils.py	2024-07-31 08:54:22 -07:00
Daniel Han	8c58eb3901	Update README.md	2024-07-31 08:53:20 -07:00
Daniel Han	2b49611392	Update mapper.py	2024-07-30 23:51:47 -07:00
Daniel Han	5f2990df0d	Update gemma2.py	2024-07-30 23:11:35 -07:00
Daniel Han	a82d18d41b	Update gemma2.py	2024-07-30 23:11:23 -07:00
Daniel Han	5c403cd0db	Update gemma2.py	2024-07-30 23:07:47 -07:00
Daniel Han	04a9f4da74	Update gemma2.py	2024-07-30 23:04:00 -07:00
Daniel Han	282d8a5794	flash-attn softcapping	2024-07-30 22:48:53 -07:00
Daniel Han	b7bdb18552	Update _utils.py	2024-07-30 19:57:53 -07:00
Daniel Han	c0abd06a06	bugs	2024-07-30 19:56:36 -07:00
Daniel Han	9953839b92	Update llama.py	2024-07-30 10:29:54 -07:00
Daniel Han	34838216cc	Update loader.py	2024-07-30 10:18:51 -07:00
XiaoYang	fd904b4e97	fix UnboundLocalError (#834 ) * When an exception has been assigned using as target, it is cleared at the end of the except clause.(https://docs.python.org/3/reference/compound_stmts.html#the-try-statement) * Update loader.py --------- Co-authored-by: xiaoyang <xiaoyang@youzan.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2024-07-30 10:15:09 -07:00
Daniel Han	7c68bab3d9	Merge branch 'main' into nightly	2024-07-30 10:01:11 -07:00
Daniel Han	1cb2412d85	Better debugging (#826 ) * Update __init__.py * Edits * Checks * Update _utils.py * Update _utils.py * Update loader.py * Update _utils.py * Update mapper.py * Update loader.py * Update loader.py * Update _utils.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update loader.py	2024-07-28 00:10:02 -07:00
Daniel Han	7a096afec9	Update loader.py	2024-07-27 23:04:23 -07:00
Daniel Han	2b63faeff7	Update mapper.py	2024-07-27 23:03:29 -07:00
Daniel Han	8788703ae3	Update loader.py	2024-07-27 23:03:13 -07:00
Daniel Han	f679e03a06	Update loader.py	2024-07-27 23:02:10 -07:00
Daniel Han	f1f1f7a8c9	Update loader.py	2024-07-27 23:00:47 -07:00
Daniel Han	6195a308af	Update loader.py	2024-07-27 22:59:47 -07:00
Daniel Han	961e78fb5e	Update loader.py	2024-07-27 22:59:16 -07:00
Daniel Han	1bbf268de1	Update loader.py	2024-07-27 22:58:15 -07:00
Daniel Han	fc5d565e35	Update _utils.py	2024-07-27 22:56:59 -07:00
Daniel Han	e34d92635d	Update loader.py	2024-07-27 22:54:49 -07:00
Daniel Han	25025801c0	Update loader.py	2024-07-27 22:52:06 -07:00
Daniel Han	031f552743	Update mapper.py	2024-07-27 22:50:38 -07:00
Daniel Han	971ab6485a	Update _utils.py	2024-07-27 22:48:52 -07:00
Daniel Han	28b3c211e4	Update loader.py	2024-07-27 22:35:24 -07:00
Daniel Han	61aba43554	Update _utils.py	2024-07-27 22:32:39 -07:00
Daniel Han	c53b4394dc	Update _utils.py	2024-07-27 22:30:48 -07:00
Daniel Han	e6159e0279	Checks	2024-07-27 22:16:33 -07:00
Daniel Han	d28527ce62	Edits	2024-07-27 20:30:32 -07:00
Daniel Han	e0748c93dd	Update __init__.py	2024-07-26 16:31:40 -07:00
Daniel Han	9852fdb642	Update llama.py	2024-07-25 08:53:21 -07:00
Daniel Han	393549fa48	Update _utils.py	2024-07-25 00:33:38 -07:00
Daniel Han	1ab912d528	Update _utils.py	2024-07-25 00:29:12 -07:00
Daniel Han	dc39485ec3	Update loader.py	2024-07-25 00:28:20 -07:00
Daniel Han	ef42e61d68	Update llama.py	2024-07-25 00:19:32 -07:00
Daniel Han	f20bc23e84	Fix PEFT	2024-07-25 00:17:19 -07:00
Daniel Han	fcf92c54ca	Patch PEFT	2024-07-24 23:45:39 -07:00
Daniel Han	e9ab1ad5bc	Mistral	2024-07-24 14:05:31 -07:00
Daniel Han	41bad372af	Merge branch 'main' into nightly	2024-07-24 12:37:12 -07:00
Daniel Han	27b23a9bd4	Update README.md	2024-07-23 15:08:09 -07:00
Daniel Han	8ca886825c	Create Run.png	2024-07-23 13:14:21 -07:00
Daniel Han	2217e8d86c	Update llama.py	2024-07-23 12:28:12 -07:00
Daniel Han	92b3752cad	Update llama.py	2024-07-23 12:27:46 -07:00
Daniel Han	affa585b3f	Update _utils.py	2024-07-23 12:25:24 -07:00
Daniel Han	da93a2237a	Update loader.py	2024-07-23 12:12:29 -07:00
Daniel Han	dd781e0c60	Update README.md	2024-07-23 12:07:27 -07:00
Daniel Han	faa36e853a	Update README.md	2024-07-23 11:51:08 -07:00
Daniel Han	56cbd06f1f	Llama 3.1 (#797 ) * Llama 3.1 * Update _utils.py * Llama 3.1 * Update _utils.py * Update llama.py * Update llama.py * hack for rotary * patch RoPE * refix rope * Update _utils.py * Update llama.py * Llama 3.1 check * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py	2024-07-23 11:40:49 -07:00
Daniel Han	9d415882bf	Update llama.py	2024-07-23 11:24:58 -07:00
Daniel Han	eb03e4111a	Update llama.py	2024-07-23 11:23:18 -07:00
Daniel Han	7fb8015c88	Update llama.py	2024-07-23 11:23:00 -07:00
Daniel Han	565d2ce6f1	Update llama.py	2024-07-23 11:22:27 -07:00
Daniel Han	a654779617	Update llama.py	2024-07-23 11:21:29 -07:00
Daniel Han	fdce7bff90	Update llama.py	2024-07-23 11:18:12 -07:00
Daniel Han	04096dcff8	Update llama.py	2024-07-23 11:16:40 -07:00
Daniel Han	e89ab65b02	Update llama.py	2024-07-23 11:16:31 -07:00
Daniel Han	122025ed58	Update llama.py	2024-07-23 11:15:35 -07:00
Daniel Han	3b266ebc7b	Update llama.py	2024-07-23 11:13:15 -07:00
Daniel Han	e2ef589460	Update llama.py	2024-07-23 11:12:58 -07:00
Daniel Han	a2403852b4	Llama 3.1 check	2024-07-23 11:09:24 -07:00
Daniel Han	a9d6c731ee	Update llama.py	2024-07-23 10:58:31 -07:00
Daniel Han	4eca9215a2	Update _utils.py	2024-07-23 10:54:54 -07:00
Daniel Han	c4dc08309e	refix rope	2024-07-23 10:53:31 -07:00
Daniel Han	f285c33046	patch RoPE	2024-07-23 10:48:45 -07:00
Daniel Han	d587ce218d	hack for rotary	2024-07-23 10:43:36 -07:00
Daniel Han	c17e8ca33d	Update llama.py	2024-07-23 10:36:06 -07:00
Daniel Han	9a18fee63f	Update llama.py	2024-07-23 10:35:03 -07:00
Daniel Han	19cf853157	Update _utils.py	2024-07-23 10:33:07 -07:00
Daniel Han	5efbd701ba	Llama 3.1	2024-07-23 10:27:36 -07:00
Daniel Han	daa4d13564	Update _utils.py	2024-07-22 23:01:18 -07:00
Daniel Han	eda2343056	Llama 3.1	2024-07-22 22:58:02 -07:00
Daniel Han	0690914c62	Update tokenizer_utils.py	2024-07-20 13:25:59 -07:00
Daniel Han	71c4aed1be	Update tokenizer_utils.py	2024-07-20 13:22:36 -07:00
Daniel Han	50c51e6ec1	Update llama.py	2024-07-20 12:47:36 -07:00
Daniel Han	07228828a0	Update llama.py	2024-07-20 11:53:32 -07:00
Daniel Han	11dcf38761	Merge branch 'main' into nightly	2024-07-20 09:47:22 -07:00
Daniel Han	32466f7bc4	Update mistral.py	2024-07-19 09:32:27 -07:00
Daniel Han	bffc936663	Fix Gemma	2024-07-19 09:27:18 -07:00
Daniel Han	256b55fcdd	Update README.md	2024-07-19 03:05:15 -07:00
Daniel Han	b8e6560b8d	Update README.md	2024-07-19 03:03:50 -07:00
Daniel Han	100ac9c052	Nightly (#784 ) * Update __init__.py * dynamic RoPE * Update mistral.py * Update llama.py * Update tokenizer_utils.py * Update mistral.py * Update llama.py * Update __init__.py * Update flex_attention.py * Update llama.py * Update llama.py * Mistral Nemo * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py	2024-07-19 01:39:08 -07:00
Daniel Han	e6002b1b32	Merge branch 'main' into nightly	2024-07-19 01:38:47 -07:00
Daniel Han	8ae30938d3	Update tokenizer_utils.py	2024-07-19 01:29:52 -07:00
Daniel Han	f6d47c99df	Nightly (#783 ) * Update __init__.py * dynamic RoPE * Update mistral.py * Update llama.py * Update tokenizer_utils.py * Update mistral.py * Update llama.py * Update __init__.py * Update flex_attention.py * Update llama.py * Update llama.py * Mistral Nemo * Update tokenizer_utils.py * Update tokenizer_utils.py	2024-07-19 01:24:46 -07:00
Daniel Han	f302c074b7	Update tokenizer_utils.py	2024-07-19 01:06:41 -07:00
Daniel Han	14144ad6fc	Update tokenizer_utils.py	2024-07-19 01:03:51 -07:00
Daniel Han	881ee0ed37	Merge branch 'main' into nightly	2024-07-19 00:59:48 -07:00
Daniel Han	8783596962	Update tokenizer_utils.py	2024-07-19 00:41:35 -07:00
Daniel Han	47e08076e6	Mistral Nemo (#782 ) * Update __init__.py * dynamic RoPE * Update mistral.py * Update llama.py * Update tokenizer_utils.py * Update mistral.py * Update llama.py * Update __init__.py * Update flex_attention.py * Update llama.py * Update llama.py * Mistral Nemo	2024-07-19 00:14:24 -07:00
Daniel Han	2f9556c428	Mistral Nemo	2024-07-18 22:53:23 -07:00
Daniel Han	187157f548	Update llama.py	2024-07-18 22:07:23 -07:00
Daniel Han	ebbbf6be52	Update llama.py	2024-07-18 21:57:33 -07:00
Daniel Han	e1dc32c2a6	Merge branch 'main' into nightly	2024-07-18 21:55:10 -07:00
Daniel Han	742a7629c2	Fix bugs (#779 ) * Update __init__.py * dynamic RoPE * Update mistral.py * Update llama.py * Update tokenizer_utils.py * Update mistral.py * Update llama.py * Update __init__.py * Update flex_attention.py	2024-07-18 18:19:24 -07:00
Daniel Han	0da004c70e	Update flex_attention.py	2024-07-18 18:18:09 -07:00
Daniel Han	d4fa9a0cdf	Update __init__.py	2024-07-18 14:43:03 -07:00
Daniel Han	765a7a9330	Update llama.py	2024-07-18 14:31:35 -07:00
Daniel Han	125b3727ff	Update mistral.py	2024-07-18 13:33:13 -07:00
Daniel Han	fcac73786c	Update tokenizer_utils.py	2024-07-18 13:25:30 -07:00
Daniel Han	1144bbb15c	Update llama.py	2024-07-18 12:33:22 -07:00
Daniel Han	72d9e5f5a0	Update mistral.py	2024-07-18 12:08:49 -07:00
Daniel Han	54dd81de67	dynamic RoPE	2024-07-18 12:06:38 -07:00
Daniel Han	5dc52e6b2e	Update __init__.py	2024-07-18 11:07:32 -07:00
Daniel Han	66ce2d401a	Update pyproject.toml	2024-07-18 10:59:09 -07:00
Daniel Han	1a7c3e1b3c	Update __init__.py	2024-07-18 10:58:12 -07:00
Daniel Han	6a437e43f5	Mistral Nemo 12b (#777 ) * Update gemma2.py * Update llama.py * Update llama.py * Update gemma2.py * init * Update gemma2.py * Update gemma2.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * All RoPE Scaling support * cleanup * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * exec * exec * Attention_Module * attention_module * imports * exec * Update llama.py * Update llama.py * boolean mask * revert masking * Update llama.py * Update save.py * Update llama.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update utils.py * retry * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update _utils.py * Update _utils.py * Update gemma2.py * Update chat_templates.py * Gemma 2 Ollama support * Update llama.py * Update llama.py * error handling * Update _utils.py * Update _utils.py * Stats for debugging * Update _utils.py * Update _utils.py * Debugging * Update tokenizer_utils.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Check exec, eval * Update _utils.py * Update _utils.py * Images * Bug fixes * Update pyproject.toml * Bug fixes * Update _utils.py * Update _utils.py * Deprecation fix * Update chat_templates.py * Now permitting use of pre-installed llama.cpp (#763) * Now permitting use of pre-installed llama.cpp * Update save.py --------- Co-authored-by: Giuseppe Strafforello <giuseppe.strafforello@titantechnologies.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Deprecation & compile * typo * Update chat_templates.py * Update chat_templates.py * train_on_responses_only * Update llama.py * Update llama.py * Update save.py * Update gemma2.py * Flex Attention * typos * Update _utils.py * Update llama.py * Update __init__.py * Update flex_attention.py * Update llama.py * Update llama.py * emulation * Update __init__.py * Update rope_embedding.py * Update flex_attention.py * Update flex_attention.py * Update rope_embedding.py * libdevice * triton_tanh * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * score * Update llama.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update llama.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * Flex Attention removal * upload tensorboard training stats to hub if available (#773) * causal_mask * Update llama.py * Update llama.py * Update flex_attention.py * Update _utils.py * Update mapper.py * Update _utils.py --------- Co-authored-by: pepistrafforello <pepi.strafforello@gmail.com> Co-authored-by: Giuseppe Strafforello <giuseppe.strafforello@titantechnologies.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>	2024-07-18 10:51:10 -07:00
Daniel Han	fa893e7d67	Chat templates	2024-07-15 14:36:44 -07:00
Daniel Han	ca6c3dcc99	Train on responses only (#770 ) * Update gemma2.py * Update llama.py * Update llama.py * Update gemma2.py * init * Update gemma2.py * Update gemma2.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * All RoPE Scaling support * cleanup * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * exec * exec * Attention_Module * attention_module * imports * exec * Update llama.py * Update llama.py * boolean mask * revert masking * Update llama.py * Update save.py * Update llama.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update utils.py * retry * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update _utils.py * Update _utils.py * Update gemma2.py * Update chat_templates.py * Gemma 2 Ollama support * Update llama.py * Update llama.py * error handling * Update _utils.py * Update _utils.py * Stats for debugging * Update _utils.py * Update _utils.py * Debugging * Update tokenizer_utils.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Check exec, eval * Update _utils.py * Update _utils.py * Images * Bug fixes * Update pyproject.toml * Bug fixes * Update _utils.py * Update _utils.py * Deprecation fix * Update chat_templates.py * Now permitting use of pre-installed llama.cpp (#763) * Now permitting use of pre-installed llama.cpp * Update save.py --------- Co-authored-by: Giuseppe Strafforello <giuseppe.strafforello@titantechnologies.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Deprecation & compile * typo * Update chat_templates.py * Update chat_templates.py * train_on_responses_only * Update llama.py * Update llama.py * Update save.py * Update gemma2.py --------- Co-authored-by: pepistrafforello <pepi.strafforello@gmail.com> Co-authored-by: Giuseppe Strafforello <giuseppe.strafforello@titantechnologies.com>	2024-07-14 22:41:04 -07:00
Daniel Han	f176cbd36a	Many bug fixes (#754 ) * Update gemma2.py * Update llama.py * Update llama.py * Update gemma2.py * init * Update gemma2.py * Update gemma2.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * All RoPE Scaling support * cleanup * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * exec * exec * Attention_Module * attention_module * imports * exec * Update llama.py * Update llama.py * boolean mask * revert masking * Update llama.py * Update save.py * Update llama.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update utils.py * retry * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update _utils.py * Update _utils.py * Update gemma2.py * Update chat_templates.py * Gemma 2 Ollama support * Update llama.py * Update llama.py * error handling * Update _utils.py * Update _utils.py * Stats for debugging * Update _utils.py * Update _utils.py * Debugging * Update tokenizer_utils.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Check exec, eval * Update _utils.py * Update _utils.py * Images * Bug fixes * Update pyproject.toml * Bug fixes * Update _utils.py * Update _utils.py	2024-07-10 01:59:06 -07:00
Daniel Han	316aaefdf2	Update llama.py	2024-07-08 10:44:19 -07:00
Daniel Han	2eb950872a	Update llama.py	2024-07-08 10:38:49 -07:00
Daniel Han	1f1211fbd6	Update _utils.py	2024-07-08 10:01:20 -07:00
Daniel Han	55bf35be5d	Update llama.py	2024-07-07 15:46:36 -07:00
Daniel Han	fcc2833767	Nightly (#744 ) * Update gemma2.py * Update llama.py * Update llama.py * Update gemma2.py * init * Update gemma2.py * Update gemma2.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * All RoPE Scaling support * cleanup * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * exec * exec * Attention_Module * attention_module * imports * exec * Update llama.py * Update llama.py * boolean mask * revert masking * Update llama.py * Update save.py * Update llama.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update utils.py * retry * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update _utils.py * Update _utils.py * Update gemma2.py * Update chat_templates.py * Gemma 2 Ollama support * Update llama.py * Update llama.py * error handling * Update _utils.py * Update _utils.py * Stats for debugging * Update _utils.py * Update _utils.py * Debugging * Update tokenizer_utils.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Check exec, eval * Update _utils.py * Update _utils.py	2024-07-07 10:22:59 -07:00
Daniel Han	5813e85bb9	Merge branch 'main' of https://github.com/unslothai/unsloth	2024-07-07 09:49:55 -07:00
Daniel Han	05393cc85b	Update _utils.py	2024-07-07 09:49:45 -07:00
Daniel Han	775fb647d5	Fix exec, eval (#743 ) * Update gemma2.py * Update llama.py * Update llama.py * Update gemma2.py * init * Update gemma2.py * Update gemma2.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * All RoPE Scaling support * cleanup * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * exec * exec * Attention_Module * attention_module * imports * exec * Update llama.py * Update llama.py * boolean mask * revert masking * Update llama.py * Update save.py * Update llama.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update utils.py * retry * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update _utils.py * Update _utils.py * Update gemma2.py * Update chat_templates.py * Gemma 2 Ollama support * Update llama.py * Update llama.py * error handling * Update _utils.py * Update _utils.py * Stats for debugging * Update _utils.py * Update _utils.py * Debugging * Update tokenizer_utils.py * Update _utils.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Check exec, eval	2024-07-07 09:33:01 -07:00
Daniel Han	de99c84625	Update llama.py	2024-07-06 23:59:03 -07:00
Daniel Han	75df21a314	Debugging (#739 ) * Update gemma2.py * Update llama.py * Update llama.py * Update gemma2.py * init * Update gemma2.py * Update gemma2.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * All RoPE Scaling support * cleanup * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * exec * exec * Attention_Module * attention_module * imports * exec * Update llama.py * Update llama.py * boolean mask * revert masking * Update llama.py * Update save.py * Update llama.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update utils.py * retry * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update _utils.py * Update _utils.py * Update gemma2.py * Update chat_templates.py * Gemma 2 Ollama support * Update llama.py * Update llama.py * error handling * Update _utils.py * Update _utils.py * Stats for debugging * Update _utils.py * Update _utils.py * Debugging * Update tokenizer_utils.py * Update _utils.py	2024-07-06 18:50:00 -07:00
Daniel Han	86c5675a67	Gemma 2 bug fixes + All RoPE Scaling Support (#736 ) * Update gemma2.py * Update llama.py * Update llama.py * Update gemma2.py * init * Update gemma2.py * Update gemma2.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * All RoPE Scaling support * cleanup * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * exec * exec * Attention_Module * attention_module * imports * exec * Update llama.py * Update llama.py * boolean mask * revert masking * Update llama.py * Update save.py * Update llama.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update utils.py * retry * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update _utils.py * Update _utils.py * Update gemma2.py * Update chat_templates.py * Gemma 2 Ollama support * Update llama.py * Update llama.py	2024-07-05 23:48:42 -07:00
Daniel Han	c1009008e3	Fix GGUF (#731 ) * Update mapper.py * Update Model Conversion Command in `save.py` to `convert_hf_to_gguf.py` (#730) * Updated convert_hf_to_gguf.py call to align with changes in llama.cpp repository * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Typo Fix (#690) --------- Co-authored-by: M. Ali Bayram <malibayram91@gmail.com> Co-authored-by: johnpaulbin <johnpaulbin@gmail.com>	2024-07-04 13:26:57 -07:00
Daniel Han	2510a4abc4	Gemma2 (#723 ) * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md --------- Co-authored-by: Michael <107991372+shimmyshimmer@users.noreply.github.com>	2024-07-03 12:12:21 -07:00
Daniel Han	cc4c5d7785	Gemma2 (#709 ) * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py * Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (#651) * Nightly (#649) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> * Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving * Implemented better list management and then forgot to actually call the new list variable, fixed * Check type of given quantization method and return type error if not list or string * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> * Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (#652) This reverts commit 506cb68867296237e95bc53c32f1bfc9b1757960. * Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (#653) This reverts commit 2f48cc9af385579876fd45bd833169d1f1a2ea58. * Update llama.py * peft * patch * Update loader.py * retrain * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * offload * Update llama.py * Create a starter script for command-line training to integrate in ML ops pipelines. (#623) * Update chat_templates.py * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Ollama * Update chat_templates.py * ollama * Update mapper.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Fixes * clearer messages * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update llama.py * Update llama.py * log * Update __init__.py * Update llama.py * Update __init__.py * Create Merge.png * Create ollama.png * Gemma2 * Update llama.py * Update loader.py * Update pyproject.toml * Update pyproject.toml * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Revert Gemma2 * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update rms_layernorm.py * Update gemma2.py * logit softcapping * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update gemma2.py * Update gemma2.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update llama.py * Update gemma2.py * Update llama.py * Update llama.py * Update gemma2.py * Update gemma2.py * Update llama.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update gemma2.py * Update _utils.py * Update _utils.py * Update gemma2.py * compile flags * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update gemma2.py * Update gemma2.py * fixes * Update _utils.py * Fix generation * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * pad token * Update gemma2.py * pad token * Update _utils.py * Update llama.py * Update gemma2.py * edit warning * Update tokenizer_utils.py --------- Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> Co-authored-by: ArcadaLabs-Jason <52756218+ArcadaLabs-Jason@users.noreply.github.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2024-07-02 22:51:01 -07:00
Daniel Han	cfddc79bc8	Nightly (#676 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py * Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (#651) * Nightly (#649) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> * Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving * Implemented better list management and then forgot to actually call the new list variable, fixed * Check type of given quantization method and return type error if not list or string * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> * Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (#652) This reverts commit 506cb68867296237e95bc53c32f1bfc9b1757960. * Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (#653) This reverts commit 2f48cc9af385579876fd45bd833169d1f1a2ea58. * Update llama.py * peft * patch * Update loader.py * retrain * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * offload * Update llama.py * Create a starter script for command-line training to integrate in ML ops pipelines. (#623) * Update chat_templates.py * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Ollama * Update chat_templates.py * ollama * Update mapper.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Fixes * clearer messages * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update llama.py * Update llama.py * log * Update __init__.py * Update llama.py * Update __init__.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> Co-authored-by: ArcadaLabs-Jason <52756218+ArcadaLabs-Jason@users.noreply.github.com>	2024-06-21 15:32:26 +10:00
Daniel Han	1508654836	Nightly (#673 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py * Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (#651) * Nightly (#649) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> * Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving * Implemented better list management and then forgot to actually call the new list variable, fixed * Check type of given quantization method and return type error if not list or string * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> * Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (#652) This reverts commit 506cb68867296237e95bc53c32f1bfc9b1757960. * Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (#653) This reverts commit 2f48cc9af385579876fd45bd833169d1f1a2ea58. * Update llama.py * peft * patch * Update loader.py * retrain * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * offload * Update llama.py * Create a starter script for command-line training to integrate in ML ops pipelines. (#623) * Update chat_templates.py * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Ollama * Update chat_templates.py * ollama * Update mapper.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Fixes --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> Co-authored-by: ArcadaLabs-Jason <52756218+ArcadaLabs-Jason@users.noreply.github.com>	2024-06-21 00:28:52 +10:00
Daniel Han	c2066592aa	Ollama (#671 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py * Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (#651) * Nightly (#649) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> * Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving * Implemented better list management and then forgot to actually call the new list variable, fixed * Check type of given quantization method and return type error if not list or string * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> * Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (#652) This reverts commit 506cb68867296237e95bc53c32f1bfc9b1757960. * Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (#653) This reverts commit 2f48cc9af385579876fd45bd833169d1f1a2ea58. * Update llama.py * peft * patch * Update loader.py * retrain * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * offload * Update llama.py * Create a starter script for command-line training to integrate in ML ops pipelines. (#623) * Update chat_templates.py * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Ollama * Update chat_templates.py * ollama * Update mapper.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> Co-authored-by: ArcadaLabs-Jason <52756218+ArcadaLabs-Jason@users.noreply.github.com>	2024-06-20 22:28:28 +10:00
Daniel Han-Chen	08d22b8853	Update chat_templates.py	2024-06-20 19:49:38 +10:00
Daniel Han-Chen	55a8016ae6	Update chat_templates.py	2024-06-20 19:45:02 +10:00
Daniel Han	0e6b31dd84	Ollama bug fixes (#667 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py * Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (#651) * Nightly (#649) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> * Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving * Implemented better list management and then forgot to actually call the new list variable, fixed * Check type of given quantization method and return type error if not list or string * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> * Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (#652) This reverts commit 506cb68867296237e95bc53c32f1bfc9b1757960. * Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (#653) This reverts commit 2f48cc9af385579876fd45bd833169d1f1a2ea58. * Update llama.py * peft * patch * Update loader.py * retrain * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * offload * Update llama.py * Create a starter script for command-line training to integrate in ML ops pipelines. (#623) * Update chat_templates.py * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Ollama * Update chat_templates.py * ollama * Update mapper.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> Co-authored-by: ArcadaLabs-Jason <52756218+ArcadaLabs-Jason@users.noreply.github.com>	2024-06-20 04:55:13 +10:00
Daniel Han	9a7f3baa15	Ollama (#665 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py * Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (#651) * Nightly (#649) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> * Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving * Implemented better list management and then forgot to actually call the new list variable, fixed * Check type of given quantization method and return type error if not list or string * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> * Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (#652) This reverts commit 506cb68867296237e95bc53c32f1bfc9b1757960. * Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (#653) This reverts commit 2f48cc9af385579876fd45bd833169d1f1a2ea58. * Update llama.py * peft * patch * Update loader.py * retrain * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * offload * Update llama.py * Create a starter script for command-line training to integrate in ML ops pipelines. (#623) * Update chat_templates.py * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> Co-authored-by: ArcadaLabs-Jason <52756218+ArcadaLabs-Jason@users.noreply.github.com>	2024-06-19 04:53:26 +10:00
Daniel Han	34f65c1eaf	Fix continuing LoRA finetuning (#656 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py * Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (#651) * Nightly (#649) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> * Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving * Implemented better list management and then forgot to actually call the new list variable, fixed * Check type of given quantization method and return type error if not list or string * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> * Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (#652) This reverts commit 506cb68867296237e95bc53c32f1bfc9b1757960. * Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (#653) This reverts commit 2f48cc9af385579876fd45bd833169d1f1a2ea58. * Update llama.py * peft * patch * Update loader.py * retrain * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> Co-authored-by: ArcadaLabs-Jason <52756218+ArcadaLabs-Jason@users.noreply.github.com>	2024-06-17 00:39:20 +10:00
Daniel Han	12d294bcb3	Fix GGUF (#654 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py * Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (#651) * Nightly (#649) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> * Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving * Implemented better list management and then forgot to actually call the new list variable, fixed * Check type of given quantization method and return type error if not list or string * Update save.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> * Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (#652) This reverts commit 506cb68867296237e95bc53c32f1bfc9b1757960. * Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (#653) This reverts commit 2f48cc9af385579876fd45bd833169d1f1a2ea58. --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com> Co-authored-by: ArcadaLabs-Jason <52756218+ArcadaLabs-Jason@users.noreply.github.com>	2024-06-16 14:51:58 +10:00
Daniel Han	0df0509c28	Nightly (#649 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com>	2024-06-16 04:32:21 +10:00
Daniel Han	ff6fee6785	Nightly (#648 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com>	2024-06-16 03:39:00 +10:00
Daniel Han	7be0f03eb4	Nightly (#646 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com>	2024-06-15 18:26:25 +10:00
Daniel Han	659889c5bc	Fix segfaults (#641 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com>	2024-06-15 00:52:33 +10:00
Daniel Han	a3fb597fe1	Qwen bug fixes (#639 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de> Co-authored-by: Walter Korman <lemurware@gmail.com>	2024-06-14 20:59:45 +10:00
Daniel Han-Chen	dee170293f	Update __init__.py	2024-06-14 16:00:26 +10:00
Daniel Han-Chen	a2ea54f62e	Update tokenizer_utils.py	2024-06-14 03:24:28 +10:00
Daniel Han	c33afda563	Nightly (#632 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com> Co-authored-by: Rickard Edén <rickardeden@gmail.com> Co-authored-by: XiaoYang <xyangk@gmail.com> Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com> Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>	2024-06-14 02:58:08 +10:00
Daniel Han	be0bba4fc8	Ollama Chat Templates (#582 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2024-06-13 05:04:54 +10:00
Daniel Han-Chen	dff5c4271b	Update llama.py	2024-06-07 04:54:57 +10:00
Daniel Han-Chen	85fac9e038	Update llama.py	2024-06-07 04:25:33 +10:00
Daniel Han-Chen	a89f888dc9	Update utils.py	2024-06-07 03:47:44 +10:00
Daniel Han-Chen	5774e75a1e	Qwen2	2024-06-07 02:53:17 +10:00
Daniel Han-Chen	0451f82140	Update pyproject.toml	2024-06-06 01:22:29 +10:00
Daniel Han-Chen	bddf7fd9e9	Update llama.py	2024-06-05 20:57:08 +10:00
Daniel Han-Chen	86bb9f50fb	Update README.md	2024-06-05 06:15:27 +10:00
Daniel Han-Chen	a8ad3cef9f	Update README.md	2024-06-05 06:14:11 +10:00
Daniel Han	669552b4bf	Fix #563 (#564 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2024-05-31 00:41:44 +10:00
Daniel Han	b2d09de4d4	Fix Phi-3 (#556 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2024-05-29 14:30:26 +10:00
Daniel Han	5cff582ccf	Nightly (#548 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2024-05-29 00:30:38 +10:00
Daniel Han-Chen	426949c3a8	Update tokenizer_utils.py	2024-05-24 11:21:29 +10:00
Z	ef3d513e76	Update _utils.py (#520 ) Fixed a typo in the tokenizer fixer.	2024-05-24 11:10:06 +10:00
Daniel Han	7486340721	Phi-3, Llama-3 bug fixes (#519 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3	2024-05-24 06:36:10 +10:00
Daniel Han	bf4a1aefa3	Phi 3 Medium (#518 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py	2024-05-24 04:24:01 +10:00
Daniel Han-Chen	87176de87b	Update README.md	2024-05-23 04:31:26 +10:00
Daniel Han	b90d9c42c6	Mistral v3 (#514 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3	2024-05-23 04:15:02 +10:00
Daniel Han	289b7fcca5	Fix `is_bfloat16_supported` missing (#510 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py	2024-05-22 20:40:57 +10:00
Daniel Han	72b19da6bd	Nightly (#506 ) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py	2024-05-22 04:45:57 +10:00
Daniel Han	dde6a0f0d3	Nightly (#483 ) * peft issue * Update save.py * Update __init__.py * Update pyproject.toml	2024-05-17 23:46:33 +10:00
Daniel Han-Chen	617804de0b	Squashed commit of the following: commit 23bd794c246e9c90c453c9f2ab41a21ac1e41b9d Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri May 17 14:16:39 2024 +1000 Update save.py commit c12cc6c2b13333c4c6709e0ee88665b08c672887 Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri May 17 04:14:05 2024 +1000 peft issue	2024-05-17 14:18:52 +10:00
Daniel Han	63e175a77d	peft issue (#480 )	2024-05-17 04:18:18 +10:00
Daniel Han	995dbe5043	Fix generation (#472 ) * Fix prompt * Update chat_templates.py * fix_untrained_tokens * Update llama.py * add tokens * Update _utils.py * Update tokenizer_utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * pad_token * Update chat_templates.py * Update chat_templates.py * tokenizer * Update save.py * Update chat_templates.py * Update chat_templates.py * patch tokenizer padding * Update tokenizer_utils.py * Update save.py * Fix: loading models with resized vocabulary (#377) * new: vocab resize on load * new: gitignore * GGUF fix * Readme (#390) * Update README.md * Update README.md --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Update README.md * Delete .gitignore * Phi-3 * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Fix reserved tokens * Update save.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update chat_templates.py * Update save.py * Update _utils.py * Update chat_templates.py * Adds dependencies and extras for torch 2.3.0 with new xformers versions (#415) * Adds dependencies and extras for torch 2.3.0 with new xformers versions * Add 2.3.0 section to readme * Support Qwen2 (#428) * support Qwen2 * support Qwen2 * Delete README.md * Revert "Delete README.md" This reverts commit 9dde82c35d446393946c3497ad5cf96a2b59197e. * Update README.md * Qwen2 == Mistral * Update llama.py * Update __init__.py * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update save.py * test_hf_gguf_equivalence * Update chat_templates.py * Update chat_templates.py * --pad-vocab * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Unspecified max_seq_length * possible_pad_token * Update tokenizer_utils.py * past_key_values * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * _wrap_fast_inference * Update llama.py * Update llama.py * flag --------- Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Nathan Azrak <42650258+nathan-az@users.noreply.github.com> Co-authored-by: Yang JianXin <995462226@qq.com>	2024-05-16 15:09:42 +10:00
Daniel Han	3329eb6a2c	Nightly (#461 ) * Fix prompt * Update chat_templates.py * fix_untrained_tokens * Update llama.py * add tokens * Update _utils.py * Update tokenizer_utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * pad_token * Update chat_templates.py * Update chat_templates.py * tokenizer * Update save.py * Update chat_templates.py * Update chat_templates.py * patch tokenizer padding * Update tokenizer_utils.py * Update save.py * Fix: loading models with resized vocabulary (#377) * new: vocab resize on load * new: gitignore * GGUF fix * Readme (#390) * Update README.md * Update README.md --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Update README.md * Delete .gitignore * Phi-3 * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Fix reserved tokens * Update save.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update chat_templates.py * Update save.py * Update _utils.py * Update chat_templates.py * Adds dependencies and extras for torch 2.3.0 with new xformers versions (#415) * Adds dependencies and extras for torch 2.3.0 with new xformers versions * Add 2.3.0 section to readme * Support Qwen2 (#428) * support Qwen2 * support Qwen2 * Delete README.md * Revert "Delete README.md" This reverts commit 9dde82c35d446393946c3497ad5cf96a2b59197e. * Update README.md * Qwen2 == Mistral * Update llama.py * Update __init__.py * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update save.py * test_hf_gguf_equivalence * Update chat_templates.py * Update chat_templates.py * --pad-vocab * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Unspecified max_seq_length * possible_pad_token * Update tokenizer_utils.py --------- Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Nathan Azrak <42650258+nathan-az@users.noreply.github.com> Co-authored-by: Yang JianXin <995462226@qq.com>	2024-05-14 04:51:23 +10:00
Daniel Han	9b4ed21ade	May 2024 Prelim (#447 ) * Fix prompt * Update chat_templates.py * fix_untrained_tokens * Update llama.py * add tokens * Update _utils.py * Update tokenizer_utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * pad_token * Update chat_templates.py * Update chat_templates.py * tokenizer * Update save.py * Update chat_templates.py * Update chat_templates.py * patch tokenizer padding * Update tokenizer_utils.py * Update save.py * Fix: loading models with resized vocabulary (#377) * new: vocab resize on load * new: gitignore * GGUF fix * Readme (#390) * Update README.md * Update README.md --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Update README.md * Delete .gitignore * Phi-3 * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Fix reserved tokens * Update save.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update chat_templates.py * Update save.py * Update _utils.py * Update chat_templates.py * Adds dependencies and extras for torch 2.3.0 with new xformers versions (#415) * Adds dependencies and extras for torch 2.3.0 with new xformers versions * Add 2.3.0 section to readme * Support Qwen2 (#428) * support Qwen2 * support Qwen2 * Delete README.md * Revert "Delete README.md" This reverts commit 9dde82c35d446393946c3497ad5cf96a2b59197e. * Update README.md * Qwen2 == Mistral * Update llama.py * Update __init__.py * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update save.py * test_hf_gguf_equivalence * Update chat_templates.py * Update chat_templates.py * --pad-vocab * Update tokenizer_utils.py --------- Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Nathan Azrak <42650258+nathan-az@users.noreply.github.com> Co-authored-by: Yang JianXin <995462226@qq.com>	2024-05-13 05:22:03 +10:00
Daniel Han	0a433c33ca	llama-3 bug fixes (#429 ) * Fix prompt * Update chat_templates.py * fix_untrained_tokens * Update llama.py * add tokens * Update _utils.py * Update tokenizer_utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * pad_token * Update chat_templates.py * Update chat_templates.py * tokenizer * Update save.py * Update chat_templates.py * Update chat_templates.py * patch tokenizer padding * Update tokenizer_utils.py * Update save.py * Fix: loading models with resized vocabulary (#377) * new: vocab resize on load * new: gitignore * GGUF fix * Readme (#390) * Update README.md * Update README.md --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Update README.md * Delete .gitignore * Phi-3 * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Fix reserved tokens * Update save.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update chat_templates.py * Update save.py * Update _utils.py * Update chat_templates.py --------- Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2024-05-08 07:40:41 +10:00
Daniel Han-Chen	073a987a91	Update save.py	2024-05-05 13:28:21 +10:00
Daniel Han	533f4ba136	Fix llama-3 (#423 ) * Fix prompt * Update chat_templates.py * fix_untrained_tokens * Update llama.py * add tokens * Update _utils.py * Update tokenizer_utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * pad_token * Update chat_templates.py * Update chat_templates.py * tokenizer * Update save.py * Update chat_templates.py * Update chat_templates.py * patch tokenizer padding * Update tokenizer_utils.py * Update save.py * Fix: loading models with resized vocabulary (#377) * new: vocab resize on load * new: gitignore * GGUF fix * Readme (#390) * Update README.md * Update README.md --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Update README.md * Delete .gitignore * Phi-3 * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Fix reserved tokens * Update save.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py --------- Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2024-05-05 05:45:01 +10:00
Daniel Han-Chen	d4e23b5d86	Update README.md	2024-04-30 20:26:10 +10:00
Daniel Han	a0d8184a0e	Phi-3 (#397 ) * Fix prompt * Update chat_templates.py * fix_untrained_tokens * Update llama.py * add tokens * Update _utils.py * Update tokenizer_utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * pad_token * Update chat_templates.py * Update chat_templates.py * tokenizer * Update save.py * Update chat_templates.py * Update chat_templates.py * patch tokenizer padding * Update tokenizer_utils.py * Update save.py * Fix: loading models with resized vocabulary (#377) * new: vocab resize on load * new: gitignore * GGUF fix * Readme (#390) * Update README.md * Update README.md --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Update README.md * Delete .gitignore * Phi-3 * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md --------- Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2024-04-30 05:59:02 +10:00
Daniel Han-Chen	308ed4d15d	Update save.py	2024-04-29 17:55:04 +10:00
Daniel Han	838ecde97a	Nightly (#370 ) * Fix prompt * Update chat_templates.py * fix_untrained_tokens * Update llama.py * add tokens * Update _utils.py * Update tokenizer_utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * pad_token * Update chat_templates.py * Update chat_templates.py * tokenizer * Update save.py * Update chat_templates.py * Update chat_templates.py * patch tokenizer padding * Update tokenizer_utils.py * Update save.py * Fix: loading models with resized vocabulary (#377) * new: vocab resize on load * new: gitignore * GGUF fix * Readme (#390) * Update README.md * Update README.md --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Update README.md * Delete .gitignore --------- Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2024-04-29 04:47:03 +10:00
Daniel Han	4a88539991	Fix Llama-3 (#366 ) * Fix prompt * Update chat_templates.py * fix_untrained_tokens * Update llama.py * add tokens * Update _utils.py * Update tokenizer_utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * pad_token * Update chat_templates.py * Update chat_templates.py * tokenizer * Update save.py * Update chat_templates.py * Update chat_templates.py	2024-04-22 05:12:11 +10:00
Daniel Han-Chen	7a0ded3ab4	Update README.md	2024-04-20 14:22:52 +10:00
Daniel Han	79396e367f	Fix prompt (#357 )	2024-04-20 04:59:19 +10:00
Daniel Han	68d7f13dc2	Update README.md (#352 )	2024-04-19 05:50:19 +10:00
Daniel Han	a32efe4dde	Update README.md (#351 )	2024-04-19 05:47:04 +10:00
Daniel Han-Chen	95b276ceb1	Update mapper.py	2024-04-19 05:37:53 +10:00
Daniel Han-Chen	2771e5042b	Update _utils.py	2024-04-19 03:36:45 +10:00
Daniel Han-Chen	faa66d209d	Update _utils.py	2024-04-19 03:33:12 +10:00
Daniel Han-Chen	a4123e16c7	Update tokenizer_utils.py	2024-04-19 03:05:54 +10:00
Daniel Han-Chen	69cedf9fcb	Update tokenizer_utils.py	2024-04-19 03:03:05 +10:00
Daniel Han-Chen	8f89ca62f1	Llama-3	2024-04-19 02:55:20 +10:00
Daniel Han	6dad5dd932	Tokenizers fix (#336 ) * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py * Update save.py * GGUF incorrect * Update save.py * Update pyproject.toml * kaggle new * Update pyproject.toml * Update pyproject.toml * upcasting * Fix Colab * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update rope_embedding.py * Update rope_embedding.py * Fix bugs * Update fast_lora.py * Update fast_lora.py * Update README.md * Update README.md * GGUF * Update save.py * Update save.py * Update save.py * Update save.py * Update README.md * Update README.md * Bugs * Update fast_lora.py * Update pyproject.toml * Update fast_lora.py * Update __init__.py * Update fast_lora.py * dtype * Update llama.py * Update llama.py * Update llama.py * dtype * Update mistral.py * trust_remote_code * lm_head * Update llama.py * save_pretrained_settings * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * state_dict * Update save.py * whoami * Update llama.py * Update save.py * Update llama.py * Patch tokenizer * Update chat_templates.py * Heal tokenizers * Update chat_templates.py * Update mapper.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update chat_templates.py * tokenizer patching * patch_tokenizer * Update chat_templates.py * Update tokenizer_utils.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update tokenizer_utils.py * Edit * Update mistral.py * Update mistral.py * Stats * Update mistral.py * attention_mask * Update llama.py * Update llama.py * batch * Temp fix batch inference * Update llama.py * Update gemma.py * Fix inference * swiglu * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * fast inference * model * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update llama.py * Update utils.py * inference * Update llama.py * Update llama.py * Update llama.py * overhead * Update llama.py * Update llama.py * compile * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * lora mamtul * Update llama.py * Update llama.py * Update llama.py * offloaded checkpointing * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update llama.py * Update llama.py * Update gemma.py * Revert "Update gemma.py" This reverts commit e3c3c5f3fa3d04a87f854056f6b547ced610d712. * Update _utils.py * Update _utils.py * Update _utils.py * Saving * sentencepiece_model_pb2 * Update llama.py * Update save.py * Update llama.py * padding side * Update tokenizer_utils.py * cache dir * Update tokenizer_utils.py * Update tokenizer_utils.py * Update pyproject.toml * Update pyproject.toml * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update save.py * Update save.py * checkpoint * Gemma 1.1 * more models * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * dtype * Update llama.py * CodeGemma * Fix downcasting * Some bugs * Fix Yi tokenizer * HF_TOKEN * Update llama.py * Update tokenizer_utils.py	2024-04-15 04:18:01 +10:00
Daniel Han	6d36f6b9a9	Readme Changes (#324 ) * Update README.md * Update README.md --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>	2024-04-11 01:43:34 +10:00
Daniel Han-Chen	66874d9918	Update _utils.py	2024-04-10 00:51:06 +10:00
Daniel Han	648bde7f06	Fix downcasting LoRA (#318 ) * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py * Update save.py * GGUF incorrect * Update save.py * Update pyproject.toml * kaggle new * Update pyproject.toml * Update pyproject.toml * upcasting * Fix Colab * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update rope_embedding.py * Update rope_embedding.py * Fix bugs * Update fast_lora.py * Update fast_lora.py * Update README.md * Update README.md * GGUF * Update save.py * Update save.py * Update save.py * Update save.py * Update README.md * Update README.md * Bugs * Update fast_lora.py * Update pyproject.toml * Update fast_lora.py * Update __init__.py * Update fast_lora.py * dtype * Update llama.py * Update llama.py * Update llama.py * dtype * Update mistral.py * trust_remote_code * lm_head * Update llama.py * save_pretrained_settings * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * state_dict * Update save.py * whoami * Update llama.py * Update save.py * Update llama.py * Patch tokenizer * Update chat_templates.py * Heal tokenizers * Update chat_templates.py * Update mapper.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update chat_templates.py * tokenizer patching * patch_tokenizer * Update chat_templates.py * Update tokenizer_utils.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update tokenizer_utils.py * Edit * Update mistral.py * Update mistral.py * Stats * Update mistral.py * attention_mask * Update llama.py * Update llama.py * batch * Temp fix batch inference * Update llama.py * Update gemma.py * Fix inference * swiglu * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * fast inference * model * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update llama.py * Update utils.py * inference * Update llama.py * Update llama.py * Update llama.py * overhead * Update llama.py * Update llama.py * compile * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * lora mamtul * Update llama.py * Update llama.py * Update llama.py * offloaded checkpointing * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update llama.py * Update llama.py * Update gemma.py * Revert "Update gemma.py" This reverts commit `c68b59bbfd`. * Update _utils.py * Update _utils.py * Update _utils.py * Saving * sentencepiece_model_pb2 * Update llama.py * Update save.py * Update llama.py * padding side * Update tokenizer_utils.py * cache dir * Update tokenizer_utils.py * Update tokenizer_utils.py * Update pyproject.toml * Update pyproject.toml * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update save.py * Update save.py * checkpoint * Gemma 1.1 * more models * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * dtype * Update llama.py * CodeGemma * Fix downcasting	2024-04-10 00:44:58 +10:00
Daniel Han	c7649138ee	CodeGemma (#317 ) * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py * Update save.py * GGUF incorrect * Update save.py * Update pyproject.toml * kaggle new * Update pyproject.toml * Update pyproject.toml * upcasting * Fix Colab * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update rope_embedding.py * Update rope_embedding.py * Fix bugs * Update fast_lora.py * Update fast_lora.py * Update README.md * Update README.md * GGUF * Update save.py * Update save.py * Update save.py * Update save.py * Update README.md * Update README.md * Bugs * Update fast_lora.py * Update pyproject.toml * Update fast_lora.py * Update __init__.py * Update fast_lora.py * dtype * Update llama.py * Update llama.py * Update llama.py * dtype * Update mistral.py * trust_remote_code * lm_head * Update llama.py * save_pretrained_settings * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * state_dict * Update save.py * whoami * Update llama.py * Update save.py * Update llama.py * Patch tokenizer * Update chat_templates.py * Heal tokenizers * Update chat_templates.py * Update mapper.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update chat_templates.py * tokenizer patching * patch_tokenizer * Update chat_templates.py * Update tokenizer_utils.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update tokenizer_utils.py * Edit * Update mistral.py * Update mistral.py * Stats * Update mistral.py * attention_mask * Update llama.py * Update llama.py * batch * Temp fix batch inference * Update llama.py * Update gemma.py * Fix inference * swiglu * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * fast inference * model * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update llama.py * Update utils.py * inference * Update llama.py * Update llama.py * Update llama.py * overhead * Update llama.py * Update llama.py * compile * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * lora mamtul * Update llama.py * Update llama.py * Update llama.py * offloaded checkpointing * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update llama.py * Update llama.py * Update gemma.py * Revert "Update gemma.py" This reverts commit `c68b59bbfd`. * Update _utils.py * Update _utils.py * Update _utils.py * Saving * sentencepiece_model_pb2 * Update llama.py * Update save.py * Update llama.py * padding side * Update tokenizer_utils.py * cache dir * Update tokenizer_utils.py * Update tokenizer_utils.py * Update pyproject.toml * Update pyproject.toml * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update save.py * Update save.py * checkpoint * Gemma 1.1 * more models * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * dtype * Update llama.py * CodeGemma	2024-04-09 23:35:02 +10:00
Daniel Han	6a53964f6b	Torch dtype (#314 ) * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py * Update save.py * GGUF incorrect * Update save.py * Update pyproject.toml * kaggle new * Update pyproject.toml * Update pyproject.toml * upcasting * Fix Colab * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update rope_embedding.py * Update rope_embedding.py * Fix bugs * Update fast_lora.py * Update fast_lora.py * Update README.md * Update README.md * GGUF * Update save.py * Update save.py * Update save.py * Update save.py * Update README.md * Update README.md * Bugs * Update fast_lora.py * Update pyproject.toml * Update fast_lora.py * Update __init__.py * Update fast_lora.py * dtype * Update llama.py * Update llama.py * Update llama.py * dtype * Update mistral.py * trust_remote_code * lm_head * Update llama.py * save_pretrained_settings * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * state_dict * Update save.py * whoami * Update llama.py * Update save.py * Update llama.py * Patch tokenizer * Update chat_templates.py * Heal tokenizers * Update chat_templates.py * Update mapper.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update chat_templates.py * tokenizer patching * patch_tokenizer * Update chat_templates.py * Update tokenizer_utils.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update tokenizer_utils.py * Edit * Update mistral.py * Update mistral.py * Stats * Update mistral.py * attention_mask * Update llama.py * Update llama.py * batch * Temp fix batch inference * Update llama.py * Update gemma.py * Fix inference * swiglu * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * fast inference * model * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update llama.py * Update utils.py * inference * Update llama.py * Update llama.py * Update llama.py * overhead * Update llama.py * Update llama.py * compile * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * lora mamtul * Update llama.py * Update llama.py * Update llama.py * offloaded checkpointing * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update llama.py * Update llama.py * Update gemma.py * Revert "Update gemma.py" This reverts commit `c68b59bbfd`. * Update _utils.py * Update _utils.py * Update _utils.py * Saving * sentencepiece_model_pb2 * Update llama.py * Update save.py * Update llama.py * padding side * Update tokenizer_utils.py * cache dir * Update tokenizer_utils.py * Update tokenizer_utils.py * Update pyproject.toml * Update pyproject.toml * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update save.py * Update save.py * checkpoint * Gemma 1.1 * more models * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * dtype	2024-04-08 23:19:46 +10:00
Daniel Han	4474e4bca4	Fix Gemma GGUF (#311 ) * Update gemma.py * Fix Gemma merging * Update rms_layernorm.py * Update gemma.py * Update pyproject.toml * Layernorms * Gemma precision * Update gemma.py * sqrt * Update gemma.py * Update save.py * RoPE and Gemma precision * Update rms_layernorm.py * Fix warning * Update chat_templates.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py * Update save.py * GGUF incorrect * Update save.py * Update pyproject.toml * kaggle new * Update pyproject.toml * Update pyproject.toml * upcasting * Fix Colab * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update rope_embedding.py * Update rope_embedding.py * Fix bugs * Update fast_lora.py * Update fast_lora.py * Update README.md * Update README.md * GGUF * Update save.py * Update save.py * Update save.py * Update save.py * Update README.md * Update README.md * Bugs * Update fast_lora.py * Update pyproject.toml * Update fast_lora.py * Update __init__.py * Update fast_lora.py * dtype * Update llama.py * Update llama.py * Update llama.py * dtype * Update mistral.py * trust_remote_code * lm_head * Update llama.py * save_pretrained_settings * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * state_dict * Update save.py * whoami * Update llama.py * Update save.py * Update llama.py * Patch tokenizer * Update chat_templates.py * Heal tokenizers * Update chat_templates.py * Update mapper.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update chat_templates.py * tokenizer patching * patch_tokenizer * Update chat_templates.py * Update tokenizer_utils.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update tokenizer_utils.py * Edit * Update mistral.py * Update mistral.py * Stats * Update mistral.py * attention_mask * Update llama.py * Update llama.py * batch * Temp fix batch inference * Update llama.py * Update gemma.py * Fix inference * swiglu * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * fast inference * model * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update llama.py * Update utils.py * inference * Update llama.py * Update llama.py * Update llama.py * overhead * Update llama.py * Update llama.py * compile * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * lora mamtul * Update llama.py * Update llama.py * Update llama.py * offloaded checkpointing * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update llama.py * Update llama.py * Update gemma.py * Revert "Update gemma.py" This reverts commit `c68b59bbfd`. * Update _utils.py * Update _utils.py * Update _utils.py * Saving * sentencepiece_model_pb2 * Update llama.py * Update save.py * Update llama.py * padding side * Update tokenizer_utils.py * cache dir * Update tokenizer_utils.py * Update tokenizer_utils.py * Update pyproject.toml * Update pyproject.toml * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update save.py * Update save.py * checkpoint * Gemma 1.1 * more models	2024-04-08 01:28:00 +10:00
Daniel Han	f3d05d19e3	Bug fixes (#308 ) * Update rms_layernorm.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Fix Gemma merging * Update rms_layernorm.py * Update gemma.py * Update pyproject.toml * Layernorms * Gemma precision * Update gemma.py * sqrt * Update gemma.py * Update save.py * RoPE and Gemma precision * Update rms_layernorm.py * Fix warning * Update chat_templates.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py * Update save.py * GGUF incorrect * Update save.py * Update pyproject.toml * kaggle new * Update pyproject.toml * Update pyproject.toml * upcasting * Fix Colab * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update rope_embedding.py * Update rope_embedding.py * Fix bugs * Update fast_lora.py * Update fast_lora.py * Update README.md * Update README.md * GGUF * Update save.py * Update save.py * Update save.py * Update save.py * Update README.md * Update README.md * Bugs * Update fast_lora.py * Update pyproject.toml * Update fast_lora.py * Update __init__.py * Update fast_lora.py * dtype * Update llama.py * Update llama.py * Update llama.py * dtype * Update mistral.py * trust_remote_code * lm_head * Update llama.py * save_pretrained_settings * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * state_dict * Update save.py * whoami * Update llama.py * Update save.py * Update llama.py * Patch tokenizer * Update chat_templates.py * Heal tokenizers * Update chat_templates.py * Update mapper.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update chat_templates.py * tokenizer patching * patch_tokenizer * Update chat_templates.py * Update tokenizer_utils.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update tokenizer_utils.py * Edit * Update mistral.py * Update mistral.py * Stats * Update mistral.py * attention_mask * Update llama.py * Update llama.py * batch * Temp fix batch inference * Update llama.py * Update gemma.py * Fix inference * swiglu * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * fast inference * model * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update llama.py * Update utils.py * inference * Update llama.py * Update llama.py * Update llama.py * overhead * Update llama.py * Update llama.py * compile * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * lora mamtul * Update llama.py * Update llama.py * Update llama.py * offloaded checkpointing * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update llama.py * Update llama.py * Update gemma.py * Revert "Update gemma.py" This reverts commit `c68b59bbfd`. * Update _utils.py * Update _utils.py * Update _utils.py * Saving * sentencepiece_model_pb2 * Update llama.py * Update save.py * Update llama.py * padding side * Update tokenizer_utils.py * cache dir * Update tokenizer_utils.py * Update tokenizer_utils.py * Update pyproject.toml * Update pyproject.toml * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py	2024-04-07 03:44:45 +10:00
Daniel Han	920f0ae6e5	Bug fixes (#306 ) * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py * Update gemma.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Fix Gemma merging * Update rms_layernorm.py * Update gemma.py * Update pyproject.toml * Layernorms * Gemma precision * Update gemma.py * sqrt * Update gemma.py * Update save.py * RoPE and Gemma precision * Update rms_layernorm.py * Fix warning * Update chat_templates.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py * Update save.py * GGUF incorrect * Update save.py * Update pyproject.toml * kaggle new * Update pyproject.toml * Update pyproject.toml * upcasting * Fix Colab * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update rope_embedding.py * Update rope_embedding.py * Fix bugs * Update fast_lora.py * Update fast_lora.py * Update README.md * Update README.md * GGUF * Update save.py * Update save.py * Update save.py * Update save.py * Update README.md * Update README.md * Bugs * Update fast_lora.py * Update pyproject.toml * Update fast_lora.py * Update __init__.py * Update fast_lora.py * dtype * Update llama.py * Update llama.py * Update llama.py * dtype * Update mistral.py * trust_remote_code * lm_head * Update llama.py * save_pretrained_settings * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * state_dict * Update save.py * whoami * Update llama.py * Update save.py * Update llama.py * Patch tokenizer * Update chat_templates.py * Heal tokenizers * Update chat_templates.py * Update mapper.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update chat_templates.py * tokenizer patching * patch_tokenizer * Update chat_templates.py * Update tokenizer_utils.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update tokenizer_utils.py * Edit * Update mistral.py * Update mistral.py * Stats * Update mistral.py * attention_mask * Update llama.py * Update llama.py * batch * Temp fix batch inference * Update llama.py * Update gemma.py * Fix inference * swiglu * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * fast inference * model * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update llama.py * Update utils.py * inference * Update llama.py * Update llama.py * Update llama.py * overhead * Update llama.py * Update llama.py * compile * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * lora mamtul * Update llama.py * Update llama.py * Update llama.py * offloaded checkpointing * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update llama.py * Update llama.py * Update gemma.py * Revert "Update gemma.py" This reverts commit `c68b59bbfd`. * Update _utils.py * Update _utils.py * Update _utils.py * Saving * sentencepiece_model_pb2 * Update llama.py * Update save.py * Update llama.py * padding side	2024-04-06 04:31:24 +11:00
Daniel Han-Chen	db4a24e602	Gemma inference fix	2024-04-05 03:51:53 +11:00
Daniel Han-Chen	b122b76b26	Update gemma.py eabdullin	2024-04-04 22:46:32 +11:00
Daniel Han	7e1c6a62e2	Nightly (#299 ) * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update pyproject.toml * Small fixes * Update pyproject.toml * Approx gelu * Update geglu.py * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py * Update gemma.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Fix Gemma merging * Update rms_layernorm.py * Update gemma.py * Update pyproject.toml * Layernorms * Gemma precision * Update gemma.py * sqrt * Update gemma.py * Update save.py * RoPE and Gemma precision * Update rms_layernorm.py * Fix warning * Update chat_templates.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py * Update save.py * GGUF incorrect * Update save.py * Update pyproject.toml * kaggle new * Update pyproject.toml * Update pyproject.toml * upcasting * Fix Colab * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update rope_embedding.py * Update rope_embedding.py * Fix bugs * Update fast_lora.py * Update fast_lora.py * Update README.md * Update README.md * GGUF * Update save.py * Update save.py * Update save.py * Update save.py * Update README.md * Update README.md * Bugs * Update fast_lora.py * Update pyproject.toml * Update fast_lora.py * Update __init__.py * Update fast_lora.py * dtype * Update llama.py * Update llama.py * Update llama.py * dtype * Update mistral.py * trust_remote_code * lm_head * Update llama.py * save_pretrained_settings * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * state_dict * Update save.py * whoami * Update llama.py * Update save.py * Update llama.py * Patch tokenizer * Update chat_templates.py * Heal tokenizers * Update chat_templates.py * Update mapper.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update chat_templates.py * tokenizer patching * patch_tokenizer * Update chat_templates.py * Update tokenizer_utils.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update tokenizer_utils.py * Edit * Update mistral.py * Update mistral.py * Stats * Update mistral.py * attention_mask * Update llama.py * Update llama.py * batch * Temp fix batch inference * Update llama.py * Update gemma.py * Fix inference * swiglu * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * fast inference * model * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update llama.py * Update utils.py * inference * Update llama.py * Update llama.py * Update llama.py * overhead * Update llama.py * Update llama.py * compile * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * lora mamtul	2024-04-03 05:38:31 +11:00
Daniel Han	537577720c	Fix batched inference (#298 ) * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update pyproject.toml * Small fixes * Update pyproject.toml * Approx gelu * Update geglu.py * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py * Update gemma.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Fix Gemma merging * Update rms_layernorm.py * Update gemma.py * Update pyproject.toml * Layernorms * Gemma precision * Update gemma.py * sqrt * Update gemma.py * Update save.py * RoPE and Gemma precision * Update rms_layernorm.py * Fix warning * Update chat_templates.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py * Update save.py * GGUF incorrect * Update save.py * Update pyproject.toml * kaggle new * Update pyproject.toml * Update pyproject.toml * upcasting * Fix Colab * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update rope_embedding.py * Update rope_embedding.py * Fix bugs * Update fast_lora.py * Update fast_lora.py * Update README.md * Update README.md * GGUF * Update save.py * Update save.py * Update save.py * Update save.py * Update README.md * Update README.md * Bugs * Update fast_lora.py * Update pyproject.toml * Update fast_lora.py * Update __init__.py * Update fast_lora.py * dtype * Update llama.py * Update llama.py * Update llama.py * dtype * Update mistral.py * trust_remote_code * lm_head * Update llama.py * save_pretrained_settings * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * state_dict * Update save.py * whoami * Update llama.py * Update save.py * Update llama.py * Patch tokenizer * Update chat_templates.py * Heal tokenizers * Update chat_templates.py * Update mapper.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update chat_templates.py * tokenizer patching * patch_tokenizer * Update chat_templates.py * Update tokenizer_utils.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update tokenizer_utils.py * Edit * Update mistral.py * Update mistral.py * Stats * Update mistral.py * attention_mask * Update llama.py * Update llama.py * batch * Temp fix batch inference * Update llama.py * Update gemma.py * Fix inference * swiglu * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * fast inference * model * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update llama.py * Update utils.py * inference * Update llama.py * Update llama.py * Update llama.py * overhead * Update llama.py * Update llama.py * compile * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py	2024-04-03 04:56:11 +11:00
Daniel Han-Chen	8921867157	Revert "Temp fix batch inference (#294 )" This reverts commit `e209991ba1`.	2024-04-02 13:18:31 +11:00
Daniel Han	e209991ba1	Temp fix batch inference (#294 ) * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update pyproject.toml * Small fixes * Update pyproject.toml * Approx gelu * Update geglu.py * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py * Update gemma.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Fix Gemma merging * Update rms_layernorm.py * Update gemma.py * Update pyproject.toml * Layernorms * Gemma precision * Update gemma.py * sqrt * Update gemma.py * Update save.py * RoPE and Gemma precision * Update rms_layernorm.py * Fix warning * Update chat_templates.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py * Update save.py * GGUF incorrect * Update save.py * Update pyproject.toml * kaggle new * Update pyproject.toml * Update pyproject.toml * upcasting * Fix Colab * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update rope_embedding.py * Update rope_embedding.py * Fix bugs * Update fast_lora.py * Update fast_lora.py * Update README.md * Update README.md * GGUF * Update save.py * Update save.py * Update save.py * Update save.py * Update README.md * Update README.md * Bugs * Update fast_lora.py * Update pyproject.toml * Update fast_lora.py * Update __init__.py * Update fast_lora.py * dtype * Update llama.py * Update llama.py * Update llama.py * dtype * Update mistral.py * trust_remote_code * lm_head * Update llama.py * save_pretrained_settings * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * state_dict * Update save.py * whoami * Update llama.py * Update save.py * Update llama.py * Patch tokenizer * Update chat_templates.py * Heal tokenizers * Update chat_templates.py * Update mapper.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update chat_templates.py * tokenizer patching * patch_tokenizer * Update chat_templates.py * Update tokenizer_utils.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update tokenizer_utils.py * Edit * Update mistral.py * Update mistral.py * Stats * Update mistral.py * attention_mask * Update llama.py * Update llama.py * batch * Temp fix batch inference * Update llama.py * Update gemma.py	2024-04-02 04:35:28 +11:00
Daniel Han	8e263b8b7d	Nightly (#293 ) Env checking	2024-04-01 04:38:12 +11:00
Daniel Han	74f79684da	Auto Healing Tokenizer (#283 ) * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update pyproject.toml * Small fixes * Update pyproject.toml * Approx gelu * Update geglu.py * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py * Update gemma.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Fix Gemma merging * Update rms_layernorm.py * Update gemma.py * Update pyproject.toml * Layernorms * Gemma precision * Update gemma.py * sqrt * Update gemma.py * Update save.py * RoPE and Gemma precision * Update rms_layernorm.py * Fix warning * Update chat_templates.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py * Update save.py * GGUF incorrect * Update save.py * Update pyproject.toml * kaggle new * Update pyproject.toml * Update pyproject.toml * upcasting * Fix Colab * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update rope_embedding.py * Update rope_embedding.py * Fix bugs * Update fast_lora.py * Update fast_lora.py * Update README.md * Update README.md * GGUF * Update save.py * Update save.py * Update save.py * Update save.py * Update README.md * Update README.md * Bugs * Update fast_lora.py * Update pyproject.toml * Update fast_lora.py * Update __init__.py * Update fast_lora.py * dtype * Update llama.py * Update llama.py * Update llama.py * dtype * Update mistral.py * trust_remote_code * lm_head * Update llama.py * save_pretrained_settings * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * state_dict * Update save.py * whoami * Update llama.py * Update save.py * Update llama.py * Patch tokenizer * Update chat_templates.py * Heal tokenizers * Update chat_templates.py * Update mapper.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update chat_templates.py * tokenizer patching * patch_tokenizer * Update chat_templates.py * Update tokenizer_utils.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update tokenizer_utils.py * Edit	2024-03-28 04:16:50 +11:00
Daniel Han-Chen	bb45fdabb6	Update mapper.py	2024-03-24 14:00:04 +11:00
Daniel Han	b9d5ca53dc	lm_head issue (#266 ) * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update pyproject.toml * Small fixes * Update pyproject.toml * Approx gelu * Update geglu.py * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py * Update gemma.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Fix Gemma merging * Update rms_layernorm.py * Update gemma.py * Update pyproject.toml * Layernorms * Gemma precision * Update gemma.py * sqrt * Update gemma.py * Update save.py * RoPE and Gemma precision * Update rms_layernorm.py * Fix warning * Update chat_templates.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py * Update save.py * GGUF incorrect * Update save.py * Update pyproject.toml * kaggle new * Update pyproject.toml * Update pyproject.toml * upcasting * Fix Colab * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update rope_embedding.py * Update rope_embedding.py * Fix bugs * Update fast_lora.py * Update fast_lora.py * Update README.md * Update README.md * GGUF * Update save.py * Update save.py * Update save.py * Update save.py * Update README.md * Update README.md * Bugs * Update fast_lora.py * Update pyproject.toml * Update fast_lora.py * Update __init__.py * Update fast_lora.py * dtype * Update llama.py * Update llama.py * Update llama.py * dtype * Update mistral.py * trust_remote_code * lm_head * Update llama.py * save_pretrained_settings * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * state_dict * Update save.py * whoami * Update llama.py * Update save.py * Update llama.py	2024-03-20 04:48:15 +11:00
Daniel Han	e269b0dc59	Fix Saving (#264 ) * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update pyproject.toml * Small fixes * Update pyproject.toml * Approx gelu * Update geglu.py * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py * Update gemma.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Fix Gemma merging * Update rms_layernorm.py * Update gemma.py * Update pyproject.toml * Layernorms * Gemma precision * Update gemma.py * sqrt * Update gemma.py * Update save.py * RoPE and Gemma precision * Update rms_layernorm.py * Fix warning * Update chat_templates.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py * Update save.py * GGUF incorrect * Update save.py * Update pyproject.toml * kaggle new * Update pyproject.toml * Update pyproject.toml * upcasting * Fix Colab * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update rope_embedding.py * Update rope_embedding.py * Fix bugs * Update fast_lora.py * Update fast_lora.py * Update README.md * Update README.md * GGUF * Update save.py * Update save.py * Update save.py * Update save.py * Update README.md * Update README.md * Bugs * Update fast_lora.py * Update pyproject.toml * Update fast_lora.py * Update __init__.py * Update fast_lora.py * dtype * Update llama.py * Update llama.py * Update llama.py * dtype * Update mistral.py * trust_remote_code * lm_head * Update llama.py * save_pretrained_settings * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * state_dict * Update save.py * whoami * Update llama.py * Update save.py	2024-03-19 19:55:02 +11:00
Daniel Han	696e8817ea	Fix GGUF and saving (#261 ) * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update pyproject.toml * Small fixes * Update pyproject.toml * Approx gelu * Update geglu.py * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py * Update gemma.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Fix Gemma merging * Update rms_layernorm.py * Update gemma.py * Update pyproject.toml * Layernorms * Gemma precision * Update gemma.py * sqrt * Update gemma.py * Update save.py * RoPE and Gemma precision * Update rms_layernorm.py * Fix warning * Update chat_templates.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py * Update save.py * GGUF incorrect * Update save.py * Update pyproject.toml * kaggle new * Update pyproject.toml * Update pyproject.toml * upcasting * Fix Colab * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update rope_embedding.py * Update rope_embedding.py * Fix bugs * Update fast_lora.py * Update fast_lora.py * Update README.md * Update README.md * GGUF * Update save.py * Update save.py * Update save.py * Update save.py * Update README.md * Update README.md * Bugs * Update fast_lora.py * Update pyproject.toml * Update fast_lora.py * Update __init__.py * Update fast_lora.py * dtype * Update llama.py * Update llama.py * Update llama.py * dtype * Update mistral.py * trust_remote_code * lm_head * Update llama.py * save_pretrained_settings * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py	2024-03-19 04:32:28 +11:00
Daniel Han	36473e2d6e	Fix lm_head, embed_tokens (#258 ) * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update pyproject.toml * Small fixes * Update pyproject.toml * Approx gelu * Update geglu.py * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py * Update gemma.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Fix Gemma merging * Update rms_layernorm.py * Update gemma.py * Update pyproject.toml * Layernorms * Gemma precision * Update gemma.py * sqrt * Update gemma.py * Update save.py * RoPE and Gemma precision * Update rms_layernorm.py * Fix warning * Update chat_templates.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py * Update save.py * GGUF incorrect * Update save.py * Update pyproject.toml * kaggle new * Update pyproject.toml * Update pyproject.toml * upcasting * Fix Colab * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update rope_embedding.py * Update rope_embedding.py * Fix bugs * Update fast_lora.py * Update fast_lora.py * Update README.md * Update README.md * GGUF * Update save.py * Update save.py * Update save.py * Update save.py * Update README.md * Update README.md * Bugs * Update fast_lora.py * Update pyproject.toml * Update fast_lora.py * Update __init__.py * Update fast_lora.py * dtype * Update llama.py * Update llama.py * Update llama.py * dtype * Update mistral.py * trust_remote_code	2024-03-18 04:18:15 +11:00
Daniel Han-Chen	16b3dd86b7	Update fast_lora.py	2024-03-17 22:46:44 +11:00
Daniel Han-Chen	4b9aaee6b1	Update fast_lora.py	2024-03-17 22:46:17 +11:00
Daniel Han-Chen	3685ec607d	Squashed commit of the following: commit `61d45c60db` Merge: `f024ce2` `64d847b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 17 22:12:44 2024 +1100 Merge branch 'main' into nightly commit `f024ce2821` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 17 22:12:18 2024 +1100 Update __init__.py commit `d38cf5387c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 17 22:09:23 2024 +1100 Update fast_lora.py commit `9c35a2c4b0` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 17 22:08:33 2024 +1100 Update pyproject.toml commit `6edc35f686` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 17 20:18:00 2024 +1100 Update fast_lora.py commit `2a9d4fb947` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 17 20:10:30 2024 +1100 Bugs commit `14717be070` Merge: `5c24a3b` `c599ae0` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 17 20:10:26 2024 +1100 Merge branch 'main' into nightly commit `5c24a3bc2e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 17 02:46:35 2024 +1100 Update README.md commit `fd729a7131` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 17 02:44:58 2024 +1100 Update README.md commit `7e9f092e9f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Mar 16 23:17:35 2024 +1100 Update save.py commit `e3efca8778` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Mar 16 22:36:57 2024 +1100 Update save.py commit `d58fa31e0c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Mar 16 22:31:02 2024 +1100 Update save.py commit `64d954bd13` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Mar 16 20:32:12 2024 +1100 Update save.py commit `815202f832` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Mar 16 20:03:51 2024 +1100 GGUF commit `338b2c928b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Mar 16 04:40:21 2024 +1100 Update README.md commit `f342425c1e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Mar 16 04:34:51 2024 +1100 Update README.md commit `cef733a420` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Mar 16 04:10:54 2024 +1100 Update fast_lora.py commit `e5bcab2a74` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Mar 16 03:43:16 2024 +1100 Update fast_lora.py commit `80cfe132f6` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Mar 16 03:38:13 2024 +1100 Fix bugs commit `d8e98be90d` Merge: `51c2484` `39713e6` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Mar 16 00:14:02 2024 +1100 Merge branch 'main' into nightly commit `51c2484ffd` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Mar 15 22:35:44 2024 +1100 Update rope_embedding.py commit `3e93a78794` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Mar 15 20:14:56 2024 +1100 Update rope_embedding.py commit `82d80e9e98` Merge: `718a6a1` `990c7a8` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Mar 15 19:50:20 2024 +1100 Merge branch 'main' into nightly commit `718a6a1c84` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Mar 15 19:50:03 2024 +1100 Update pyproject.toml commit `385c6d44a8` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Mar 15 19:49:09 2024 +1100 Update pyproject.toml commit `9c9ede4680` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Mar 15 19:48:39 2024 +1100 Update pyproject.toml commit `9d6c9c9ebc` Merge: `a12e4ea` `2c5c5bb` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Mar 15 05:07:09 2024 +1100 Merge branch 'main' into nightly commit `a12e4ea3e0` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Mar 15 04:44:54 2024 +1100 Update chat_templates.py commit `c3e0e518d9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Mar 15 04:30:03 2024 +1100 Update chat_templates.py commit `fe0e2d7baa` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Mar 15 03:50:32 2024 +1100 Update chat_templates.py commit `5dfe582c96` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Mar 15 03:37:37 2024 +1100 Update chat_templates.py commit `ec73a776ec` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Mar 15 03:34:32 2024 +1100 Update chat_templates.py commit `5c4241a240` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Mar 14 20:27:20 2024 +1100 Update pyproject.toml commit `6841a303d9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Mar 14 20:20:16 2024 +1100 Update pyproject.toml commit `341b5f46e6` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Mar 14 20:12:54 2024 +1100 Update pyproject.toml commit `6809846464` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Mar 14 20:12:02 2024 +1100 Update pyproject.toml commit `77edfb10b7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Mar 14 20:11:24 2024 +1100 Update pyproject.toml commit `0ed19ec170` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Mar 14 20:10:10 2024 +1100 Update pyproject.toml commit `1f4c625f2a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Mar 14 20:06:05 2024 +1100 Update pyproject.toml commit `8fa0aab61c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Mar 14 20:04:01 2024 +1100 Update pyproject.toml commit `eb61377632` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Mar 14 19:02:29 2024 +1100 Fix Colab commit `d6ac9b56c1` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Mar 12 20:12:55 2024 +1100 upcasting commit `b7c3190e97` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Mar 12 00:46:57 2024 +1100 Update pyproject.toml commit `4d77c32425` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 11 23:28:03 2024 +1100 Update pyproject.toml commit `98573e8618` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 11 23:21:27 2024 +1100 kaggle new commit `4a794dd880` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 11 20:28:09 2024 +1100 Update pyproject.toml commit `a0c18c9880` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 11 20:05:06 2024 +1100 Update save.py commit `684eaae9ea` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 11 19:52:18 2024 +1100 GGUF incorrect commit `029d588603` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 11 19:13:42 2024 +1100 Update save.py commit `252a38a2df` Merge: `63b1f58` `3222377` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 11 19:13:29 2024 +1100 Merge branch 'main' into nightly commit `63b1f5879e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 11 04:05:43 2024 +1100 Update llama.py commit `cb0d937469` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 11 03:53:23 2024 +1100 Account for DoRA commit `2e87755a5c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 11 03:19:20 2024 +1100 Update llama.py commit `93d88ad68d` Merge: `aba595d` `8bea94c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 11 03:15:30 2024 +1100 Merge branch 'main' into nightly commit `aba595de9c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 11 03:07:44 2024 +1100 Update llama.py commit `133097d995` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 11 02:25:10 2024 +1100 Update save.py commit `b5d3d63df5` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 11 02:15:33 2024 +1100 Update save.py commit `8be91b050f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 11 02:10:51 2024 +1100 Update chat_templates.py commit `23b7a5764c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 10 19:32:02 2024 +1100 Update fast_lora.py commit `c1728a9904` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 10 19:11:00 2024 +1100 Update fast_lora.py commit `c192ce3ed4` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 10 19:09:10 2024 +1100 Update fast_lora.py commit `7e3abd19ba` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 10 18:36:35 2024 +1100 Update fast_lora.py commit `c1f3e70394` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 10 18:33:52 2024 +1100 Update fast_lora.py commit `08da057f04` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 10 18:09:58 2024 +1100 Update save.py commit `1e8922af2b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 10 17:42:41 2024 +1100 Revert commit `74fc5caa60` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 10 16:20:20 2024 +1100 Accuracy commit `35c6d776c4` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 10 14:13:53 2024 +1100 Update save.py commit `6d2bc97117` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 10 14:08:38 2024 +1100 Update llama.py commit `baf8e4c0a8` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 10 13:11:42 2024 +1100 Update llama.py commit `c0d9516255` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 10 13:10:03 2024 +1100 Update llama.py commit `91877f5506` Merge: `f887080` `1fcf9d4` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 10 13:08:23 2024 +1100 Merge branch 'main' into nightly commit `f887080a6d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 10 04:31:40 2024 +1100 Tokenizer overwritten commit `1c1461ae09` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 10 04:08:02 2024 +1100 Update loader.py commit `14f063819a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 10 03:54:18 2024 +1100 model_name commit `daba749ee1` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 10 03:48:05 2024 +1100 Update llama.py commit `457b7ba6d6` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 10 02:57:41 2024 +1100 Update chat_templates.py commit `5c9629f5fe` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 10 02:55:07 2024 +1100 Update save.py commit `e6c3cdfc2d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Mar 9 23:22:14 2024 +1100 Update save.py commit `f0a3c05b07` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Mar 9 23:19:02 2024 +1100 Update save.py commit `58d1f1e03c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Mar 9 20:05:52 2024 +1100 Update chat_templates.py commit `9b3dd3e9f8` Merge: `b321ada` `70f271b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Mar 8 19:40:25 2024 +1100 Merge branch 'main' into nightly commit `b321adac2c` Merge: `03232df` `fedcafe` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Mar 7 04:32:51 2024 +1100 Merge branch 'main' into nightly commit `03232dff4e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Mar 7 02:48:15 2024 +1100 Update chat_templates.py commit `e67c6b4ec9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Mar 7 02:05:38 2024 +1100 Fix warning commit `5a7f52819e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Mar 6 19:21:44 2024 +1100 Update rms_layernorm.py commit `837ba610cf` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Mar 6 18:18:07 2024 +1100 RoPE and Gemma precision commit `1e41fa0c8c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Mar 6 05:07:37 2024 +1100 Update save.py commit `333d3d9a51` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Mar 6 04:52:59 2024 +1100 Update gemma.py commit `85c052d8fb` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Mar 6 04:48:26 2024 +1100 sqrt commit `ed1aa0007a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Mar 6 04:24:10 2024 +1100 Update gemma.py commit `be81c07d43` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Mar 6 04:14:11 2024 +1100 Gemma precision commit `5a693a107d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Mar 5 23:58:00 2024 +1100 Layernorms commit `e0a2463738` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Mar 5 20:07:44 2024 +1100 Update pyproject.toml commit `160320b3d9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Mar 5 18:55:50 2024 +1100 Update gemma.py commit `9f7f205f67` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Mar 5 18:27:11 2024 +1100 Update rms_layernorm.py commit `43e710b064` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Mar 5 18:21:59 2024 +1100 Fix Gemma merging commit `3137392ccf` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Mar 5 02:41:56 2024 +1100 Update gemma.py commit `fe35b127fc` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Mar 5 02:40:20 2024 +1100 Update gemma.py commit `5fae81b628` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Mar 5 02:22:31 2024 +1100 Update gemma.py commit `961813b35f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Mar 5 01:45:08 2024 +1100 Update gemma.py commit `c027dacb18` Merge: `cb193f7` `7b7665d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 4 16:19:13 2024 +1100 Merge branch 'main' into nightly commit `cb193f7e0a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 4 16:14:50 2024 +1100 Update gemma.py commit `440c29273f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 4 16:11:12 2024 +1100 Update rms_layernorm.py commit `c31b27b7dc` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 4 16:04:29 2024 +1100 Update rms_layernorm.py commit `6fa081ae77` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Mar 4 16:00:35 2024 +1100 Update rms_layernorm.py commit `aa2fb63048` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 3 19:32:18 2024 +1100 Update gemma.py commit `ac23e4b612` Merge: `1fea4ff` `fa2a43b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 3 19:29:41 2024 +1100 Merge branch 'main' into nightly commit `1fea4ffcf2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 3 18:07:55 2024 +1100 Update geglu.py commit `245fe4716c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 3 03:39:36 2024 +1100 Update _utils.py commit `e27523cd5e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 3 03:39:10 2024 +1100 Update __init__.py commit `65fbde6cd9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 3 03:37:43 2024 +1100 Update __init__.py commit `9a2e791e6c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 3 03:35:21 2024 +1100 Update llama.py commit `786885c38c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 3 03:17:59 2024 +1100 Approx gelu commit `1c7f0d21ee` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Mar 3 02:24:39 2024 +1100 Update geglu.py commit `393e53b016` Merge: `c88ab10` `307f2da` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Mar 2 18:29:02 2024 +1100 Merge branch 'main' into nightly commit `c88ab10a5c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Mar 2 18:28:39 2024 +1100 Approx gelu commit `e032445694` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Mar 2 02:39:11 2024 +1100 Update pyproject.toml commit `c970a2b3be` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Mar 2 02:38:40 2024 +1100 Small fixes commit `db87262625` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Mar 1 04:16:25 2024 +1100 Update pyproject.toml commit `d44bbf5f2e` Merge: `0866020` `d0c15bb` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 29 00:17:47 2024 +1100 Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly commit `0866020037` Merge: `54b26c0` `2561964` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 29 00:17:12 2024 +1100 Merge branch 'main' into nightly commit `d0c15bb508` Author: Daniel Han <danielhanchen@gmail.com> Date: Thu Feb 29 00:17:03 2024 +1100 Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py commit `54b26c0466` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 29 00:16:19 2024 +1100 Update llama.py commit `074aa737ce` Merge: `b1892b5` `e7c53fb` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 29 00:14:42 2024 +1100 Merge branch 'main' into nightly commit `b1892b5511` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 29 00:09:00 2024 +1100 Update chat_templates.py commit `072dc0c447` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 29 00:02:40 2024 +1100 Update _utils.py commit `d37f284eaa` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 28 23:59:49 2024 +1100 DoRA commit `967fed83c7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 28 22:02:18 2024 +1100 Update llama.py commit `a4cd4a6e41` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 27 01:40:45 2024 +1100 Update README.md commit `90a5c2c121` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 27 01:39:52 2024 +1100 Update README.md commit `4b7df80abd` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 27 00:16:57 2024 +1100 Chat Templates commit `54ff6eb169` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 17:27:42 2024 +1100 Update cross_entropy_loss.py commit `7cf69dd9f2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 16:19:39 2024 +1100 Update cross_entropy_loss.py commit `1f450cd6ee` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 14:44:22 2024 +1100 Update gemma.py commit `7a5db6758c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 14:39:43 2024 +1100 correct_dtype commit `cf16811ab9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 14:38:27 2024 +1100 Update gemma.py commit `637d01643c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 14:37:48 2024 +1100 Update llama.py commit `3d73560df3` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 14:37:09 2024 +1100 Update llama.py commit `0d24acf0f5` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 14:36:39 2024 +1100 Update llama.py commit `1aeca6f00c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 14:35:44 2024 +1100 RoPE commit `4ec1b216e0` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 14:15:37 2024 +1100 Update save.py commit `11c9ea4d1b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 05:06:04 2024 +1100 Update gemma.py commit `24e9d0af28` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 05:05:41 2024 +1100 Update gemma.py commit `0b693df646` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 04:37:12 2024 +1100 Update gemma.py commit `ae336e9663` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 04:36:10 2024 +1100 Update gemma.py commit `1d53b366a0` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 04:32:49 2024 +1100 Update gemma.py commit `b1864710c5` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 04:30:29 2024 +1100 Update gemma.py commit `686d9fa4c4` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 04:27:55 2024 +1100 Update gemma.py commit `40169f9fd3` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 04:23:38 2024 +1100 Update gemma.py commit `b0b38f770b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 04:20:19 2024 +1100 Update gemma.py commit `13e06a062b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:56:34 2024 +1100 Update gemma.py commit `fd5389a485` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:55:15 2024 +1100 Update cross_entropy_loss.py commit `a5abe39ded` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:53:49 2024 +1100 Update cross_entropy_loss.py commit `78c96ef83c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:50:34 2024 +1100 Update cross_entropy_loss.py commit `88cfe5eeb2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:48:16 2024 +1100 Update cross_entropy_loss.py commit `dba4c0355d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:46:18 2024 +1100 Update cross_entropy_loss.py commit `199460d5ef` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:43:22 2024 +1100 Update cross_entropy_loss.py commit `13718b9847` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:36:38 2024 +1100 Update cross_entropy_loss.py commit `48736a0755` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:35:45 2024 +1100 Update cross_entropy_loss.py commit `c1dbf6710e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:29:19 2024 +1100 Update cross_entropy_loss.py commit `b55d7ad4b2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:29:11 2024 +1100 Update cross_entropy_loss.py commit `bc3ff0f3a6` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:28:22 2024 +1100 Update cross_entropy_loss.py commit `7bbce701c8` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:26:42 2024 +1100 Update cross_entropy_loss.py commit `9c0b3cd431` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:19:26 2024 +1100 Update cross_entropy_loss.py commit `e696487e84` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:18:01 2024 +1100 Update cross_entropy_loss.py commit `c675220373` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:16:31 2024 +1100 Update cross_entropy_loss.py commit `bb6b6f238e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:15:19 2024 +1100 Update cross_entropy_loss.py commit `096544eff4` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:14:04 2024 +1100 Update cross_entropy_loss.py commit `b5eff3bfbc` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:12:23 2024 +1100 Update cross_entropy_loss.py commit `fdaa9493be` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 03:09:37 2024 +1100 Update cross_entropy_loss.py commit `8259b16a5d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 26 02:49:02 2024 +1100 gemma commit `df8034d185` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 04:48:34 2024 +1100 Update llama.py commit `db2b387b22` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 04:47:45 2024 +1100 llama commit `36d29078cd` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 04:35:39 2024 +1100 Update gemma.py commit `4cc14b2074` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 04:30:59 2024 +1100 Update gemma.py commit `bc428f4e7b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 04:29:01 2024 +1100 Update gemma.py commit `7c958ab9e7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 04:25:57 2024 +1100 Update gemma.py commit `d530f95135` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 04:25:15 2024 +1100 Update gemma.py commit `6ad44835d2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 04:22:35 2024 +1100 Update gemma.py commit `46723124f7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 04:21:03 2024 +1100 Update gemma.py commit `ad1ce483db` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 04:20:54 2024 +1100 Update gemma.py commit `47d4a33f41` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 04:20:09 2024 +1100 Update gemma.py commit `6b531eff93` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 04:18:23 2024 +1100 Update gemma.py commit `147d129772` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 04:17:16 2024 +1100 Update gemma.py commit `884fadc744` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 04:14:57 2024 +1100 Update gemma.py commit `9d648cbccf` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 04:14:38 2024 +1100 Update gemma.py commit `6238f16625` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 04:08:04 2024 +1100 Update gemma.py commit `9a20dff5d2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 04:07:47 2024 +1100 Update gemma.py commit `94833602da` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:55:44 2024 +1100 Update gemma.py commit `40c244a7bc` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:53:40 2024 +1100 Update gemma.py commit `0e1578dedb` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:51:32 2024 +1100 Update gemma.py commit `33a72ba122` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:50:59 2024 +1100 Update gemma.py commit `890d73e6d0` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:49:55 2024 +1100 Update gemma.py commit `33eeb7add2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:44:22 2024 +1100 Update gemma.py commit `096a4192fb` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:39:40 2024 +1100 Update gemma.py commit `f270f377ab` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:39:31 2024 +1100 Update gemma.py commit `765e54fe55` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:38:18 2024 +1100 Update gemma.py commit `29721a86d1` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:37:30 2024 +1100 Update gemma.py commit `208e2c1189` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:32:23 2024 +1100 Update gemma.py commit `aff1db7a4c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:22:24 2024 +1100 Update gemma.py commit `98903e72c2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:21:37 2024 +1100 Update gemma.py commit `20da5547f5` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:19:34 2024 +1100 Update gemma.py commit `20e9ca2fac` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:16:55 2024 +1100 Update gemma.py commit `842b310767` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:16:34 2024 +1100 Update gemma.py commit `11067cf849` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:14:02 2024 +1100 Update gemma.py commit `6725dc93f1` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:13:18 2024 +1100 Update gemma.py commit `baf11e0e34` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:12:07 2024 +1100 Update gemma.py commit `e0ada344ac` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:11:20 2024 +1100 Update gemma.py commit `39a58f7d2e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:07:20 2024 +1100 Update gemma.py commit `c96e8c28f8` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:03:31 2024 +1100 Update gemma.py commit `10d03d8315` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:03:23 2024 +1100 Update gemma.py commit `45a33bae1c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:01:56 2024 +1100 Update gemma.py commit `77fea80a1b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:01:30 2024 +1100 Update gemma.py commit `e3517f0248` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 03:00:22 2024 +1100 Update gemma.py commit `03c7f211a6` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 02:49:14 2024 +1100 Update gemma.py commit `105a0325f1` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 02:42:56 2024 +1100 Update gemma.py commit `9d9e38a74a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 02:41:24 2024 +1100 Update gemma.py commit `49a124243e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 02:37:48 2024 +1100 Update gemma.py commit `5c31fd1e5b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 02:34:18 2024 +1100 Update gemma.py commit `c0f6b6c0cc` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 02:31:22 2024 +1100 Update gemma.py commit `911745377a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 02:30:06 2024 +1100 Update gemma.py commit `f73698859e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 02:28:01 2024 +1100 Update gemma.py commit `e56eb3f406` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 02:26:57 2024 +1100 Update gemma.py commit `045bce2fa5` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 02:25:39 2024 +1100 Update gemma.py commit `a847719499` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 02:18:43 2024 +1100 Update gemma.py commit `c058d1aa96` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 02:17:45 2024 +1100 Update gemma.py commit `dfb2e250f6` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 02:17:12 2024 +1100 Update gemma.py commit `f2fca2d6c1` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 02:10:35 2024 +1100 Update gemma.py commit `ae473e0f2b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 02:09:08 2024 +1100 rope commit `ce2d74732d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 02:07:29 2024 +1100 Update llama.py commit `ed5042aa07` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 02:04:11 2024 +1100 Update llama.py commit `42840647d0` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 02:02:59 2024 +1100 Update llama.py commit `ecd21b375b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 02:02:02 2024 +1100 Update llama.py commit `5bf8cacebd` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 01:58:33 2024 +1100 Update gemma.py commit `66a4380d08` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 01:50:42 2024 +1100 Update gemma.py commit `b85b45e7a0` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 01:49:39 2024 +1100 Update gemma.py commit `72870a1ed3` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 01:48:02 2024 +1100 Update gemma.py commit `7b903368ab` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 01:45:44 2024 +1100 Update gemma.py commit `0c2d7e503b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 01:39:25 2024 +1100 Update cross_entropy_loss.py commit `9ae5abc83f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 01:12:44 2024 +1100 Update gemma.py commit `3907ea9003` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 01:10:12 2024 +1100 Update gemma.py commit `c4c8558a90` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 01:04:09 2024 +1100 Update gemma.py commit `e0e96ef5ce` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 01:03:04 2024 +1100 Update gemma.py commit `df22d0cb3a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 00:46:58 2024 +1100 Update gemma.py commit `b9d9aa1b07` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 25 00:44:44 2024 +1100 Update gemma.py commit `785760ba0d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 22:43:00 2024 +1100 Update gemma.py commit `103b1cdbed` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 22:42:09 2024 +1100 Update gemma.py commit `b482ce13e9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 22:40:46 2024 +1100 Update gemma.py commit `7be57334cb` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 22:39:28 2024 +1100 Update gemma.py commit `2242bdf0a3` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 22:30:14 2024 +1100 Update gemma.py commit `a066c348f2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 22:24:43 2024 +1100 Update llama.py commit `b3d7e61608` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 20:20:37 2024 +1100 Update gemma.py commit `5b5652de36` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 20:17:24 2024 +1100 Update gemma.py commit `0f5cc839f3` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 20:15:25 2024 +1100 Update gemma.py commit `6011a490c6` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 20:13:24 2024 +1100 Update gemma.py commit `6d94cf88a7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 20:11:23 2024 +1100 Update gemma.py commit `c706447fd4` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 20:04:31 2024 +1100 Update gemma.py commit `ed3f139a9c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 19:55:22 2024 +1100 Update gemma.py commit `b7ba95857f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 19:47:10 2024 +1100 Update gemma.py commit `f0b21f9b7b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 19:45:20 2024 +1100 Update gemma.py commit `762790d6a9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 19:35:03 2024 +1100 Update gemma.py commit `b174e54507` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 19:33:04 2024 +1100 Update gemma.py commit `47c6feaf89` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 19:12:52 2024 +1100 Update gemma.py commit `b2b658cbee` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 19:10:36 2024 +1100 Update gemma.py commit `407205dd87` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 18:58:31 2024 +1100 Update gemma.py commit `82e45c02a5` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 18:25:25 2024 +1100 revert commit `94201755db` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 18:17:38 2024 +1100 revert commit `d5e625b674` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 18:04:40 2024 +1100 Update cross_entropy_loss.py commit `ba19344fb9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 18:03:45 2024 +1100 Update cross_entropy_loss.py commit `7916872408` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:48:19 2024 +1100 Update llama.py commit `49129bc66c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:43:19 2024 +1100 Update gemma.py commit `ebbd4756d4` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:41:53 2024 +1100 Update gemma.py commit `68e280405c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:40:21 2024 +1100 Update gemma.py commit `9af090f6ea` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:32:04 2024 +1100 Update gemma.py commit `61add47f37` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:31:10 2024 +1100 Update gemma.py commit `de96285442` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:29:51 2024 +1100 Update gemma.py commit `0abbcdc15a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:29:09 2024 +1100 Update gemma.py commit `17036a6e54` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:27:49 2024 +1100 Update llama.py commit `8b7de591c1` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:26:19 2024 +1100 Update gemma.py commit `041aa7d909` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:18:09 2024 +1100 Update gemma.py commit `71483e6c8d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:17:26 2024 +1100 Update gemma.py commit `8cb12078f9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:15:44 2024 +1100 Update gemma.py commit `23e6ebf14d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:13:40 2024 +1100 Update gemma.py commit `a6752f3f16` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:12:00 2024 +1100 Update gemma.py commit `9c565580fb` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:09:03 2024 +1100 Update gemma.py commit `cd479b4374` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:05:59 2024 +1100 Update cross_entropy_loss.py commit `0080357a88` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:03:54 2024 +1100 Update gemma.py commit `e8de606be9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:02:54 2024 +1100 Update gemma.py commit `35adcbf83c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 17:02:14 2024 +1100 Update gemma.py commit `7e33aa2520` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 16:59:24 2024 +1100 Update gemma.py commit `4c7d21e41b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 16:57:16 2024 +1100 Update gemma.py commit `3feae56451` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 16:55:21 2024 +1100 Update gemma.py commit `c060bad7ce` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 16:47:39 2024 +1100 Update gemma.py commit `8ffaf5f109` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 15:44:10 2024 +1100 Update gemma.py commit `be098122f1` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 15:42:12 2024 +1100 Update llama.py commit `8a618c60ed` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 14:29:15 2024 +1100 pos commit `ef235c3508` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 14:27:39 2024 +1100 Update gemma.py commit `a48adb0a83` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 14:27:31 2024 +1100 Update gemma.py commit `8920cda9a7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 14:13:31 2024 +1100 position_ids commit `a1eab801fa` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 14:06:48 2024 +1100 Update gemma.py commit `c07aff5188` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 14:05:18 2024 +1100 Update gemma.py commit `4c6e122caa` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 14:02:26 2024 +1100 norm commit `5c5bb53241` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 13:59:30 2024 +1100 Update llama.py commit `0e1826dd8c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 13:55:37 2024 +1100 Update llama.py commit `893b5dfe2b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 13:54:00 2024 +1100 revert commit `9ab87f3648` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 13:33:05 2024 +1100 Update cross_entropy_loss.py commit `d2dc658077` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 13:22:17 2024 +1100 Update geglu.py commit `1fcbd61f76` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 03:40:56 2024 +1100 Update cross_entropy_loss.py commit `fadcb311c3` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 03:40:43 2024 +1100 Update llama.py commit `1a2a10d028` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 03:38:25 2024 +1100 Update llama.py commit `17a1a855e0` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 03:33:46 2024 +1100 CE commit `097629108f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 03:30:43 2024 +1100 Update llama.py commit `30cc4ffd67` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 03:27:21 2024 +1100 Update llama.py commit `20227ba5c7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 03:14:29 2024 +1100 Update llama.py commit `73a8616f99` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 03:03:00 2024 +1100 Update llama.py commit `bd4fd22f34` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 02:58:05 2024 +1100 Update cross_entropy_loss.py commit `2814f02087` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 02:33:28 2024 +1100 Update cross_entropy_loss.py commit `dd321363f2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 02:30:49 2024 +1100 Update cross_entropy_loss.py commit `574b2f787f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 02:29:33 2024 +1100 Update cross_entropy_loss.py commit `6184f770e7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 02:10:38 2024 +1100 Update cross_entropy_loss.py commit `f5f3d6794e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 02:08:26 2024 +1100 Update cross_entropy_loss.py commit `88bf684c61` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 02:06:11 2024 +1100 Update cross_entropy_loss.py commit `603c71c7f0` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 24 02:02:11 2024 +1100 Fast CE Loss commit `25339b71f7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 18:12:15 2024 +1100 Update fast_lora.py commit `ebfc8f8a55` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 17:57:42 2024 +1100 Update fast_lora.py commit `6ab5eb75f4` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 17:55:58 2024 +1100 Update llama.py commit `f999bcd52d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 17:46:33 2024 +1100 Update llama.py commit `13d0cee4f6` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 17:37:17 2024 +1100 Update llama.py commit `b66f6dbfd3` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 17:28:29 2024 +1100 Update llama.py commit `5866b08f71` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 04:11:08 2024 +1100 gemma commit `dd478be2fa` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 03:59:09 2024 +1100 Update llama.py commit `2ff33fb270` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 03:52:24 2024 +1100 Update llama.py commit `4c9f366688` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 03:27:42 2024 +1100 Update cross_entropy_loss.py commit `cfebc8d979` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 03:27:03 2024 +1100 Update llama.py commit `4b009aad33` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 03:16:50 2024 +1100 Update llama.py commit `6f340f5417` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 02:38:16 2024 +1100 Update fast_lora.py commit `ff27a824fa` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 02:33:37 2024 +1100 Update llama.py commit `5d728270d3` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 02:28:05 2024 +1100 Update llama.py commit `e743e58cb6` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 02:23:25 2024 +1100 Update gemma.py commit `879bdd2efb` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 02:20:26 2024 +1100 Update gemma.py commit `9052a85f92` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 02:19:06 2024 +1100 Update gemma.py commit `bad295a413` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 02:15:12 2024 +1100 Update llama.py commit `b30d2502cc` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 02:12:02 2024 +1100 Update llama.py commit `11233cb3d4` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 02:10:11 2024 +1100 model_type commit `76de9c1fe3` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 02:05:31 2024 +1100 FastGemmaModel commit `c0d32de795` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 02:02:39 2024 +1100 Update fast_lora.py commit `bd2fa264c9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 01:57:34 2024 +1100 Update mapper.py commit `9a1b28d691` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 01:55:25 2024 +1100 Update pyproject.toml commit `0beaf18908` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 23 01:50:51 2024 +1100 Gemma commit `0659d90d95` Merge: `7372768` `1b7bf71` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 21 17:49:57 2024 +1100 Merge branch 'main' into nightly commit `7372768d14` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 21 03:29:49 2024 +1100 original commit `10d9d56434` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 21 03:25:46 2024 +1100 spaces commit `edd03f66fb` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 21 03:14:27 2024 +1100 trainer commit `028ee5ca06` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 21 02:05:33 2024 +1100 save commit `917f791ab7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 21 00:53:17 2024 +1100 Update save.py commit `bf3e10b26e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 21 00:48:18 2024 +1100 Update save.py commit `d266332141` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 20 23:28:37 2024 +1100 Update save.py commit `4d1e575047` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 20 20:01:33 2024 +1100 Update __init__.py commit `6aac6c4be8` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 20 19:50:29 2024 +1100 Update save.py commit `83d906a2c0` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 20 18:35:11 2024 +1100 Update save.py commit `5d88ffefd5` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 20 18:00:51 2024 +1100 Update save.py commit `4c9be6d057` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 20 17:12:18 2024 +1100 Update save.py commit `d3eac595a1` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 20 17:11:33 2024 +1100 Update save.py commit `632705b9b0` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 20 17:00:19 2024 +1100 Update save.py commit `8f60bf57be` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 20 15:59:38 2024 +1100 Update save.py commit `164fa807fe` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 20 04:44:17 2024 +1100 Update save.py commit `11e04a7057` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 20 04:42:47 2024 +1100 Update save.py commit `d2e76580ae` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 20 04:41:40 2024 +1100 Update save.py commit `3e3dc37640` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 20 04:41:34 2024 +1100 Update save.py commit `d9751e6553` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 20 04:35:07 2024 +1100 Update save.py commit `44659a05a3` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 20 04:13:58 2024 +1100 Update save.py commit `34998914ac` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 20 03:35:19 2024 +1100 saving commit `3728b8e876` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 20 02:19:36 2024 +1100 Update save.py commit `c20bb30710` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 20 02:07:01 2024 +1100 Update save.py commit `2c207b4989` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 20 00:10:04 2024 +1100 llama.cpp bugs commit `0ffd7b46f3` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 19 19:39:47 2024 +1100 linking commit `ad0c2bcd87` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 19 13:08:03 2024 +1100 Update save.py commit `0c1b71ac63` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 19 12:51:25 2024 +1100 Update save.py commit `14db89f4fc` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 19 04:13:40 2024 +1100 Update save.py commit `e31071f368` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 19 04:01:27 2024 +1100 Update save.py commit `aa45208f5a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 19 03:39:09 2024 +1100 Update save.py commit `28de27dd8d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 19 03:22:21 2024 +1100 PeftModel token + saving commit `99cdf0bd5a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 19 02:45:54 2024 +1100 Update save.py commit `71555c7678` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 19 02:37:43 2024 +1100 Update save.py commit `89d2418cd7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 23:35:18 2024 +1100 Update save.py commit `3e89d4e7ec` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 23:31:24 2024 +1100 Update save.py commit `457c044473` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 23:25:36 2024 +1100 install commit `22378e95cc` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 23:12:37 2024 +1100 Update pyproject.toml commit `3d789cb09e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 22:59:37 2024 +1100 Update save.py commit `31a62c52d3` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 22:54:38 2024 +1100 trainer commit `870edf3534` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 22:52:51 2024 +1100 Update save.py commit `b6a6e90b7a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 22:03:22 2024 +1100 Update save.py commit `02b7b7f3f2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 20:28:56 2024 +1100 Update save.py commit `53f6a07b92` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 20:22:52 2024 +1100 Update save.py commit `7ee8243e8b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 20:19:28 2024 +1100 Update save.py commit `94b5c58c2e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 20:11:51 2024 +1100 Update save.py commit `c6ad5f97a3` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 20:09:11 2024 +1100 Update loader.py commit `cc2764cddd` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 19:54:34 2024 +1100 Update save.py commit `0af3a1b143` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 19:44:59 2024 +1100 Update save.py commit `ef8abf4c6e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 19:25:56 2024 +1100 apache commit `164b950cad` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 19:24:35 2024 +1100 spaces commit `7aedc92637` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 19:21:02 2024 +1100 slashes commit `7573ee2c22` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 19:15:41 2024 +1100 slash commit `e51a381305` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 19:08:04 2024 +1100 globals commit `c660879cdc` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 18:30:03 2024 +1100 spaces commit `dcbbab3c22` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 18:25:19 2024 +1100 spaces commit `5edae1cd79` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 18:21:54 2024 +1100 readme commit `22f1f52513` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 18:18:12 2024 +1100 Update llama.py commit `2e65e63678` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 18:09:38 2024 +1100 saving bugs commit `c9a524a99a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 18 04:15:20 2024 +1100 Bugs commit `6d3a3b4286` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 15 19:24:07 2024 +1100 Fix RoPE precision issues commit `84b37f8548` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 15 03:53:58 2024 +1100 Update mapper.py commit `bd4a701e7a` Merge: `629c39d` `0439b85` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 15 03:35:47 2024 +1100 Merge branch 'main' into nightly commit `629c39d2e5` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 14 23:17:38 2024 +1100 Update mistral.py commit `227c26ca44` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 14 23:15:45 2024 +1100 Update llama.py commit `b660132380` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 14 23:11:06 2024 +1100 Saving, LlamaRotaryEmbedding issues commit `91a3c43468` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 14 17:58:41 2024 +1100 Update chat_templates.py commit `efbb1e6049` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 14 17:56:14 2024 +1100 patch tokenizer commit `5f5910ffee` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 14 17:45:02 2024 +1100 Update chat_templates.py commit `d40a12852e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 14 17:31:33 2024 +1100 Update chat_templates.py commit `7c713cb58c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 14 17:27:03 2024 +1100 Update chat_templates.py commit `b28a383d1c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 14 04:20:41 2024 +1100 Update chat_templates.py commit `4f20e20e28` Merge: `2cdf43d` `474fd32` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 14 03:25:40 2024 +1100 Merge branch 'main' into nightly commit `2cdf43d8b7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 12 04:28:41 2024 +1100 Chat Templates commit `acd635aa0d` Merge: `b7c5296` `3d5cf37` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 11 16:42:37 2024 +1100 Merge branch 'main' into nightly commit `b7c52963ad` Merge: `868fb27` `99b8d23` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 9 03:52:36 2024 +1100 Merge branch 'main' into nightly commit `868fb27e11` Merge: `601dc9e` `b7f24e8` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 8 16:50:19 2024 +1100 Merge branch 'main' into nightly commit `601dc9ec4b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 8 03:39:45 2024 +1100 Update llama.py commit `31de486f1c` Merge: `81128a4` `25cfc7f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 8 03:39:19 2024 +1100 revert commit `81128a4504` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 8 03:00:53 2024 +1100 Update llama.py commit `998097394a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 8 02:57:09 2024 +1100 Update llama.py commit `277ca9eecf` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 8 02:52:43 2024 +1100 Update llama.py commit `d6ab9c92d7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 8 02:49:26 2024 +1100 Update llama.py commit `e094914b0e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 8 02:30:49 2024 +1100 Update llama.py commit `9a54a6f05e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 7 20:15:19 2024 +1100 Update llama.py commit `e94647c2dc` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 7 19:25:46 2024 +1100 Update llama.py commit `e8593b7103` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 7 18:25:08 2024 +1100 Update llama.py commit `1065936ecb` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 7 18:21:36 2024 +1100 Update llama.py commit `60b47f6130` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 7 18:18:43 2024 +1100 Update llama.py commit `e43e819a57` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 7 18:02:58 2024 +1100 Update llama.py commit `1e77ab25b6` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 7 17:43:51 2024 +1100 Update llama.py commit `b36e0bfc89` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 7 17:34:59 2024 +1100 Update llama.py commit `ab0a3c9976` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 7 04:40:19 2024 +1100 Update mistral.py commit `2b346dc498` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 7 03:45:07 2024 +1100 Update save.py commit `17a6b12ee3` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 7 03:41:32 2024 +1100 Update save.py commit `7da7afcc78` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 7 03:20:55 2024 +1100 __version__ commit `9e2b00e167` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 7 03:19:33 2024 +1100 __version__ commit `9c3849fc1d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 7 03:18:02 2024 +1100 Update pyproject.toml commit `bfb3ea7179` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 7 02:57:54 2024 +1100 Update save.py commit `213cfee903` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 7 02:57:24 2024 +1100 Update save.py commit `8b52dc027e` Merge: `8e9d9c3` `bb66faa` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 7 02:50:35 2024 +1100 Merge branch 'main' into nightly commit `8e9d9c38b1` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Feb 7 01:48:31 2024 +1100 SWA inference commit `d0b1144423` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 6 18:53:32 2024 +1100 Fix llm_int8_skip_modules commit `39a2a7c57d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 6 18:50:48 2024 +1100 Fix SWA inference commit `aa0427f898` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 6 02:11:06 2024 +1100 Update save.py commit `262289d5a5` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 6 01:54:22 2024 +1100 Update save.py commit `53ca91f744` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Feb 6 01:30:48 2024 +1100 Update save.py commit `e487abd94a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 5 19:07:00 2024 +1100 Update save.py commit `797a87a3e3` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 5 18:53:27 2024 +1100 Update save.py commit `e9031ceabe` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 5 18:48:40 2024 +1100 mistral swa commit `33192d6dc8` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 5 18:05:25 2024 +1100 Update save.py commit `03ed3a83d5` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 5 17:38:01 2024 +1100 Torch 2.2.0 commit `a50daa19d8` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 5 17:32:43 2024 +1100 Update save.py commit `d69cef9f23` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 5 17:16:20 2024 +1100 Update save.py commit `1750b13a63` Merge: `63ed23a` `efa0d23` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Feb 5 02:29:56 2024 +1100 Merge branch 'main' into nightly commit `63ed23ae98` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 14:02:26 2024 +1100 Update utils.py commit `990068b977` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 13:50:21 2024 +1100 Update utils.py commit `6ab4019e4a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 04:07:47 2024 +1100 Update llama.py commit `7ab3426608` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 03:57:29 2024 +1100 Update llama.py commit `201d90c3ba` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 03:54:12 2024 +1100 Update llama.py commit `00242d5f89` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 03:52:08 2024 +1100 Update llama.py commit `3bfb0ebf9d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 03:50:31 2024 +1100 Update llama.py commit `71899d7060` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 03:48:18 2024 +1100 Update llama.py commit `9cd2517949` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 03:46:30 2024 +1100 Update llama.py commit `24431b4347` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 03:45:25 2024 +1100 Update llama.py commit `63816fc119` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 03:42:41 2024 +1100 Update llama.py commit `7166b11599` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 03:41:15 2024 +1100 Update llama.py commit `d867b9bbdf` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 03:27:15 2024 +1100 Update llama.py commit `9a5ebef148` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 03:23:26 2024 +1100 Update llama.py commit `65270cec60` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 03:13:42 2024 +1100 Update llama.py commit `80fa8e93c9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 03:12:07 2024 +1100 Update llama.py commit `36b400e598` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 03:08:49 2024 +1100 Update llama.py commit `4a5d3b1de7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 03:05:49 2024 +1100 Update llama.py commit `88a695da3a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 02:49:51 2024 +1100 Update llama.py commit `ac9bc79251` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 02:42:30 2024 +1100 Update llama.py commit `8c8685eeef` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 02:40:30 2024 +1100 Update llama.py commit `607dfa1d0e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 02:35:17 2024 +1100 Update llama.py commit `54802ecbb9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 02:32:21 2024 +1100 New version commit `711e5c0922` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 02:19:42 2024 +1100 attention_mask commit `74d7fc65c6` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 02:03:23 2024 +1100 SDPA commit `fcb884643b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 02:02:51 2024 +1100 Update llama.py commit `aa032fc80d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 01:46:04 2024 +1100 Update mistral.py commit `31578f2010` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Feb 4 01:42:46 2024 +1100 fast inference commit `257cd7d531` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 23:22:21 2024 +1100 Update llama.py commit `e500b785f7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 23:12:16 2024 +1100 Update llama.py commit `cf0fae9a55` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 23:01:56 2024 +1100 Update llama.py commit `381b991c45` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 22:42:11 2024 +1100 Update llama.py commit `665908eb70` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 22:28:20 2024 +1100 Update llama.py commit `5534f8a8f1` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 20:22:42 2024 +1100 more temp matrices commit `68db1c7af7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 19:44:36 2024 +1100 fast inference again commit `d76f583349` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 19:30:22 2024 +1100 Update mistral.py commit `9225dd6708` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 19:19:13 2024 +1100 Update llama.py commit `8270821269` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 19:08:42 2024 +1100 Update llama.py commit `1c6e1f18b4` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 18:32:18 2024 +1100 Update llama.py commit `dd03abedd6` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 18:20:06 2024 +1100 Update llama.py commit `522f6dbfd2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 18:07:55 2024 +1100 Update llama.py commit `ad357dec89` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 03:52:25 2024 +1100 fast inference + saving config.json commit `a78d6fba7e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 03:34:02 2024 +1100 Update llama.py commit `fa3d23406e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 03:10:39 2024 +1100 Update utils.py commit `8e3f0296be` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 02:55:22 2024 +1100 Update utils.py commit `404e177adb` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 02:30:47 2024 +1100 Update utils.py commit `a81d193830` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 02:20:00 2024 +1100 Update utils.py commit `4436a1099d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 01:50:48 2024 +1100 Update llama.py commit `a15ffc95e0` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 01:48:03 2024 +1100 Update llama.py commit `0be46406d8` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 01:39:39 2024 +1100 Update llama.py commit `e911047188` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 01:35:45 2024 +1100 Update llama.py commit `c934c16e21` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 01:29:50 2024 +1100 Update llama.py commit `705bbba5c5` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Feb 3 01:22:53 2024 +1100 Update llama.py commit `9c2bed35b9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 23:45:02 2024 +1100 Update llama.py commit `ea9b4eea0c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 23:40:52 2024 +1100 Update llama.py commit `c6ad936f88` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 23:32:58 2024 +1100 Update llama.py commit `03df291110` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 23:32:36 2024 +1100 Update llama.py commit `dc2740416b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 22:26:47 2024 +1100 Update llama.py commit `b33c92d1bd` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 22:22:42 2024 +1100 Update llama.py commit `923c6bad14` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 22:20:45 2024 +1100 Update llama.py commit `b3703926ca` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 22:17:02 2024 +1100 Update llama.py commit `1b274212e1` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 20:35:00 2024 +1100 Update llama.py commit `9df19aeb9f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 20:32:21 2024 +1100 past_key_values commit `82eea75cde` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 20:29:35 2024 +1100 torch compile commit `e748078095` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 20:24:44 2024 +1100 Update llama.py commit `9146fa443d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 20:24:28 2024 +1100 Update llama.py commit `a5a123db08` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 20:20:14 2024 +1100 Update llama.py commit `82ead808d5` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 20:19:15 2024 +1100 Update llama.py commit `1497521e5c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 20:06:52 2024 +1100 Update mistral.py commit `c3c454a729` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 20:06:31 2024 +1100 Update llama.py commit `f299c9cde1` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 19:44:52 2024 +1100 Update llama.py commit `0e1d67d4e1` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 19:07:41 2024 +1100 fast inference commit `168ded977e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 18:28:05 2024 +1100 Update llama.py commit `bf441055ca` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 18:14:31 2024 +1100 faster inference commit `7c2b04254c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 16:59:02 2024 +1100 Update llama.py commit `ca99d7c194` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 16:46:30 2024 +1100 Update mistral.py commit `40e8848c4e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 15:56:56 2024 +1100 Update llama.py commit `6cc9835021` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 13:52:12 2024 +1100 Update llama.py commit `7ad1a1fa62` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 04:18:34 2024 +1100 Update llama.py commit `0b661a23b9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Feb 2 03:55:36 2024 +1100 Update llama.py commit `da64d3403c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 23:54:01 2024 +1100 Update llama.py commit `abc47836dc` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 23:41:32 2024 +1100 Update llama.py commit `f231c4f395` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 23:41:23 2024 +1100 Update llama.py commit `5f3c51b394` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 23:41:11 2024 +1100 Update llama.py commit `e0ea238256` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 23:40:36 2024 +1100 Update llama.py commit `73f63d6884` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 23:38:07 2024 +1100 Update llama.py commit `e4b5e38800` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 23:17:53 2024 +1100 inference commit `334c5ed1f0` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 22:48:54 2024 +1100 Update llama.py commit `cd39f6108f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 22:34:33 2024 +1100 lm_head commit `24c4c37b7c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 22:21:35 2024 +1100 revert commit `e791db9b06` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 20:12:59 2024 +1100 Update llama.py commit `1793a16c8f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 20:03:54 2024 +1100 faster inference commit `19fb50e244` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 19:56:54 2024 +1100 Update utils.py commit `b83cea7cb4` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 19:56:06 2024 +1100 Update llama.py commit `8920cafbe7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 19:46:38 2024 +1100 inference commit `38b59825b1` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 19:36:36 2024 +1100 Update llama.py commit `e8ec80a4c2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 19:19:44 2024 +1100 Update llama.py commit `a2cb7a113b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 19:07:30 2024 +1100 Update llama.py commit `a5ee70b63f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 18:34:43 2024 +1100 Update llama.py commit `e2f0dd8683` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 18:16:46 2024 +1100 Update llama.py commit `3f0ddf0c7c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 18:16:15 2024 +1100 Update llama.py commit `e90b3bf192` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 17:59:21 2024 +1100 Update llama.py commit `57044509ad` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 17:44:22 2024 +1100 Update llama.py commit `cf4b58eeb6` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 17:20:02 2024 +1100 inference commit `648c79ec1b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 03:47:51 2024 +1100 faster inference commit `20f19391e6` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 03:05:02 2024 +1100 Update mistral.py commit `e2f72fe52f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 03:04:46 2024 +1100 Revert commit `329f80ac4c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 02:47:57 2024 +1100 Update llama.py commit `713a95ca0e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Feb 1 02:46:26 2024 +1100 Inference commit `acbdef7ff5` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Jan 31 20:16:08 2024 +1100 padding commit `0dc26ed98a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Jan 31 20:06:28 2024 +1100 Update llama.py commit `9f31254539` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Jan 31 20:02:21 2024 +1100 Fix SDPA commit `7227de48cf` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Jan 31 19:48:33 2024 +1100 Update llama.py commit `5edad55b5d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Jan 31 19:44:19 2024 +1100 Update llama.py commit `c928c57aa9` Merge: `5da0555` `2f55935` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Jan 31 19:38:27 2024 +1100 Merge branch 'main' into nightly commit `5da05558a0` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Jan 31 03:50:47 2024 +1100 past_key_value commit `d347db0944` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Jan 31 03:46:11 2024 +1100 Update loader.py commit `c0edaa46db` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Jan 31 03:40:55 2024 +1100 revert inference commit `55fe6052ca` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Jan 31 03:36:40 2024 +1100 if past_key_value is not None and q_len == 1: commit `248887b51c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Jan 31 03:35:24 2024 +1100 LlamaAttention_fast_forward_inference commit `c68a3bc9ec` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Jan 31 03:28:30 2024 +1100 Update loader.py commit `44168377f3` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Jan 31 01:32:56 2024 +1100 Update rope_embedding.py commit `270df81b60` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 30 17:33:08 2024 +1100 Remove fast path commit `b8f665bf22` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 30 17:26:45 2024 +1100 fast lm_head commit `ed5a653ecf` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 30 04:16:39 2024 +1100 Update mistral.py commit `e0bad0eec5` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 30 04:10:14 2024 +1100 Fix inference commit `71725aeea5` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 30 03:23:42 2024 +1100 Update __init__.py commit `3ddda6f492` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 30 03:08:46 2024 +1100 Update mistral.py commit `6f74c98fbc` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 23:19:43 2024 +1100 Update utils.py commit `a7bfeec919` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 23:14:51 2024 +1100 Update utils.py commit `7c87d60bc1` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 23:07:40 2024 +1100 Update utils.py commit `bb364204cc` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 20:09:59 2024 +1100 Update llama.py commit `4700d51916` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 19:55:40 2024 +1100 Fast inference repatch commit `58cabcbc06` Merge: `01b8162` `a3a2ad9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 17:59:34 2024 +1100 Merge branch 'main' into nightly commit `01b8162244` Merge: `5cfea20` `90309ca` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 17:49:57 2024 +1100 Merge branch 'main' into nightly commit `5cfea20129` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 17:35:19 2024 +1100 Update llama.py commit `25a88ea003` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 16:52:04 2024 +1100 Update llama.py commit `03ca52dc94` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 03:43:39 2024 +1100 saving commit `01e5c305f9` Merge: `e10e488` `a16bc73` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 03:43:19 2024 +1100 Merge branch 'main' into nightly commit `e10e48893b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 03:43:17 2024 +1100 Update save.py commit `5bd916b91b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 03:43:05 2024 +1100 Update mistral.py commit `11ba2c520b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 03:34:27 2024 +1100 Mistral patch commit `498dfb8acc` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 02:42:08 2024 +1100 print commit `a3d2b9b778` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 02:15:52 2024 +1100 Update save.py commit `9e00cc287a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 00:59:29 2024 +1100 Update save.py commit `460de24ea2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 00:57:11 2024 +1100 Update save.py commit `b060d7b621` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 00:56:44 2024 +1100 Update save.py commit `74b69775c2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 00:51:57 2024 +1100 Update save.py commit `fef0589d6e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 00:51:45 2024 +1100 Update save.py commit `9dec4b3f08` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 29 00:49:31 2024 +1100 Update save.py commit `ee0bf6fe66` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 20:07:17 2024 +1100 Update save.py commit `c69c166b82` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 18:12:11 2024 +1100 patch_saving_functions commit `ac02ba6c38` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 18:08:52 2024 +1100 Update save.py commit `20e524a596` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 18:04:35 2024 +1100 Update save.py commit `788e695180` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 17:48:34 2024 +1100 Patch saving commit `31222ced74` Merge: `893aab0` `af33224` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 17:48:28 2024 +1100 Merge branch 'main' into nightly commit `893aab0e57` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 17:20:49 2024 +1100 Update dpo.py commit `d5c852e711` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 16:47:33 2024 +1100 Update llama.py commit `ddb48efd33` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 16:43:10 2024 +1100 Update llama.py commit `5fe166d32e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 16:40:59 2024 +1100 Update llama.py commit `2d64e0a904` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 16:37:21 2024 +1100 Update mistral.py commit `2f73cb4049` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 16:31:01 2024 +1100 Update llama.py commit `6aa46ffff6` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 15:13:54 2024 +1100 Update llama.py commit `ee6f5096ed` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 13:53:06 2024 +1100 attention mask commit `f1b0fd0848` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 13:37:38 2024 +1100 Update mistral.py commit `7663a32753` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 04:29:52 2024 +1100 Update save.py commit `a158003625` Merge: `36829b7` `e2bbd38` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 04:29:35 2024 +1100 Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly commit `36829b7ec4` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 04:27:10 2024 +1100 Update save.py commit `917ce15861` Merge: `166f8c8` `a81aff2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 04:20:15 2024 +1100 Merge branch 'main' into nightly commit `166f8c812e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 04:18:19 2024 +1100 attention mask commit `6c7f0dbcb4` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 04:08:19 2024 +1100 Update llama.py commit `c836ed7f3d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 04:08:03 2024 +1100 Update mistral.py commit `12d57e5308` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 03:59:23 2024 +1100 labels commit `4c5ebcc960` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 03:56:54 2024 +1100 Update llama.py commit `6a027a8292` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 03:55:23 2024 +1100 Update llama.py commit `9f9739cbac` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 03:49:54 2024 +1100 attention_mask commit `2bd77e7277` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 02:34:03 2024 +1100 Update fast_lora.py commit `a1e5aca7cc` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 01:27:46 2024 +1100 Update fast_lora.py commit `a2f705d65b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 01:17:27 2024 +1100 Update fast_lora.py commit `d01ba458df` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 28 00:45:55 2024 +1100 Update fast_lora.py commit `6fa0635971` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 23:20:41 2024 +1100 Update fast_lora.py commit `e094af81a1` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 20:39:19 2024 +1100 Update fast_lora.py commit `c74e1af85c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 20:37:43 2024 +1100 Update fast_lora.py commit `363ffba1c2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 20:17:08 2024 +1100 Update fast_lora.py commit `510c85f412` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 20:04:55 2024 +1100 Update swiglu.py commit `35daafdd6e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 19:53:57 2024 +1100 Update fast_lora.py commit `6201f7681f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 19:38:28 2024 +1100 Update fast_lora.py commit `86a1c9788b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 19:29:53 2024 +1100 Update fast_lora.py commit `8e0e4ccdc6` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 19:18:59 2024 +1100 Update fast_lora.py commit `2e4c59deaf` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 19:05:02 2024 +1100 Update swiglu.py commit `85e87d9ba2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 19:03:04 2024 +1100 Swiglu commit `83b6937285` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 18:21:29 2024 +1100 Update fast_lora.py commit `3d3e7f5b42` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 18:15:34 2024 +1100 Update fast_lora.py commit `d3f3b6fc49` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 18:00:41 2024 +1100 Update fast_lora.py commit `e77d7c069f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 18:00:22 2024 +1100 Update fast_lora.py commit `f7d11d10f8` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 17:47:54 2024 +1100 Update fast_lora.py commit `8ed03f5f45` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 17:38:30 2024 +1100 Update pyproject.toml commit `af65cb0d3d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 04:50:47 2024 +1100 Works? commit `fb5333726a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 04:49:36 2024 +1100 Update llama.py commit `a59ec7903c` Merge: `704e36a` `7da0c50` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 04:48:40 2024 +1100 Merge branch 'main' into nightly commit `704e36a64e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 04:48:34 2024 +1100 Revert "Update llama.py" This reverts commit `a208ec46e0`. commit `a208ec46e0` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 04:48:03 2024 +1100 Update llama.py commit `bd2ff90817` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 04:47:42 2024 +1100 Update llama.py commit `a3d892a15b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 04:47:01 2024 +1100 Update llama.py commit `b89599a3f7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 04:46:47 2024 +1100 Update llama.py commit `baeea64917` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 04:20:58 2024 +1100 Update save.py commit `ecdbb28dcb` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 04:19:59 2024 +1100 Update save.py commit `393341f211` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 04:18:25 2024 +1100 Update swiglu.py commit `47babc780a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 04:16:38 2024 +1100 Update fast_lora.py commit `9edc309f59` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 04:15:37 2024 +1100 Update llama.py commit `a27ac6175e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 04:03:47 2024 +1100 Update utils.py commit `7fb64e0e31` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 03:58:21 2024 +1100 Update fast_lora.py commit `e3bd0bb74f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 03:51:53 2024 +1100 Update save.py commit `ba847f5411` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 03:38:33 2024 +1100 Update fast_lora.py commit `c57495df6f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 03:38:19 2024 +1100 Update fast_lora.py commit `d379bb8cf5` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 03:36:27 2024 +1100 Update fast_lora.py commit `421ed33b42` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 03:23:50 2024 +1100 Update fast_lora.py commit `83ceb11cc9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 03:12:53 2024 +1100 Update fast_lora.py commit `f3da0d2e4a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 02:52:54 2024 +1100 Update fast_lora.py commit `4ae3ad3bce` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 02:34:21 2024 +1100 Update fast_lora.py commit `c74d3dd3b9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 02:22:15 2024 +1100 Update fast_lora.py commit `0a1aa98d3c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 02:19:00 2024 +1100 Update fast_lora.py commit `f7760719f0` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 01:28:03 2024 +1100 Update fast_lora.py commit `38bff800b6` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 27 01:16:31 2024 +1100 Update fast_lora.py commit `485d54fd70` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 23:48:17 2024 +1100 Update fast_lora.py commit `ce08a15cee` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 23:33:17 2024 +1100 Update fast_lora.py commit `b4492eb29c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 23:14:44 2024 +1100 Update fast_lora.py commit `3865c8cd68` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 23:06:06 2024 +1100 Update fast_lora.py commit `e9e34f1d4d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 22:29:38 2024 +1100 Update fast_lora.py commit `d96f665510` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 22:21:54 2024 +1100 Update swiglu.py commit `f8aa20d2db` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 22:09:32 2024 +1100 Update fast_lora.py commit `ae0d9380c1` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 22:07:10 2024 +1100 Update swiglu.py commit `694da2dba3` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 20:06:52 2024 +1100 Update fast_lora.py commit `4f573494da` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 20:04:51 2024 +1100 Update fast_lora.py commit `cb06ce849b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 19:41:48 2024 +1100 Update fast_lora.py commit `e0a36b356b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 19:30:18 2024 +1100 Update fast_lora.py commit `c4f0de58a9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 19:21:00 2024 +1100 Update fast_lora.py commit `a0a409d5af` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 19:20:53 2024 +1100 Update fast_lora.py commit `97658f9a74` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 19:12:08 2024 +1100 Update fast_lora.py commit `979de52220` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 19:00:22 2024 +1100 Update fast_lora.py commit `39d251757f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 18:19:36 2024 +1100 Update fast_lora.py commit `77365cc740` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 18:11:43 2024 +1100 Update llama.py commit `4bfbccec4b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 18:04:59 2024 +1100 Update fast_lora.py commit `56dffca23f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 17:44:10 2024 +1100 Update llama.py commit `259097bc61` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 17:36:40 2024 +1100 Update fast_lora.py commit `b8de6b6a9b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 17:25:36 2024 +1100 Update fast_lora.py commit `796aa4d0ef` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 17:13:09 2024 +1100 Update fast_lora.py commit `9ecb3bc869` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 17:04:33 2024 +1100 Update fast_lora.py commit `7bed11a8ba` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 16:43:50 2024 +1100 Update fast_lora.py commit `317bbc57fe` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 16:16:32 2024 +1100 Update fast_lora.py commit `c4ac728d4c` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 16:08:10 2024 +1100 Update fast_lora.py commit `ab44945336` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 14:06:25 2024 +1100 Update fast_lora.py commit `84772de40f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 13:59:57 2024 +1100 Update fast_lora.py commit `653f1036ae` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 03:49:48 2024 +1100 Update fast_lora.py commit `46ec8bbc3d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 03:15:23 2024 +1100 Repatch commit `99eeebf72a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 02:41:15 2024 +1100 Update swiglu.py commit `ae156d9154` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 02:07:35 2024 +1100 Update llama.py commit `29945bd8c1` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 26 01:33:23 2024 +1100 Update llama.py commit `194093e375` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Jan 25 23:35:56 2024 +1100 remove patching commit `4f60055f9e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Jan 25 23:25:17 2024 +1100 Update fast_lora.py commit `2d25facc7b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Jan 25 23:14:37 2024 +1100 Update fast_lora.py commit `380f2fd6e4` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Jan 25 19:05:43 2024 +1100 Fix saving and bnb-4bit commit `0474451a6a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Jan 25 03:55:26 2024 +1100 Update pyproject.toml commit `5a5d34b8ae` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Jan 25 03:47:23 2024 +1100 Update mapper.py commit `bbf5ef65af` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Jan 25 03:46:23 2024 +1100 Graceful FA2 error + torch 2.1.1 commit `5ea888c578` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Thu Jan 25 02:29:37 2024 +1100 Update to transformers 4.37 commit `d2f1521a49` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 23:23:05 2024 +1100 incorrect inference commit `f186fe9bc9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 19:44:45 2024 +1100 Update mistral.py commit `e184b06c4b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 19:28:17 2024 +1100 Update mistral.py commit `b314500993` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 19:23:58 2024 +1100 q_len issue commit `d6e85d7731` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 19:17:42 2024 +1100 q_len == 1 commit `5d9e68181e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 19:11:17 2024 +1100 hidden_states commit `f311a8efcc` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 18:04:20 2024 +1100 Update llama.py commit `eeee6333ec` Merge: `20d8f22` `04f8771` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 18:04:10 2024 +1100 Merge branch 'main' into nightly commit `20d8f223d7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 03:47:28 2024 +1100 Fast LoRA saving commit `1bb1c3c2b9` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 03:36:50 2024 +1100 LoRA commit `24c7a67556` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 02:44:52 2024 +1100 Update llama.py commit `d87ef86991` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 02:22:37 2024 +1100 Update llama.py commit `a77c448939` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 02:18:37 2024 +1100 Update llama.py commit `5260ec2a0a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 02:13:01 2024 +1100 Update llama.py commit `716e03fe1b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 02:09:35 2024 +1100 Update llama.py commit `00f50876f7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 02:03:25 2024 +1100 Update llama.py commit `e5b5333137` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 02:00:10 2024 +1100 Update llama.py commit `9a5062e6c4` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 01:44:10 2024 +1100 Update llama.py commit `1289ae825b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 01:41:20 2024 +1100 RoPE commit `7e4140ebfc` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 01:37:36 2024 +1100 Update llama.py commit `1ba28d8e26` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 01:33:51 2024 +1100 Update llama.py commit `3e1b244d5e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 01:29:03 2024 +1100 Fast inference RoPE commit `8647f0e86e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 01:06:34 2024 +1100 Update llama.py commit `085a8e944a` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 23 00:32:57 2024 +1100 inference commit `f41a437540` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 22 22:34:33 2024 +1100 Update utils.py commit `31f0c9c08d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 22 22:31:38 2024 +1100 Update utils.py commit `4220f6a6dc` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 22 22:26:55 2024 +1100 No print commit `e1cbc9e423` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 22 22:21:29 2024 +1100 Update utils.py commit `050c61bc0b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 22 20:04:27 2024 +1100 Update utils.py commit `7c3b647ba7` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 22 20:01:59 2024 +1100 fast_linear_forward commit `8c613e851f` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 22 19:36:17 2024 +1100 Apache 2 commit `a18f982e67` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 22 19:31:34 2024 +1100 Max sequence lengths commit `e38485feaa` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 22 18:18:23 2024 +1100 Mistral correct RoPE scaling commit `278de9f375` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 22 04:33:44 2024 +1100 Update llama.py commit `0666589ace` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 22 04:28:24 2024 +1100 Update save.py commit `e34393f975` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 22 04:25:37 2024 +1100 Update llama.py commit `2a3d4f3a8d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 22 04:18:42 2024 +1100 fast inference commit `a5ab4dc21a` Merge: `8828ece` `3a9b2de` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Mon Jan 22 01:58:00 2024 +1100 Merge branch 'main' into nightly commit `8828eceb5b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 21 22:18:53 2024 +1100 Update llama.py commit `fcde58859b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 21 20:06:14 2024 +1100 Update llama.py commit `2f80890578` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 21 19:21:56 2024 +1100 Update llama.py commit `7be801ff8d` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 21 19:10:48 2024 +1100 Update llama.py commit `92512e670e` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 21 19:01:45 2024 +1100 Update llama.py commit `5c927e4930` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 21 19:01:28 2024 +1100 Update mistral.py commit `fe2bc30987` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 21 18:52:25 2024 +1100 Update llama.py commit `ac99a47a45` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 21 18:52:15 2024 +1100 Update llama.py commit `5bf108ebda` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 21 18:15:55 2024 +1100 Update llama.py commit `b591f33a37` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 21 18:06:44 2024 +1100 Update llama.py commit `da7b4f59ee` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 21 17:32:37 2024 +1100 Update save.py commit `196ab974d6` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 21 16:53:58 2024 +1100 Update llama.py commit `cbc1c69e29` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 21 16:22:28 2024 +1100 faster saving & inference	2024-03-17 22:21:36 +11:00
Daniel Han	64d847bede	Bug fixes (#257 ) * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update pyproject.toml * Small fixes * Update pyproject.toml * Approx gelu * Update geglu.py * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py * Update gemma.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Fix Gemma merging * Update rms_layernorm.py * Update gemma.py * Update pyproject.toml * Layernorms * Gemma precision * Update gemma.py * sqrt * Update gemma.py * Update save.py * RoPE and Gemma precision * Update rms_layernorm.py * Fix warning * Update chat_templates.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py * Update save.py * GGUF incorrect * Update save.py * Update pyproject.toml * kaggle new * Update pyproject.toml * Update pyproject.toml * upcasting * Fix Colab * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update rope_embedding.py * Update rope_embedding.py * Fix bugs * Update fast_lora.py * Update fast_lora.py * Update README.md * Update README.md * GGUF * Update save.py * Update save.py * Update save.py * Update save.py * Update README.md * Update README.md * Bugs * Update fast_lora.py * Update pyproject.toml * Update fast_lora.py	2024-03-17 22:09:50 +11:00
Daniel Han	c599ae0f27	Bug fixes (#249 ) * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update pyproject.toml * Small fixes * Update pyproject.toml * Approx gelu * Update geglu.py * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py * Update gemma.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Fix Gemma merging * Update rms_layernorm.py * Update gemma.py * Update pyproject.toml * Layernorms * Gemma precision * Update gemma.py * sqrt * Update gemma.py * Update save.py * RoPE and Gemma precision * Update rms_layernorm.py * Fix warning * Update chat_templates.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py * Update save.py * GGUF incorrect * Update save.py * Update pyproject.toml * kaggle new * Update pyproject.toml * Update pyproject.toml * upcasting * Fix Colab * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update rope_embedding.py * Update rope_embedding.py * Fix bugs * Update fast_lora.py * Update fast_lora.py * Update README.md * Update README.md * GGUF * Update save.py * Update save.py * Update save.py * Update save.py * Update README.md * Update README.md	2024-03-17 02:47:05 +11:00
Qubitium	39713e66ed	Fix single gpu limit code overriding the wrong cuda gpu id via env (#228 )	2024-03-16 00:12:16 +11:00
HuyNguyen-hust	e29a630cd3	10% faster RoPE embedding from HuyNguyen-hust (#238 )	2024-03-16 00:09:59 +11:00
Daniel Han	990c7a809c	Gemma GGUF chat templates work! (#246 ) * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update pyproject.toml * Small fixes * Update pyproject.toml * Approx gelu * Update geglu.py * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py * Update gemma.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Fix Gemma merging * Update rms_layernorm.py * Update gemma.py * Update pyproject.toml * Layernorms * Gemma precision * Update gemma.py * sqrt * Update gemma.py * Update save.py * RoPE and Gemma precision * Update rms_layernorm.py * Fix warning * Update chat_templates.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py * Update save.py * GGUF incorrect * Update save.py * Update pyproject.toml * kaggle new * Update pyproject.toml * Update pyproject.toml * upcasting * Fix Colab * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py	2024-03-15 05:09:45 +11:00
Daniel Han	2c5c5bb4bb	Fix Gemma GGUF (#234 ) * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update pyproject.toml * Small fixes * Update pyproject.toml * Approx gelu * Update geglu.py * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py * Update gemma.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Fix Gemma merging * Update rms_layernorm.py * Update gemma.py * Update pyproject.toml * Layernorms * Gemma precision * Update gemma.py * sqrt * Update gemma.py * Update save.py * RoPE and Gemma precision * Update rms_layernorm.py * Fix warning * Update chat_templates.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py * Update save.py * GGUF incorrect * Update save.py * Update pyproject.toml * kaggle new * Update pyproject.toml * Update pyproject.toml * upcasting * Fix Colab * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml	2024-03-14 20:32:04 +11:00
Daniel Han	32223779c4	Fix more bugs (#232 ) * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update pyproject.toml * Small fixes * Update pyproject.toml * Approx gelu * Update geglu.py * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py * Update gemma.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Fix Gemma merging * Update rms_layernorm.py * Update gemma.py * Update pyproject.toml * Layernorms * Gemma precision * Update gemma.py * sqrt * Update gemma.py * Update save.py * RoPE and Gemma precision * Update rms_layernorm.py * Fix warning * Update chat_templates.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update chat_templates.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Account for DoRA * Update llama.py	2024-03-11 04:31:03 +11:00
Daniel Han	8bea94c137	Saving fixes (#231 ) * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update pyproject.toml * Small fixes * Update pyproject.toml * Approx gelu * Update geglu.py * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py * Update gemma.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Fix Gemma merging * Update rms_layernorm.py * Update gemma.py * Update pyproject.toml * Layernorms * Gemma precision * Update gemma.py * sqrt * Update gemma.py * Update save.py * RoPE and Gemma precision * Update rms_layernorm.py * Fix warning * Update chat_templates.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten * Update llama.py * Update llama.py * Update llama.py * Update save.py * Accuracy * Revert * Update save.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py	2024-03-10 20:09:34 +11:00
Daniel Han	1fcf9d4577	Fix bugs (#230 ) * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update pyproject.toml * Small fixes * Update pyproject.toml * Approx gelu * Update geglu.py * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py * Update gemma.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Fix Gemma merging * Update rms_layernorm.py * Update gemma.py * Update pyproject.toml * Layernorms * Gemma precision * Update gemma.py * sqrt * Update gemma.py * Update save.py * RoPE and Gemma precision * Update rms_layernorm.py * Fix warning * Update chat_templates.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update llama.py * model_name * Update loader.py * Tokenizer overwritten	2024-03-10 04:54:23 +11:00
Daniel Han	70f271b1d3	Fix Gemma (#223 ) * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update pyproject.toml * Small fixes * Update pyproject.toml * Approx gelu * Update geglu.py * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py * Update gemma.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Fix Gemma merging * Update rms_layernorm.py * Update gemma.py * Update pyproject.toml * Layernorms * Gemma precision * Update gemma.py * sqrt * Update gemma.py * Update save.py * RoPE and Gemma precision * Update rms_layernorm.py * Fix warning * Update chat_templates.py	2024-03-07 04:34:06 +11:00
Daniel Han	fedcafe281	Fix Gemma norm float32 (#217 ) * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update pyproject.toml * Small fixes * Update pyproject.toml * Approx gelu * Update geglu.py * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py * Update gemma.py * Update rms_layernorm.py * Update rms_layernorm.py * Update rms_layernorm.py * Update gemma.py	2024-03-04 16:19:55 +11:00
Daniel Han	7b7665d9d6	Fix Gemma fast inference (#215 ) * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update pyproject.toml * Small fixes * Update pyproject.toml * Approx gelu * Update geglu.py * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py * Update gemma.py	2024-03-03 19:36:06 +11:00
Daniel Han	fa2a43baf3	Fix Gemma activation function (#214 ) * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update pyproject.toml * Small fixes * Update pyproject.toml * Approx gelu * Update geglu.py * Approx gelu * Update llama.py * Update __init__.py * Update __init__.py * Update _utils.py * Update geglu.py	2024-03-03 18:21:44 +11:00
Daniel Han	307f2da353	Nightly (#204 ) * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py * Update llama.py * Hotfix - fix DoRA, Gemma prompt template (#202) (#203) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py	2024-02-29 00:18:38 +11:00
Daniel Han	25619645dd	Hotfix - fix DoRA, Gemma prompt template (#202 ) * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md * Update llama.py * DoRA * Update _utils.py * Update chat_templates.py	2024-02-29 00:15:22 +11:00
Daniel Han	e7c53fb370	2.4x faster Gemma (#197 ) * Update save.py * Update save.py * linking * llama.cpp bugs * Update save.py * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original * Gemma * Update pyproject.toml * Update mapper.py * Update fast_lora.py * FastGemmaModel * model_type * Update llama.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update fast_lora.py * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * gemma * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Fast CE Loss * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * CE * Update llama.py * Update llama.py * Update cross_entropy_loss.py * Update geglu.py * Update cross_entropy_loss.py * revert * Update llama.py * Update llama.py * norm * Update gemma.py * Update gemma.py * position_ids * Update gemma.py * Update gemma.py * pos * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * revert * revert * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * rope * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * llama * Update llama.py * gemma * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update save.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update gemma.py * correct_dtype * Update gemma.py * Update cross_entropy_loss.py * Update cross_entropy_loss.py * Chat Templates * Update README.md * Update README.md	2024-02-27 01:42:10 +11:00
Daniel Han	1b7bf718cc	Feb 2024 Release (#187 ) * Fast inference repatch * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update mistral.py * Update __init__.py * Fix inference * Update mistral.py * fast lm_head * Remove fast path * Update rope_embedding.py * Update loader.py * LlamaAttention_fast_forward_inference * if past_key_value is not None and q_len == 1: * revert inference * Update loader.py * past_key_value * Update llama.py * Update llama.py * Fix SDPA * Update llama.py * padding * Inference * Update llama.py * Revert * Update mistral.py * faster inference * inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * inference * Update llama.py * Update utils.py * faster inference * Update llama.py * revert * lm_head * Update llama.py * inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * faster inference * Update llama.py * fast inference * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * torch compile * past_key_values * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update llama.py * fast inference + saving config.json * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * fast inference again * more temp matrices * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update mistral.py * Update llama.py * SDPA * attention_mask * New version * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update save.py * Update save.py * Torch 2.2.0 * Update save.py * mistral swa * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Fix SWA inference * Fix llm_int8_skip_modules * SWA inference * Update save.py * Update save.py * Update pyproject.toml * __version__ * __version__ * Update save.py * Update save.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Chat Templates * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * patch tokenizer * Update chat_templates.py * Saving, LlamaRotaryEmbedding issues * Update llama.py * Update mistral.py * Update mapper.py * Fix RoPE precision issues * Bugs * saving bugs * Update llama.py * readme * spaces * spaces * globals * slash * slashes * spaces * apache * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * trainer * Update save.py * Update pyproject.toml * install * Update save.py * Update save.py * Update save.py * Update save.py * PeftModel token + saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * linking * llama.cpp bugs * Update save.py * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original	2024-02-21 03:58:59 +11:00
Daniel Han	0439b8508d	Prelim Feb release (#173 ) * Works? * Update pyproject.toml * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Swiglu * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * attention_mask * Update llama.py * Update llama.py * labels * Update mistral.py * Update llama.py * attention mask * Update save.py * Update save.py * Update mistral.py * attention mask * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update dpo.py * Patch saving * Update save.py * Update save.py * patch_saving_functions * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * print * Mistral patch * Update mistral.py * Update save.py * saving * Update llama.py * Update llama.py * Fast inference repatch * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update mistral.py * Update __init__.py * Fix inference * Update mistral.py * fast lm_head * Remove fast path * Update rope_embedding.py * Update loader.py * LlamaAttention_fast_forward_inference * if past_key_value is not None and q_len == 1: * revert inference * Update loader.py * past_key_value * Update llama.py * Update llama.py * Fix SDPA * Update llama.py * padding * Inference * Update llama.py * Revert * Update mistral.py * faster inference * inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * inference * Update llama.py * Update utils.py * faster inference * Update llama.py * revert * lm_head * Update llama.py * inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * faster inference * Update llama.py * fast inference * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * torch compile * past_key_values * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update llama.py * fast inference + saving config.json * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * fast inference again * more temp matrices * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update mistral.py * Update llama.py * SDPA * attention_mask * New version * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update save.py * Update save.py * Torch 2.2.0 * Update save.py * mistral swa * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Fix SWA inference * Fix llm_int8_skip_modules * SWA inference * Update save.py * Update save.py * Update pyproject.toml * __version__ * __version__ * Update save.py * Update save.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Chat Templates * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * patch tokenizer * Update chat_templates.py * Saving, LlamaRotaryEmbedding issues * Update llama.py * Update mistral.py	2024-02-15 00:07:42 +11:00
Younes Belkada	474fd32f91	add HF tagging in unsloth (#170 )	2024-02-13 18:28:42 +11:00
Daniel Han	3d5cf373bc	Update README.md (#165 )	2024-02-09 15:59:17 +11:00
Daniel Han	2bc34566c4	Update README.md (#164 )	2024-02-09 15:49:09 +11:00
Daniel Han-Chen	99b8d231ce	Update mapper.py	2024-02-09 03:51:59 +11:00
Daniel Han	b7f24e804c	Update README.md (#162 )	2024-02-08 13:11:54 +11:00
Daniel Han	0b01dcb655	Nightly (#161 ) * Update fast_lora.py * Update utils.py * Update llama.py * Update fast_lora.py * Update swiglu.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a208ec46e0`. * Update llama.py * Works? * Update pyproject.toml * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Swiglu * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * attention_mask * Update llama.py * Update llama.py * labels * Update mistral.py * Update llama.py * attention mask * Update save.py * Update save.py * Update mistral.py * attention mask * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update dpo.py * Patch saving * Update save.py * Update save.py * patch_saving_functions * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * print * Mistral patch * Update mistral.py * Update save.py * saving * Update llama.py * Update llama.py * Fast inference repatch * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update mistral.py * Update __init__.py * Fix inference * Update mistral.py * fast lm_head * Remove fast path * Update rope_embedding.py * Update loader.py * LlamaAttention_fast_forward_inference * if past_key_value is not None and q_len == 1: * revert inference * Update loader.py * past_key_value * Update llama.py * Update llama.py * Fix SDPA * Update llama.py * padding * Inference * Update llama.py * Revert * Update mistral.py * faster inference * inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * inference * Update llama.py * Update utils.py * faster inference * Update llama.py * revert * lm_head * Update llama.py * inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * faster inference * Update llama.py * fast inference * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * torch compile * past_key_values * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update llama.py * fast inference + saving config.json * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * fast inference again * more temp matrices * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update mistral.py * Update llama.py * SDPA * attention_mask * New version * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update save.py * Update save.py * Torch 2.2.0 * Update save.py * mistral swa * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Fix SWA inference * Fix llm_int8_skip_modules * SWA inference * Update save.py * Update save.py * Update pyproject.toml * __version__ * __version__ * Update save.py * Update save.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py	2024-02-08 03:40:28 +11:00
Daniel Han	25cfc7f590	Torch 2.2 (#157 ) * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update save.py * Update fast_lora.py * Update utils.py * Update llama.py * Update fast_lora.py * Update swiglu.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a208ec46e0`. * Update llama.py * Works? * Update pyproject.toml * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Swiglu * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * attention_mask * Update llama.py * Update llama.py * labels * Update mistral.py * Update llama.py * attention mask * Update save.py * Update save.py * Update mistral.py * attention mask * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update dpo.py * Patch saving * Update save.py * Update save.py * patch_saving_functions * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * print * Mistral patch * Update mistral.py * Update save.py * saving * Update llama.py * Update llama.py * Fast inference repatch * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update mistral.py * Update __init__.py * Fix inference * Update mistral.py * fast lm_head * Remove fast path * Update rope_embedding.py * Update loader.py * LlamaAttention_fast_forward_inference * if past_key_value is not None and q_len == 1: * revert inference * Update loader.py * past_key_value * Update llama.py * Update llama.py * Fix SDPA * Update llama.py * padding * Inference * Update llama.py * Revert * Update mistral.py * faster inference * inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * inference * Update llama.py * Update utils.py * faster inference * Update llama.py * revert * lm_head * Update llama.py * inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * faster inference * Update llama.py * fast inference * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * torch compile * past_key_values * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update llama.py * fast inference + saving config.json * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * fast inference again * more temp matrices * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update mistral.py * Update llama.py * SDPA * attention_mask * New version * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update save.py * Update save.py * Torch 2.2.0 * Update save.py * mistral swa * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Fix SWA inference * Fix llm_int8_skip_modules * SWA inference * Update save.py * Update save.py * Update pyproject.toml * __version__ * __version__ * Update save.py * Update save.py * Update mistral.py	2024-02-07 04:40:50 +11:00
Michael Han	bb66faaa33	ReadMe Revamp (#156 ) * HF Perf Button * Update README.md Adding new buttons cleanup * Update README.md * Delete images/Discord.png * Delete images/try live demo green.png * new transparent logos * Revamping page * Revamp mainpage * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * finetune button * Delete start free finetune button.png * free finetune button * Add files via upload * Update README.md * Update README.md * Add files via upload * Add files via upload * Update README.md * Add files via upload * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Squashed commit of the following: commit `efa0d2332e` Author: Daniel Han <danielhanchen@gmail.com> Date: Sun Feb 4 17:35:56 2024 +1100 2x faster inference (#151) * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update save.py * Update fast_lora.py * Update utils.py * Update llama.py * Update fast_lora.py * Update swiglu.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a208ec46e0`. * Update llama.py * Works? * Update pyproject.toml * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Swiglu * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * attention_mask * Update llama.py * Update llama.py * labels * Update mistral.py * Update llama.py * attention mask * Update save.py * Update save.py * Update mistral.py * attention mask * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update dpo.py * Patch saving * Update save.py * Update save.py * patch_saving_functions * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * print * Mistral patch * Update mistral.py * Update save.py * saving * Update llama.py * Update llama.py * Fast inference repatch * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update mistral.py * Update __init__.py * Fix inference * Update mistral.py * fast lm_head * Remove fast path * Update rope_embedding.py * Update loader.py * LlamaAttention_fast_forward_inference * if past_key_value is not None and q_len == 1: * revert inference * Update loader.py * past_key_value * Update llama.py * Update llama.py * Fix SDPA * Update llama.py * padding * Inference * Update llama.py * Revert * Update mistral.py * faster inference * inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * inference * Update llama.py * Update utils.py * faster inference * Update llama.py * revert * lm_head * Update llama.py * inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * faster inference * Update llama.py * fast inference * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * torch compile * past_key_values * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update llama.py * fast inference + saving config.json * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * fast inference again * more temp matrices * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update mistral.py * Update llama.py * SDPA * attention_mask * New version * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py commit `2f55935f94` Author: Daniel Han <danielhanchen@gmail.com> Date: Wed Jan 31 04:03:37 2024 +1100 Hotfix - fix inference (#146) * faster saving & inference * Update llama.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update llama.py * Update save.py * Update llama.py * Mistral correct RoPE scaling * Max sequence lengths * Apache 2 * fast_linear_forward * Update utils.py * Update utils.py * No print * Update utils.py * Update utils.py * inference * Update llama.py * Fast inference RoPE * Update llama.py * Update llama.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * LoRA * Fast LoRA saving * Update llama.py * hidden_states * q_len == 1 * q_len issue * Update mistral.py * Update mistral.py * incorrect inference * Update to transformers 4.37 * Graceful FA2 error + torch 2.1.1 * Update mapper.py * Update pyproject.toml * Fix saving and bnb-4bit * Update fast_lora.py * Update fast_lora.py * remove patching * Update llama.py * Update llama.py * Update swiglu.py * Repatch * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update save.py * Update fast_lora.py * Update utils.py * Update llama.py * Update fast_lora.py * Update swiglu.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a208ec46e0`. * Update llama.py * Works? * Update pyproject.toml * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Swiglu * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * attention_mask * Update llama.py * Update llama.py * labels * Update mistral.py * Update llama.py * attention mask * Update save.py * Update save.py * Update mistral.py * attention mask * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update dpo.py * Patch saving * Update save.py * Update save.py * patch_saving_functions * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * print * Mistral patch * Update mistral.py * Update save.py * saving * Update llama.py * Update llama.py * Fast inference repatch * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update mistral.py * Update __init__.py * Fix inference * Update mistral.py * fast lm_head * Remove fast path * Update rope_embedding.py * Update loader.py * LlamaAttention_fast_forward_inference * if past_key_value is not None and q_len == 1: * revert inference * Update loader.py * past_key_value commit `a3a2ad9382` Author: Daniel Han <danielhanchen@gmail.com> Date: Mon Jan 29 17:49:54 2024 +1100 Fix inference attention mask (#142) * faster saving & inference * Update llama.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update llama.py * Update save.py * Update llama.py * Mistral correct RoPE scaling * Max sequence lengths * Apache 2 * fast_linear_forward * Update utils.py * Update utils.py * No print * Update utils.py * Update utils.py * inference * Update llama.py * Fast inference RoPE * Update llama.py * Update llama.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * LoRA * Fast LoRA saving * Update llama.py * hidden_states * q_len == 1 * q_len issue * Update mistral.py * Update mistral.py * incorrect inference * Update to transformers 4.37 * Graceful FA2 error + torch 2.1.1 * Update mapper.py * Update pyproject.toml * Fix saving and bnb-4bit * Update fast_lora.py * Update fast_lora.py * remove patching * Update llama.py * Update llama.py * Update swiglu.py * Repatch * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update save.py * Update fast_lora.py * Update utils.py * Update llama.py * Update fast_lora.py * Update swiglu.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a208ec46e0`. * Update llama.py * Works? * Update pyproject.toml * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Swiglu * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * attention_mask * Update llama.py * Update llama.py * labels * Update mistral.py * Update llama.py * attention mask * Update save.py * Update save.py * Update mistral.py * attention mask * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update dpo.py * Patch saving * Update save.py * Update save.py * patch_saving_functions * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * print * Mistral patch * Update mistral.py * Update save.py * saving * Update llama.py * Update llama.py commit `90309ca8dc` Author: Daniel Han <danielhanchen@gmail.com> Date: Mon Jan 29 03:45:07 2024 +1100 Nightly (#140) * faster saving & inference * Update llama.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update llama.py * Update save.py * Update llama.py * Mistral correct RoPE scaling * Max sequence lengths * Apache 2 * fast_linear_forward * Update utils.py * Update utils.py * No print * Update utils.py * Update utils.py * inference * Update llama.py * Fast inference RoPE * Update llama.py * Update llama.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * LoRA * Fast LoRA saving * Update llama.py * hidden_states * q_len == 1 * q_len issue * Update mistral.py * Update mistral.py * incorrect inference * Update to transformers 4.37 * Graceful FA2 error + torch 2.1.1 * Update mapper.py * Update pyproject.toml * Fix saving and bnb-4bit * Update fast_lora.py * Update fast_lora.py * remove patching * Update llama.py * Update llama.py * Update swiglu.py * Repatch * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update save.py * Update fast_lora.py * Update utils.py * Update llama.py * Update fast_lora.py * Update swiglu.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a208ec46e0`. * Update llama.py * Works? * Update pyproject.toml * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Swiglu * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * attention_mask * Update llama.py * Update llama.py * labels * Update mistral.py * Update llama.py * attention mask * Update save.py * Update save.py * Update mistral.py * attention mask * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update dpo.py * Patch saving * Update save.py * Update save.py * patch_saving_functions * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * print * Mistral patch * Update mistral.py * Update save.py * saving commit `a16bc73e80` Author: Daniel Han <danielhanchen@gmail.com> Date: Mon Jan 29 02:52:39 2024 +1100 Fix saving issues (#139) * faster saving & inference * Update llama.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update llama.py * Update save.py * Update llama.py * Mistral correct RoPE scaling * Max sequence lengths * Apache 2 * fast_linear_forward * Update utils.py * Update utils.py * No print * Update utils.py * Update utils.py * inference * Update llama.py * Fast inference RoPE * Update llama.py * Update llama.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * LoRA * Fast LoRA saving * Update llama.py * hidden_states * q_len == 1 * q_len issue * Update mistral.py * Update mistral.py * incorrect inference * Update to transformers 4.37 * Graceful FA2 error + torch 2.1.1 * Update mapper.py * Update pyproject.toml * Fix saving and bnb-4bit * Update fast_lora.py * Update fast_lora.py * remove patching * Update llama.py * Update llama.py * Update swiglu.py * Repatch * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update save.py * Update fast_lora.py * Update utils.py * Update llama.py * Update fast_lora.py * Update swiglu.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a208ec46e0`. * Update llama.py * Works? * Update pyproject.toml * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Swiglu * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * attention_mask * Update llama.py * Update llama.py * labels * Update mistral.py * Update llama.py * attention mask * Update save.py * Update save.py * Update mistral.py * attention mask * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update dpo.py * Patch saving * Update save.py * Update save.py * patch_saving_functions * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * print commit `af33224554` Author: Daniel Han <danielhanchen@gmail.com> Date: Sun Jan 28 04:30:29 2024 +1100 1 more bug (#138) * faster saving & inference * Update llama.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update llama.py * Update save.py * Update llama.py * Mistral correct RoPE scaling * Max sequence lengths * Apache 2 * fast_linear_forward * Update utils.py * Update utils.py * No print * Update utils.py * Update utils.py * inference * Update llama.py * Fast inference RoPE * Update llama.py * Update llama.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * LoRA * Fast LoRA saving * Update llama.py * hidden_states * q_len == 1 * q_len issue * Update mistral.py * Update mistral.py * incorrect inference * Update to transformers 4.37 * Graceful FA2 error + torch 2.1.1 * Update mapper.py * Update pyproject.toml * Fix saving and bnb-4bit * Update fast_lora.py * Update fast_lora.py * remove patching * Update llama.py * Update llama.py * Update swiglu.py * Repatch * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update save.py * Update fast_lora.py * Update utils.py * Update llama.py * Update fast_lora.py * Update swiglu.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a208ec46e0`. * Update llama.py * Works? * Update pyproject.toml * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Swiglu * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * attention_mask * Update llama.py * Update llama.py * labels * Update mistral.py * Update llama.py * attention mask * Update save.py * Update save.py commit `e2bbd3819e` Author: Daniel Han <danielhanchen@gmail.com> Date: Sun Jan 28 04:20:06 2024 +1100 Fix bugs + more accurate Swiglu (#137) * faster saving & inference * Update llama.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update llama.py * Update save.py * Update llama.py * Mistral correct RoPE scaling * Max sequence lengths * Apache 2 * fast_linear_forward * Update utils.py * Update utils.py * No print * Update utils.py * Update utils.py * inference * Update llama.py * Fast inference RoPE * Update llama.py * Update llama.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * LoRA * Fast LoRA saving * Update llama.py * hidden_states * q_len == 1 * q_len issue * Update mistral.py * Update mistral.py * incorrect inference * Update to transformers 4.37 * Graceful FA2 error + torch 2.1.1 * Update mapper.py * Update pyproject.toml * Fix saving and bnb-4bit * Update fast_lora.py * Update fast_lora.py * remove patching * Update llama.py * Update llama.py * Update swiglu.py * Repatch * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update save.py * Update fast_lora.py * Update utils.py * Update llama.py * Update fast_lora.py * Update swiglu.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a208ec46e0`. * Update llama.py * Works? * Update pyproject.toml * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Swiglu * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * attention_mask * Update llama.py * Update llama.py * labels * Update mistral.py * Update llama.py * attention mask commit `a81aff286f` Author: Daniel Han <danielhanchen@gmail.com> Date: Sat Jan 27 04:50:22 2024 +1100 Inference bug fix (#134) * faster saving & inference * Update llama.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update llama.py * Update save.py * Update llama.py * Mistral correct RoPE scaling * Max sequence lengths * Apache 2 * fast_linear_forward * Update utils.py * Update utils.py * No print * Update utils.py * Update utils.py * inference * Update llama.py * Fast inference RoPE * Update llama.py * Update llama.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * LoRA * Fast LoRA saving * Update llama.py * hidden_states * q_len == 1 * q_len issue * Update mistral.py * Update mistral.py * incorrect inference * Update to transformers 4.37 * Graceful FA2 error + torch 2.1.1 * Update mapper.py * Update pyproject.toml * Fix saving and bnb-4bit * Update fast_lora.py * Update fast_lora.py * remove patching * Update llama.py * Update llama.py * Update swiglu.py * Repatch * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update save.py * Update fast_lora.py * Update utils.py * Update llama.py * Update fast_lora.py * Update swiglu.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a208ec46e0`. * Update llama.py commit `7da0c50f75` Author: Daniel Han <danielhanchen@gmail.com> Date: Sat Jan 27 04:47:54 2024 +1100 More bug fixes (#133) * faster saving & inference * Update llama.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update llama.py * Update save.py * Update llama.py * Mistral correct RoPE scaling * Max sequence lengths * Apache 2 * fast_linear_forward * Update utils.py * Update utils.py * No print * Update utils.py * Update utils.py * inference * Update llama.py * Fast inference RoPE * Update llama.py * Update llama.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * LoRA * Fast LoRA saving * Update llama.py * hidden_states * q_len == 1 * q_len issue * Update mistral.py * Update mistral.py * incorrect inference * Update to transformers 4.37 * Graceful FA2 error + torch 2.1.1 * Update mapper.py * Update pyproject.toml * Fix saving and bnb-4bit * Update fast_lora.py * Update fast_lora.py * remove patching * Update llama.py * Update llama.py * Update swiglu.py * Repatch * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update save.py * Update fast_lora.py * Update utils.py * Update llama.py * Update fast_lora.py * Update swiglu.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Update llama.py commit `62fae3aa74` Author: Daniel Han <danielhanchen@gmail.com> Date: Fri Jan 26 04:19:17 2024 +1100 Fix bugs (#129) * faster saving & inference * Update llama.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update llama.py * Update save.py * Update llama.py * Mistral correct RoPE scaling * Max sequence lengths * Apache 2 * fast_linear_forward * Update utils.py * Update utils.py * No print * Update utils.py * Update utils.py * inference * Update llama.py * Fast inference RoPE * Update llama.py * Update llama.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * LoRA * Fast LoRA saving * Update llama.py * hidden_states * q_len == 1 * q_len issue * Update mistral.py * Update mistral.py * incorrect inference * Update to transformers 4.37 * Graceful FA2 error + torch 2.1.1 * Update mapper.py * Update pyproject.toml * Fix saving and bnb-4bit * Update fast_lora.py * Update fast_lora.py * remove patching * Update llama.py * Update llama.py * Update swiglu.py * Repatch * Update fast_lora.py commit `04f8771821` Author: Daniel Han <danielhanchen@gmail.com> Date: Tue Jan 23 03:55:24 2024 +1100 2-4x faster native HF inference (#119) * faster saving & inference * Update llama.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update llama.py * Update save.py * Update llama.py * Mistral correct RoPE scaling * Max sequence lengths * Apache 2 * fast_linear_forward * Update utils.py * Update utils.py * No print * Update utils.py * Update utils.py * inference * Update llama.py * Fast inference RoPE * Update llama.py * Update llama.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * LoRA * Fast LoRA saving commit `3a9b2dee98` Author: Daniel Han <danielhanchen@gmail.com> Date: Sun Jan 21 22:20:22 2024 +1100 Hotfix (#118) * faster saving & inference * Update llama.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py commit `a6f4fb0075` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 21 05:00:37 2024 +1100 Update save.py commit `705cac0357` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 21 04:21:54 2024 +1100 Update save.py commit `16edcb3be2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sun Jan 21 04:13:03 2024 +1100 Update save.py commit `3d05a74b12` Author: Daniel Han <danielhanchen@gmail.com> Date: Sun Jan 21 03:43:49 2024 +1100 Fixed saving! (#113) * Fix tokenizer, dropout, bias for LoRA * Update loader.py * Fix LoRA downcasting * Update _utils.py * Saving to GGUF * fix * colab_quantize_to_gguf * move save modules * save module * Update __init__.py * Update save.py * Temp downgrade due to TRL issue * Fix up bugs * Faster saving + other changes * Update llama.py * Saving modules * spelling * Update llama.py * Update save.py * Update save.py * Update loader.py * Update llama.py * patch saving * Update save.py * Update save.py * Update save.py * patch saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * original_model * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * saving to RAM leakage? * Update save.py * new_save_directory * Update save.py * Update save.py * Update save.py * Update save.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Quick fixes * Update llama.py * Update llama.py * Update dpo.py * Update dpo.py * Update llama.py * Update save.py * getattr * RSLoRA and LoftQ direct support * Update llama.py * Update llama.py * Update llama.py * Fix DPO + GGUF * Fix quantization_method * Fix quantization_config * patch model * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update save.py * tokenizer_save_settings * Update save.py * quantization and loftq * Update save.py * Update llama.py * Update save.py * upload_to_huggingface * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py commit `bb05d6b6e2` Author: Daniel Han <danielhanchen@gmail.com> Date: Sat Jan 20 23:23:00 2024 +1100 Hotfix for Jan 2024 Release (#110) * Fix tokenizer, dropout, bias for LoRA * Update loader.py * Fix LoRA downcasting * Update _utils.py * Saving to GGUF * fix * colab_quantize_to_gguf * move save modules * save module * Update __init__.py * Update save.py * Temp downgrade due to TRL issue * Fix up bugs * Faster saving + other changes * Update llama.py * Saving modules * spelling * Update llama.py * Update save.py * Update save.py * Update loader.py * Update llama.py * patch saving * Update save.py * Update save.py * Update save.py * patch saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * original_model * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * saving to RAM leakage? * Update save.py * new_save_directory * Update save.py * Update save.py * Update save.py * Update save.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Quick fixes * Update llama.py * Update llama.py * Update dpo.py * Update dpo.py * Update llama.py * Update save.py * getattr * RSLoRA and LoftQ direct support * Update llama.py * Update llama.py * Update llama.py * Fix DPO + GGUF * Fix quantization_method * Fix quantization_config * patch model * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update save.py * tokenizer_save_settings * Update save.py * quantization and loftq * Update save.py * Update llama.py * Update save.py commit `12e75c93d0` Author: Daniel Han <danielhanchen@gmail.com> Date: Sat Jan 20 04:25:06 2024 +1100 Quick fixes (#106) * Fix tokenizer, dropout, bias for LoRA * Update loader.py * Fix LoRA downcasting * Update _utils.py * Saving to GGUF * fix * colab_quantize_to_gguf * move save modules * save module * Update __init__.py * Update save.py * Temp downgrade due to TRL issue * Fix up bugs * Faster saving + other changes * Update llama.py * Saving modules * spelling * Update llama.py * Update save.py * Update save.py * Update loader.py * Update llama.py * patch saving * Update save.py * Update save.py * Update save.py * patch saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * original_model * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * saving to RAM leakage? * Update save.py * new_save_directory * Update save.py * Update save.py * Update save.py * Update save.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Quick fixes * Update llama.py * Update llama.py * Update dpo.py * Update dpo.py * Update llama.py * Update save.py * getattr * RSLoRA and LoftQ direct support * Update llama.py * Update llama.py * Update llama.py * Fix DPO + GGUF commit `52b5ef31e0` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Sat Jan 20 02:30:31 2024 +1100 Update _utils.py commit `1a19c38675` Merge: `0a52390` `0d6e52b` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 19 23:15:38 2024 +1100 Merge branch 'main' of https://github.com/unslothai/unsloth commit `0a52390ac2` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 19 23:15:20 2024 +1100 Revert quantization methods commit `0d6e52b5c7` Author: Daniel Han <danielhanchen@gmail.com> Date: Fri Jan 19 22:57:22 2024 +1100 getattr issues (#103) * Fix tokenizer, dropout, bias for LoRA * Update loader.py * Fix LoRA downcasting * Update _utils.py * Saving to GGUF * fix * colab_quantize_to_gguf * move save modules * save module * Update __init__.py * Update save.py * Temp downgrade due to TRL issue * Fix up bugs * Faster saving + other changes * Update llama.py * Saving modules * spelling * Update llama.py * Update save.py * Update save.py * Update loader.py * Update llama.py * patch saving * Update save.py * Update save.py * Update save.py * patch saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * original_model * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * saving to RAM leakage? * Update save.py * new_save_directory * Update save.py * Update save.py * Update save.py * Update save.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Quick fixes * Update llama.py * Update llama.py * Update dpo.py * Update dpo.py * Update llama.py * Update save.py * getattr commit `b3fcea6421` Author: Daniel Han <danielhanchen@gmail.com> Date: Fri Jan 19 22:52:30 2024 +1100 Quick fixes (#101) * Fix tokenizer, dropout, bias for LoRA * Update loader.py * Fix LoRA downcasting * Update _utils.py * Saving to GGUF * fix * colab_quantize_to_gguf * move save modules * save module * Update __init__.py * Update save.py * Temp downgrade due to TRL issue * Fix up bugs * Faster saving + other changes * Update llama.py * Saving modules * spelling * Update llama.py * Update save.py * Update save.py * Update loader.py * Update llama.py * patch saving * Update save.py * Update save.py * Update save.py * patch saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * original_model * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * saving to RAM leakage? * Update save.py * new_save_directory * Update save.py * Update save.py * Update save.py * Update save.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Quick fixes * Update llama.py * Update llama.py * Update dpo.py * Update dpo.py * Update llama.py * Update save.py commit `d691516ab9` Author: Daniel Han <danielhanchen@gmail.com> Date: Fri Jan 19 04:51:19 2024 +1100 2024 Release (#96) * Fix tokenizer, dropout, bias for LoRA * Update loader.py * Fix LoRA downcasting * Update _utils.py * Saving to GGUF * fix * colab_quantize_to_gguf * move save modules * save module * Update __init__.py * Update save.py * Temp downgrade due to TRL issue * Fix up bugs * Faster saving + other changes * Update llama.py * Saving modules * spelling * Update llama.py * Update save.py * Update save.py * Update loader.py * Update llama.py * patch saving * Update save.py * Update save.py * Update save.py * patch saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * original_model * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * saving to RAM leakage? * Update save.py * new_save_directory * Update save.py * Update save.py * Update save.py * Update save.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml commit `9e2dec16fb` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 19 03:41:00 2024 +1100 Update pyproject.toml commit `396c7245dd` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Fri Jan 19 03:35:17 2024 +1100 Update pyproject.toml commit `738e91591f` Author: Daniel Han <danielhanchen@gmail.com> Date: Thu Jan 11 04:08:03 2024 +1100 Fix some bugs (#83) * Fix tokenizer, dropout, bias for LoRA * Update loader.py * Fix LoRA downcasting * Update _utils.py * Saving to GGUF * fix * colab_quantize_to_gguf * move save modules * save module * Update __init__.py * Update save.py * Temp downgrade due to TRL issue * Fix up bugs commit `a1da50b5ce` Author: Daniel Han <danielhanchen@gmail.com> Date: Wed Jan 10 23:10:48 2024 +1100 Update README.md (#81) commit `606e8a9284` Author: shimmy <107991372+shimmyshimmer@users.noreply.github.com> Date: Wed Jan 10 23:10:23 2024 +1100 Discord button redo (#80) commit `0169294ffb` Author: shimmy <107991372+shimmyshimmer@users.noreply.github.com> Date: Wed Jan 10 23:02:20 2024 +1100 Update logos (#79) * HF Perf Button * Update README.md Adding new buttons cleanup * Update README.md * Delete images/Discord.png * Delete images/try live demo green.png * new transparent logos * Revamping page * Revamp mainpage * Update README.md * Update README.md commit `b2a8c33430` Author: Daniel Han <danielhanchen@gmail.com> Date: Wed Jan 10 20:03:01 2024 +1100 Create FUNDING.yml (#78) commit `c9c1abf290` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Wed Jan 10 01:02:44 2024 +1100 fix_tokenizer commit `6efffb46e4` Author: Daniel Han-Chen <danielhanchen@gmail.com> Date: Tue Jan 9 23:40:43 2024 +1100 check_tokenizer --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2024-02-07 02:00:12 +11:00
Daniel Han	efa0d2332e	2x faster inference (#151 ) * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update save.py * Update fast_lora.py * Update utils.py * Update llama.py * Update fast_lora.py * Update swiglu.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a208ec46e0`. * Update llama.py * Works? * Update pyproject.toml * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Swiglu * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * attention_mask * Update llama.py * Update llama.py * labels * Update mistral.py * Update llama.py * attention mask * Update save.py * Update save.py * Update mistral.py * attention mask * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update dpo.py * Patch saving * Update save.py * Update save.py * patch_saving_functions * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * print * Mistral patch * Update mistral.py * Update save.py * saving * Update llama.py * Update llama.py * Fast inference repatch * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update mistral.py * Update __init__.py * Fix inference * Update mistral.py * fast lm_head * Remove fast path * Update rope_embedding.py * Update loader.py * LlamaAttention_fast_forward_inference * if past_key_value is not None and q_len == 1: * revert inference * Update loader.py * past_key_value * Update llama.py * Update llama.py * Fix SDPA * Update llama.py * padding * Inference * Update llama.py * Revert * Update mistral.py * faster inference * inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * inference * Update llama.py * Update utils.py * faster inference * Update llama.py * revert * lm_head * Update llama.py * inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * faster inference * Update llama.py * fast inference * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * torch compile * past_key_values * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update llama.py * fast inference + saving config.json * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * fast inference again * more temp matrices * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update mistral.py * Update llama.py * SDPA * attention_mask * New version * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py	2024-02-04 17:35:56 +11:00
Daniel Han	2f55935f94	Hotfix - fix inference (#146 ) * faster saving & inference * Update llama.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update llama.py * Update save.py * Update llama.py * Mistral correct RoPE scaling * Max sequence lengths * Apache 2 * fast_linear_forward * Update utils.py * Update utils.py * No print * Update utils.py * Update utils.py * inference * Update llama.py * Fast inference RoPE * Update llama.py * Update llama.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * LoRA * Fast LoRA saving * Update llama.py * hidden_states * q_len == 1 * q_len issue * Update mistral.py * Update mistral.py * incorrect inference * Update to transformers 4.37 * Graceful FA2 error + torch 2.1.1 * Update mapper.py * Update pyproject.toml * Fix saving and bnb-4bit * Update fast_lora.py * Update fast_lora.py * remove patching * Update llama.py * Update llama.py * Update swiglu.py * Repatch * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update save.py * Update fast_lora.py * Update utils.py * Update llama.py * Update fast_lora.py * Update swiglu.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a208ec46e0`. * Update llama.py * Works? * Update pyproject.toml * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Swiglu * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * attention_mask * Update llama.py * Update llama.py * labels * Update mistral.py * Update llama.py * attention mask * Update save.py * Update save.py * Update mistral.py * attention mask * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update dpo.py * Patch saving * Update save.py * Update save.py * patch_saving_functions * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * print * Mistral patch * Update mistral.py * Update save.py * saving * Update llama.py * Update llama.py * Fast inference repatch * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update mistral.py * Update __init__.py * Fix inference * Update mistral.py * fast lm_head * Remove fast path * Update rope_embedding.py * Update loader.py * LlamaAttention_fast_forward_inference * if past_key_value is not None and q_len == 1: * revert inference * Update loader.py * past_key_value	2024-01-31 04:03:37 +11:00
Daniel Han	a3a2ad9382	Fix inference attention mask (#142 ) * faster saving & inference * Update llama.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update llama.py * Update save.py * Update llama.py * Mistral correct RoPE scaling * Max sequence lengths * Apache 2 * fast_linear_forward * Update utils.py * Update utils.py * No print * Update utils.py * Update utils.py * inference * Update llama.py * Fast inference RoPE * Update llama.py * Update llama.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * LoRA * Fast LoRA saving * Update llama.py * hidden_states * q_len == 1 * q_len issue * Update mistral.py * Update mistral.py * incorrect inference * Update to transformers 4.37 * Graceful FA2 error + torch 2.1.1 * Update mapper.py * Update pyproject.toml * Fix saving and bnb-4bit * Update fast_lora.py * Update fast_lora.py * remove patching * Update llama.py * Update llama.py * Update swiglu.py * Repatch * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update save.py * Update fast_lora.py * Update utils.py * Update llama.py * Update fast_lora.py * Update swiglu.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a208ec46e0`. * Update llama.py * Works? * Update pyproject.toml * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Swiglu * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * attention_mask * Update llama.py * Update llama.py * labels * Update mistral.py * Update llama.py * attention mask * Update save.py * Update save.py * Update mistral.py * attention mask * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update dpo.py * Patch saving * Update save.py * Update save.py * patch_saving_functions * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * print * Mistral patch * Update mistral.py * Update save.py * saving * Update llama.py * Update llama.py	2024-01-29 17:49:54 +11:00
Daniel Han	90309ca8dc	Nightly (#140 ) * faster saving & inference * Update llama.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update llama.py * Update save.py * Update llama.py * Mistral correct RoPE scaling * Max sequence lengths * Apache 2 * fast_linear_forward * Update utils.py * Update utils.py * No print * Update utils.py * Update utils.py * inference * Update llama.py * Fast inference RoPE * Update llama.py * Update llama.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * LoRA * Fast LoRA saving * Update llama.py * hidden_states * q_len == 1 * q_len issue * Update mistral.py * Update mistral.py * incorrect inference * Update to transformers 4.37 * Graceful FA2 error + torch 2.1.1 * Update mapper.py * Update pyproject.toml * Fix saving and bnb-4bit * Update fast_lora.py * Update fast_lora.py * remove patching * Update llama.py * Update llama.py * Update swiglu.py * Repatch * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update save.py * Update fast_lora.py * Update utils.py * Update llama.py * Update fast_lora.py * Update swiglu.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a208ec46e0`. * Update llama.py * Works? * Update pyproject.toml * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Swiglu * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * attention_mask * Update llama.py * Update llama.py * labels * Update mistral.py * Update llama.py * attention mask * Update save.py * Update save.py * Update mistral.py * attention mask * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update dpo.py * Patch saving * Update save.py * Update save.py * patch_saving_functions * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * print * Mistral patch * Update mistral.py * Update save.py * saving	2024-01-29 03:45:07 +11:00
Daniel Han	a16bc73e80	Fix saving issues (#139 ) * faster saving & inference * Update llama.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update llama.py * Update save.py * Update llama.py * Mistral correct RoPE scaling * Max sequence lengths * Apache 2 * fast_linear_forward * Update utils.py * Update utils.py * No print * Update utils.py * Update utils.py * inference * Update llama.py * Fast inference RoPE * Update llama.py * Update llama.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * LoRA * Fast LoRA saving * Update llama.py * hidden_states * q_len == 1 * q_len issue * Update mistral.py * Update mistral.py * incorrect inference * Update to transformers 4.37 * Graceful FA2 error + torch 2.1.1 * Update mapper.py * Update pyproject.toml * Fix saving and bnb-4bit * Update fast_lora.py * Update fast_lora.py * remove patching * Update llama.py * Update llama.py * Update swiglu.py * Repatch * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update save.py * Update fast_lora.py * Update utils.py * Update llama.py * Update fast_lora.py * Update swiglu.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a208ec46e0`. * Update llama.py * Works? * Update pyproject.toml * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Swiglu * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * attention_mask * Update llama.py * Update llama.py * labels * Update mistral.py * Update llama.py * attention mask * Update save.py * Update save.py * Update mistral.py * attention mask * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update dpo.py * Patch saving * Update save.py * Update save.py * patch_saving_functions * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * print	2024-01-29 02:52:39 +11:00
Daniel Han	af33224554	1 more bug (#138 ) * faster saving & inference * Update llama.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update llama.py * Update save.py * Update llama.py * Mistral correct RoPE scaling * Max sequence lengths * Apache 2 * fast_linear_forward * Update utils.py * Update utils.py * No print * Update utils.py * Update utils.py * inference * Update llama.py * Fast inference RoPE * Update llama.py * Update llama.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * LoRA * Fast LoRA saving * Update llama.py * hidden_states * q_len == 1 * q_len issue * Update mistral.py * Update mistral.py * incorrect inference * Update to transformers 4.37 * Graceful FA2 error + torch 2.1.1 * Update mapper.py * Update pyproject.toml * Fix saving and bnb-4bit * Update fast_lora.py * Update fast_lora.py * remove patching * Update llama.py * Update llama.py * Update swiglu.py * Repatch * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update save.py * Update fast_lora.py * Update utils.py * Update llama.py * Update fast_lora.py * Update swiglu.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a208ec46e0`. * Update llama.py * Works? * Update pyproject.toml * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Swiglu * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * attention_mask * Update llama.py * Update llama.py * labels * Update mistral.py * Update llama.py * attention mask * Update save.py * Update save.py	2024-01-28 04:30:29 +11:00
Daniel Han	e2bbd3819e	Fix bugs + more accurate Swiglu (#137 ) * faster saving & inference * Update llama.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update llama.py * Update save.py * Update llama.py * Mistral correct RoPE scaling * Max sequence lengths * Apache 2 * fast_linear_forward * Update utils.py * Update utils.py * No print * Update utils.py * Update utils.py * inference * Update llama.py * Fast inference RoPE * Update llama.py * Update llama.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * LoRA * Fast LoRA saving * Update llama.py * hidden_states * q_len == 1 * q_len issue * Update mistral.py * Update mistral.py * incorrect inference * Update to transformers 4.37 * Graceful FA2 error + torch 2.1.1 * Update mapper.py * Update pyproject.toml * Fix saving and bnb-4bit * Update fast_lora.py * Update fast_lora.py * remove patching * Update llama.py * Update llama.py * Update swiglu.py * Repatch * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update save.py * Update fast_lora.py * Update utils.py * Update llama.py * Update fast_lora.py * Update swiglu.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a208ec46e0`. * Update llama.py * Works? * Update pyproject.toml * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Swiglu * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * attention_mask * Update llama.py * Update llama.py * labels * Update mistral.py * Update llama.py * attention mask	2024-01-28 04:20:06 +11:00
Daniel Han	a81aff286f	Inference bug fix (#134 ) * faster saving & inference * Update llama.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update llama.py * Update save.py * Update llama.py * Mistral correct RoPE scaling * Max sequence lengths * Apache 2 * fast_linear_forward * Update utils.py * Update utils.py * No print * Update utils.py * Update utils.py * inference * Update llama.py * Fast inference RoPE * Update llama.py * Update llama.py * RoPE * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * LoRA * Fast LoRA saving * Update llama.py * hidden_states * q_len == 1 * q_len issue * Update mistral.py * Update mistral.py * incorrect inference * Update to transformers 4.37 * Graceful FA2 error + torch 2.1.1 * Update mapper.py * Update pyproject.toml * Fix saving and bnb-4bit * Update fast_lora.py * Update fast_lora.py * remove patching * Update llama.py * Update llama.py * Update swiglu.py * Repatch * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update llama.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update swiglu.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update fast_lora.py * Update save.py * Update fast_lora.py * Update utils.py * Update llama.py * Update fast_lora.py * Update swiglu.py * Update save.py * Update save.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Revert "Update llama.py" This reverts commit `a208ec46e0`. * Update llama.py	2024-01-27 04:50:22 +11:00

994 changed files with 236633 additions and 3096 deletions

2

.gitattributes vendored Normal file

View file

 @ -0,0 +1,2 @@
 # Normalize Python files to LF line endings
 *.py text eol=lf

55

.github/CODEOWNERS vendored Normal file

View file

 @ -0,0 +1,55 @@
 # Inspired from https://github.com/vllm-project/vllm/blob/main/.github/CODEOWNERS
 /unsloth/models/loader.py @danielhanchen @mmathew23
 /unsloth/models/llama.py @Datta0 @danielhanchen @mmathew23
 /unsloth/models/rl.py @Datta0 @pluesclues @danielhanchen
 /unsloth/models/rl_replacements.py @Datta0 @pluesclues @danielhanchen
 /unsloth/trainer.py @danielhanchen
 /unsloth/models/sentence_transformer.py @Etherll @danielhanchen
 /unsloth/save.py @rolandtannous @danielhanchen
 /unsloth/tokenizer_utils.py @mmathew23 @danielhanchen
 /unsloth/chat_templates.py @rolandtannous @danielhanchen
 /unsloth/ollama_template_mappers.py @rolandtannous @danielhanchen
 /unsloth/kernels/moe/*.py @Datta0
 /unsloth/import_fixes.py @danielhanchen
 /unsloth/device_type.py @danielhanchen
 /unsloth/_auto_install.py @danielhanchen
 /unsloth/dataprep/*.py @danielhanchen
 /unsloth/kernels/cross_entropy_loss.py @danielhanchen
 /unsloth/kernels/fast_lora.py @danielhanchen
 /unsloth/kernels/flex_attention.py @danielhanchen
 /unsloth/kernels/fp8.py @Datta0
 /unsloth/kernels/geglu.py @danielhanchen
 /unsloth/kernels/layernorm.py @danielhanchen
 /unsloth/kernels/rms_layernorm.py @danielhanchen
 /unsloth/kernels/rope_embedding.py @danielhanchen
 /unsloth/kernels/swiglu.py @danielhanchen
 /unsloth/kernels/utils.py @danielhanchen @Datta0
 /unsloth/models/_utils.py @danielhanchen @mmathew23
 /unsloth/models/cohere.py @danielhanchen
 /unsloth/models/dpo.py @danielhanchen
 /unsloth/models/falcon_h1.py @danielhanchen
 /unsloth/models/gemma.py @danielhanchen
 /unsloth/models/gemma2.py @danielhanchen
 /unsloth/models/glm4_moe.py @Datta0
 /unsloth/models/granite.py @danielhanchen
 /unsloth/models/llama4.py @danielhanchen
 /unsloth/models/loader_utils.py @Datta0 @danielhanchen
 /unsloth/models/mapper.py @danielhanchen
 /unsloth/models/mistral.py @danielhanchen
 /unsloth/models/qwen2.py @danielhanchen
 /unsloth/models/qwen3.py @Datta0
 /unsloth/models/qwen3_moe.py @Datta0
 /unsloth/models/vision.py @mmathew23 @danielhanchen
 /unsloth/utils/attention_dispatch.py @mmathew23
 /unsloth/utils/hf_hub.py @mmathew23
 /unsloth/utils/packing.py @mmathew23
 /cli/ @rolandtannous @Manan17
 /studio/frontend/ @Shine1i @rolandtannous @Manan17
 /studio/frontend/public/ @Shine1i
 /studio/backend/ @rolandtannous
 /studio/backend/core/data_recipe/ @rolandtannous
 /studio/backend/tests/ @rolandtannous @danielhanchen
 /tests/ @rolandtannous @danielhanchen
 /scripts/ @rolandtannous @danielhanchen

									
										4

.github/FUNDING.yml
									
										vendored
									
										View file
										
				@ -1,9 +1,9 @@

				# These are supported funding model platforms

				github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]

				github: unslothai

				patreon: # Replace with a single Patreon username

				open_collective: # Replace with a single Open Collective username

				ko_fi: unsloth

				ko_fi: # unsloth

				tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel

				community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry

				liberapay: # Replace with a single Liberapay username

									
										22

.github/ISSUE_TEMPLATE/bug---issue.md
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,22 @@

				---

				name: Bug / Issue

				about: Bug / Issue

				title: "[Bug] Please fill in your issue title here."

				labels: bug

				assignees: ''

				---

				Note: Please do not remove the questions. Answer beside them.

				1. Did you update? `pip install --upgrade unsloth unsloth_zoo`

				2. `Colab` or `Kaggle` or local / cloud

				3. Number GPUs used, use `nvidia-smi`

				4. Which notebook? Please link!

				5. Which Unsloth version, TRL version, transformers version, PyTorch version?

				6. Which trainer? `SFTTrainer`, `GRPOTrainer` etc

				```python

				Put Minimal code to reproduce error here ###Remove Hugging Face token###

				###Please make sure to check formatting properly, edit if needed.###

				```

				🦥 You can also ask via our Reddit page: https://reddit.com/r/unsloth/

21

.github/ISSUE_TEMPLATE/feature-request.md vendored Normal file

View file

 @ -0,0 +1,21 @@
 ---
 name: Feature Request
 about: New features, model support, ideas
 title: "[Feature]"
 labels: feature request
 assignees: ''
 ---
 For new models, have you tried:
 ```python
 from unsloth import FastModel
 model, tokenizer = FastModel.from_pretrained(
     "microsoft/Phi-4-multimodal-instruct",
     trust_remote_code = True,
 )
 from transformers import AutoModelForSequenceClassification
 model, tokenizer = FastModel.from_pretrained(
     auto_model = AutoModelForSequenceClassification,
 )
 ```

									
										27

.github/dependabot.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,27 @@

				---

				version: 2

				updates:

				  - package-ecosystem: "github-actions"

				    directory: "/"

				    schedule:

				      interval: "weekly"

				    groups:

				      actions:

				        patterns: ["*"]

				  - package-ecosystem: "bun"

				    directory: "/studio/frontend"

				    schedule:

				      interval: "weekly"

				    groups:

				      bun-frontend:

				        patterns: ["*"]

				  - package-ecosystem: "npm"

				    directory: "/studio/backend/core/data_recipe/oxc-validator"

				    schedule:

				      interval: "weekly"

				    groups:

				      npm-oxc-validator:

				        patterns: ["*"]

				...

									
										37

.github/workflows/stale.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,37 @@

				name: 'Inactive Issue Pinger'

				on:

				  schedule:

				    - cron: '30 5 * * *' # Runs at 5:30 UTC every day

				jobs:

				  stale:

				    runs-on: ubuntu-latest

				    permissions:

				      issues: write

				    steps:

				      - uses: actions/stale@v10

				        with:

				          # The message to post on stale issues.

				          # This message will ping the issue author.

				          # Note: The stale bot action does not currently support a direct placeholder for the last commenter.

				          # As a workaround, this message encourages any participant to reply.

				          stale-issue-message: >

				            Is this issue still important to you?

				            Apologies in advance we might have missed this issue as well.

				            For faster response times, please post on our Reddit server - https://www.reddit.com/r/unsloth or our Discord - https://discord.com/invite/unsloth 

				          # The number of days of inactivity before an issue is considered stale.

				          days-before-issue-stale: 9999

				          # Set to -1 to never close stale issues.

				          days-before-issue-close: -1

				          # A label to apply to stale issues.

				          stale-issue-label: 'inactive'

				          # The number of operations to perform per run to avoid rate limiting.

				          operations-per-run: 500

				          enable-statistics: false

62

.gitignore vendored

View file

 @ -1,7 +1,17 @@
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
 *$py.class
 *.class
 unsloth_compiled_cache/
 # ML artifacts (large files)
 feature/
 outputs/
 exports/
 /datasets/
 studio/backend/assets/datasets/
 unsloth_training_checkpoints/
 *.gguf
 *.safetensors
 # C extensions
 *.so
 @ -94,6 +104,12 @@ ipython_config.py
 #   install all needed dependencies.
 #Pipfile.lock
 # UV
 #   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
 #   This is especially recommended for binary packages to ensure reproducibility, and is more
 #   commonly ignored for libraries.
 #uv.lock
 # poetry
 #   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
 #   This is especially recommended for binary packages to ensure reproducibility, and is more
 @ -106,8 +122,10 @@ ipython_config.py
 #pdm.lock
 #   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
 #   in version control.
 #   https://pdm.fming.dev/#use-with-ide
 #   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
 .pdm.toml
 .pdm-python
 .pdm-build/
 # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
 __pypackages__/
 @ -127,6 +145,9 @@ venv/
 ENV/
 env.bak/
 venv.bak/
 .venv_overlay/
 .venv_t5/
 environment.yaml
 # Spyder project settings
 .spyderproject
 @ -158,3 +179,40 @@ cython_debug/
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 #.idea/
 # Ruff stuff:
 .ruff_cache/
 .pre-commit-cache/
 # PyPI configuration file and IDE/Editors
 .pypirc
 .vscode
 .idea/
 .claude/
 *.swp
 *.swo
 # oh-my-codex
 .omx/
 # Firebase
 firebase-debug.log
 # Other
 resources/
 tmp/
 **/node_modules/
 auth.db
 # Local working docs
 **/CLAUDE.md
 **/claude.md
 **/AGENT.md
 **/agent.md
 docs/canvas-lab-architecture.md
 log_rtx.txt
 log.txt
 setup_leo.sh
 server.pid
 *.log
 package-lock.json

									
										6

.pre-commit-ci.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,6 @@

				ci:

				  autofix_prs: true

				  autofix_prs_limit: 5

				  autoupdate_schedule: monthly

				  autoupdate_commit_msg: "chore: pre-commit autoupdate"

				  skip: []

									
										18

.pre-commit-config.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,18 @@

				repos:

				  - repo: https://github.com/astral-sh/ruff-pre-commit

				    rev: v0.15.10

				    hooks:

				      - id: ruff

				        args:

				          - --fix

				          - --exit-non-zero-on-fix

				        exclude: '\.ipynb$'

				  - repo: local

				    hooks:

				      - id: ruff-format-with-kwargs

				        name: Ruff format with kwarg spacing

				        entry: scripts/run_ruff_format.py

				        language: python

				        types: [python]

				        additional_dependencies:

				          - ruff==0.6.9

132

CODE_OF_CONDUCT.md Normal file

View file

 @ -0,0 +1,132 @@
 # Contributor Covenant Code of Conduct
 ## Our Pledge
 We as members, contributors, and leaders pledge to make participation in our
 community a harassment-free experience for everyone, regardless of age, body
 size, visible or invisible disability, ethnicity, sex characteristics, gender
 identity and expression, level of experience, education, socio-economic status,
 nationality, personal appearance, race, caste, color, religion, or sexual
 identity and orientation.
 We pledge to act and interact in ways that contribute to an open, welcoming,
 diverse, inclusive, and healthy community.
 ## Our Standards
 Examples of behavior that contributes to a positive environment for our
 community include:
 * Demonstrating empathy and kindness toward other people
 * Being respectful of differing opinions, viewpoints, and experiences
 * Giving and gracefully accepting constructive feedback
 * Accepting responsibility and apologizing to those affected by our mistakes,
   and learning from the experience
 * Focusing on what is best not just for us as individuals, but for the overall
   community
 Examples of unacceptable behavior include:
 * The use of sexualized language or imagery, and sexual attention or advances of
   any kind
 * Trolling, insulting or derogatory comments, and personal or political attacks
 * Public or private harassment
 * Publishing others' private information, such as a physical or email address,
   without their explicit permission
 * Other conduct which could reasonably be considered inappropriate in a
   professional setting
 ## Enforcement Responsibilities
 Community leaders are responsible for clarifying and enforcing our standards of
 acceptable behavior and will take appropriate and fair corrective action in
 response to any behavior that they deem inappropriate, threatening, offensive,
 or harmful.
 Community leaders have the right and responsibility to remove, edit, or reject
 comments, commits, code, wiki edits, issues, and other contributions that are
 not aligned to this Code of Conduct, and will communicate reasons for moderation
 decisions when appropriate.
 ## Scope
 This Code of Conduct applies within all community spaces, and also applies when
 an individual is officially representing the community in public spaces.
 Examples of representing our community include using an official e-mail address,
 posting via an official social media account, or acting as an appointed
 representative at an online or offline event.
 ## Enforcement
 Instances of abusive, harassing, or otherwise unacceptable behavior may be
 reported to the community leaders responsible for enforcement at support@unsloth.ai.
 All complaints will be reviewed and investigated promptly and fairly.
 All community leaders are obligated to respect the privacy and security of the
 reporter of any incident.
 ## Enforcement Guidelines
 Community leaders will follow these Community Impact Guidelines in determining
 the consequences for any action they deem in violation of this Code of Conduct:
 ### 1. Correction
 **Community Impact**: Use of inappropriate language or other behavior deemed
 unprofessional or unwelcome in the community.
 **Consequence**: A private, written warning from community leaders, providing
 clarity around the nature of the violation and an explanation of why the
 behavior was inappropriate. A public apology may be requested.
 ### 2. Warning
 **Community Impact**: A violation through a single incident or series of
 actions.
 **Consequence**: A warning with consequences for continued behavior. No
 interaction with the people involved, including unsolicited interaction with
 those enforcing the Code of Conduct, for a specified period of time. This
 includes avoiding interactions in community spaces as well as external channels
 like social media. Violating these terms may lead to a temporary or permanent
 ban.
 ### 3. Temporary Ban
 **Community Impact**: A serious violation of community standards, including
 sustained inappropriate behavior.
 **Consequence**: A temporary ban from any sort of interaction or public
 communication with the community for a specified period of time. No public or
 private interaction with the people involved, including unsolicited interaction
 with those enforcing the Code of Conduct, is allowed during this period.
 Violating these terms may lead to a permanent ban.
 ### 4. Permanent Ban
 **Community Impact**: Demonstrating a pattern of violation of community
 standards, including sustained inappropriate behavior, harassment of an
 individual, or aggression toward or disparagement of classes of individuals.
 **Consequence**: A permanent ban from any sort of public interaction within the
 community.
 ## Attribution
 This Code of Conduct is adapted from the [Contributor Covenant][homepage],
 version 2.1, available at
 [https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
 Community Impact Guidelines were inspired by
 [Mozilla's code of conduct enforcement ladder][Mozilla CoC].
 For answers to common questions about this code of conduct, see the FAQ at
 [https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
 [https://www.contributor-covenant.org/translations][translations].
 [homepage]: https://www.contributor-covenant.org
 [v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
 [Mozilla CoC]: https://github.com/mozilla/diversity
 [FAQ]: https://www.contributor-covenant.org/faq
 [translations]: https://www.contributor-covenant.org/translations

									
										29

CONTRIBUTING.md
									
										Normal file
									
										View file
										
				@ -0,0 +1,29 @@

				# 🦥 Contributing to Unsloth

				Thank you for not only using Unsloth but also for being interested in helping out! We value all contributions, whether they come in the form of code, ideas, support for others or just by simply spreading the word of Unsloth! 💕

				- **[Support the Community](https://github.com/unslothai/unsloth/issues)**: Answer questions, review pull requests, or assist others in discussions.

				- **Fix Bugs**: Identify and resolve issues with the existing codebase.

				- **Submit Ideas**: Request new features or share enhancements you'd like to see.

				- **Develop Features**: Implement new functionality or improve existing tools which can be done via PRs.

				- **[Improve Documentation](https://docs.unsloth.ai/)**: Help by creating guides, FAQs, or enhancing clarity.

				One of the best ways to support us is by spreading the word about Unsloth! Share how it’s powering your amazing projects in blog posts or social media, and inspire others to explore its potential. Even a simple star on our repo goes a long way in showing your support and helping the community grow. 🌟

				## Submitting Issues

				If you find a bug or have a feature idea, we’d love to hear from you! Here’s how to make your submission stand out:

				### Reporting Bugs

				1. **Search First**: Check if the issue has already been reported using GitHub’s search bar under Issues.

				2. **Details Matter**: Is this on Google Colab, Kaggle, or on another platform service? Are you using Unsloth's official notebook? Include your OS, Python version, and other relevant details. For bugs, a concise code snippet that reproduces the issue is incredibly helpful.

				3. **Be Thorough**: Attach screenshots, traceback logs, or any additional information that might speed up resolution.

				## Spread the Word

				Your support extends beyond code:

				- Spread the word by writing about Unsloth in blogs or social media.

				- Share how Unsloth powers your projects.

				- Star our repository to show your appreciation.

				Finally, please be mindful of our [Code of Conduct](https://github.com/unslothai/unsloth/blob/main/CODE_OF_CONDUCT.md) to ensure a welcoming and inclusive environment for everyone.

				Thank you so much for reading and we hope you have lots of fun using Unsloth! 🦥

664

COPYING Normal file

View file

 @ -0,0 +1,664 @@
                     GNU AFFERO GENERAL PUBLIC LICENSE
                        Version 3, 19 November 2007
  Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
  Everyone is permitted to copy and distribute verbatim copies
  of this license document, but changing it is not allowed.
                             Preamble
   The GNU Affero General Public License is a free, copyleft license for
 software and other kinds of works, specifically designed to ensure
 cooperation with the community in the case of network server software.
   The licenses for most software and other practical works are designed
 to take away your freedom to share and change the works.  By contrast,
 our General Public Licenses are intended to guarantee your freedom to
 share and change all versions of a program--to make sure it remains free
 software for all its users.
   When we speak of free software, we are referring to freedom, not
 price.  Our General Public Licenses are designed to make sure that you
 have the freedom to distribute copies of free software (and charge for
 them if you wish), that you receive source code or can get it if you
 want it, that you can change the software or use pieces of it in new
 free programs, and that you know you can do these things.
   Developers that use our General Public Licenses protect your rights
 with two steps: (1) assert copyright on the software, and (2) offer
 you this License which gives you legal permission to copy, distribute
 and/or modify the software.
   A secondary benefit of defending all users' freedom is that
 improvements made in alternate versions of the program, if they
 receive widespread use, become available for other developers to
 incorporate.  Many developers of free software are heartened and
 encouraged by the resulting cooperation.  However, in the case of
 software used on network servers, this result may fail to come about.
 The GNU General Public License permits making a modified version and
 letting the public access it on a server without ever releasing its
 source code to the public.
   The GNU Affero General Public License is designed specifically to
 ensure that, in such cases, the modified source code becomes available
 to the community.  It requires the operator of a network server to
 provide the source code of the modified version running there to the
 users of that server.  Therefore, public use of a modified version, on
 a publicly accessible server, gives the public access to the source
 code of the modified version.
   An older license, called the Affero General Public License and
 published by Affero, was designed to accomplish similar goals.  This is
 a different license, not a version of the Affero GPL, but Affero has
 released a new version of the Affero GPL which permits relicensing under
 this license.
   The precise terms and conditions for copying, distribution and
 modification follow.
                        TERMS AND CONDITIONS
 . Definitions.
   "This License" refers to version 3 of the GNU Affero General Public License.
   "Copyright" also means copyright-like laws that apply to other kinds of
 works, such as semiconductor masks.
   "The Program" refers to any copyrightable work licensed under this
 License.  Each licensee is addressed as "you".  "Licensees" and
 "recipients" may be individuals or organizations.
   To "modify" a work means to copy from or adapt all or part of the work
 in a fashion requiring copyright permission, other than the making of an
 exact copy.  The resulting work is called a "modified version" of the
 earlier work or a work "based on" the earlier work.
   A "covered work" means either the unmodified Program or a work based
 on the Program.
   To "propagate" a work means to do anything with it that, without
 permission, would make you directly or secondarily liable for
 infringement under applicable copyright law, except executing it on a
 computer or modifying a private copy.  Propagation includes copying,
 distribution (with or without modification), making available to the
 public, and in some countries other activities as well.
   To "convey" a work means any kind of propagation that enables other
 parties to make or receive copies.  Mere interaction with a user through
 a computer network, with no transfer of a copy, is not conveying.
   An interactive user interface displays "Appropriate Legal Notices"
 to the extent that it includes a convenient and prominently visible
 feature that (1) displays an appropriate copyright notice, and (2)
 tells the user that there is no warranty for the work (except to the
 extent that warranties are provided), that licensees may convey the
 work under this License, and how to view a copy of this License.  If
 the interface presents a list of user commands or options, such as a
 menu, a prominent item in the list meets this criterion.
 . Source Code.
   The "source code" for a work means the preferred form of the work
 for making modifications to it.  "Object code" means any non-source
 form of a work.
   A "Standard Interface" means an interface that either is an official
 standard defined by a recognized standards body, or, in the case of
 interfaces specified for a particular programming language, one that
 is widely used among developers working in that language.
   The "System Libraries" of an executable work include anything, other
 than the work as a whole, that (a) is included in the normal form of
 packaging a Major Component, but which is not part of that Major
 Component, and (b) serves only to enable use of the work with that
 Major Component, or to implement a Standard Interface for which an
 implementation is available to the public in source code form.  A
 "Major Component", in this context, means a major essential component
 (kernel, window system, and so on) of the specific operating system
 (if any) on which the executable work runs, or a compiler used to
 produce the work, or an object code interpreter used to run it.
   The "Corresponding Source" for a work in object code form means all
 the source code needed to generate, install, and (for an executable
 work) run the object code and to modify the work, including scripts to
 control those activities.  However, it does not include the work's
 System Libraries, or general-purpose tools or generally available free
 programs which are used unmodified in performing those activities but
 which are not part of the work.  For example, Corresponding Source
 includes interface definition files associated with source files for
 the work, and the source code for shared libraries and dynamically
 linked subprograms that the work is specifically designed to require,
 such as by intimate data communication or control flow between those
 subprograms and other parts of the work.
   The Corresponding Source need not include anything that users
 can regenerate automatically from other parts of the Corresponding
 Source.
   The Corresponding Source for a work in source code form is that
 same work.
 . Basic Permissions.
   All rights granted under this License are granted for the term of
 copyright on the Program, and are irrevocable provided the stated
 conditions are met.  This License explicitly affirms your unlimited
 permission to run the unmodified Program.  The output from running a
 covered work is covered by this License only if the output, given its
 content, constitutes a covered work.  This License acknowledges your
 rights of fair use or other equivalent, as provided by copyright law.
   You may make, run and propagate covered works that you do not
 convey, without conditions so long as your license otherwise remains
 in force.  You may convey covered works to others for the sole purpose
 of having them make modifications exclusively for you, or provide you
 with facilities for running those works, provided that you comply with
 the terms of this License in conveying all material for which you do
 not control copyright.  Those thus making or running the covered works
 for you must do so exclusively on your behalf, under your direction
 and control, on terms that prohibit them from making any copies of
 your copyrighted material outside their relationship with you.
   Conveying under any other circumstances is permitted solely under
 the conditions stated below.  Sublicensing is not allowed; section 10
 makes it unnecessary.
 . Protecting Users' Legal Rights From Anti-Circumvention Law.
   No covered work shall be deemed part of an effective technological
 measure under any applicable law fulfilling obligations under article
 of the WIPO copyright treaty adopted on 20 December 1996, or
 similar laws prohibiting or restricting circumvention of such
 measures.
   When you convey a covered work, you waive any legal power to forbid
 circumvention of technological measures to the extent such circumvention
 is effected by exercising rights under this License with respect to
 the covered work, and you disclaim any intention to limit operation or
 modification of the work as a means of enforcing, against the work's
 users, your or third parties' legal rights to forbid circumvention of
 technological measures.
 . Conveying Verbatim Copies.
   You may convey verbatim copies of the Program's source code as you
 receive it, in any medium, provided that you conspicuously and
 appropriately publish on each copy an appropriate copyright notice;
 keep intact all notices stating that this License and any
 non-permissive terms added in accord with section 7 apply to the code;
 keep intact all notices of the absence of any warranty; and give all
 recipients a copy of this License along with the Program.
   You may charge any price or no price for each copy that you convey,
 and you may offer support or warranty protection for a fee.
 . Conveying Modified Source Versions.
   You may convey a work based on the Program, or the modifications to
 produce it from the Program, in the form of source code under the
 terms of section 4, provided that you also meet all of these conditions:
     a) The work must carry prominent notices stating that you modified
     it, and giving a relevant date.
     b) The work must carry prominent notices stating that it is
     released under this License and any conditions added under section
 .  This requirement modifies the requirement in section 4 to
     "keep intact all notices".
     c) You must license the entire work, as a whole, under this
     License to anyone who comes into possession of a copy.  This
     License will therefore apply, along with any applicable section 7
     additional terms, to the whole of the work, and all its parts,
     regardless of how they are packaged.  This License gives no
     permission to license the work in any other way, but it does not
     invalidate such permission if you have separately received it.
     d) If the work has interactive user interfaces, each must display
     Appropriate Legal Notices; however, if the Program has interactive
     interfaces that do not display Appropriate Legal Notices, your
     work need not make them do so.
   A compilation of a covered work with other separate and independent
 works, which are not by their nature extensions of the covered work,
 and which are not combined with it such as to form a larger program,
 in or on a volume of a storage or distribution medium, is called an
 "aggregate" if the compilation and its resulting copyright are not
 used to limit the access or legal rights of the compilation's users
 beyond what the individual works permit.  Inclusion of a covered work
 in an aggregate does not cause this License to apply to the other
 parts of the aggregate.
 . Conveying Non-Source Forms.
   You may convey a covered work in object code form under the terms
 of sections 4 and 5, provided that you also convey the
 machine-readable Corresponding Source under the terms of this License,
 in one of these ways:
     a) Convey the object code in, or embodied in, a physical product
     (including a physical distribution medium), accompanied by the
     Corresponding Source fixed on a durable physical medium
     customarily used for software interchange.
     b) Convey the object code in, or embodied in, a physical product
     (including a physical distribution medium), accompanied by a
     written offer, valid for at least three years and valid for as
     long as you offer spare parts or customer support for that product
     model, to give anyone who possesses the object code either (1) a
     copy of the Corresponding Source for all the software in the
     product that is covered by this License, on a durable physical
     medium customarily used for software interchange, for a price no
     more than your reasonable cost of physically performing this
     conveying of source, or (2) access to copy the
     Corresponding Source from a network server at no charge.
     c) Convey individual copies of the object code with a copy of the
     written offer to provide the Corresponding Source.  This
     alternative is allowed only occasionally and noncommercially, and
     only if you received the object code with such an offer, in accord
     with subsection 6b.
     d) Convey the object code by offering access from a designated
     place (gratis or for a charge), and offer equivalent access to the
     Corresponding Source in the same way through the same place at no
     further charge.  You need not require recipients to copy the
     Corresponding Source along with the object code.  If the place to
     copy the object code is a network server, the Corresponding Source
     may be on a different server (operated by you or a third party)
     that supports equivalent copying facilities, provided you maintain
     clear directions next to the object code saying where to find the
     Corresponding Source.  Regardless of what server hosts the
     Corresponding Source, you remain obligated to ensure that it is
     available for as long as needed to satisfy these requirements.
     e) Convey the object code using peer-to-peer transmission, provided
     you inform other peers where the object code and Corresponding
     Source of the work are being offered to the general public at no
     charge under subsection 6d.
   A separable portion of the object code, whose source code is excluded
 from the Corresponding Source as a System Library, need not be
 included in conveying the object code work.
   A "User Product" is either (1) a "consumer product", which means any
 tangible personal property which is normally used for personal, family,
 or household purposes, or (2) anything designed or sold for incorporation
 into a dwelling.  In determining whether a product is a consumer product,
 doubtful cases shall be resolved in favor of coverage.  For a particular
 product received by a particular user, "normally used" refers to a
 typical or common use of that class of product, regardless of the status
 of the particular user or of the way in which the particular user
 actually uses, or expects or is expected to use, the product.  A product
 is a consumer product regardless of whether the product has substantial
 commercial, industrial or non-consumer uses, unless such uses represent
 the only significant mode of use of the product.
   "Installation Information" for a User Product means any methods,
 procedures, authorization keys, or other information required to install
 and execute modified versions of a covered work in that User Product from
 a modified version of its Corresponding Source.  The information must
 suffice to ensure that the continued functioning of the modified object
 code is in no case prevented or interfered with solely because
 modification has been made.
   If you convey an object code work under this section in, or with, or
 specifically for use in, a User Product, and the conveying occurs as
 part of a transaction in which the right of possession and use of the
 User Product is transferred to the recipient in perpetuity or for a
 fixed term (regardless of how the transaction is characterized), the
 Corresponding Source conveyed under this section must be accompanied
 by the Installation Information.  But this requirement does not apply
 if neither you nor any third party retains the ability to install
 modified object code on the User Product (for example, the work has
 been installed in ROM).
   The requirement to provide Installation Information does not include a
 requirement to continue to provide support service, warranty, or updates
 for a work that has been modified or installed by the recipient, or for
 the User Product in which it has been modified or installed.  Access to a
 network may be denied when the modification itself materially and
 adversely affects the operation of the network or violates the rules and
 protocols for communication across the network.
   Corresponding Source conveyed, and Installation Information provided,
 in accord with this section must be in a format that is publicly
 documented (and with an implementation available to the public in
 source code form), and must require no special password or key for
 unpacking, reading or copying.
 . Additional Terms.
   "Additional permissions" are terms that supplement the terms of this
 License by making exceptions from one or more of its conditions.
 Additional permissions that are applicable to the entire Program shall
 be treated as though they were included in this License, to the extent
 that they are valid under applicable law.  If additional permissions
 apply only to part of the Program, that part may be used separately
 under those permissions, but the entire Program remains governed by
 this License without regard to the additional permissions.
   When you convey a copy of a covered work, you may at your option
 remove any additional permissions from that copy, or from any part of
 it.  (Additional permissions may be written to require their own
 removal in certain cases when you modify the work.)  You may place
 additional permissions on material, added by you to a covered work,
 for which you have or can give appropriate copyright permission.
   Notwithstanding any other provision of this License, for material you
 add to a covered work, you may (if authorized by the copyright holders of
 that material) supplement the terms of this License with terms:
     a) Disclaiming warranty or limiting liability differently from the
     terms of sections 15 and 16 of this License; or
     b) Requiring preservation of specified reasonable legal notices or
     author attributions in that material or in the Appropriate Legal
     Notices displayed by works containing it; or
     c) Prohibiting misrepresentation of the origin of that material, or
     requiring that modified versions of such material be marked in
     reasonable ways as different from the original version; or
     d) Limiting the use for publicity purposes of names of licensors or
     authors of the material; or
     e) Declining to grant rights under trademark law for use of some
     trade names, trademarks, or service marks; or
     f) Requiring indemnification of licensors and authors of that
     material by anyone who conveys the material (or modified versions of
     it) with contractual assumptions of liability to the recipient, for
     any liability that these contractual assumptions directly impose on
     those licensors and authors.
   All other non-permissive additional terms are considered "further
 restrictions" within the meaning of section 10.  If the Program as you
 received it, or any part of it, contains a notice stating that it is
 governed by this License along with a term that is a further
 restriction, you may remove that term.  If a license document contains
 a further restriction but permits relicensing or conveying under this
 License, you may add to a covered work material governed by the terms
 of that license document, provided that the further restriction does
 not survive such relicensing or conveying.
   If you add terms to a covered work in accord with this section, you
 must place, in the relevant source files, a statement of the
 additional terms that apply to those files, or a notice indicating
 where to find the applicable terms.
   Additional terms, permissive or non-permissive, may be stated in the
 form of a separately written license, or stated as exceptions;
 the above requirements apply either way.
 . Termination.
   You may not propagate or modify a covered work except as expressly
 provided under this License.  Any attempt otherwise to propagate or
 modify it is void, and will automatically terminate your rights under
 this License (including any patent licenses granted under the third
 paragraph of section 11).
   However, if you cease all violation of this License, then your
 license from a particular copyright holder is reinstated (a)
 provisionally, unless and until the copyright holder explicitly and
 finally terminates your license, and (b) permanently, if the copyright
 holder fails to notify you of the violation by some reasonable means
 prior to 60 days after the cessation.
   Moreover, your license from a particular copyright holder is
 reinstated permanently if the copyright holder notifies you of the
 violation by some reasonable means, this is the first time you have
 received notice of violation of this License (for any work) from that
 copyright holder, and you cure the violation prior to 30 days after
 your receipt of the notice.
   Termination of your rights under this section does not terminate the
 licenses of parties who have received copies or rights from you under
 this License.  If your rights have been terminated and not permanently
 reinstated, you do not qualify to receive new licenses for the same
 material under section 10.
 . Acceptance Not Required for Having Copies.
   You are not required to accept this License in order to receive or
 run a copy of the Program.  Ancillary propagation of a covered work
 occurring solely as a consequence of using peer-to-peer transmission
 to receive a copy likewise does not require acceptance.  However,
 nothing other than this License grants you permission to propagate or
 modify any covered work.  These actions infringe copyright if you do
 not accept this License.  Therefore, by modifying or propagating a
 covered work, you indicate your acceptance of this License to do so.
 . Automatic Licensing of Downstream Recipients.
   Each time you convey a covered work, the recipient automatically
 receives a license from the original licensors, to run, modify and
 propagate that work, subject to this License.  You are not responsible
 for enforcing compliance by third parties with this License.
   An "entity transaction" is a transaction transferring control of an
 organization, or substantially all assets of one, or subdividing an
 organization, or merging organizations.  If propagation of a covered
 work results from an entity transaction, each party to that
 transaction who receives a copy of the work also receives whatever
 licenses to the work the party's predecessor in interest had or could
 give under the previous paragraph, plus a right to possession of the
 Corresponding Source of the work from the predecessor in interest, if
 the predecessor has it or can get it with reasonable efforts.
   You may not impose any further restrictions on the exercise of the
 rights granted or affirmed under this License.  For example, you may
 not impose a license fee, royalty, or other charge for exercise of
 rights granted under this License, and you may not initiate litigation
 (including a cross-claim or counterclaim in a lawsuit) alleging that
 any patent claim is infringed by making, using, selling, offering for
 sale, or importing the Program or any portion of it.
 . Patents.
   A "contributor" is a copyright holder who authorizes use under this
 License of the Program or a work on which the Program is based.  The
 work thus licensed is called the contributor's "contributor version".
   A contributor's "essential patent claims" are all patent claims
 owned or controlled by the contributor, whether already acquired or
 hereafter acquired, that would be infringed by some manner, permitted
 by this License, of making, using, or selling its contributor version,
 but do not include claims that would be infringed only as a
 consequence of further modification of the contributor version.  For
 purposes of this definition, "control" includes the right to grant
 patent sublicenses in a manner consistent with the requirements of
 this License.
   Each contributor grants you a non-exclusive, worldwide, royalty-free
 patent license under the contributor's essential patent claims, to
 make, use, sell, offer for sale, import and otherwise run, modify and
 propagate the contents of its contributor version.
   In the following three paragraphs, a "patent license" is any express
 agreement or commitment, however denominated, not to enforce a patent
 (such as an express permission to practice a patent or covenant not to
 sue for patent infringement).  To "grant" such a patent license to a
 party means to make such an agreement or commitment not to enforce a
 patent against the party.
   If you convey a covered work, knowingly relying on a patent license,
 and the Corresponding Source of the work is not available for anyone
 to copy, free of charge and under the terms of this License, through a
 publicly available network server or other readily accessible means,
 then you must either (1) cause the Corresponding Source to be so
 available, or (2) arrange to deprive yourself of the benefit of the
 patent license for this particular work, or (3) arrange, in a manner
 consistent with the requirements of this License, to extend the patent
 license to downstream recipients.  "Knowingly relying" means you have
 actual knowledge that, but for the patent license, your conveying the
 covered work in a country, or your recipient's use of the covered work
 in a country, would infringe one or more identifiable patents in that
 country that you have reason to believe are valid.
   If, pursuant to or in connection with a single transaction or
 arrangement, you convey, or propagate by procuring conveyance of, a
 covered work, and grant a patent license to some of the parties
 receiving the covered work authorizing them to use, propagate, modify
 or convey a specific copy of the covered work, then the patent license
 you grant is automatically extended to all recipients of the covered
 work and works based on it.
   A patent license is "discriminatory" if it does not include within
 the scope of its coverage, prohibits the exercise of, or is
 conditioned on the non-exercise of one or more of the rights that are
 specifically granted under this License.  You may not convey a covered
 work if you are a party to an arrangement with a third party that is
 in the business of distributing software, under which you make payment
 to the third party based on the extent of your activity of conveying
 the work, and under which the third party grants, to any of the
 parties who would receive the covered work from you, a discriminatory
 patent license (a) in connection with copies of the covered work
 conveyed by you (or copies made from those copies), or (b) primarily
 for and in connection with specific products or compilations that
 contain the covered work, unless you entered into that arrangement,
 or that patent license was granted, prior to 28 March 2007.
   Nothing in this License shall be construed as excluding or limiting
 any implied license or other defenses to infringement that may
 otherwise be available to you under applicable patent law.
 . No Surrender of Others' Freedom.
   If conditions are imposed on you (whether by court order, agreement or
 otherwise) that contradict the conditions of this License, they do not
 excuse you from the conditions of this License.  If you cannot convey a
 covered work so as to satisfy simultaneously your obligations under this
 License and any other pertinent obligations, then as a consequence you may
 not convey it at all.  For example, if you agree to terms that obligate you
 to collect a royalty for further conveying from those to whom you convey
 the Program, the only way you could satisfy both those terms and this
 License would be to refrain entirely from conveying the Program.
 . Remote Network Interaction; Use with the GNU General Public License.
   Notwithstanding any other provision of this License, if you modify the
 Program, your modified version must prominently offer all users
 interacting with it remotely through a computer network (if your version
 supports such interaction) an opportunity to receive the Corresponding
 Source of your version by providing access to the Corresponding Source
 from a network server at no charge, through some standard or customary
 means of facilitating copying of software.  This Corresponding Source
 shall include the Corresponding Source for any work covered by version 3
 of the GNU General Public License that is incorporated pursuant to the
 following paragraph.
   Notwithstanding any other provision of this License, you have
 permission to link or combine any covered work with a work licensed
 under version 3 of the GNU General Public License into a single
 combined work, and to convey the resulting work.  The terms of this
 License will continue to apply to the part which is the covered work,
 but the work with which it is combined will remain governed by version
 of the GNU General Public License.
 . Revised Versions of this License.
   The Free Software Foundation may publish revised and/or new versions of
 the GNU Affero General Public License from time to time.  Such new versions
 will be similar in spirit to the present version, but may differ in detail to
 address new problems or concerns.
   Each version is given a distinguishing version number.  If the
 Program specifies that a certain numbered version of the GNU Affero General
 Public License "or any later version" applies to it, you have the
 option of following the terms and conditions either of that numbered
 version or of any later version published by the Free Software
 Foundation.  If the Program does not specify a version number of the
 GNU Affero General Public License, you may choose any version ever published
 by the Free Software Foundation.
   If the Program specifies that a proxy can decide which future
 versions of the GNU Affero General Public License can be used, that proxy's
 public statement of acceptance of a version permanently authorizes you
 to choose that version for the Program.
   Later license versions may give you additional or different
 permissions.  However, no additional obligations are imposed on any
 author or copyright holder as a result of your choosing to follow a
 later version.
 . Disclaimer of Warranty.
   THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
 APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
 HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
 OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
 THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
 IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
 ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
 . Limitation of Liability.
   IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
 WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
 THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
 GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
 USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
 DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
 PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
 EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
 SUCH DAMAGES.
 . Interpretation of Sections 15 and 16.
   If the disclaimer of warranty and limitation of liability provided
 above cannot be given local legal effect according to their terms,
 reviewing courts shall apply local law that most closely approximates
 an absolute waiver of all civil liability in connection with the
 Program, unless a warranty or assumption of liability accompanies a
 copy of the Program in return for a fee.
                      END OF TERMS AND CONDITIONS
             How to Apply These Terms to Your New Programs
   If you develop a new program, and you want it to be of the greatest
 possible use to the public, the best way to achieve this is to make it
 free software which everyone can redistribute and change under these terms.
   To do so, attach the following notices to the program.  It is safest
 to attach them to the start of each source file to most effectively
 state the exclusion of warranty; and each file should have at least
 the "copyright" line and a pointer to where the full notice is found.
     <one line to give the program's name and a brief idea of what it does.>
     Copyright (C) <year>  <name of author>
     This program is free software: you can redistribute it and/or modify
     it under the terms of the GNU Affero General Public License as published
     by the Free Software Foundation, either version 3 of the License, or
     (at your option) any later version.
     This program is distributed in the hope that it will be useful,
     but WITHOUT ANY WARRANTY; without even the implied warranty of
     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
     GNU Affero General Public License for more details.
     You should have received a copy of the GNU Affero General Public License
     along with this program.  If not, see <https://www.gnu.org/licenses/>.
 Also add information on how to contact you by electronic and paper mail.
   If your software can interact with users remotely through a computer
 network, you should also make sure that it provides a way for users to
 get its source.  For example, if your program is a web application, its
 interface could display a "Source" link that leads users to an archive
 of the code.  There are many ways you could offer source, and different
 solutions will be better for different programs; see section 13 for the
 specific requirements.
   You should also get your employer (if you work as a programmer) or school,
 if any, to sign a "copyright disclaimer" for the program, if necessary.
 For more information on this, and how to apply and follow the GNU AGPL, see
 <https://www.gnu.org/licenses/>.
 Files under unsloth/*, tests/*, scripts/* are Apache 2.0 licensed.
 Files under studio/*, unsloth_cli/* which is optional to install are AGPLv3 licensed.

4

LICENSE

View file

 @ -186,7 +186,9 @@
       same "printed page" as the copyright notice for easier
       identification within third-party archives.
    Copyright [yyyy] [name of copyright owner]
    Copyright [2024-] [Unsloth AI. Inc team, Daniel Han-Chen & Michael Han-Chen]
    Files under unsloth/*, tests/*, scripts/* are Apache 2.0 licensed.
    Files under studio/*, unsloth_cli/* which is optional to install are AGPLv3 licensed.
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.

									
										625

README.md
									
										View file
										
				@ -1,441 +1,266 @@

				<p align="center">

				  <picture>

				    <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/shimmyshimmer/unsloth/main/images/unsloth%20logo%20white%20text.png">

				    <source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/shimmyshimmer/unsloth/main/images/unsloth%20logo%20black%20text.png">

				    <img alt="unsloth logo" src="./images/unsloth%20logo%20black%20text.png" height="120" style="max-width: 100%;">

				  </picture>

				</p>

				<p align="center">

				  <a href="https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing"><img src="./images/Free version button.png" height="50"></a>

				  <a href="https://discord.gg/u54VK8m8tk"><img src="./images/Discord button.png" height="50"></a>

				  <a href="https://ko-fi.com/unsloth"><img src="./images/Kofi button.png" height="50"></a>

				</p>

				<h1 align="center" style="margin:0;">

				  <a href="https://unsloth.ai/docs"><picture>

				    <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20logo%20white%20text.png">

				    <source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20logo%20black%20text.png">

				    <img alt="Unsloth logo" src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20logo%20black%20text.png" height="80" style="max-width:100%;">

				  </picture></a>

				</h1>

				<h3 align="center" style="margin: 0; margin-top: 0;">

				Unsloth Studio lets you run and train models locally.

				</h3>

				<h2 align="center">

				    Finetune Mistral, Llama 2-5x faster with 50% less memory!

				</h2>

				<p align="center">

				  <a href="#-features">Features</a> •

				  <a href="#-install">Quickstart</a> •

				  <a href="#-free-notebooks">Notebooks</a> •

				  <a href="https://unsloth.ai/docs">Documentation</a>

				</p>

				<br>

				<a href="https://unsloth.ai/docs/new/studio">

				<img alt="unsloth studio ui homepage" src="https://github.com/user-attachments/assets/53ae17a9-d975-44ef-9686-efb4ebd0454d" style="max-width: 100%; margin-bottom: 0;"></a>

				| Llama 2 7b                    | Mistral 7b                  | CodeLlama 34b           | Llama 7b Kaggle 2x T4  |

				|-----------------------------|-----------------------------|-------------------------|------------------------|

				| **2.2x faster 43% less VRAM**     | **2.2x faster 62% less VRAM**     | **1.9x faster 27% less VRAM**  | **5.5x faster 44% less VRAM** |

				| [⭐Llama **free** Colab notebook](https://colab.research.google.com/drive/1lBzz5KeZJKXjvivbYvmGarix9Ao6Wxe5?usp=sharing") | [⭐Mistral **free** Colab notebook](https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing) | [CodeLlama A100 Colab notebook](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing) | [⭐Kaggle **free** Alpaca notebook](https://www.kaggle.com/danielhanchen/unsloth-alpaca-t4-ddp)

				| [Llama A100 Colab notebook](https://colab.research.google.com/drive/1YIPY_18xm-K0iJDgvNkRoJsgkPMPAO3G?usp=sharing) | [Mistral A100 Colab notebook](https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing) | 50+ more examples below! | [⭐Kaggle **free** Slim Orca notebook](https://www.kaggle.com/danielhanchen/unsloth-slimorca-t4-ddp) |

				## ⚡ Get started

				* **NEW!** [DPO](https://arxiv.org/abs/2305.18290) support. ⭐**Free!** DPO Zephyr, Mistral example! <a href="https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing"><img src="./images/Colab.png" height="20">  [More info](#DPO) on DPO

				* **NEW!** [TinyLlama 1.1b](https://github.com/jzhang38/TinyLlama) on 3T tokens! ⭐**Free!** example <a href="https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing"><img src="./images/Colab.png" height="20">

				* **NEW!** We're in 🤗 Huggingface's official docs! We're on the [SFT docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth) and the [DPO docs](https://huggingface.co/docs/trl/main/en/dpo_trainer#accelerate-dpo-fine-tuning-using-unsloth)!

				* Supports Llama, Yi, Mistral, CodeLlama, Qwen (llamafied), Deepseek and their derived models (Open Hermes etc).

				* All kernels written in [OpenAI's Triton](https://openai.com/research/triton) language. **Manual backprop engine**.

				* **0% loss in accuracy** - no approximation methods - all exact.

				* No change of hardware. Supports NVIDIA GPUs since 2018+. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc) [Check your GPU!](https://developer.nvidia.com/cuda-gpus) GTX 1070, 1080 works, but is slow.

				* Works on **Linux** and **Windows** via WSL.

				* **NEW!** Download 4 bit models 4x faster from 🤗 Huggingface! Eg: `unsloth/mistral-7b-bnb-4bit`

				* Supports 4bit and 16bit QLoRA / LoRA finetuning via [bitsandbytes](https://github.com/TimDettmers/bitsandbytes).

				* **NEW!** Want a UI for finetuning? Try [Llama-Factory](https://github.com/hiyouga/LLaMA-Factory) and use `--use_unsloth`!

				* Open source trains 5x faster - see [Unsloth Pro](https://unsloth.ai/) for **30x faster training**!

				| 1 A100 40GB  | 🤗 Hugging Face | Flash Attention | 🦥 Unsloth Open Source | [🦥 Unsloth Pro](https://unsloth.ai/pricing) |

				|--------------|--------------|-----------------|---------------------|-----------------|

				| Alpaca       | 1x           | 1.04x           | 1.98x               | **15.64x**      |

				| LAION Chip2  | 1x           | 0.92x           | 1.61x               | **20.73x**      |

				| OASST        | 1x           | 1.19x           | 2.17x               | **14.83x**      |

				| Slim Orca    | 1x           | 1.18x           | 2.22x               | **14.82x**      |

				Join our [Discord](https://discord.gg/nsS4V5Z6ge)!

				<img src="./images/unsloth made with love.png" width="200" />

				If you trained a model with 🦥 Unsloth, we made a cool sticker if you want to use it!

				# Installation Instructions - Conda

				Select either `pytorch-cuda=11.8` for CUDA 11.8 or `pytorch-cuda=12.1` for CUDA 12.1.

				#### macOS, Linux, WSL:

				```bash

				conda install cudatoolkit xformers bitsandbytes pytorch pytorch-cuda=12.1 \

				  -c pytorch -c nvidia -c xformers -c conda-forge -y

				pip install "unsloth[conda] @ git+https://github.com/unslothai/unsloth.git"

				curl -fsSL https://unsloth.ai/install.sh | sh

				```

				# Installation Instructions - Pip

				Do **NOT** use this if you have Anaconda. You must use the Conda install method, or else stuff will BREAK.

				1. Find your CUDA version via

				```python

				import torch; torch.version.cuda

				#### Windows:

				```powershell

				irm https://unsloth.ai/install.ps1 | iex

				```

				2. For Pytorch 2.1.0: You can update Pytorch via Pip (interchange `cu121` / `cu118`). Go to https://pytorch.org/ to learn more. Select either `cu118` for CUDA 11.8 or `cu121` for CUDA 12.1. If you have a RTX 3060 or higher (A100, H100 etc), use the `"ampere"` path. For Pytorch 2.1.1: got to step 3.

				#### Community:

				- [Discord](https://discord.gg/unsloth)

				- [𝕏 (Twitter)](https://x.com/UnslothAI)

				- [Reddit](https://reddit.com/r/unsloth)

				## ⭐ Features

				Unsloth Studio (Beta) lets you run and train text, [audio](https://unsloth.ai/docs/basics/text-to-speech-tts-fine-tuning), [embedding](https://unsloth.ai/docs/new/embedding-finetuning), [vision](https://unsloth.ai/docs/basics/vision-fine-tuning) models on Windows, Linux and macOS.

				### Inference

				* **Search + download + run models** including GGUF, LoRA adapters, safetensors

				* **Export models**: [Save or export](https://unsloth.ai/docs/new/studio/export) models to GGUF, 16-bit safetensors and other formats.

				* **Tool calling**: Support for [self-healing tool calling](https://unsloth.ai/docs/new/studio/chat#auto-healing-tool-calling) and web search

				* **[Code execution](https://unsloth.ai/docs/new/studio/chat#code-execution)**: lets LLMs test code in Claude artifacts and sandbox environments

				* [Auto-tune inference parameters](https://unsloth.ai/docs/new/studio/chat#auto-parameter-tuning) and customize chat templates.

				* We work directly with teams behind [gpt-oss](https://docs.unsloth.ai/new/gpt-oss-how-to-run-and-fine-tune#unsloth-fixes-for-gpt-oss), [Qwen3](https://www.reddit.com/r/LocalLLaMA/comments/1kaodxu/qwen3_unsloth_dynamic_ggufs_128k_context_bug_fixes/), [Llama 4](https://github.com/ggml-org/llama.cpp/pull/12889), [Mistral](models/tutorials/devstral-how-to-run-and-fine-tune.md), [Gemma 1-3](https://news.ycombinator.com/item?id=39671146), and [Phi-4](https://unsloth.ai/blog/phi4), where we’ve fixed bugs that improve model accuracy.

				* Upload images, audio, PDFs, code, DOCX and more file types to chat with.

				### Training

				* Train and RL **500+ models** up to **2x faster** with up to **70% less VRAM**, with no accuracy loss.

				* Custom Triton and mathematical **kernels**. See some collabs we did with [PyTorch](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/fp8-reinforcement-learning) and [Hugging Face](https://unsloth.ai/docs/new/faster-moe).

				* **Data Recipes**: [Auto-create datasets](https://unsloth.ai/docs/new/studio/data-recipe) from **PDF, CSV, DOCX** etc. Edit data in a visual-node workflow.

				* **[Reinforcement Learning](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide)** (RL): The most efficient [RL](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide) library, using **80% less VRAM** for GRPO, [FP8](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/fp8-reinforcement-learning) etc.

				* Supports full fine-tuning, RL, pretraining, 4-bit, 16-bit and, FP8 training.

				* **Observability**: Monitor training live, track loss and GPU usage and customize graphs.

				* [Multi-GPU](https://unsloth.ai/docs/basics/multi-gpu-training-with-unsloth) training is supported, with major improvements coming soon.

				## 📥 Install

				Unsloth can be used in two ways: through **[Unsloth Studio](https://unsloth.ai/docs/new/studio/)**, the web UI, or through **Unsloth Core**, the code-based version. Each has different requirements.

				### Unsloth Studio (web UI)

				Unsloth Studio (Beta) works on **Windows, Linux, WSL** and **macOS**.

				* **CPU:** Supported for Chat and Data Recipes currently

				* **NVIDIA:** Training works on RTX 30/40/50, Blackwell, DGX Spark, Station and more

				* **macOS:** Currently supports chat and Data Recipes. **MLX training** is coming very soon

				* **AMD:** Chat + Data works. Train with [Unsloth Core](#unsloth-core-code-based). Studio support is out soon.

				* **Coming soon:** Training support for Apple MLX, AMD, and Intel.

				* **Multi-GPU:** Available now, with a major upgrade on the way

				#### macOS, Linux, WSL:

				```bash

				pip install --upgrade --force-reinstall --no-cache-dir torch==2.1.0 triton \

				  --index-url https://download.pytorch.org/whl/cu121

				curl -fsSL https://unsloth.ai/install.sh | sh

				```

				#### Windows:

				```powershell

				irm https://unsloth.ai/install.ps1 | iex

				```

				#### Launch

				```bash

				pip install "unsloth[cu118] @ git+https://github.com/unslothai/unsloth.git"

				pip install "unsloth[cu121] @ git+https://github.com/unslothai/unsloth.git"

				pip install "unsloth[cu118_ampere] @ git+https://github.com/unslothai/unsloth.git"

				pip install "unsloth[cu121_ampere] @ git+https://github.com/unslothai/unsloth.git"

				unsloth studio -H 0.0.0.0 -p 8888

				```

				3. For Pytorch 2.1.1: Use the `"ampere"` path for newer RTX 30xx GPUs or higher.

				#### Update

				To update, use the same install commands as above. Or run (does not work on Windows):

				```bash

				pip install --upgrade --force-reinstall --no-cache-dir torch==2.1.1 triton \

				  --index-url https://download.pytorch.org/whl/cu121

				unsloth studio update

				```

				#### Docker

				Use our [Docker image](https://hub.docker.com/r/unsloth/unsloth) ```unsloth/unsloth``` container. Run:

				```bash

				pip install "unsloth[cu118_torch211] @ git+https://github.com/unslothai/unsloth.git"

				pip install "unsloth[cu121_torch211] @ git+https://github.com/unslothai/unsloth.git"

				pip install "unsloth[cu118_ampere_torch211] @ git+https://github.com/unslothai/unsloth.git"

				pip install "unsloth[cu121_ampere_torch211] @ git+https://github.com/unslothai/unsloth.git"

				```

				4. We're working on Pytorch 2.1.2 support.

				5. If you get errors, try the below first, then go back to step 1:

				docker run -d -e JUPYTER_PASSWORD="mypassword" \

				  -p 8888:8888 -p 8000:8000 -p 2222:22 \

				  -v $(pwd)/work:/workspace/work \

				  --gpus all \

				  unsloth/unsloth

				  ```

				#### Developer, Nightly, Uninstall

				To see developer, nightly and uninstallation etc. instructions, see [advanced installation](#-advanced-installation).

				### Unsloth Core (code-based)

				#### Linux, WSL:

				```bash

				pip install --upgrade pip

				curl -LsSf https://astral.sh/uv/install.sh | sh

				uv venv unsloth_env --python 3.13

				source unsloth_env/bin/activate

				uv pip install unsloth --torch-backend=auto

				```

				# Documentation

				We support Huggingface's TRL, Trainer, Seq2SeqTrainer or even Pytorch code!

				We're in 🤗 Huggingface's official docs! We're on the [SFT docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth) and the [DPO docs](https://huggingface.co/docs/trl/main/en/dpo_trainer#accelerate-dpo-fine-tuning-using-unsloth)!

				```python

				from unsloth import FastLanguageModel

				import torch

				from trl import SFTTrainer

				from transformers import TrainingArguments

				from datasets import load_dataset

				max_seq_length = 2048 # Supports RoPE Scaling interally, so choose any!

				# Get LAION dataset

				url = "https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl"

				dataset = load_dataset("json", data_files = {"train" : url}, split = "train")

				# 4bit pre quantized models we support - 4x faster downloading!

				fourbit_models = [

				    "unsloth/mistral-7b-bnb-4bit",

				    "unsloth/llama-2-7b-bnb-4bit",

				    "unsloth/llama-2-13b-bnb-4bit",

				    "unsloth/codellama-34b-bnb-4bit",

				    "unsloth/tinyllama-bnb-4bit",

				]

				# Load Llama model

				model, tokenizer = FastLanguageModel.from_pretrained(

				    model_name = "unsloth/mistral-7b-bnb-4bit", # Supports Llama, Mistral - replace this!

				    max_seq_length = max_seq_length,

				    dtype = None,

				    load_in_4bit = True,

				)

				# Do model patching and add fast LoRA weights

				model = FastLanguageModel.get_peft_model(

				    model,

				    r = 16,

				    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",

				                      "gate_proj", "up_proj", "down_proj",],

				    lora_alpha = 16,

				    lora_dropout = 0, # Supports any, but = 0 is optimized

				    bias = "none",    # Supports any, but = "none" is optimized

				    use_gradient_checkpointing = True,

				    random_state = 3407,

				    max_seq_length = max_seq_length,

				)

				trainer = SFTTrainer(

				    model = model,

				    train_dataset = dataset,

				    dataset_text_field = "text",

				    max_seq_length = max_seq_length,

				    tokenizer = tokenizer,

				    args = TrainingArguments(

				        per_device_train_batch_size = 2,

				        gradient_accumulation_steps = 4,

				        warmup_steps = 10,

				        max_steps = 60,

				        fp16 = not torch.cuda.is_bf16_supported(),

				        bf16 = torch.cuda.is_bf16_supported(),

				        logging_steps = 1,

				        output_dir = "outputs",

				        optim = "adamw_8bit",

				        seed = 3407,

				    ),

				)

				trainer.train()

				#### Windows:

				```powershell

				winget install -e --id Python.Python.3.13

				winget install --id=astral-sh.uv  -e

				uv venv unsloth_env --python 3.13

				.\unsloth_env\Scripts\activate

				uv pip install unsloth --torch-backend=auto

				```

				For Windows, `pip install unsloth` works only if you have PyTorch installed. Read our [Windows Guide](https://unsloth.ai/docs/get-started/install/windows-installation).

				You can use the same Docker image as Unsloth Studio.

				<a name="DPO"></a>

				# DPO (Direct Preference Optimization) Support

				DPO, PPO, Reward Modelling all seem to work as per 3rd party independent testing from [Llama-Factory](https://github.com/hiyouga/LLaMA-Factory). We have a preliminary Google Colab notebook for reproducing Zephyr on Tesla T4 here: [notebook](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing).

				#### AMD, Intel:

				For RTX 50x, B200, 6000 GPUs: `uv pip install unsloth --torch-backend=auto`. Read our guides for: [Blackwell](https://unsloth.ai/docs/blog/fine-tuning-llms-with-blackwell-rtx-50-series-and-unsloth) and [DGX Spark](https://unsloth.ai/docs/blog/fine-tuning-llms-with-nvidia-dgx-spark-and-unsloth). <br>

				To install Unsloth on **AMD** and **Intel** GPUs, follow our [AMD Guide](https://unsloth.ai/docs/get-started/install/amd) and [Intel Guide](https://unsloth.ai/docs/get-started/install/intel).

				We're in 🤗 Huggingface's official docs! We're on the [SFT docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth) and the [DPO docs](https://huggingface.co/docs/trl/main/en/dpo_trainer#accelerate-dpo-fine-tuning-using-unsloth)!

				## 📒 Free Notebooks

				```python

				from unsloth import FastLanguageModel, PatchDPOTrainer

				PatchDPOTrainer()

				import torch

				from transformers import TrainingArguments

				from trl import DPOTrainer

				Train for free with our notebooks. You can use our new [free Unsloth Studio notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb) to run and train models for free in a web UI.

				Read our [guide](https://unsloth.ai/docs/get-started/fine-tuning-llms-guide). Add dataset, run, then deploy your trained model.

				model, tokenizer = FastLanguageModel.from_pretrained(

				    model_name = "unsloth/zephyr-sft-bnb-4bit",

				    max_seq_length = max_seq_length,

				    dtype = None,

				    load_in_4bit = True,

				)

				| Model | Free Notebooks | Performance | Memory use |

				|-----------|---------|--------|----------|

				| **Gemma 4 (E2B)**      | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma4_(E2B)-Vision.ipynb)               | 1.5x faster | 50% less |

				| **Qwen3.5 (4B)**      | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_(4B)_Vision.ipynb)               | 1.5x faster | 60% less |

				| **gpt-oss (20B)**      | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-(20B)-Fine-tuning.ipynb)               | 2x faster | 70% less |

				| **Qwen3.5 GSPO**      | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_(4B)_Vision_GRPO.ipynb)               | 2x faster | 70% less |

				| **gpt-oss (20B): GRPO**      | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-(20B)-GRPO.ipynb)               | 2x faster | 80% less |

				| **Qwen3: Advanced GRPO**      | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb)               | 2x faster | 70% less |

				| **embeddinggemma (300M)**    | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/EmbeddingGemma_(300M).ipynb)               | 2x faster | 20% less |

				| **Mistral Ministral 3 (3B)**      | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Ministral_3_VL_(3B)_Vision.ipynb)               | 1.5x faster | 60% less |

				| **Llama 3.1 (8B) Alpaca**      | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb)               | 2x faster | 70% less |

				| **Llama 3.2 Conversational**      | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb)               | 2x faster | 70% less |

				| **Orpheus-TTS (3B)**     | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Orpheus_(3B)-TTS.ipynb)               | 1.5x faster | 50% less |

				# Do model patching and add fast LoRA weights

				model = FastLanguageModel.get_peft_model(

				    model,

				    r = 64,

				    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",

				                      "gate_proj", "up_proj", "down_proj",],

				    lora_alpha = 64,

				    lora_dropout = 0, # Supports any, but = 0 is optimized

				    bias = "none",    # Supports any, but = "none" is optimized

				    use_gradient_checkpointing = True,

				    random_state = 3407,

				    max_seq_length = max_seq_length,

				)

				- See all our notebooks for: [Kaggle](https://github.com/unslothai/notebooks?tab=readme-ov-file#-kaggle-notebooks), [GRPO](https://unsloth.ai/docs/get-started/unsloth-notebooks#grpo-reasoning-rl-notebooks), [TTS](https://unsloth.ai/docs/get-started/unsloth-notebooks#text-to-speech-tts-notebooks), [embedding](https://unsloth.ai/docs/new/embedding-finetuning) & [Vision](https://unsloth.ai/docs/get-started/unsloth-notebooks#vision-multimodal-notebooks)

				- See [all our models](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [all our notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks)

				- See detailed documentation for Unsloth [here](https://unsloth.ai/docs)

				dpo_trainer = DPOTrainer(

				    model = model,

				    ref_model = None,

				    args = TrainingArguments(

				        per_device_train_batch_size = 4,

				        gradient_accumulation_steps = 8,

				        warmup_ratio = 0.1,

				        num_train_epochs = 3,

				        fp16 = not torch.cuda.is_bf16_supported(),

				        bf16 = torch.cuda.is_bf16_supported(),

				        logging_steps = 1,

				        optim = "adamw_8bit",

				        seed = 42,

				        output_dir = "outputs",

				    ),

				    beta = 0.1,

				    train_dataset = YOUR_DATASET_HERE,

				    # eval_dataset = YOUR_DATASET_HERE,

				    tokenizer = tokenizer,

				    max_length = 1024,

				    max_prompt_length = 512,

				)

				dpo_trainer.train()

				```

				## 🦥 Unsloth News

				- **Qwen3.6**: Qwen3.6-35B-A3B can now be trained and run in Unsloth Studio. [Blog](https://unsloth.ai/docs/models/qwen3.6)

				- **Gemma 4**: Run and train Google’s new models directly in Unsloth. [Blog](https://unsloth.ai/docs/models/gemma-4)

				- **Introducing Unsloth Studio**: our new web UI for running and training LLMs. [Blog](https://unsloth.ai/docs/new/studio)

				- **Qwen3.5** - 0.8B, 2B, 4B, 9B, 27B, 35-A3B, 112B-A10B are now supported. [Guide + notebooks](https://unsloth.ai/docs/models/qwen3.5/fine-tune)

				- Train **MoE LLMs 12x faster** with 35% less VRAM - DeepSeek, GLM, Qwen and gpt-oss. [Blog](https://unsloth.ai/docs/new/faster-moe)

				- **Embedding models**: Unsloth now supports ~1.8-3.3x faster embedding fine-tuning. [Blog](https://unsloth.ai/docs/new/embedding-finetuning) • [Notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks#embedding-models)

				- New **7x longer context RL** vs. all other setups, via our new batching algorithms. [Blog](https://unsloth.ai/docs/new/grpo-long-context)

				- New RoPE & MLP **Triton Kernels** & **Padding Free + Packing**: 3x faster training & 30% less VRAM. [Blog](https://unsloth.ai/docs/new/3x-faster-training-packing)

				- **500K Context**: Training a 20B model with >500K context is now possible on an 80GB GPU. [Blog](https://unsloth.ai/docs/blog/500k-context-length-fine-tuning)

				- **FP8 & Vision RL**: You can now do FP8 & VLM GRPO on consumer GPUs. [FP8 Blog](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/vision-reinforcement-learning-vlm-rl)

				- **gpt-oss** by OpenAI: Read our [RL blog](https://unsloth.ai/docs/models/gpt-oss-how-to-run-and-fine-tune/gpt-oss-reinforcement-learning), [Flex Attention](https://unsloth.ai/docs/models/gpt-oss-how-to-run-and-fine-tune/long-context-gpt-oss-training) blog and [Guide](https://unsloth.ai/docs/models/gpt-oss-how-to-run-and-fine-tune).

				# Support us!

				We're currently 2 brothers trying to make LLMs for everyone! It'll be super cool if you can support our work!!

				<a href="https://ko-fi.com/unsloth"><img src="./images/Kofi button.png" height="50"></a>

				# Future Milestones and limitations

				1. Support Mixtral.

				2. Supports all Mistral, Llama type models, but some are unoptimized (Qwen with biases)

				3. Dropout, bias in LoRA matrices are supported, just not optimized.

				# Performance comparisons on 1 Tesla T4 GPU:

				**Time taken for 1 epoch**

				One Tesla T4 on Google Colab

				`bsz = 2, ga = 4, max_grad_norm = 0.3, num_train_epochs = 1, seed = 3047, lr = 2e-4, wd = 0.01, optim = "adamw_8bit", schedule = "linear", schedule_steps = 10`

				| System | GPU | Alpaca (52K) | LAION OIG (210K) | Open Assistant (10K) | SlimOrca (518K) |

				| --- | --- | --- | --- | --- | --- |

				| Huggingface | 1 T4 | 23h 15m | 56h 28m | 8h 38m | 391h 41m |

				| Unsloth Open | 1 T4 | 13h 7m (1.8x) | 31h 47m (1.8x) | 4h 27m (1.9x) | 240h 4m (1.6x) |

				| Unsloth Pro | 1 T4 | 3h 6m (7.5x) | 5h 17m (10.7x) | 1h 7m (7.7x) | 59h 53m (6.5x) |

				| Unsloth Max | 1 T4 | 2h 39m (8.8x) | 4h 31m (12.5x) | 0h 58m (8.9x) | 51h 30m (7.6x) |

				**Peak Memory Usage**

				| System | GPU | Alpaca (52K) | LAION OIG (210K) | Open Assistant (10K) | SlimOrca (518K) |

				| --- | --- | --- | --- | --- | --- |

				| Huggingface | 1 T4 | 7.3GB | 5.9GB | 14.0GB | 13.3GB |

				| Unsloth Open | 1 T4 | 6.8GB | 5.7GB | 7.8GB | 7.7GB |

				| Unsloth Pro | 1 T4 | 6.4GB | 6.4GB | 6.4GB | 6.4GB |

				| Unsloth Max | 1 T4 | 11.4GB | 12.4GB | 11.9GB | 14.4GB |

				# Performance comparisons on 2 Tesla T4 GPUs via DDP:

				**Time taken for 1 epoch**

				Two Tesla T4s on Kaggle

				`bsz = 2, ga = 4, max_grad_norm = 0.3, num_train_epochs = 1, seed = 3047, lr = 2e-4, wd = 0.01, optim = "adamw_8bit", schedule = "linear", schedule_steps = 10`

				| System | GPU | Alpaca (52K) | LAION OIG (210K) | Open Assistant (10K) | SlimOrca (518K) * |

				| --- | --- | --- | --- | --- | --- |

				| Huggingface | 2 T4 | 84h 47m | 163h 48m | 30h 51m | 1301h 24m * |

				| Unsloth Pro | 2 T4 | 3h 20m (25.4x) | 5h 43m (28.7x) | 1h 12m (25.7x) | 71h 40m (18.1x) * |

				| Unsloth Max | 2 T4 | 3h 4m (27.6x) | 5h 14m (31.3x) | 1h 6m (28.1x) | 54h 20m (23.9x) * |

				**Peak Memory Usage on a Multi GPU System (2 GPUs)**

				| System | GPU | Alpaca (52K) | LAION OIG (210K) | Open Assistant (10K) | SlimOrca (518K) * |

				| --- | --- | --- | --- | --- | --- |

				| Huggingface | 2 T4 | 8.4GB \| 6GB | 7.2GB \| 5.3GB | 14.3GB \| 6.6GB | 10.9GB \| 5.9GB * |

				| Unsloth Pro | 2 T4 | 7.7GB \| 4.9GB | 7.5GB \| 4.9GB | 8.5GB \| 4.9GB | 6.2GB \| 4.7GB * |

				| Unsloth Max | 2 T4 | 10.5GB \| 5GB | 10.6GB \| 5GB | 10.6GB \| 5GB | 10.5GB \| 5GB * |

				* Slim Orca `bsz=1` for all benchmarks since `bsz=2` OOMs. We can handle `bsz=2`, but we benchmark it with `bsz=1` for consistency.

				# Llama-Factory 3rd party benchmarking

				| Method | Bits | TGS | GRAM | Speed |

				| --- | --- | --- | --- | --- |

				| HF | 16 | 2392 | 18GB | 100% |

				| HF+FA2 | 16 | 2954 | 17GB | 123% |

				| Unsloth+FA2 | 16 | 4007 | 16GB | **168%** |

				| HF | 4 | 2415 | 9GB | 101% |

				| Unsloth+FA2 | 4 | 3726 | 7GB | **160%** |

				[Link](https://github.com/hiyouga/LLaMA-Factory/wiki/Performance-Comparison) to performance table. TGS: tokens per GPU per second. Model: LLaMA2-7B. GPU: NVIDIA A100 * 1. Batch size: 4. Gradient accumulation: 2. LoRA rank: 8. Max length: 1024.

				# How did we make it faster?

				Manual autograd, Triton kernels etc. See our [Benchmark Breakdown](https://unsloth.ai/blog/mistral-benchmark) for more info!

				# Troubleshooting

				1. Sometimes `bitsandbytes` or `xformers` does not link properly. Try running:

				## 📥 Advanced Installation

				The below advanced instructions are for Unsloth Studio. For Unsloth Core advanced installation, [view our docs](https://unsloth.ai/docs/get-started/install/pip-install#advanced-pip-installation).

				#### Developer installs: macOS, Linux, WSL:

				```bash

				!ldconfig /usr/lib64-nvidia

				git clone https://github.com/unslothai/unsloth

				cd unsloth

				./install.sh --local

				unsloth studio -H 0.0.0.0 -p 8888

				```

				Then to update :

				```bash

				unsloth studio update

				```

				2. Windows is not supported as of yet - we rely on Xformers and Triton support, so until both packages support Windows officially, Unsloth will then support Windows.

				3. If it doesn't install - maybe try updating `pip`.

				#### Developer installs: Windows PowerShell:

				```powershell

				git clone https://github.com/unslothai/unsloth.git

				cd unsloth

				Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass

				.\install.ps1 --local

				unsloth studio -H 0.0.0.0 -p 8888

				```

				Then to update :

				```bash

				unsloth studio update

				```

				#### Nightly: MacOS, Linux, WSL:

				```bash

				git clone https://github.com/unslothai/unsloth

				cd unsloth

				git checkout nightly

				./install.sh --local

				unsloth studio -H 0.0.0.0 -p 8888

				```

				Then to launch every time:

				```bash

				unsloth studio -H 0.0.0.0 -p 8888

				```

				# Full benchmarking tables

				Click  "Code" for a fully reproducible example.

				"Unsloth Equal" is a preview of our PRO version, with code stripped out. All settings and the loss curve remains identical.

				| 1 A100 40GB | Hugging Face | Flash Attention 2 | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |

				|--------------|-------------|-------------|-----------------|--------------|---------------|-------------|

				| Alpaca       | 1x          | 1.04x       | 1.98x           | 2.48x        | 5.32x         | **15.64x**      |

				| code | [Code](https://colab.research.google.com/drive/1u4dBeM-0vGNVmmO6X7cScAut-Hyt4KDF?usp=sharing) |    [Code](https://colab.research.google.com/drive/1fgTOxpMbVjloQBvZyz4lF4BacKSZOB2A?usp=sharing) |    [Code](https://colab.research.google.com/drive/1YIPY_18xm-K0iJDgvNkRoJsgkPMPAO3G?usp=sharing) |    [Code](https://colab.research.google.com/drive/1ANW8EFL3LVyTD7Gq4TkheC1Z7Rxw-rHp?usp=sharing) | | |

				| seconds| 1040 | 1001 | 525 | 419 | 196 | 67  |

				| memory MB| 18235 | 15365 | 9631 | 8525 | | |

				| % saved| | 15.74 | 47.18 | 53.25 | | | |

				#### Nightly: Windows:

				Run in Windows Powershell:

				```bash

				git clone https://github.com/unslothai/unsloth.git

				cd unsloth

				git checkout nightly

				Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass

				.\install.ps1 --local

				unsloth studio -H 0.0.0.0 -p 8888

				```

				Then to launch every time:

				```bash

				unsloth studio -H 0.0.0.0 -p 8888

				```

				#### Uninstall

				You can uninstall Unsloth Studio by deleting its install folder usually located under `$HOME/.unsloth/studio` on Mac/Linux/WSL and `%USERPROFILE%\.unsloth\studio` on Windows. Using the `rm -rf` commands will **delete everything**, including your history, cache:

				| 1 A100 40GB | Hugging Face | Flash Attention 2 | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |

				|--------------|-------------|-------------|-----------------|--------------|---------------|-------------|

				| LAION Chip2  | 1x          | 0.92x       | 1.61x           | 1.84x        | 7.05x         | **20.73x**      |

				| code |[Code](https://colab.research.google.com/drive/1gjL1TaKwc_xv2TcxJC8QWEWBG1msh3g2?usp=sharing) |    [Code](https://colab.research.google.com/drive/15vlPjMr8xDj5BFhGdqunGaOQSMqXPEXU?usp=sharing) |    [Code](https://colab.research.google.com/drive/1zPwvf-BmHyHlPMBxDsY8zS0BnQ-KKbCc?usp=sharing) |    [Code](https://colab.research.google.com/drive/1X2uHy-arRsZxqWHvKHwwW102JaMwChD2?usp=sharing) | | |

				| seconds| 581  | 631  | 361 | 315 | 82  | 28  |

				| memory MB| 7763  | 8047  | 7763 | 6441 | | |

				| % saved| | -3.66 | 0.00  | 17.03 | | | |

				* ​ **MacOS, WSL, Linux:** `rm -rf ~/.unsloth/studio`

				* ​ **Windows (PowerShell):** `Remove-Item -Recurse -Force "$HOME\.unsloth\studio"`

				For more info, [see our docs](https://unsloth.ai/docs/new/studio/install#uninstall).

				| 1 A100 40GB | Hugging Face | Flash Attention 2 | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |

				|--------------|-------------|-------------|-----------------|--------------|---------------|-------------|

				| OASST        | 1x          | 1.19x       | 2.17x           | 2.66x        | 5.04x         | **14.83x**      |

				| code |[Code](https://colab.research.google.com/drive/10NzDreFbuWELGUuBv0MOoC7y3MBewaNx?usp=sharing) |    [Code](https://colab.research.google.com/drive/1TwdkJ1sHsuEH-kgeCPqSFeCpOnCfz6Ou?usp=sharing) |    [Code](https://colab.research.google.com/drive/1AkwjUkOF0XeRBMT_S8Uhh74kitEsZHla?usp=sharing) |    [Code](https://colab.research.google.com/drive/1roMkp2UjbeK2t3DkNz50cRs1MT92RPFT?usp=sharing) | | |

				| seconds| 1852 | 1558 | 852 | 696 | 367 | 125 |

				| memory MB| 26431 | 16565 | 12267| 11223| | |

				| % saved| | 37.33 | 53.59 | 57.54 | | |

				#### Deleting model files

				| 1 A100 40GB | Hugging Face | Flash Attention 2 | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |

				|--------------|-------------|-------------|-----------------|--------------|---------------|-------------|

				| Slim Orca    | 1x          | 1.18x       | 2.22x           | 2.64x        | 5.04x         | **14.82x**      |

				| code |[Code](https://colab.research.google.com/drive/1UNo1xsMl8YH7xnWnIVjDFnCAPfc0RGgu?usp=sharing) |    [Code](https://colab.research.google.com/drive/1zbphER-SKhbSWGjHTfnBLPFyTgIVvaeH?usp=sharing) |    [Code](https://colab.research.google.com/drive/156si33585iv4Uh-VILFglUmIMrNCNuc2?usp=sharing) |    [Code](https://colab.research.google.com/drive/1_mhZy7dfl9jEnJRuJBZJ5y3OwW06jgQA?usp=sharing) | | |

				| seconds| 1824 | 1545 | 821 | 691 | 362 | 123 |

				| memory MB| 24557 | 15681 | 10595| 9007 | | |

				| % saved| | 36.14 | 56.86 | 63.32 | | |

				You can delete old model files either from the bin icon in model search or by removing the relevant cached model folder from the default Hugging Face cache directory. By default, HF uses:

				### Mistral 7b

				| 1 A100 40GB | Hugging Face | Flash Attention 2 | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |

				|--------------|-------------|-------------|-----------------|--------------|---------------|-------------|

				| Mistral 7B Slim Orca  | 1x | 1.15x        | 2.15x        | 2.53x            | 4.61x         | **13.69x**         |

				| code | [Code](https://colab.research.google.com/drive/1mePk3KzwTD81hr5mcNcs_AX3Kbg_Ha0x?usp=sharing) | [Code](https://colab.research.google.com/drive/1dgHxjvTmX6hb0bPcLp26RXSE6_n9DKj7?usp=sharing) | [Code](https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing) | [Code](https://colab.research.google.com/drive/18yOiyX0T81mTwZqOALFSCX_tSAqju6aD?usp=sharing) | |

				| seconds      | 1813        | 1571        | 842             | 718          | 393           | 132         |

				| memory MB    | 32853       | 19385       | 12465           | 10271        |          |        |

				| % saved| | 40.99      | 62.06       | 68.74           |         |          |

				* ​ **MacOS, Linux, WSL:** `~/.cache/huggingface/hub/`

				* ​ **Windows:** `%USERPROFILE%\.cache\huggingface\hub\`

				### CodeLlama 34b

				| 1 A100 40GB | Hugging Face | Flash Attention 2 | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |

				|--------------|-------------|-------------|-----------------|--------------|---------------|-------------|

				| Code Llama 34B   | OOM ❌         | 0.99x        | 1.87x           | 2.61x        | 4.27x      | 12.82x      |

				| code | [Code](https://colab.research.google.com/drive/1ykfz3BqrtC_AUFegCzUQjjfUNlxp6Otc?usp=sharing) | [Code](https://colab.research.google.com/drive/12ZypxQh7OC6kBXvWZI-5d05I4m-B_hoR?usp=sharing) | [Code](https://colab.research.google.com/drive/1gdHyAx8XJsz2yNV-DHvbHjR1iCef5Qmh?usp=sharing) | [Code](https://colab.research.google.com/drive/1fm7wqx9MJ0kRrwKOfmLkK1Rmw-pySahB?usp=sharing) | |

				| seconds      | 1953  | 1982  | 1043  | 748   | 458   | 152   |

				| memory MB    | 40000 | 33217 | 27413 | 22161 |       | |

				| % saved|    | 16.96| 31.47 | 44.60 |       | | |

				## 💚 Community and Links

				| Type                                                                                                                                      | Links                                                                          |

				| ----------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------ |

				| <img width="16" src="https://cdn.prod.website-files.com/6257adef93867e50d84d30e2/66e3d80db9971f10a9757c99_Symbol.svg" />  **Discord**                       | [Join Discord server](https://discord.com/invite/unsloth)                          |

				| <img width="15" src="https://redditinc.com/hs-fs/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" />  **r/unsloth Reddit**                       | [Join Reddit community](https://reddit.com/r/unsloth)                          |

				| 📚 **Documentation & Wiki**                                                                                                               | [Read Our Docs](https://unsloth.ai/docs)                                       |

				| <img width="13" src="https://upload.wikimedia.org/wikipedia/commons/0/09/X_(formerly_Twitter)_logo_late_2025.svg" />  **Twitter (aka X)** | [Follow us on X](https://twitter.com/unslothai)                                |

				| 🔮 **Our Models**                                                                                                                         | [Unsloth Catalog](https://unsloth.ai/docs/get-started/unsloth-model-catalog)   |

				| ✍️ **Blog**                                                                                                                               | [Read our Blogs](https://unsloth.ai/blog)                                      |

				### 1 Tesla T4

				### Citation

				| 1 T4 16GB  | Hugging Face | Flash Attention | Unsloth Open    | Unsloth Pro Equal | Unsloth Pro   | Unsloth Max |

				|--------------|-------------|-----------------|-----------------|---------------|---------------|-------------|

				| Alpaca       | 1x          | 1.09x           | 1.69x           | 1.79x         | 2.93x          | **8.3x**        |

				| code | [Code](https://colab.research.google.com/drive/1XpLIV4s8Bj5uryB-X2gqM88oRGHEGdaB?usp=sharing) |    [Code](https://colab.research.google.com/drive/1LyXu6CjuymQg6ddHX8g1dpUvrMa1nn4L?usp=sharing) |    [Code](https://colab.research.google.com/drive/1gsv4LpY7C32otl1rgRo5wXTk4HIitXoM?usp=sharing) |    [Code](https://colab.research.google.com/drive/1VtULwRQwhEnVdNryjm27zXfdSM1tNfFK?usp=sharing) | | |

				| seconds       | 1599        | 1468        | 942             | 894          | 545           | 193         |

				| memory MB       | 7199        | 7059        | 6459            | 5443         |               |             |

				| % saved        |         | 1.94        | 10.28           | 24.39        |               | |

				You can cite the Unsloth repo as follows:

				```bibtex

				@software{unsloth,

				  author = {Daniel Han, Michael Han and Unsloth team},

				  title = {Unsloth},

				  url = {https://github.com/unslothai/unsloth},

				  year = {2023}

				}

				```

				If you trained a model with 🦥Unsloth, you can use this cool sticker!   <img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/made with unsloth.png" width="200" align="center" />

				| 1 T4 16GB  | Hugging Face | Flash Attention | Unsloth Open    | Unsloth Pro Equal | Unsloth Pro   | Unsloth Max |

				|--------------|-------------|-----------------|-----------------|---------------|---------------|-------------|

				| LAION Chip2  | 1x          | 0.99x           | 1.80x           | 1.75x         | 4.15x         | **11.75x**      |

				| code | [Code](https://colab.research.google.com/drive/1EtdStADehE4FVJnU2Cu6O8p9jDYdqG2L?usp=sharing) |    [Code](https://colab.research.google.com/drive/1Ik4jO68odUiQIJ_szZ3xok5fk58WpA5Q?usp=sharing) |    [Code](https://colab.research.google.com/drive/1E2nR4V3bXIWBQIUE7uR39lYPr3UikzqH?usp=sharing) |    [Code](https://colab.research.google.com/drive/13jbj8D8FOt9KyXwZt9Yf2MsYkD8CyCVR?usp=sharing) | | |

				| seconds  | 952         | 955         | 529             | 543          | 229           | 81          | 

				| memory MB  | 6037        | 6033        | 5797            | 4855         |               | |

				| % saved   |         | 0.07        | 3.98            | 19.58        |               | |

				### License

				Unsloth uses a dual-licensing model of Apache 2.0 and AGPL-3.0. The core Unsloth package remains licensed under **[Apache 2.0](https://github.com/unslothai/unsloth?tab=Apache-2.0-1-ov-file)**, while certain optional components, such as the Unsloth Studio UI are licensed under the open-source license **[AGPL-3.0](https://github.com/unslothai/unsloth?tab=AGPL-3.0-2-ov-file)**.

				| 1 T4 16GB  | Hugging Face | Flash Attention | Unsloth Open    | Unsloth Pro Equal | Unsloth Pro   | Unsloth Max |

				|--------------|-------------|-----------------|-----------------|---------------|---------------|-------------|

				| OASST        | 1x          | 1.19x           | 1.95x           | 1.86x         | 2.58x         | **7.3x**        |

				| code | [Code](https://colab.research.google.com/drive/1aXzGgEM3yYB6SWy_XR81nQFWME40ksSy?usp=sharing) |    [Code](https://colab.research.google.com/drive/1-5MdIOp0cM0scC-CdRZhh8OYhnGHqct4?usp=sharing) |    [Code](https://colab.research.google.com/drive/1n-fgduZhRUsSjgpqNtVkXA3rSfE7iBdg?usp=sharing) |    [Code](https://colab.research.google.com/drive/1z_GlHr2M_bB4lQrPhdWC7dseZv23cBIy?usp=sharing) | | |

				| seconds        | 2640        | 2222        | 1355            | 1421         | 1024          | 362         |

				| memory MB        | 14827       | 10391       | 8413            | 7031         |               | |

				| % saved         |         | 29.92       | 43.26           | 52.58        |               | |

				This structure helps support ongoing Unsloth development while keeping the project open source and enabling the broader ecosystem to continue growing.

				| 1 T4 16GB  | Hugging Face | Flash Attention | Unsloth Open    | Unsloth Pro Equal | Unsloth Pro   | Unsloth Max |

				|--------------|-------------|-----------------|-----------------|---------------|---------------|-------------|

				| Slim Orca    | 1x          | 1.21x           | 1.77x           | 1.85x         | 2.71x         | **7.67x**       |

				| code | [Code](https://colab.research.google.com/drive/15yLlJx9IE84kzx7ikky45pRcarPyUtEs?usp=sharing) |    [Code](https://colab.research.google.com/drive/16IShIBmjKULWy87I-xURpj4nztTkAF13?usp=sharing) |    [Code](https://colab.research.google.com/drive/1CJG3XLg_OQpCz71eB7Uqx7wuK_n2b-a8?usp=sharing) |    [Code](https://colab.research.google.com/drive/1UmwuWHtlrC6MAfl9mX7A_TRfo5iSHDa-?usp=sharing) | | |

				| seconds    | 2735        | 2262        | 1545            | 1478         | 1009          | 356         |

				| memory MB    | 13933       | 10489       | 7661            | 6563         |               | |

				| % saved    |         | 24.72       | 45.02           | 52.90        |               | |

				### 2 Tesla T4s via DDP

				 | 2 T4 DDP | Hugging Face | Flash Attention | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |

				|--------------|----------|-------------|-----------------|--------------|---------------|-------------|

				| Alpaca       | 1x       | 0.99x       | 4.95x           | 4.44x        | 7.28x         | **20.61x**      |

				| code | [Code](https://www.kaggle.com/danielhanchen/hf-original-alpaca-t4-ddp) |   [Code](https://www.kaggle.com/danielhanchen/hf-sdpa-alpaca-t4-ddp) |   [Code](https://www.kaggle.com/danielhanchen/unsloth-alpaca-t4-ddp) | | |

				| seconds       | 9882     | 9946        | 1996            | 2227         | 1357          | 480         |

				| memory MB| 9176 | 9128 | 6904 | 6782 |  | |

				| % saved |     | 0.52 | 24.76 | 26.09 |  | | |

				 | 2 T4 DDP | Hugging Face | Flash Attention | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |

				|--------------|----------|-------------|-----------------|--------------|---------------|-------------|

				| LAION Chip2  | 1x       | 1.12x       | 5.28x           | 4.21x        | 10.01x        | **28.32x**      |

				| code | [Code](https://www.kaggle.com/danielhanchen/hf-original-laion-t4-ddp) |    [Code](https://www.kaggle.com/danielhanchen/hf-sdpa-laion-t4-ddp) |    [Code](https://www.kaggle.com/danielhanchen/unsloth-laion-t4-ddp) | | |

				| seconds  | 5418     | 4854        | 1027            | 1286         | 541           | 191         |

				| memory MB| 7316 | 7316 | 5732 | 5934 |  | |

				| % saved |     | 0.00 | 21.65 | 18.89 |  | |

				 | 2 T4 DDP | Hugging Face | Flash Attention | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |

				|--------------|----------|-------------|-----------------|--------------|---------------|-------------|

				| OASST (bsz=1)        | 1x       | 1.14x       | 5.56x           | 5.09x        | 5.64x         | **15.97x**      |

				| code | [Code](https://www.kaggle.com/danielhanchen/hf-original-oasst-bsz1-t4-ddp) |   [Code](https://www.kaggle.com/danielhanchen/hf-sdpa-oasst-bsz1-t4-ddp) |   [Code](https://www.kaggle.com/danielhanchen/unsloth-oasst-bsz1-t4-ddp) | | | |

				| seconds        | 4503 | 3955 | 811 | 885 | 798 | 282 |

				| memory MB | 11896 | 11628 | 6616 | 7105 |  | |

				| % saved |     | 2.25 | 44.38 | 40.27 |  | |

				 | 2 T4 DDP | Hugging Face | Flash Attention | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |

				|--------------|----------|-------------|-----------------|--------------|---------------|-------------|

				| Slim Orca (bsz=1)    | 1x       | 0.97x       | 5.54x           | 4.68x        | 6.88x         | **19.46x**       |

				| code | [Code](https://www.kaggle.com/danielhanchen/hf-original-slimorca-bsz1-t4-ddp) |    [Code](https://www.kaggle.com/danielhanchen/hf-sdpa-slimorca-bsz1-t4-ddp) |    [Code](https://www.kaggle.com/danielhanchen/unsloth-slimorca-bsz1-t4-ddp) | | |

				| seconds | 4042 | 4158 | 729 | 863 | 588 | 208 |

				| memory MB| 11010 | 11042 | 6492 | 7410 |  | |

				| % saved |     | -0.29| 41.04 | 32.70 |  | | |

				 | 2 T4 DDP | Hugging Face | Flash Attention | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |

				|--------------|----------|-------------|-----------------|--------------|---------------|-------------|

				| OASST (bsz=2)        | OOM ❌      | OOM ❌       |  ✓          | ✓         | ✓         | ✓ |

				| code | [Code](https://www.kaggle.com/danielhanchen/hf-original-oasst-t4-ddp) |    [Code](https://www.kaggle.com/danielhanchen/hf-sdpa-oasst-t4-ddp) |    [Code](https://www.kaggle.com/danielhanchen/unsloth-oasst-t4-ddp) | | | |

				| seconds        | OOM      | OOM         | 2719            | 3391         | 2794          | 987         |

				| memory MB| OOM  | OOM  | 8134 | 9600 |  | |

				| % saved | OOM  | OOM  |       |       |  | |

				 | 2 T4 DDP | Hugging Face | Flash Attention | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |

				|--------------|----------|-------------|-----------------|--------------|---------------|-------------|

				| Slim Orca (bsz=2)    | OOM ❌       | OOM ❌       |  ✓          | ✓        | ✓         |✓ |

				| code  | [Code](https://www.kaggle.com/danielhanchen/hf-original-slimorca-t4-ddp) |     [Code](https://www.kaggle.com/danielhanchen/hf-sdpa-slimorca-t4-ddp) |     [Code](https://www.kaggle.com/danielhanchen/unsloth-slimorca-t4-ddp) | | |

				| seconds    | OOM      | OOM         | 2990            | 3444         | 2351          | 831         |

				| memory MB| OOM  | OOM  | 7594 | 8881 | | |

				| % saved | OOM  | OOM  |       |       |  | |

				# Credits

				1. [RandomInternetPreson](https://github.com/RandomInternetPreson) for confirming WSL support

				2. [152334H](https://github.com/152334H) for experimental DPO support

				3. [atgctg](https://github.com/atgctg) for syntax highlighting

				<img src="./images/unsloth loading page render.png" width="300" />

				### Thank You to

				- The [llama.cpp library](https://github.com/ggml-org/llama.cpp) that lets users run and save models with Unsloth

				- The Hugging Face team and their libraries: [transformers](https://github.com/huggingface/transformers) and [TRL](https://github.com/huggingface/trl)

				- The Pytorch and [Torch AO](https://github.com/unslothai/unsloth/pull/3391) team for their contributions

				- NVIDIA for their [NeMo DataDesigner](https://github.com/NVIDIA-NeMo/DataDesigner) library and their contributions

				- And of course for every single person who has contributed or has used Unsloth!

									
										79

build.sh
									
										Normal file
									
										View file
										
				@ -0,0 +1,79 @@

				#!/usr/bin/env bash

				set -euo pipefail

				# 1. Build frontend (Vite outputs to dist/)

				cd studio/frontend

				# Clean stale dist to force a full rebuild

				rm -rf dist

				# Tailwind v4's oxide scanner respects .gitignore in parent directories.

				# Python venvs create a .gitignore with "*" (ignore everything), which

				# prevents Tailwind from scanning .tsx source files for class names.

				# Temporarily hide any such .gitignore during the build, then restore it.

				_HIDDEN_GITIGNORES=()

				_dir="$(pwd)"

				while [ "$_dir" != "/" ]; do

				    _dir="$(dirname "$_dir")"

				    if [ -f "$_dir/.gitignore" ] && grep -qx '\*' "$_dir/.gitignore" 2>/dev/null; then

				        mv "$_dir/.gitignore" "$_dir/.gitignore._twbuild"

				        _HIDDEN_GITIGNORES+=("$_dir/.gitignore")

				    fi

				done

				_restore_gitignores() {

				    for _gi in "${_HIDDEN_GITIGNORES[@]+"${_HIDDEN_GITIGNORES[@]}"}"; do

				        mv "${_gi}._twbuild" "$_gi" 2>/dev/null || true

				    done

				}

				trap _restore_gitignores EXIT

				# Use bun for install if available (faster), fall back to npm.

				_install_ok=false

				if command -v bun &>/dev/null; then

				    if bun install; then

				        _install_ok=true

				    else

				        echo "⚠ bun install failed, falling back to npm"

				        rm -rf node_modules

				    fi

				fi

				if [ "$_install_ok" != "true" ]; then

				    if ! npm install; then

				        echo "❌ ERROR: package install failed" >&2

				        exit 1

				    fi

				fi

				npm run build       # outputs to studio/frontend/dist/

				_restore_gitignores

				trap - EXIT

				# Validate CSS output -- catch truncated Tailwind builds before packaging

				MAX_CSS_SIZE=$(find dist/assets -name '*.css' -exec wc -c {} + 2>/dev/null | sort -n | tail -1 | awk '{print $1}')

				if [ -z "$MAX_CSS_SIZE" ]; then

				    echo "❌ ERROR: No CSS files were emitted into dist/assets."

				    echo "   The frontend build may have failed silently."

				    exit 1

				fi

				if [ "$MAX_CSS_SIZE" -lt 100000 ]; then

				    echo "❌ ERROR: Largest CSS file is only $((MAX_CSS_SIZE / 1024))KB (expected >100KB)."

				    echo "   Tailwind may not have scanned all source files."

				    echo "   Check for .gitignore files blocking the Tailwind oxide scanner."

				    exit 1

				fi

				echo "✅ Frontend CSS validated (${MAX_CSS_SIZE} bytes)"

				cd ../..

				# 2. Clean old artifacts

				rm -rf build dist *.egg-info

				# 3. Build wheel

				python -m build

				# 4. Optionally publish

				if [ "${1:-}" = "publish" ]; then

				    python -m twine upload dist/*

				fi

									
										7

cli.py
									
										Normal file
									
										View file
										
				@ -0,0 +1,7 @@

				# SPDX-License-Identifier: AGPL-3.0-only

				# Copyright 2026-present the Unsloth AI Inc. team. All rights reserved. See /studio/LICENSE.AGPL-3.0

				from unsloth_cli import app

				if __name__ == "__main__":

				    app()

BIN
images/Assistant.png Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 81 KiB

BIN
images/Documentation Button.png Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 12 KiB

BIN
images/Merge.png Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 31 KiB

BIN
images/Run.png Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 11 KiB

BIN
images/STUDIO BLACK LOGO.png Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 162 KiB

BIN
images/STUDIO WHITE LOGO.png Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 159 KiB

BIN
images/Terminal_Type.png Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 68 KiB

BIN
images/Where_Terminal.png Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 175 KiB

BIN
images/buy me a coffee button.png Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 18 KiB

BIN
images/documentation github button.png Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 12 KiB

BIN
images/documentation green button.png Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 12 KiB

BIN
images/documentation lighter.png Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 12 KiB

BIN
images/documentation white button.png Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 11 KiB

BIN
images/made with unsloth.png Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 69 KiB

BIN
images/ollama.png Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 66 KiB

BIN
images/start free finetune button.png Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 11 KiB

BIN
images/unsloth end.png Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 871 KiB

BIN
images/unsloth logo black text.png

View file

Binary file not shown.

Before

Width: | Height: | Size: 57 KiB

After

Width: | Height: | Size: 354 KiB

BIN
images/unsloth logo only.png

View file

Binary file not shown.

Before

Width: | Height: | Size: 56 KiB

After

Width: | Height: | Size: 59 KiB

BIN
images/unsloth logo white text.png

View file

Binary file not shown.

Before

Width: | Height: | Size: 58 KiB

After

Width: | Height: | Size: 351 KiB

BIN
images/unsloth new logo.png

View file

Binary file not shown.

Before

Width: | Height: | Size: 59 KiB

After

Width: | Height: | Size: 59 KiB

BIN
images/unsloth sticker.png Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 1.2 MiB

1125

install.ps1 Normal file

View file

File diff suppressed because it is too large Load diff

1671

install.sh Executable file

View file

File diff suppressed because it is too large Load diff

1183

pyproject.toml

View file

File diff suppressed because it is too large Load diff

									
										179

scripts/enforce_kwargs_spacing.py
									
										Executable file
									
										View file
										
				@ -0,0 +1,179 @@

				#!/usr/bin/env python3

				"""Ensure keyword arguments use spaces around '=', prune redundant pass statements."""

				from __future__ import annotations

				import ast

				import argparse

				import io

				import sys

				import tokenize

				from collections import defaultdict

				from pathlib import Path

				def enforce_spacing(text: str) -> tuple[str, bool]:

				    """Return updated text with keyword '=' padded by spaces, plus change flag."""

				    lines = text.splitlines(keepends=True)

				    if not lines:

				        return text, False

				    offsets: dict[int, int] = defaultdict(int)

				    changed = False

				    reader = io.StringIO(text).readline

				    for token in tokenize.generate_tokens(reader):

				        if token.type != tokenize.OP or token.string != "=":

				            continue

				        line_index = token.start[0] - 1

				        col = token.start[1] + offsets[line_index]

				        if line_index < 0 or line_index >= len(lines):

				            continue

				        line = lines[line_index]

				        if col >= len(line) or line[col] != "=":

				            continue

				        line_changed = False

				        # Insert a space before '=' when missing and not preceded by whitespace.

				        if col > 0 and line[col - 1] not in {" ", "\t"}:

				            line = f"{line[:col]} {line[col:]}"

				            offsets[line_index] += 1

				            col += 1

				            line_changed = True

				            changed = True

				        # Insert a space after '=' when missing and not followed by whitespace or newline.

				        next_index = col + 1

				        if next_index < len(line) and line[next_index] not in {" ", "\t", "\n", "\r"}:

				            line = f"{line[:next_index]} {line[next_index:]}"

				            offsets[line_index] += 1

				            line_changed = True

				            changed = True

				        if line_changed:

				            lines[line_index] = line

				    if not changed:

				        return text, False

				    return "".join(lines), True

				def remove_redundant_passes(text: str) -> tuple[str, bool]:

				    """Drop pass statements that share a block with other executable code."""

				    try:

				        tree = ast.parse(text)

				    except SyntaxError:

				        return text, False

				    redundant: list[ast.Pass] = []

				    def visit(node: ast.AST) -> None:

				        for attr in ("body", "orelse", "finalbody"):

				            value = getattr(node, attr, None)

				            if not isinstance(value, list) or len(value) <= 1:

				                continue

				            for stmt in value:

				                if isinstance(stmt, ast.Pass):

				                    redundant.append(stmt)

				            for stmt in value:

				                if isinstance(stmt, ast.AST):

				                    visit(stmt)

				        handlers = getattr(node, "handlers", None)

				        if handlers:

				            for handler in handlers:

				                visit(handler)

				    visit(tree)

				    if not redundant:

				        return text, False

				    lines = text.splitlines(keepends=True)

				    changed = False

				    for node in sorted(

				        redundant, key=lambda item: (item.lineno, item.col_offset), reverse=True

				    ):

				        start = node.lineno - 1

				        end = (node.end_lineno or node.lineno) - 1

				        if start >= len(lines):

				            continue

				        changed = True

				        if start == end:

				            line = lines[start]

				            col_start = node.col_offset

				            col_end = node.end_col_offset or (col_start + 4)

				            segment = line[:col_start] + line[col_end:]

				            lines[start] = segment if segment.strip() else ""

				            continue

				        # Defensive fall-back for unexpected multi-line 'pass'.

				        prefix = lines[start][: node.col_offset]

				        lines[start] = prefix if prefix.strip() else ""

				        for idx in range(start + 1, end):

				            lines[idx] = ""

				        suffix = lines[end][(node.end_col_offset or 0) :]

				        lines[end] = suffix

				    # Normalise to ensure lines end with newlines except at EOF.

				    result_lines: list[str] = []

				    for index, line in enumerate(lines):

				        if not line:

				            continue

				        if index < len(lines) - 1 and not line.endswith("\n"):

				            result_lines.append(f"{line}\n")

				        else:

				            result_lines.append(line)

				    return "".join(result_lines), changed

				def process_file(path: Path) -> bool:

				    try:

				        with tokenize.open(path) as handle:

				            original = handle.read()

				            encoding = handle.encoding

				    except (OSError, SyntaxError) as exc:  # SyntaxError from tokenize on invalid python

				        print(f"Failed to read {path}: {exc}", file=sys.stderr)

				        return False

				    updated, changed = enforce_spacing(original)

				    updated, removed = remove_redundant_passes(updated)

				    if changed or removed:

				        path.write_text(updated, encoding=encoding)

				        return True

				    return False

				def main(argv: list[str]) -> int:

				    parser = argparse.ArgumentParser(description=__doc__)

				    parser.add_argument("files", nargs="+", help="Python files to fix")

				    args = parser.parse_args(argv)

				    touched: list[Path] = []

				    self_path = Path(__file__).resolve()

				    for entry in args.files:

				        path = Path(entry)

				        # Skip modifying this script to avoid self-edit loops.

				        if path.resolve() == self_path:

				            continue

				        if not path.exists() or path.is_dir():

				            continue

				        if process_file(path):

				            touched.append(path)

				    if touched:

				        for path in touched:

				            print(f"Adjusted kwarg spacing in {path}")

				    return 0

				if __name__ == "__main__":

				    sys.exit(main(sys.argv[1:]))

									
										169

scripts/install_gemma4_mlx.sh
									
										Executable file
									
										View file
										
				@ -0,0 +1,169 @@

				#!/bin/bash

				set -e

				# ============================================================

				# Gemma 4 MLX — One-command setup + inference

				#

				# Usage:

				#   bash install_gemma4_mlx.sh [--venv-dir DIR]

				#

				# This script:

				#   1. Creates a Python virtual environment

				#   2. Installs uv, mlx-vlm, transformers

				# ============================================================

				# ── Output style (inspired by unsloth/install.sh) ─────────────

				RULE=""

				_rule_i=0

				while [ "$_rule_i" -lt 52 ]; do

				    RULE="${RULE}─"

				    _rule_i=$((_rule_i + 1))

				done

				if [ -n "${NO_COLOR:-}" ]; then

				    C_TITLE= C_DIM= C_OK= C_WARN= C_ERR= C_RST=

				elif [ -t 1 ] || [ -n "${FORCE_COLOR:-}" ]; then

				    _ESC="$(printf '\033')"

				    C_TITLE="${_ESC}[38;5;117m"

				    C_DIM="${_ESC}[38;5;245m"

				    C_OK="${_ESC}[38;5;108m"

				    C_WARN="${_ESC}[38;5;136m"

				    C_ERR="${_ESC}[91m"

				    C_RST="${_ESC}[0m"

				else

				    C_TITLE= C_DIM= C_OK= C_WARN= C_ERR= C_RST=

				fi

				step()    { printf "  ${C_DIM}%-18.18s${C_RST}${3:-$C_OK}%s${C_RST}\n" "$1" "$2"; }

				substep() { printf "  ${C_DIM}%-18s${2:-$C_DIM}%s${C_RST}\n" "" "$1"; }

				fail()    { step "error" "$1" "$C_ERR"; exit 1; }

				# ── Parse flags ───────────────────────────────────────────────

				VENV_DIR=""

				_next_is_venv=false

				for arg in "$@"; do

				    if [ "$_next_is_venv" = true ]; then

				        VENV_DIR="$arg"

				        _next_is_venv=false

				        continue

				    fi

				    case "$arg" in

				        --venv-dir)  _next_is_venv=true ;;

				    esac

				done

				# Default venv location

				if [ -z "$VENV_DIR" ]; then

				    VENV_DIR="$HOME/.unsloth/unsloth_gemma4_mlx"

				fi

				# ── Banner ────────────────────────────────────────────────────

				echo ""

				printf "  ${C_TITLE}%s${C_RST}\n" "💎 Gemma 4 MLX Installer"

				printf "  ${C_DIM}%s${C_RST}\n" "$RULE"

				echo ""

				# ── Platform check ────────────────────────────────────────────

				if [ "$(uname)" != "Darwin" ]; then

				    fail "MLX requires macOS with Apple Silicon. Detected: $(uname)"

				fi

				_ARCH=$(uname -m)

				if [ "$_ARCH" != "arm64" ]; then

				    step "warning" "Apple Silicon recommended (detected: $_ARCH)" "$C_WARN"

				fi

				step "platform" "macOS ($_ARCH)"

				# ── Detect Python ─────────────────────────────────────────────

				PYTHON=""

				for _candidate in python3.12 python3.11 python3.13 python3; do

				    if command -v "$_candidate" >/dev/null 2>&1; then

				        PYTHON="$_candidate"

				        break

				    fi

				done

				if [ -z "$PYTHON" ]; then

				    fail "Python 3 not found. Install via: brew install python@3.12"

				fi

				_PY_VERSION=$("$PYTHON" -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}')")

				step "python" "$PYTHON ($_PY_VERSION)"

				# ── Create virtual environment ────────────────────────────────

				if [ -x "$VENV_DIR/bin/python" ]; then

				    step "venv" "using existing environment"

				    substep "$VENV_DIR"

				else

				    step "venv" "creating virtual environment"

				    substep "$VENV_DIR"

				    mkdir -p "$(dirname "$VENV_DIR")"

				    "$PYTHON" -m venv "$VENV_DIR"

				fi

				# ── Install uv ───────────────────────────────────────────────

				if ! command -v uv >/dev/null 2>&1; then

				    step "uv" "installing uv package manager..."

				    _uv_tmp=$(mktemp)

				    curl -LsSf "https://astral.sh/uv/install.sh" -o "$_uv_tmp"

				    sh "$_uv_tmp" </dev/null >/dev/null 2>&1

				    rm -f "$_uv_tmp"

				    if [ -f "$HOME/.local/bin/env" ]; then

				        . "$HOME/.local/bin/env"

				    fi

				    export PATH="$HOME/.local/bin:$PATH"

				    substep "done"

				else

				    step "uv" "found $(uv --version 2>/dev/null || echo 'uv')"

				fi

				_VENV_PY="$VENV_DIR/bin/python"

				# ── Install dependencies ──────────────────────────────────────

				step "install" "installing mlx-vlm..."

				uv pip install --python "$_VENV_PY" -q mlx-vlm

				substep "done"

				step "install" "installing transformers>=5.5.0..."

				if uv pip install --python "$_VENV_PY" -q "transformers>=5.5.0" 2>/dev/null; then

				    substep "installed from PyPI"

				else

				    substep "PyPI install failed (Python <3.10?), trying GitHub..."

				    if uv pip install --python "$_VENV_PY" -q "git+https://github.com/huggingface/transformers.git@v5.5-release" 2>/dev/null; then

				        substep "installed from huggingface/transformers v5.5-release"

				    else

				        step "warning" "could not install transformers>=5.5.0" "$C_WARN"

				        substep "tried: PyPI, huggingface/transformers v5.5-release"

				    fi

				fi

				# ── Verify installation ──────────────────────────────────────

				if "$_VENV_PY" -c "import mlx_vlm"; then

				    substep "mlx-vlm verified"

				else

				    fail "Installation verification failed."

				fi

				# ── Done ──────────────────────────────────────────────────────

				echo ""

				printf "  ${C_TITLE}%s${C_RST}\n" "Gemma 4 MLX installed!"

				printf "  ${C_DIM}%s${C_RST}\n" "$RULE"

				echo ""

				step "available models" "unsloth/gemma-4-E2B-it-UD-MLX-4bit"

				substep "unsloth/gemma-4-E4B-it-UD-MLX-4bit"

				substep "unsloth/gemma-4-26b-a4b-it-UD-MLX-4bit"

				substep "unsloth/gemma-4-31b-it-UD-MLX-4bit"

				echo ""

				step "venv activate" "source ${VENV_DIR}/bin/activate"

				echo ""

				step "text chat" "python -m mlx_vlm.chat --model unsloth/gemma-4-E2B-it-UD-MLX-4bit"

				echo ""

				step "vision chat" "python -m mlx_vlm.chat --model unsloth/gemma-4-31b-it-UD-MLX-4bit"

				substep "Use /image path/to/image.jpg to load an image"

				echo ""

				step "gradio UI" "python -m mlx_vlm.chat_ui --model unsloth/gemma-4-31b-it-UD-MLX-4bit"

				echo ""

				printf "  ${C_DIM}%s${C_RST}\n" "$RULE"

				echo ""

									
										191

scripts/install_qwen3_6_mlx.sh
									
										Normal file
									
										View file
										
				@ -0,0 +1,191 @@

				#!/bin/bash

				set -e

				# ============================================================

				# Qwen3.6 MLX — One-command setup + inference

				#

				# Usage:

				#   bash install_qwen3_6_mlx.sh [--venv-dir DIR]

				#

				# This script:

				#   1. Creates a Python virtual environment

				#   2. Installs uv, mlx-vlm, transformers, torch, torchvision

				# ============================================================

				# ── Output style (inspired by unsloth/install.sh) ─────────────

				RULE=""

				_rule_i=0

				while [ "$_rule_i" -lt 52 ]; do

				    RULE="${RULE}─"

				    _rule_i=$((_rule_i + 1))

				done

				if [ -n "${NO_COLOR:-}" ]; then

				    C_TITLE= C_DIM= C_OK= C_WARN= C_ERR= C_RST=

				elif [ -t 1 ] || [ -n "${FORCE_COLOR:-}" ]; then

				    _ESC="$(printf '\033')"

				    C_TITLE="${_ESC}[38;5;117m"

				    C_DIM="${_ESC}[38;5;245m"

				    C_OK="${_ESC}[38;5;108m"

				    C_WARN="${_ESC}[38;5;136m"

				    C_ERR="${_ESC}[91m"

				    C_RST="${_ESC}[0m"

				else

				    C_TITLE= C_DIM= C_OK= C_WARN= C_ERR= C_RST=

				fi

				step()    { printf "  ${C_DIM}%-18.18s${C_RST}${3:-$C_OK}%s${C_RST}\n" "$1" "$2"; }

				substep() { printf "  ${C_DIM}%-18s${2:-$C_DIM}%s${C_RST}\n" "" "$1"; }

				fail()    { step "error" "$1" "$C_ERR"; exit 1; }

				# ── Parse flags ───────────────────────────────────────────────

				VENV_DIR=""

				_next_is_venv=false

				for arg in "$@"; do

				    if [ "$_next_is_venv" = true ]; then

				        VENV_DIR="$arg"

				        _next_is_venv=false

				        continue

				    fi

				    case "$arg" in

				        --venv-dir)  _next_is_venv=true ;;

				    esac

				done

				# Default venv location

				if [ -z "$VENV_DIR" ]; then

				    VENV_DIR="$HOME/.unsloth/unsloth_qwen3_6_mlx"

				fi

				# ── Banner ────────────────────────────────────────────────────

				echo ""

				printf "  ${C_TITLE}%s${C_RST}\n" "Qwen3.6 MLX Installer"

				printf "  ${C_DIM}%s${C_RST}\n" "$RULE"

				echo ""

				# ── Platform check ────────────────────────────────────────────

				if [ "$(uname)" != "Darwin" ]; then

				    fail "MLX requires macOS with Apple Silicon. Detected: $(uname)"

				fi

				_ARCH=$(uname -m)

				if [ "$_ARCH" != "arm64" ]; then

				    step "warning" "Apple Silicon recommended (detected: $_ARCH)" "$C_WARN"

				fi

				step "platform" "macOS ($_ARCH)"

				# ── Detect Python ─────────────────────────────────────────────

				PYTHON=""

				for _candidate in python3.12 python3.11 python3.13 python3; do

				    if command -v "$_candidate" >/dev/null 2>&1; then

				        PYTHON="$_candidate"

				        break

				    fi

				done

				if [ -z "$PYTHON" ]; then

				    fail "Python 3 not found. Install via: brew install python@3.12"

				fi

				_PY_VERSION=$("$PYTHON" -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}')")

				step "python" "$PYTHON ($_PY_VERSION)"

				# ── Create virtual environment ────────────────────────────────

				if [ -x "$VENV_DIR/bin/python" ]; then

				    step "venv" "using existing environment"

				    substep "$VENV_DIR"

				else

				    step "venv" "creating virtual environment"

				    substep "$VENV_DIR"

				    mkdir -p "$(dirname "$VENV_DIR")"

				    "$PYTHON" -m venv "$VENV_DIR"

				fi

				# ── Install uv ───────────────────────────────────────────────

				if ! command -v uv >/dev/null 2>&1; then

				    step "uv" "installing uv package manager..."

				    _uv_tmp=$(mktemp)

				    curl -LsSf "https://astral.sh/uv/install.sh" -o "$_uv_tmp"

				    sh "$_uv_tmp" </dev/null

				    rm -f "$_uv_tmp"

				    if [ -f "$HOME/.local/bin/env" ]; then

				        . "$HOME/.local/bin/env"

				    fi

				    export PATH="$HOME/.local/bin:$PATH"

				    substep "done"

				else

				    step "uv" "found $(uv --version 2>/dev/null || echo 'uv')"

				fi

				_VENV_PY="$VENV_DIR/bin/python"

				# ── Install dependencies ──────────────────────────────────────

				step "install" "installing mlx-vlm..."

				uv pip install --python "$_VENV_PY" -q mlx-vlm

				substep "done"

				step "install" "installing transformers>=5.2.0..."

				if uv pip install --python "$_VENV_PY" -q "transformers>=5.2.0"; then

				    substep "installed from PyPI"

				else

				    substep "PyPI install failed, trying GitHub..."

				    if uv pip install --python "$_VENV_PY" -q "git+https://github.com/huggingface/transformers.git"; then

				        substep "installed from huggingface/transformers main"

				    else

				        fail "Could not install transformers>=5.2.0 (required for Qwen3.5/3.6 model support). Please check your Python version (>=3.10 required) and network connection, then try again."

				    fi

				fi

				step "install" "installing torch + torchvision (needed for Qwen3 VL processor)..."

				uv pip install --python "$_VENV_PY" -q torch torchvision

				substep "done"

				# ── Verify installation ──────────────────────────────────────

				if "$_VENV_PY" -c "import mlx_vlm; import torch; import torchvision; import transformers"; then

				    substep "mlx-vlm + torch + transformers verified"

				else

				    fail "Installation verification failed. Please ensure Python >=3.10 and try again."

				fi

				# ── Apply patches for multi-turn image chat ──────────────────

				_PATCH_BASE="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/fix/ui-fix/unsloth/models/patches/mlx_vlm_qwen3_5"

				_SITE_PKGS=$("$_VENV_PY" -c "import site; print(site.getsitepackages()[0])")

				step "patch" "fixing multi-turn image chat..."

				if curl -sSLf "${_PATCH_BASE}/qwen3_5.py" -o "${_SITE_PKGS}/mlx_vlm/models/qwen3_5/qwen3_5.py"; then

				    substep "patched qwen3_5.py (MRoPE position reset)"

				else

				    step "warning" "failed to download qwen3_5.py patch — multi-turn image chat may not work" "$C_WARN"

				fi

				if curl -sSLf "${_PATCH_BASE}/generate.py" -o "${_SITE_PKGS}/mlx_vlm/generate.py"; then

				    substep "patched generate.py (mask trim on cache reuse)"

				else

				    step "warning" "failed to download generate.py patch — multi-turn image chat may not work" "$C_WARN"

				fi

				# Clear pycache so patches take effect

				find "${_SITE_PKGS}/mlx_vlm" -name "__pycache__" -type d -exec rm -rf {} + 2>/dev/null || true

				substep "cleared bytecode cache"

				# ── Done ──────────────────────────────────────────────────────

				echo ""

				printf "  ${C_TITLE}%s${C_RST}\n" "Qwen3.6 MLX installed!"

				printf "  ${C_DIM}%s${C_RST}\n" "$RULE"

				echo ""

				step "available models" "unsloth/Qwen3.6-35B-A3B-UD-MLX-3bit"

				substep "unsloth/Qwen3.6-35B-A3B-UD-MLX-4bit"

				substep "unsloth/Qwen3.6-35B-A3B-MLX-8bit"

				echo ""

				step "venv activate" "source ${VENV_DIR}/bin/activate"

				echo ""

				step "vision chat" "python -m mlx_vlm.chat --model unsloth/Qwen3.6-35B-A3B-UD-MLX-4bit"

				substep "Use /image path/to/image.jpg to load an image"

				echo ""

				step "gradio UI" "python -m mlx_vlm.chat_ui --model unsloth/Qwen3.6-35B-A3B-UD-MLX-4bit"

				echo ""

				printf "  ${C_DIM}%s${C_RST}\n" "$RULE"

				echo ""

									
										30

scripts/run_ruff_format.py
									
										Executable file
									
										View file
										
				@ -0,0 +1,30 @@

				#!/usr/bin/env python3

				"""Run `ruff format` followed by kwarg spacing enforcement."""

				from __future__ import annotations

				import subprocess

				import sys

				from pathlib import Path

				HERE = Path(__file__).resolve().parent

				def main(argv: list[str]) -> int:

				    files = [arg for arg in argv if Path(arg).exists()]

				    if not files:

				        return 0

				    ruff_cmd = [sys.executable, "-m", "ruff", "format", *files]

				    ruff_proc = subprocess.run(ruff_cmd)

				    if ruff_proc.returncode != 0:

				        return ruff_proc.returncode

				    spacing_script = HERE / "enforce_kwargs_spacing.py"

				    spacing_cmd = [sys.executable, str(spacing_script), *files]

				    spacing_proc = subprocess.run(spacing_cmd)

				    return spacing_proc.returncode

				if __name__ == "__main__":

				    raise SystemExit(main(sys.argv[1:]))

661

studio/LICENSE.AGPL-3.0 Normal file

View file

 @ -0,0 +1,661 @@
                     GNU AFFERO GENERAL PUBLIC LICENSE
                        Version 3, 19 November 2007
  Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
  Everyone is permitted to copy and distribute verbatim copies
  of this license document, but changing it is not allowed.
                             Preamble
   The GNU Affero General Public License is a free, copyleft license for
 software and other kinds of works, specifically designed to ensure
 cooperation with the community in the case of network server software.
   The licenses for most software and other practical works are designed
 to take away your freedom to share and change the works.  By contrast,
 our General Public Licenses are intended to guarantee your freedom to
 share and change all versions of a program--to make sure it remains free
 software for all its users.
   When we speak of free software, we are referring to freedom, not
 price.  Our General Public Licenses are designed to make sure that you
 have the freedom to distribute copies of free software (and charge for
 them if you wish), that you receive source code or can get it if you
 want it, that you can change the software or use pieces of it in new
 free programs, and that you know you can do these things.
   Developers that use our General Public Licenses protect your rights
 with two steps: (1) assert copyright on the software, and (2) offer
 you this License which gives you legal permission to copy, distribute
 and/or modify the software.
   A secondary benefit of defending all users' freedom is that
 improvements made in alternate versions of the program, if they
 receive widespread use, become available for other developers to
 incorporate.  Many developers of free software are heartened and
 encouraged by the resulting cooperation.  However, in the case of
 software used on network servers, this result may fail to come about.
 The GNU General Public License permits making a modified version and
 letting the public access it on a server without ever releasing its
 source code to the public.
   The GNU Affero General Public License is designed specifically to
 ensure that, in such cases, the modified source code becomes available
 to the community.  It requires the operator of a network server to
 provide the source code of the modified version running there to the
 users of that server.  Therefore, public use of a modified version, on
 a publicly accessible server, gives the public access to the source
 code of the modified version.
   An older license, called the Affero General Public License and
 published by Affero, was designed to accomplish similar goals.  This is
 a different license, not a version of the Affero GPL, but Affero has
 released a new version of the Affero GPL which permits relicensing under
 this license.
   The precise terms and conditions for copying, distribution and
 modification follow.
                        TERMS AND CONDITIONS
 . Definitions.
   "This License" refers to version 3 of the GNU Affero General Public License.
   "Copyright" also means copyright-like laws that apply to other kinds of
 works, such as semiconductor masks.
   "The Program" refers to any copyrightable work licensed under this
 License.  Each licensee is addressed as "you".  "Licensees" and
 "recipients" may be individuals or organizations.
   To "modify" a work means to copy from or adapt all or part of the work
 in a fashion requiring copyright permission, other than the making of an
 exact copy.  The resulting work is called a "modified version" of the
 earlier work or a work "based on" the earlier work.
   A "covered work" means either the unmodified Program or a work based
 on the Program.
   To "propagate" a work means to do anything with it that, without
 permission, would make you directly or secondarily liable for
 infringement under applicable copyright law, except executing it on a
 computer or modifying a private copy.  Propagation includes copying,
 distribution (with or without modification), making available to the
 public, and in some countries other activities as well.
   To "convey" a work means any kind of propagation that enables other
 parties to make or receive copies.  Mere interaction with a user through
 a computer network, with no transfer of a copy, is not conveying.
   An interactive user interface displays "Appropriate Legal Notices"
 to the extent that it includes a convenient and prominently visible
 feature that (1) displays an appropriate copyright notice, and (2)
 tells the user that there is no warranty for the work (except to the
 extent that warranties are provided), that licensees may convey the
 work under this License, and how to view a copy of this License.  If
 the interface presents a list of user commands or options, such as a
 menu, a prominent item in the list meets this criterion.
 . Source Code.
   The "source code" for a work means the preferred form of the work
 for making modifications to it.  "Object code" means any non-source
 form of a work.
   A "Standard Interface" means an interface that either is an official
 standard defined by a recognized standards body, or, in the case of
 interfaces specified for a particular programming language, one that
 is widely used among developers working in that language.
   The "System Libraries" of an executable work include anything, other
 than the work as a whole, that (a) is included in the normal form of
 packaging a Major Component, but which is not part of that Major
 Component, and (b) serves only to enable use of the work with that
 Major Component, or to implement a Standard Interface for which an
 implementation is available to the public in source code form.  A
 "Major Component", in this context, means a major essential component
 (kernel, window system, and so on) of the specific operating system
 (if any) on which the executable work runs, or a compiler used to
 produce the work, or an object code interpreter used to run it.
   The "Corresponding Source" for a work in object code form means all
 the source code needed to generate, install, and (for an executable
 work) run the object code and to modify the work, including scripts to
 control those activities.  However, it does not include the work's
 System Libraries, or general-purpose tools or generally available free
 programs which are used unmodified in performing those activities but
 which are not part of the work.  For example, Corresponding Source
 includes interface definition files associated with source files for
 the work, and the source code for shared libraries and dynamically
 linked subprograms that the work is specifically designed to require,
 such as by intimate data communication or control flow between those
 subprograms and other parts of the work.
   The Corresponding Source need not include anything that users
 can regenerate automatically from other parts of the Corresponding
 Source.
   The Corresponding Source for a work in source code form is that
 same work.
 . Basic Permissions.
   All rights granted under this License are granted for the term of
 copyright on the Program, and are irrevocable provided the stated
 conditions are met.  This License explicitly affirms your unlimited
 permission to run the unmodified Program.  The output from running a
 covered work is covered by this License only if the output, given its
 content, constitutes a covered work.  This License acknowledges your
 rights of fair use or other equivalent, as provided by copyright law.
   You may make, run and propagate covered works that you do not
 convey, without conditions so long as your license otherwise remains
 in force.  You may convey covered works to others for the sole purpose
 of having them make modifications exclusively for you, or provide you
 with facilities for running those works, provided that you comply with
 the terms of this License in conveying all material for which you do
 not control copyright.  Those thus making or running the covered works
 for you must do so exclusively on your behalf, under your direction
 and control, on terms that prohibit them from making any copies of
 your copyrighted material outside their relationship with you.
   Conveying under any other circumstances is permitted solely under
 the conditions stated below.  Sublicensing is not allowed; section 10
 makes it unnecessary.
 . Protecting Users' Legal Rights From Anti-Circumvention Law.
   No covered work shall be deemed part of an effective technological
 measure under any applicable law fulfilling obligations under article
 of the WIPO copyright treaty adopted on 20 December 1996, or
 similar laws prohibiting or restricting circumvention of such
 measures.
   When you convey a covered work, you waive any legal power to forbid
 circumvention of technological measures to the extent such circumvention
 is effected by exercising rights under this License with respect to
 the covered work, and you disclaim any intention to limit operation or
 modification of the work as a means of enforcing, against the work's
 users, your or third parties' legal rights to forbid circumvention of
 technological measures.
 . Conveying Verbatim Copies.
   You may convey verbatim copies of the Program's source code as you
 receive it, in any medium, provided that you conspicuously and
 appropriately publish on each copy an appropriate copyright notice;
 keep intact all notices stating that this License and any
 non-permissive terms added in accord with section 7 apply to the code;
 keep intact all notices of the absence of any warranty; and give all
 recipients a copy of this License along with the Program.
   You may charge any price or no price for each copy that you convey,
 and you may offer support or warranty protection for a fee.
 . Conveying Modified Source Versions.
   You may convey a work based on the Program, or the modifications to
 produce it from the Program, in the form of source code under the
 terms of section 4, provided that you also meet all of these conditions:
     a) The work must carry prominent notices stating that you modified
     it, and giving a relevant date.
     b) The work must carry prominent notices stating that it is
     released under this License and any conditions added under section
 .  This requirement modifies the requirement in section 4 to
     "keep intact all notices".
     c) You must license the entire work, as a whole, under this
     License to anyone who comes into possession of a copy.  This
     License will therefore apply, along with any applicable section 7
     additional terms, to the whole of the work, and all its parts,
     regardless of how they are packaged.  This License gives no
     permission to license the work in any other way, but it does not
     invalidate such permission if you have separately received it.
     d) If the work has interactive user interfaces, each must display
     Appropriate Legal Notices; however, if the Program has interactive
     interfaces that do not display Appropriate Legal Notices, your
     work need not make them do so.
   A compilation of a covered work with other separate and independent
 works, which are not by their nature extensions of the covered work,
 and which are not combined with it such as to form a larger program,
 in or on a volume of a storage or distribution medium, is called an
 "aggregate" if the compilation and its resulting copyright are not
 used to limit the access or legal rights of the compilation's users
 beyond what the individual works permit.  Inclusion of a covered work
 in an aggregate does not cause this License to apply to the other
 parts of the aggregate.
 . Conveying Non-Source Forms.
   You may convey a covered work in object code form under the terms
 of sections 4 and 5, provided that you also convey the
 machine-readable Corresponding Source under the terms of this License,
 in one of these ways:
     a) Convey the object code in, or embodied in, a physical product
     (including a physical distribution medium), accompanied by the
     Corresponding Source fixed on a durable physical medium
     customarily used for software interchange.
     b) Convey the object code in, or embodied in, a physical product
     (including a physical distribution medium), accompanied by a
     written offer, valid for at least three years and valid for as
     long as you offer spare parts or customer support for that product
     model, to give anyone who possesses the object code either (1) a
     copy of the Corresponding Source for all the software in the
     product that is covered by this License, on a durable physical
     medium customarily used for software interchange, for a price no
     more than your reasonable cost of physically performing this
     conveying of source, or (2) access to copy the
     Corresponding Source from a network server at no charge.
     c) Convey individual copies of the object code with a copy of the
     written offer to provide the Corresponding Source.  This
     alternative is allowed only occasionally and noncommercially, and
     only if you received the object code with such an offer, in accord
     with subsection 6b.
     d) Convey the object code by offering access from a designated
     place (gratis or for a charge), and offer equivalent access to the
     Corresponding Source in the same way through the same place at no
     further charge.  You need not require recipients to copy the
     Corresponding Source along with the object code.  If the place to
     copy the object code is a network server, the Corresponding Source
     may be on a different server (operated by you or a third party)
     that supports equivalent copying facilities, provided you maintain
     clear directions next to the object code saying where to find the
     Corresponding Source.  Regardless of what server hosts the
     Corresponding Source, you remain obligated to ensure that it is
     available for as long as needed to satisfy these requirements.
     e) Convey the object code using peer-to-peer transmission, provided
     you inform other peers where the object code and Corresponding
     Source of the work are being offered to the general public at no
     charge under subsection 6d.
   A separable portion of the object code, whose source code is excluded
 from the Corresponding Source as a System Library, need not be
 included in conveying the object code work.
   A "User Product" is either (1) a "consumer product", which means any
 tangible personal property which is normally used for personal, family,
 or household purposes, or (2) anything designed or sold for incorporation
 into a dwelling.  In determining whether a product is a consumer product,
 doubtful cases shall be resolved in favor of coverage.  For a particular
 product received by a particular user, "normally used" refers to a
 typical or common use of that class of product, regardless of the status
 of the particular user or of the way in which the particular user
 actually uses, or expects or is expected to use, the product.  A product
 is a consumer product regardless of whether the product has substantial
 commercial, industrial or non-consumer uses, unless such uses represent
 the only significant mode of use of the product.
   "Installation Information" for a User Product means any methods,
 procedures, authorization keys, or other information required to install
 and execute modified versions of a covered work in that User Product from
 a modified version of its Corresponding Source.  The information must
 suffice to ensure that the continued functioning of the modified object
 code is in no case prevented or interfered with solely because
 modification has been made.
   If you convey an object code work under this section in, or with, or
 specifically for use in, a User Product, and the conveying occurs as
 part of a transaction in which the right of possession and use of the
 User Product is transferred to the recipient in perpetuity or for a
 fixed term (regardless of how the transaction is characterized), the
 Corresponding Source conveyed under this section must be accompanied
 by the Installation Information.  But this requirement does not apply
 if neither you nor any third party retains the ability to install
 modified object code on the User Product (for example, the work has
 been installed in ROM).
   The requirement to provide Installation Information does not include a
 requirement to continue to provide support service, warranty, or updates
 for a work that has been modified or installed by the recipient, or for
 the User Product in which it has been modified or installed.  Access to a
 network may be denied when the modification itself materially and
 adversely affects the operation of the network or violates the rules and
 protocols for communication across the network.
   Corresponding Source conveyed, and Installation Information provided,
 in accord with this section must be in a format that is publicly
 documented (and with an implementation available to the public in
 source code form), and must require no special password or key for
 unpacking, reading or copying.
 . Additional Terms.
   "Additional permissions" are terms that supplement the terms of this
 License by making exceptions from one or more of its conditions.
 Additional permissions that are applicable to the entire Program shall
 be treated as though they were included in this License, to the extent
 that they are valid under applicable law.  If additional permissions
 apply only to part of the Program, that part may be used separately
 under those permissions, but the entire Program remains governed by
 this License without regard to the additional permissions.
   When you convey a copy of a covered work, you may at your option
 remove any additional permissions from that copy, or from any part of
 it.  (Additional permissions may be written to require their own
 removal in certain cases when you modify the work.)  You may place
 additional permissions on material, added by you to a covered work,
 for which you have or can give appropriate copyright permission.
   Notwithstanding any other provision of this License, for material you
 add to a covered work, you may (if authorized by the copyright holders of
 that material) supplement the terms of this License with terms:
     a) Disclaiming warranty or limiting liability differently from the
     terms of sections 15 and 16 of this License; or
     b) Requiring preservation of specified reasonable legal notices or
     author attributions in that material or in the Appropriate Legal
     Notices displayed by works containing it; or
     c) Prohibiting misrepresentation of the origin of that material, or
     requiring that modified versions of such material be marked in
     reasonable ways as different from the original version; or
     d) Limiting the use for publicity purposes of names of licensors or
     authors of the material; or
     e) Declining to grant rights under trademark law for use of some
     trade names, trademarks, or service marks; or
     f) Requiring indemnification of licensors and authors of that
     material by anyone who conveys the material (or modified versions of
     it) with contractual assumptions of liability to the recipient, for
     any liability that these contractual assumptions directly impose on
     those licensors and authors.
   All other non-permissive additional terms are considered "further
 restrictions" within the meaning of section 10.  If the Program as you
 received it, or any part of it, contains a notice stating that it is
 governed by this License along with a term that is a further
 restriction, you may remove that term.  If a license document contains
 a further restriction but permits relicensing or conveying under this
 License, you may add to a covered work material governed by the terms
 of that license document, provided that the further restriction does
 not survive such relicensing or conveying.
   If you add terms to a covered work in accord with this section, you
 must place, in the relevant source files, a statement of the
 additional terms that apply to those files, or a notice indicating
 where to find the applicable terms.
   Additional terms, permissive or non-permissive, may be stated in the
 form of a separately written license, or stated as exceptions;
 the above requirements apply either way.
 . Termination.
   You may not propagate or modify a covered work except as expressly
 provided under this License.  Any attempt otherwise to propagate or
 modify it is void, and will automatically terminate your rights under
 this License (including any patent licenses granted under the third
 paragraph of section 11).
   However, if you cease all violation of this License, then your
 license from a particular copyright holder is reinstated (a)
 provisionally, unless and until the copyright holder explicitly and
 finally terminates your license, and (b) permanently, if the copyright
 holder fails to notify you of the violation by some reasonable means
 prior to 60 days after the cessation.
   Moreover, your license from a particular copyright holder is
 reinstated permanently if the copyright holder notifies you of the
 violation by some reasonable means, this is the first time you have
 received notice of violation of this License (for any work) from that
 copyright holder, and you cure the violation prior to 30 days after
 your receipt of the notice.
   Termination of your rights under this section does not terminate the
 licenses of parties who have received copies or rights from you under
 this License.  If your rights have been terminated and not permanently
 reinstated, you do not qualify to receive new licenses for the same
 material under section 10.
 . Acceptance Not Required for Having Copies.
   You are not required to accept this License in order to receive or
 run a copy of the Program.  Ancillary propagation of a covered work
 occurring solely as a consequence of using peer-to-peer transmission
 to receive a copy likewise does not require acceptance.  However,
 nothing other than this License grants you permission to propagate or
 modify any covered work.  These actions infringe copyright if you do
 not accept this License.  Therefore, by modifying or propagating a
 covered work, you indicate your acceptance of this License to do so.
 . Automatic Licensing of Downstream Recipients.
   Each time you convey a covered work, the recipient automatically
 receives a license from the original licensors, to run, modify and
 propagate that work, subject to this License.  You are not responsible
 for enforcing compliance by third parties with this License.
   An "entity transaction" is a transaction transferring control of an
 organization, or substantially all assets of one, or subdividing an
 organization, or merging organizations.  If propagation of a covered
 work results from an entity transaction, each party to that
 transaction who receives a copy of the work also receives whatever
 licenses to the work the party's predecessor in interest had or could
 give under the previous paragraph, plus a right to possession of the
 Corresponding Source of the work from the predecessor in interest, if
 the predecessor has it or can get it with reasonable efforts.
   You may not impose any further restrictions on the exercise of the
 rights granted or affirmed under this License.  For example, you may
 not impose a license fee, royalty, or other charge for exercise of
 rights granted under this License, and you may not initiate litigation
 (including a cross-claim or counterclaim in a lawsuit) alleging that
 any patent claim is infringed by making, using, selling, offering for
 sale, or importing the Program or any portion of it.
 . Patents.
   A "contributor" is a copyright holder who authorizes use under this
 License of the Program or a work on which the Program is based.  The
 work thus licensed is called the contributor's "contributor version".
   A contributor's "essential patent claims" are all patent claims
 owned or controlled by the contributor, whether already acquired or
 hereafter acquired, that would be infringed by some manner, permitted
 by this License, of making, using, or selling its contributor version,
 but do not include claims that would be infringed only as a
 consequence of further modification of the contributor version.  For
 purposes of this definition, "control" includes the right to grant
 patent sublicenses in a manner consistent with the requirements of
 this License.
   Each contributor grants you a non-exclusive, worldwide, royalty-free
 patent license under the contributor's essential patent claims, to
 make, use, sell, offer for sale, import and otherwise run, modify and
 propagate the contents of its contributor version.
   In the following three paragraphs, a "patent license" is any express
 agreement or commitment, however denominated, not to enforce a patent
 (such as an express permission to practice a patent or covenant not to
 sue for patent infringement).  To "grant" such a patent license to a
 party means to make such an agreement or commitment not to enforce a
 patent against the party.
   If you convey a covered work, knowingly relying on a patent license,
 and the Corresponding Source of the work is not available for anyone
 to copy, free of charge and under the terms of this License, through a
 publicly available network server or other readily accessible means,
 then you must either (1) cause the Corresponding Source to be so
 available, or (2) arrange to deprive yourself of the benefit of the
 patent license for this particular work, or (3) arrange, in a manner
 consistent with the requirements of this License, to extend the patent
 license to downstream recipients.  "Knowingly relying" means you have
 actual knowledge that, but for the patent license, your conveying the
 covered work in a country, or your recipient's use of the covered work
 in a country, would infringe one or more identifiable patents in that
 country that you have reason to believe are valid.
   If, pursuant to or in connection with a single transaction or
 arrangement, you convey, or propagate by procuring conveyance of, a
 covered work, and grant a patent license to some of the parties
 receiving the covered work authorizing them to use, propagate, modify
 or convey a specific copy of the covered work, then the patent license
 you grant is automatically extended to all recipients of the covered
 work and works based on it.
   A patent license is "discriminatory" if it does not include within
 the scope of its coverage, prohibits the exercise of, or is
 conditioned on the non-exercise of one or more of the rights that are
 specifically granted under this License.  You may not convey a covered
 work if you are a party to an arrangement with a third party that is
 in the business of distributing software, under which you make payment
 to the third party based on the extent of your activity of conveying
 the work, and under which the third party grants, to any of the
 parties who would receive the covered work from you, a discriminatory
 patent license (a) in connection with copies of the covered work
 conveyed by you (or copies made from those copies), or (b) primarily
 for and in connection with specific products or compilations that
 contain the covered work, unless you entered into that arrangement,
 or that patent license was granted, prior to 28 March 2007.
   Nothing in this License shall be construed as excluding or limiting
 any implied license or other defenses to infringement that may
 otherwise be available to you under applicable patent law.
 . No Surrender of Others' Freedom.
   If conditions are imposed on you (whether by court order, agreement or
 otherwise) that contradict the conditions of this License, they do not
 excuse you from the conditions of this License.  If you cannot convey a
 covered work so as to satisfy simultaneously your obligations under this
 License and any other pertinent obligations, then as a consequence you may
 not convey it at all.  For example, if you agree to terms that obligate you
 to collect a royalty for further conveying from those to whom you convey
 the Program, the only way you could satisfy both those terms and this
 License would be to refrain entirely from conveying the Program.
 . Remote Network Interaction; Use with the GNU General Public License.
   Notwithstanding any other provision of this License, if you modify the
 Program, your modified version must prominently offer all users
 interacting with it remotely through a computer network (if your version
 supports such interaction) an opportunity to receive the Corresponding
 Source of your version by providing access to the Corresponding Source
 from a network server at no charge, through some standard or customary
 means of facilitating copying of software.  This Corresponding Source
 shall include the Corresponding Source for any work covered by version 3
 of the GNU General Public License that is incorporated pursuant to the
 following paragraph.
   Notwithstanding any other provision of this License, you have
 permission to link or combine any covered work with a work licensed
 under version 3 of the GNU General Public License into a single
 combined work, and to convey the resulting work.  The terms of this
 License will continue to apply to the part which is the covered work,
 but the work with which it is combined will remain governed by version
 of the GNU General Public License.
 . Revised Versions of this License.
   The Free Software Foundation may publish revised and/or new versions of
 the GNU Affero General Public License from time to time.  Such new versions
 will be similar in spirit to the present version, but may differ in detail to
 address new problems or concerns.
   Each version is given a distinguishing version number.  If the
 Program specifies that a certain numbered version of the GNU Affero General
 Public License "or any later version" applies to it, you have the
 option of following the terms and conditions either of that numbered
 version or of any later version published by the Free Software
 Foundation.  If the Program does not specify a version number of the
 GNU Affero General Public License, you may choose any version ever published
 by the Free Software Foundation.
   If the Program specifies that a proxy can decide which future
 versions of the GNU Affero General Public License can be used, that proxy's
 public statement of acceptance of a version permanently authorizes you
 to choose that version for the Program.
   Later license versions may give you additional or different
 permissions.  However, no additional obligations are imposed on any
 author or copyright holder as a result of your choosing to follow a
 later version.
 . Disclaimer of Warranty.
   THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
 APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
 HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
 OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
 THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
 IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
 ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
 . Limitation of Liability.
   IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
 WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
 THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
 GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
 USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
 DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
 PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
 EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
 SUCH DAMAGES.
 . Interpretation of Sections 15 and 16.
   If the disclaimer of warranty and limitation of liability provided
 above cannot be given local legal effect according to their terms,
 reviewing courts shall apply local law that most closely approximates
 an absolute waiver of all civil liability in connection with the
 Program, unless a warranty or assumption of liability accompanies a
 copy of the Program in return for a fee.
                      END OF TERMS AND CONDITIONS
             How to Apply These Terms to Your New Programs
   If you develop a new program, and you want it to be of the greatest
 possible use to the public, the best way to achieve this is to make it
 free software which everyone can redistribute and change under these terms.
   To do so, attach the following notices to the program.  It is safest
 to attach them to the start of each source file to most effectively
 state the exclusion of warranty; and each file should have at least
 the "copyright" line and a pointer to where the full notice is found.
     <one line to give the program's name and a brief idea of what it does.>
     Copyright (C) <year>  <name of author>
     This program is free software: you can redistribute it and/or modify
     it under the terms of the GNU Affero General Public License as published by
     the Free Software Foundation, either version 3 of the License, or
     (at your option) any later version.
     This program is distributed in the hope that it will be useful,
     but WITHOUT ANY WARRANTY; without even the implied warranty of
     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
     GNU Affero General Public License for more details.
     You should have received a copy of the GNU Affero General Public License
     along with this program.  If not, see <https://www.gnu.org/licenses/>.
 Also add information on how to contact you by electronic and paper mail.
   If your software can interact with users remotely through a computer
 network, you should also make sure that it provides a way for users to
 get its source.  For example, if your program is a web application, its
 interface could display a "Source" link that leads users to an archive
 of the code.  There are many ways you could offer source, and different
 solutions will be better for different programs; see section 13 for the
 specific requirements.
   You should also get your employer (if you work as a programmer) or school,
 if any, to sign a "copyright disclaimer" for the program, if necessary.
 For more information on this, and how to apply and follow the GNU AGPL, see
 <https://www.gnu.org/licenses/>.

153

studio/Unsloth_Studio_Colab.ipynb Normal file

View file

 @ -0,0 +1,153 @@
 {
  "cells": [
   {
    "cell_type": "markdown",
    "metadata": {
     "id": "view-in-github",
     "colab_type": "text"
    },
    "source": [
     "<a href=\"https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "6b87de59",
    "metadata": {
     "id": "6b87de59"
    },
    "source": [
     "To run this, press \"*Runtime*\" and press \"*Run all*\" on a **free** Tesla T4 Google Colab instance!\n",
     "<div class=\"align-center\">\n",
     "<a href=\"https://unsloth.ai/\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png\" width=\"115\"></a>\n",
     "<a href=\"https://discord.gg/unsloth\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/Discord button.png\" width=\"145\"></a>\n",
     "<a href=\"https://unsloth.ai/docs/\"><img src=\"https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true\" width=\"125\"></a> Join Discord if you need help + ⭐ <i>Star us on <a href=\"https://github.com/unslothai/unsloth\">Github</a> </i> ⭐\n",
     "</div>\n",
     "\n",
     "To install Unsloth Studio on your local device, follow [our guide](https://unsloth.ai/docs/new/unsloth-studio/install). Unsloth Studio is licensed [AGPL-3.0](https://github.com/unslothai/unsloth/blob/main/studio/LICENSE.AGPL-3.0).\n",
     "\n",
     "### Unsloth Studio\n",
     "\n",
     "Train and run open models with [**Unsloth Studio**](https://unsloth.ai/docs/new/unsloth-studio/start). NEW! Installation should now only take 2 mins!\n",
     "\n",
     "\n",
     "We are actively working on making Unsloth Studio install on Colab T4 GPUs faster.\n",
     "\n",
     "[Features](https://unsloth.ai/docs/new/unsloth-studio#features) • [Quickstart](https://unsloth.ai/docs/new/unsloth-studio/start) • [Data Recipes](https://unsloth.ai/docs/new/unsloth-studio/data-recipe) • [Studio Chat](https://unsloth.ai/docs/new/unsloth-studio/chat) • [Export](https://unsloth.ai/docs/new/unsloth-studio/export)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "e4206349",
    "metadata": {
     "id": "e4206349"
    },
    "source": [
     "<p align=\"left\"><img src=\"https://github.com/unslothai/unsloth/raw/main/studio/frontend/public/studio%20github%20landscape%20colab%20display.png\" width=\"600\"></p>"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "27da2957",
    "metadata": {
     "id": "27da2957"
    },
    "source": [
     "### Setup: Clone repo and run setup"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "27e68f91",
    "metadata": {
     "id": "27e68f91"
    },
    "outputs": [],
    "source": "!git clone --depth 1 --branch main https://github.com/unslothai/unsloth.git\n%cd /content/unsloth\n!chmod +x studio/setup.sh && ./studio/setup.sh"
   },
   {
    "cell_type": "markdown",
    "id": "3e1771a9",
    "metadata": {
     "id": "3e1771a9"
    },
    "source": [
     "### Start Unsloth Studio"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "277e431e",
    "metadata": {
     "id": "277e431e"
    },
    "outputs": [],
    "source": [
     "import sys, time\n",
     "sys.path.insert(0, \"/content/unsloth/studio/backend\")\n",
     "from colab import start\n",
     "start()"
    ]
   },
   {
    "cell_type": "code",
    "source": [
     "from google.colab import output\n",
     "output.serve_kernel_port_as_iframe(8888, height = 1200, width = \"100%\")\n",
     "for _ in range(10000): time.sleep(300), print(\"=\", end = \"\")"
    ],
    "metadata": {
     "id": "wb9UELh--XzX"
    },
    "id": "wb9UELh--XzX",
    "execution_count": null,
    "outputs": []
   },
   {
    "cell_type": "markdown",
    "id": "f2b0c6a1",
    "metadata": {
     "id": "f2b0c6a1"
    },
    "source": [
     "And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n",
     "\n",
     "Some other resources:\n",
     "1. Looking to use Unsloth locally? Read our [Installation Guide](https://unsloth.ai/docs/get-started/install) for details on installing Unsloth on Windows, Docker, AMD, Intel GPUs.\n",
     "2. Learn how to do Reinforcement Learning with our [RL Guide and notebooks](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide).\n",
     "3. Read our guides and notebooks for [Text-to-speech (TTS)](https://unsloth.ai/docs/basics/text-to-speech-tts-fine-tuning) and [vision](https://unsloth.ai/docs/basics/vision-fine-tuning) model support.\n",
     "4. Explore our [LLM Tutorials Directory](https://unsloth.ai/docs/models/tutorials-how-to-fine-tune-and-run-llms) to find dedicated guides for each model.\n",
     "5. Need help with Inference? Read our [Inference & Deployment page](https://unsloth.ai/docs/basics/inference-and-deployment) for details on using vLLM, llama.cpp, Ollama etc.\n",
     "\n",
     "<div class=\"align-center\">\n",
     "  <a href=\"https://unsloth.ai\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png\" width=\"115\"></a>\n",
     "  <a href=\"https://discord.gg/unsloth\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/Discord.png\" width=\"145\"></a>\n",
     "  <a href=\"https://unsloth.ai/docs/\"><img src=\"https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true\" width=\"125\"></a>\n",
     "\n",
     "  Join Discord if you need help + ⭐️ <i>Star us on <a href=\"https://github.com/unslothai/unsloth\">Github</a> </i> ⭐️\n",
     "\n",
     "  <b>This notebook is licensed <a href=\"https://github.com/unslothai/unsloth/blob/main/studio/LICENSE.AGPL-3.0\">AGPL-3.0</a></b>\n",
     "</div>"
    ]
   }
  ],
  "metadata": {
   "accelerator": "GPU",
   "colab": {
    "gpuType": "T4",
    "provenance": [],
    "include_colab_link": true
   },
   "kernelspec": {
    "display_name": "Python 3",
    "name": "python3"
   },
   "language_info": {
    "name": "python"
   }
  },
  "nbformat": 4,
  "nbformat_minor": 5
 }

2

studio/init.py Normal file

View file

 @ -0,0 +1,2 @@
 # SPDX-License-Identifier: AGPL-3.0-only
 # Copyright 2026-present the Unsloth AI Inc. team. All rights reserved. See /studio/LICENSE.AGPL-3.0

2

studio/backend/init.py Normal file

View file

 @ -0,0 +1,2 @@
 # SPDX-License-Identifier: AGPL-3.0-only
 # Copyright 2026-present the Unsloth AI Inc. team. All rights reserved. See /studio/LICENSE.AGPL-3.0

									
										59

studio/backend/_platform_compat.py
									
										Normal file
									
										View file
										
				@ -0,0 +1,59 @@

				# SPDX-License-Identifier: AGPL-3.0-only

				# Copyright 2026-present the Unsloth AI Inc. team. All rights reserved. See /studio/LICENSE.AGPL-3.0

				"""

				Compatibility shim for Anaconda/conda-forge Python builds.

				Anaconda modifies sys.version to include distributor metadata between pipe

				characters, e.g. '3.12.4 | packaged by Anaconda, Inc. | (main, ...) [MSC ...]'.

				Python's platform._sys_version() has a hardcoded regex that cannot parse this,

				raising ValueError. CPython closed this as "not planned" (cpython#102396).

				This module seeds platform._sys_version_cache so the stdlib parser never sees

				the problematic string, fixing the import chain:

				    structlog -> rich.pretty -> attrs._compat -> platform.python_implementation()

				Import this module before any library imports that may trigger the above chain.

				Safe to import multiple times (no-op if cache is already seeded or no pipes).

				"""

				import platform

				import re

				import sys

				def _seed_sys_version_cache() -> None:

				    """One-shot cache prime: parse a cleaned sys.version and seed the cache."""

				    raw = sys.version

				    # Strip paired |...| segments (Anaconda, conda-forge metadata)

				    cleaned = re.sub(r"\s*\|[^|]*\|\s*", " ", raw).strip()

				    # Format B: "ver (build) | label | (build_dup) \n[compiler]"

				    # After pipe-strip, two consecutive (...) groups remain; drop the second.

				    cleaned = re.sub(r"(\([^)]*\))\s+\([^)]*\)", r"\1", cleaned)

				    if "|" in cleaned:

				        # Unpaired pipe remaining -- keep version + everything from "(" onward

				        m = re.match(r"([\w.+]+)\s*", cleaned)

				        p = cleaned.find("(")

				        if m and p > 0:

				            cleaned = m.group(0) + cleaned[p:]

				    if cleaned == raw:

				        return  # Nothing to fix

				    # Parse the cleaned string through the real stdlib parser

				    try:

				        result = platform._sys_version(cleaned)

				    except ValueError:

				        return  # Cleaning didn't produce a parseable string; don't make things worse

				    # Seed the cache so future calls with the raw string skip parsing entirely

				    cache = getattr(platform, "_sys_version_cache", None)

				    if isinstance(cache, dict):

				        cache[raw] = result

				if "|" in sys.version:

				    _seed_sys_version_cache()

2

studio/backend/assets/init.py Normal file

View file

 @ -0,0 +1,2 @@
 # SPDX-License-Identifier: AGPL-3.0-only
 # Copyright 2026-present the Unsloth AI Inc. team. All rights reserved. See /studio/LICENSE.AGPL-3.0

2

studio/backend/assets/configs/init.py Normal file

View file

 @ -0,0 +1,2 @@
 # SPDX-License-Identifier: AGPL-3.0-only
 # Copyright 2026-present the Unsloth AI Inc. team. All rights reserved. See /studio/LICENSE.AGPL-3.0

									
										42

studio/backend/assets/configs/full_finetune.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,42 @@

				model: unsloth/Qwen2.5-0.5B

				data:

				  dataset: tatsu-lab/alpaca

				  format_type: auto

				training:

				  training_type: full

				  max_seq_length: 2048

				  load_in_4bit: false

				  output_dir: outputs

				  num_epochs: 1

				  learning_rate: 2e-5

				  batch_size: 1

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 0

				  save_steps: 0

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: false

				  gradient_checkpointing: "unsloth"

				lora:

				  lora_r: 64

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules: ""

				  vision_all_linear: false

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: unsloth-training

				  enable_tensorboard: false

				  tensorboard_dir: runs

									
										398

studio/backend/assets/configs/inference_defaults.json
									
										Normal file
									
										View file
										
				@ -0,0 +1,398 @@

				{

				  "_comment": "Per-model-family inference parameter defaults. Sources: (1) Ollama params blobs, (2) Existing Unsloth Studio YAML configs. Patterns ordered longest-match-first.",

				  "families": {

				    "qwen3.6": {

				      "temperature": 0.7,

				      "top_p": 0.8,

				      "top_k": 20,

				      "min_p": 0.0,

				      "repetition_penalty": 1.0,

				      "presence_penalty": 1.5

				    },

				    "qwen3.5": {

				      "temperature": 0.7,

				      "top_p": 0.8,

				      "top_k": 20,

				      "min_p": 0.0,

				      "repetition_penalty": 1.0,

				      "presence_penalty": 1.5

				    },

				    "qwen3-coder": {

				      "temperature": 0.7,

				      "top_p": 0.8,

				      "top_k": 20,

				      "min_p": 0.0,

				      "repetition_penalty": 1.0

				    },

				    "qwen3-next": {

				      "temperature": 0.7,

				      "top_p": 0.8,

				      "top_k": 20,

				      "min_p": 0.0,

				      "repetition_penalty": 1.0

				    },

				    "qwen3-vl": {

				      "temperature": 0.7,

				      "top_p": 0.8,

				      "top_k": 20,

				      "min_p": 0.0,

				      "repetition_penalty": 1.0

				    },

				    "qwen3": {

				      "temperature": 0.6,

				      "top_p": 0.95,

				      "top_k": 20,

				      "min_p": 0.0,

				      "repetition_penalty": 1.0

				    },

				    "qwen2.5-coder": {

				      "temperature": 1.5,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.1,

				      "repetition_penalty": 1.0

				    },

				    "qwen2.5-vl": {

				      "temperature": 1.5,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.1,

				      "repetition_penalty": 1.0

				    },

				    "qwen2.5-omni": {

				      "temperature": 0.7,

				      "top_p": 0.8,

				      "top_k": 20,

				      "min_p": 0.0,

				      "repetition_penalty": 1.0

				    },

				    "qwen2.5-math": {

				      "temperature": 0.7,

				      "top_p": 0.8,

				      "top_k": 20,

				      "min_p": 0.0,

				      "repetition_penalty": 1.0

				    },

				    "qwen2.5": {

				      "temperature": 0.7,

				      "top_p": 0.8,

				      "top_k": 20,

				      "min_p": 0.0,

				      "repetition_penalty": 1.0

				    },

				    "qwen2-vl": {

				      "temperature": 1.5,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.1,

				      "repetition_penalty": 1.0

				    },

				    "qwen2": {

				      "temperature": 0.7,

				      "top_p": 0.8,

				      "top_k": 20,

				      "min_p": 0.0,

				      "repetition_penalty": 1.0

				    },

				    "qwq": {

				      "temperature": 0.6,

				      "top_p": 0.95,

				      "top_k": 40,

				      "min_p": 0.0,

				      "repetition_penalty": 1.0

				    },

				    "gemma-4": {

				      "temperature": 1.0,

				      "top_p": 0.95,

				      "top_k": 64,

				      "min_p": 0.0,

				      "repetition_penalty": 1.0,

				      "presence_penalty": 0.0

				    },

				    "gemma-3n": {

				      "temperature": 1.0,

				      "top_p": 0.95,

				      "top_k": 64,

				      "min_p": 0.0,

				      "repetition_penalty": 1.0

				    },

				    "gemma-3": {

				      "temperature": 1.0,

				      "top_p": 0.95,

				      "top_k": 64,

				      "min_p": 0.0,

				      "repetition_penalty": 1.0

				    },

				    "medgemma": {

				      "temperature": 1.0,

				      "top_p": 0.95,

				      "top_k": 64,

				      "min_p": 0.0,

				      "repetition_penalty": 1.0

				    },

				    "gemma-2": {

				      "temperature": 1.0,

				      "top_p": 0.95,

				      "top_k": 64,

				      "min_p": 0.0,

				      "repetition_penalty": 1.0

				    },

				    "llama-4": {

				      "temperature": 1.0,

				      "top_p": 0.9,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "llama-3.3": {

				      "temperature": 1.5,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.1,

				      "repetition_penalty": 1.0

				    },

				    "llama-3.2": {

				      "temperature": 1.5,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.1,

				      "repetition_penalty": 1.0

				    },

				    "llama-3.1": {

				      "temperature": 1.5,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.1,

				      "repetition_penalty": 1.0

				    },

				    "llama-3": {

				      "temperature": 1.5,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.1,

				      "repetition_penalty": 1.0

				    },

				    "phi-4": {

				      "temperature": 0.8,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.0,

				      "repetition_penalty": 1.0

				    },

				    "phi-3": {

				      "temperature": 0.7,

				      "top_p": 0.9,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "mistral-nemo": {

				      "temperature": 0.7,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "mistral-small": {

				      "temperature": 0.15,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "mistral-large": {

				      "temperature": 0.7,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "magistral": {

				      "temperature": 0.7,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "ministral": {

				      "temperature": 0.15,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "devstral": {

				      "temperature": 0.7,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "pixtral": {

				      "temperature": 1.5,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.1,

				      "repetition_penalty": 1.0

				    },

				    "deepseek-r1": {

				      "temperature": 0.6,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "deepseek-v3": {

				      "temperature": 0.6,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "deepseek-ocr": {

				      "temperature": 0.0,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "glm-5": {

				      "temperature": 1.0,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "glm-4": {

				      "temperature": 1.0,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "nemotron": {

				      "temperature": 1.0,

				      "top_p": 1.0,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "minimax-m2.5": {

				      "temperature": 1.0,

				      "top_p": 0.95,

				      "top_k": 40,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "minimax": {

				      "temperature": 1.0,

				      "top_p": 0.95,

				      "top_k": 40,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "gpt-oss": {

				      "temperature": 1.0,

				      "top_p": 1.0,

				      "top_k": 0,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "granite-4": {

				      "temperature": 0.0,

				      "top_p": 1.0,

				      "top_k": 0,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "kimi-k2": {

				      "temperature": 0.6,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "kimi": {

				      "temperature": 0.6,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "lfm2": {

				      "temperature": 0.1,

				      "top_p": 0.1,

				      "top_k": 50,

				      "min_p": 0.15,

				      "repetition_penalty": 1.05

				    },

				    "smollm": {

				      "temperature": 0.7,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "olmo": {

				      "temperature": 0.7,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "falcon": {

				      "temperature": 0.7,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "ernie": {

				      "temperature": 0.7,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "seed": {

				      "temperature": 0.7,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "grok": {

				      "temperature": 1.0,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    },

				    "mimo": {

				      "temperature": 0.7,

				      "top_p": 0.95,

				      "top_k": -1,

				      "min_p": 0.01,

				      "repetition_penalty": 1.0

				    }

				  },

				  "patterns": [

				    "qwen3.6", "qwen3.5",

				    "qwen3-coder", "qwen3-next", "qwen3-vl", "qwen3",

				    "qwen2.5-coder", "qwen2.5-vl", "qwen2.5-omni", "qwen2.5-math", "qwen2.5",

				    "qwen2-vl", "qwen2",

				    "qwq",

				    "gemma-4", "gemma-3n", "gemma-3", "medgemma", "gemma-2",

				    "llama-4", "llama-3.3", "llama-3.2", "llama-3.1", "llama-3",

				    "phi-4", "phi-3",

				    "mistral-nemo", "mistral-small", "mistral-large", "magistral", "ministral",

				    "devstral", "pixtral",

				    "deepseek-r1", "deepseek-v3", "deepseek-ocr",

				    "glm-5", "glm-4",

				    "nemotron",

				    "minimax-m2.5", "minimax",

				    "gpt-oss", "granite-4",

				    "kimi-k2", "kimi",

				    "lfm2", "smollm", "olmo", "falcon", "ernie", "seed", "grok", "mimo"

				  ]

				}

									
										42

studio/backend/assets/configs/lora_text.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,42 @@

				model: unsloth/Qwen2.5-0.5B

				data:

				  dataset: tatsu-lab/alpaca

				  format_type: auto

				training:

				  training_type: lora

				  max_seq_length: 2048

				  load_in_4bit: true

				  output_dir: outputs

				  num_epochs: 1

				  learning_rate: 0.0002

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 0

				  save_steps: 0

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: false

				  gradient_checkpointing: "unsloth"

				lora:

				  lora_r: 64

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules: "q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj"

				  vision_all_linear: false

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: unsloth-training

				  enable_tensorboard: false

				  tensorboard_dir: runs

									
										56

studio/backend/assets/configs/model_defaults/default.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,56 @@

				# Default model training parameters

				# Used for models without specific configurations

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_ratio: 0.1

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 16

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 0.7

				  top_p: 0.95

				  top_k: -1

				  min_p: 0.01

									
										43

studio/backend/assets/configs/model_defaults/embedding/unsloth_Qwen3-Embedding-0.6B.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,43 @@

				# Model defaults for unsloth/Qwen3-Embedding-0.6B

				# Based on Qwen3_Embedding_(0_6B).py embedding notebook

				# Also applies to: unsloth/Qwen3-Embedding-4B

				training:

				  max_seq_length: 512

				  # num_epochs: 2

				  num_epochs: 0

				  learning_rate: 3e-5

				  batch_size: 256

				  gradient_accumulation_steps: 1

				  warmup_ratio: 0.03

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: false

				  gradient_checkpointing: false

				  optim: "adamw_8bit"

				  lr_scheduler_type: "constant_with_warmup"

				lora:

				  lora_r: 32

				  lora_alpha: 32

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "embedding-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 50

									
										39

studio/backend/assets/configs/model_defaults/embedding/unsloth_all-MiniLM-L6-v2.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,39 @@

				# Model defaults for unsloth/all-MiniLM-L6-v2

				# Based on All_MiniLM_L6_v2.py embedding notebook

				training:

				  max_seq_length: 512

				  # num_epochs: 2

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 256

				  gradient_accumulation_steps: 1

				  warmup_ratio: 0.03

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: false

				  gradient_checkpointing: false

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 64

				  lora_alpha: 128

				  lora_dropout: 0.0

				  target_modules:

				    - "value"

				    - "key"

				    - "dense"

				    - "query"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "embedding-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 50

									
										39

studio/backend/assets/configs/model_defaults/embedding/unsloth_bge-m3.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,39 @@

				# Model defaults for unsloth/bge-m3

				# Based on BGE_M3.py embedding notebook

				training:

				  max_seq_length: 512

				  # num_epochs: 2

				  num_epochs: 0

				  learning_rate: 3e-5

				  batch_size: 256

				  gradient_accumulation_steps: 1

				  warmup_ratio: 0.03

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: false

				  gradient_checkpointing: false

				  optim: "adamw_8bit"

				  lr_scheduler_type: "constant_with_warmup"

				lora:

				  lora_r: 32

				  lora_alpha: 64

				  lora_dropout: 0.0

				  target_modules:

				    - "key"

				    - "query"

				    - "dense"

				    - "value"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "embedding-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 50

									
										42

studio/backend/assets/configs/model_defaults/embedding/unsloth_embeddinggemma-300m.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,42 @@

				# Model defaults for unsloth/embeddinggemma-300m

				# Based on EmbeddingGemma_(300M).py embedding notebook

				training:

				  max_seq_length: 1024

				  # num_epochs: 1

				  num_epochs: 0

				  learning_rate: 2e-5

				  batch_size: 64

				  gradient_accumulation_steps: 2

				  warmup_ratio: 0.03

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: false

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 32

				  lora_alpha: 64

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "embedding-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 5

									
										38

studio/backend/assets/configs/model_defaults/embedding/unsloth_gte-modernbert-base.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,38 @@

				# Model defaults for unsloth/gte-modernbert-base

				# Based on ModernBert.py embedding notebook

				training:

				  max_seq_length: 512

				  # num_epochs: 2

				  num_epochs: 0

				  learning_rate: 3e-5

				  batch_size: 256

				  gradient_accumulation_steps: 1

				  warmup_ratio: 0.03

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: false

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "constant_with_warmup"

				lora:

				  lora_r: 64

				  lora_alpha: 128

				  lora_dropout: 0.0

				  target_modules:

				    - "Wi"

				    - "Wo"

				    - "Wqkv"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "embedding-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 50

									
										47

studio/backend/assets/configs/model_defaults/ernie/unsloth_ERNIE-4.5-21B-A3B-PT.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,47 @@

				# Model defaults for unsloth/ERNIE-4.5-21B-A3B-PT

				# Based on ERNIE_4_5_21B_A3B_PT-Conversational.ipynb

				# Also applies to: unsloth/ERNIE-4.5-21B-A3B-PT

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 4

				  gradient_accumulation_steps: 2

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 8

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

									
										55

studio/backend/assets/configs/model_defaults/ernie/unsloth_ERNIE-4.5-VL-28B-A3B-PT.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,55 @@

				# Model defaults for unsloth/ERNIE-4.5-VL-28B-A3B-PT

				# Based on ERNIE_4_5_VL_28B_A3B_PT_Vision.ipynb

				# Also applies to: unsloth/ERNIE-4.5-VL-28B-A3B-PT

				# added inference parameters from unsloth notebook

				training:

				  trust_remote_code: true

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 2

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 8

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: true

				  temperature: 1.5

				  min_p: 0.1

									
										47

studio/backend/assets/configs/model_defaults/falcon/tiiuae_Falcon-H1-0.5B-Instruct.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,47 @@

				# Model defaults for tiiuae/Falcon-H1-0.5B-Instruct

				# Based on Falcon_H1_(0.5B)-Alpaca.ipynb

				# Also applies to: tiiuae/Falcon-H1-0.5B-Instruct, unsloth/Falcon-H1-0.5B-Instruct

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 8

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: false

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 16

				  lora_alpha: 16

				  lora_dropout: 0.1

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

									
										50

studio/backend/assets/configs/model_defaults/gemma/unsloth_codegemma-7b-bnb-4bit.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,50 @@

				# Model defaults for unsloth/codegemma-7b-bnb-4bit

				# Based on CodeGemma_(7B)-Conversational.ipynb

				# Also applies to: unsloth/codegemma-7b, google/codegemma-7b

				# added inference parameters from Ollama

				training:

				  trust_remote_code: false

				  max_seq_length: 4096

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 1

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 16

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 0

				  top_p: 0.9

									
										53

studio/backend/assets/configs/model_defaults/gemma/unsloth_functiongemma-270m-it.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,53 @@

				# Model defaults for unsloth/functiongemma-270m-it

				# Based on FunctionGemma_(270M).ipynb

				# Also applies to: unsloth/functiongemma-270m-it-unsloth-bnb-4bit, google/functiongemma-270m-it, unsloth/functiongemma-270m-it-unsloth-bnb-4bit

				# added inference parameters from unsloth guides

				training:

				  trust_remote_code: false

				  max_seq_length: 4096

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 4

				  gradient_accumulation_steps: 2

				  warmup_steps: 10

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 128

				  lora_alpha: 256

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 1.0

				  top_k: 64

				  top_p: 0.95

				  min_p: 0.0

									
										46

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-2-27b-bnb-4bit.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,46 @@

				# Model defaults for unsloth/gemma-2-27b-bnb-4bit

				# Based on Gemma2_(9B)-Alpaca.ipynb (same defaults for larger models)

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 16

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

									
										47

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-2-2b.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,47 @@

				# Model defaults for unsloth/gemma-2-2b

				# Based on Gemma2_(2B)-Alpaca.ipynb

				# Also applies to: unsloth/gemma-2-2b-bnb-4bit, google/gemma-2-2b

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 16

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

									
										53

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-3-270m-it.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,53 @@

				# Model defaults for unsloth/gemma-3-270m-it

				# Based on Gemma3_(270M).ipynb

				# Also applies to: unsloth/gemma-3-270m-it-unsloth-bnb-4bit, google/gemma-3-270m-it, unsloth/gemma-3-270m-it-bnb-4bit

				# added inference parameters from unsloth guides

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 5e-5

				  batch_size: 4

				  gradient_accumulation_steps: 1

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 128

				  lora_alpha: 128

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 1.0

				  top_k: 64

				  top_p: 0.95

				  min_p: 0.0

									
										51

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-3-27b-it.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,51 @@

				# Model defaults for unsloth/gemma-3-27b-it

				# Based on Gemma3_(27B)_A100-Conversational.ipynb

				# Also applies to: unsloth/gemma-3-27b-it-unsloth-bnb-4bit, google/gemma-3-27b-it, unsloth/gemma-3-27b-it-bnb-4bit

				# added inference parameters from unsloth guides

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 8

				  lora_alpha: 8

				  lora_dropout: 0.0

				  target_modules:

				    - "all-linear"

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 1.0

				  top_k: 64

				  top_p: 0.95

				  min_p: 0.0

									
										51

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-3-4b-it.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,51 @@

				# Model defaults for unsloth/gemma-3-4b-it

				# Based on Gemma3_(4B).ipynb

				# Also applies to: unsloth/gemma-3-4b-it-unsloth-bnb-4bit, google/gemma-3-4b-it, unsloth/gemma-3-4b-it-bnb-4bit

				# added inference parameters from unsloth guides

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 8

				  lora_alpha: 8

				  lora_dropout: 0.0

				  target_modules:

				    - "all-linear"

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 1.0

				  top_k: 64

				  top_p: 0.95

				  min_p: 0.0

									
										51

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-3-4b-pt.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,51 @@

				# Model defaults for unsloth/gemma-3-4b-pt

				# Based on Gemma3_(4B)-Vision.ipynb

				# Also applies to: unsloth/gemma-3-4b-pt-unsloth-bnb-4bit, google/gemma-3-4b-pt, unsloth/gemma-3-4b-pt-bnb-4bit

				# added inference parameters from unsloth guides

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 2

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 1

				  gradient_accumulation_steps: 4

				  warmup_ratio: 0.03

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: true

				  optim: "adamw_torch_fused"

				  lr_scheduler_type: "cosine"

				lora:

				  lora_r: 16

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules:

				    - "all-linear"

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 1.0

				  top_k: 64

				  top_p: 0.95

				  min_p: 0.0

									
										53

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-3n-E4B-it.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,53 @@

				# Model defaults for unsloth/gemma-3n-E4B-it

				# Based on Gemma3N_(4B)-Conversational.ipynb

				# Also applies to: unsloth/gemma-3n-E4B-it-unsloth-bnb-4bit, google/gemma-3n-E4B-it, unsloth/gemma-3n-E4B-it-unsloth-bnb-4bit

				# added inference parameters from unsloth guides

				training:

				  trust_remote_code: false

				  max_seq_length: 1024

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 1

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 8

				  lora_alpha: 8

				  lora_dropout: 0.0

				  target_modules:

				    - "all-linear"

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				audio_input: true

				inference:

				  trust_remote_code: false

				  temperature: 1.0

				  top_k: 64

				  top_p: 0.95

				  min_p: 0.0

									
										53

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-3n-E4B.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,53 @@

				# Model defaults for unsloth/gemma-3n-E4B

				# Based on Gemma3N_(4B)-Vision.ipynb

				# Also applies to: unsloth/gemma-3n-E4B-unsloth-bnb-4bit, google/gemma-3n-E4B

				# added inference parameters from unsloth guides

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 2

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 1

				  gradient_accumulation_steps: 4

				  warmup_ratio: 0.03

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: true

				  optim: "adamw_torch_fused"

				  lr_scheduler_type: "cosine"

				lora:

				  lora_r: 32

				  lora_alpha: 32

				  lora_dropout: 0.0

				  target_modules:

				    - "all-linear"

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				audio_input: true

				inference:

				  trust_remote_code: false

				  temperature: 1.0

				  top_k: 64

				  top_p: 0.95

				  min_p: 0.0

									
										47

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-4-26B-A4B-it.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,47 @@

				# Model defaults for unsloth/gemma-4-26B-A4B-it

				# Also applies to: google/gemma-4-26B-A4B-it, unsloth/gemma-4-26B-A4B-it-GGUF

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 8

				  lora_alpha: 8

				  lora_dropout: 0.0

				  target_modules:

				    - "all-linear"

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 1.0

				  top_p: 0.95

				  top_k: 64

				  min_p: 0.0

									
										47

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-4-26B-A4B.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,47 @@

				# Model defaults for unsloth/gemma-4-26B-A4B (base/pretrained)

				# Also applies to: google/gemma-4-26B-A4B

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 8

				  lora_alpha: 8

				  lora_dropout: 0.0

				  target_modules:

				    - "all-linear"

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 1.0

				  top_p: 0.95

				  top_k: 64

				  min_p: 0.0

									
										47

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-4-31B-it.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,47 @@

				# Model defaults for unsloth/gemma-4-31B-it

				# Also applies to: google/gemma-4-31B-it, unsloth/gemma-4-31B-it-GGUF

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 8

				  lora_alpha: 8

				  lora_dropout: 0.0

				  target_modules:

				    - "all-linear"

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 1.0

				  top_p: 0.95

				  top_k: 64

				  min_p: 0.0

									
										47

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-4-31B.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,47 @@

				# Model defaults for unsloth/gemma-4-31B (base/pretrained)

				# Also applies to: google/gemma-4-31B

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 8

				  lora_alpha: 8

				  lora_dropout: 0.0

				  target_modules:

				    - "all-linear"

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 1.0

				  top_p: 0.95

				  top_k: 64

				  min_p: 0.0

									
										47

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-4-E2B-it.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,47 @@

				# Model defaults for unsloth/gemma-4-E2B-it

				# Also applies to: google/gemma-4-E2B-it, unsloth/gemma-4-E2B-it-GGUF

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 8

				  lora_alpha: 8

				  lora_dropout: 0.0

				  target_modules:

				    - "all-linear"

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 1.0

				  top_p: 0.95

				  top_k: 64

				  min_p: 0.0

									
										47

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-4-E2B.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,47 @@

				# Model defaults for unsloth/gemma-4-E2B (base/pretrained)

				# Also applies to: google/gemma-4-E2B

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 8

				  lora_alpha: 8

				  lora_dropout: 0.0

				  target_modules:

				    - "all-linear"

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 1.0

				  top_p: 0.95

				  top_k: 64

				  min_p: 0.0

									
										47

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-4-E4B-it.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,47 @@

				# Model defaults for unsloth/gemma-4-E4B-it

				# Also applies to: google/gemma-4-E4B-it, unsloth/gemma-4-E4B-it-GGUF

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 8

				  lora_alpha: 8

				  lora_dropout: 0.0

				  target_modules:

				    - "all-linear"

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 1.0

				  top_p: 0.95

				  top_k: 64

				  min_p: 0.0

									
										47

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-4-E4B.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,47 @@

				# Model defaults for unsloth/gemma-4-E4B (base/pretrained)

				# Also applies to: google/gemma-4-E4B

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 8

				  lora_alpha: 8

				  lora_dropout: 0.0

				  target_modules:

				    - "all-linear"

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 1.0

				  top_p: 0.95

				  top_k: 64

				  min_p: 0.0

									
										52

studio/backend/assets/configs/model_defaults/gpt-oss/unsloth_gpt-oss-120b.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,52 @@

				# Model defaults for unsloth/gpt-oss-120b

				# Based on gpt-oss-(120B)_A100-Fine-tuning.ipynb

				# Also applies to: openai/gpt-oss-120b, unsloth/gpt-oss-120b-unsloth-bnb-4bit

				# added inference parameters from unsloth guides

				training:

				  trust_remote_code: false

				  max_seq_length: 4096

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 4

				  gradient_accumulation_steps: 1

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 16

				  lora_alpha: 32

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 1.0

				  top_p: 1.0

				  top_k: 0

									
										52

studio/backend/assets/configs/model_defaults/gpt-oss/unsloth_gpt-oss-20b.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,52 @@

				# Model defaults for unsloth/gpt-oss-20b

				# Based on gpt-oss-(20B)-Fine-tuning.ipynb

				# Also applies to: openai/gpt-oss-20b, unsloth/gpt-oss-20b-unsloth-bnb-4bit, unsloth/gpt-oss-20b-BF16

				# added inference parameters from unsloth guides

				training:

				  trust_remote_code: false

				  max_seq_length: 1024

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 1

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 8

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 1.0

				  top_p: 1.0

				  top_k: 0

									
										54

studio/backend/assets/configs/model_defaults/granite/unsloth_granite-4.0-350m-unsloth-bnb-4bit.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,54 @@

				# Model defaults for unsloth/granite-4.0-350m

				# Based on Granite4.0_350M.ipynb

				# Also applies to: ibm-granite/granite-4.0-350m, unsloth/granite-4.0-350m-bnb-4bit

				# added inference parameters from unsloth guides

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 32

				  lora_alpha: 32

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				    - "shared_mlp.input_linear"

				    - "shared_mlp.output_linear"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 0.0

				  top_p: 1.0

				  top_k: 0

									
										54

studio/backend/assets/configs/model_defaults/granite/unsloth_granite-4.0-h-micro.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,54 @@

				# Model defaults for unsloth/granite-4.0-h-micro

				# Based on Granite4.0.ipynb

				# Also applies to: ibm-granite/granite-4.0-h-micro, unsloth/granite-4.0-h-micro-bnb-4bit, unsloth/granite-4.0-h-micro-unsloth-bnb-4bit

				# added inference parameters from unsloth guides

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 32

				  lora_alpha: 32

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				    - "shared_mlp.input_linear"

				    - "shared_mlp.output_linear"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 0.0

				  top_p: 1.0

				  top_k: 0

									
										49

studio/backend/assets/configs/model_defaults/llama/unsloth_Llama-3.2-11B-Vision-Instruct.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,49 @@

				# Model defaults for unsloth/Llama-3.2-11B-Vision-Instruct

				# Based on Llama3.2_(11B)-Vision.ipynb

				# Also applies to: unsloth/Llama-3.2-11B-Vision-Instruct-unsloth-bnb-4bit, meta-llama/Llama-3.2-11B-Vision-Instruct, unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit

				# added inference parameters from unsloth notebook

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 16

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules:

				    - "all-linear"

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 1.5

				  min_p: 0.1

									
										47

studio/backend/assets/configs/model_defaults/llama/unsloth_Llama-3.2-1B-Instruct.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,47 @@

				# Model defaults for unsloth/Llama-3.2-1B-Instruct

				# Based on Llama3.2_(1B)-RAFT.ipynb

				# Also applies to: unsloth/Llama-3.2-1B-Instruct-unsloth-bnb-4bit, meta-llama/Llama-3.2-1B-Instruct, unsloth/Llama-3.2-1B-Instruct-bnb-4bit, RedHatAI/Llama-3.2-1B-Instruct-FP8, unsloth/Llama-3.2-1B-Instruct-FP8-Block, unsloth/Llama-3.2-1B-Instruct-FP8-Dynamic

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 5

				  num_epochs: 0

				  learning_rate: 2e-5

				  batch_size: 1

				  gradient_accumulation_steps: 8

				  warmup_steps: 0

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: true

				  optim: "adamw_torch"

				  lr_scheduler_type: "cosine"

				lora:

				  lora_r: 16

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

									
										51

studio/backend/assets/configs/model_defaults/llama/unsloth_Llama-3.2-3B-Instruct.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,51 @@

				# Model defaults for unsloth/Llama-3.2-3B-Instruct

				# Based on Llama3.2_(1B_and_3B)-Conversational.ipynb

				# Also applies to: unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit, meta-llama/Llama-3.2-3B-Instruct, unsloth/Llama-3.2-3B-Instruct-bnb-4bit, RedHatAI/Llama-3.2-3B-Instruct-FP8, unsloth/Llama-3.2-3B-Instruct-FP8-Block, unsloth/Llama-3.2-3B-Instruct-FP8-Dynamic

				# added inference parameters from unsloth notebook

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 16

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 1.5

				  min_p: 0.1

									
										51

studio/backend/assets/configs/model_defaults/llama/unsloth_Llama-3.3-70B-Instruct.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,51 @@

				# Model defaults for unsloth/Llama-3.3-70B-Instruct

				# Based on Llama3.3_(70B)_A100-Conversational.ipynb

				# Also applies to: unsloth/Llama-3.3-70B-Instruct-unsloth-bnb-4bit, meta-llama/Llama-3.3-70B-Instruct, unsloth/Llama-3.3-70B-Instruct-bnb-4bit, RedHatAI/Llama-3.3-70B-Instruct-FP8, unsloth/Llama-3.3-70B-Instruct-FP8-Block, unsloth/Llama-3.3-70B-Instruct-FP8-Dynamic

				# added inference parameters from unsloth notebook

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 16

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 1.5

				  min_p: 0.1

									
										47

studio/backend/assets/configs/model_defaults/llama/unsloth_Meta-Llama-3.1-70B-bnb-4bit.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,47 @@

				# Model defaults for unsloth/Meta-Llama-3.1-70B-bnb-4bit

				# Based on Llama3.1_(8B)-Alpaca.ipynb

				# Also applies to: unsloth/Meta-Llama-3.1-8B-bnb-4bit, unsloth/Meta-Llama-3.1-8B-unsloth-bnb-4bit, meta-llama/Meta-Llama-3.1-8B, unsloth/Meta-Llama-3.1-8B, unsloth/Meta-Llama-3.1-70B, meta-llama/Meta-Llama-3.1-70B, unsloth/Meta-Llama-3.1-405B-bnb-4bit, meta-llama/Meta-Llama-3.1-405B

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 16

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

									
										47

studio/backend/assets/configs/model_defaults/llama/unsloth_Meta-Llama-3.1-8B-Instruct-bnb-4bit.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,47 @@

				# Model defaults for unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

				# Based on Llama3.1_(8B)-Inference.ipynb

				# Also applies to: "unsloth/Meta-Llama-3.1-8B-Instruct-unsloth-bnb-4bit", "meta-llama/Meta-Llama-3.1-8B-Instruct", "unsloth/Meta-Llama-3.1-8B-Instruct","RedHatAI/Llama-3.1-8B-Instruct-FP8","unsloth/Llama-3.1-8B-Instruct-FP8-Block","unsloth/Llama-3.1-8B-Instruct-FP8-Dynamic"

				training:

				  trust_remote_code: false

				  max_seq_length: 8192

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 16

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

									
										47

studio/backend/assets/configs/model_defaults/llama/unsloth_llama-3-8b-Instruct-bnb-4bit.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,47 @@

				# Model defaults for unsloth/llama-3-8b-Instruct-bnb-4bit

				# Based on Llama3_(8B)-Conversational.ipynb

				# Also applies to: unsloth/llama-3-8b-Instruct, meta-llama/Meta-Llama-3-8B-Instruct

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 16

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

									
										47

studio/backend/assets/configs/model_defaults/llama/unsloth_llama-3-8b-bnb-4bit.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,47 @@

				# Model defaults for unsloth/llama-3-8b-bnb-4bit

				# Based on Llama3_(8B)-Alpaca.ipynb

				# Also applies to: unsloth/llama-3-8b, meta-llama/Meta-Llama-3-8B

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 16

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

									
										46

studio/backend/assets/configs/model_defaults/llasa/unsloth_Llasa-3B.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,46 @@

				# Model defaults for unsloth/Llasa-3B

				# Based on Llasa_TTS_(3B).ipynb and Llasa_TTS_(1B).ipynb

				# Also applies to: HKUSTAudio/Llasa-1B

				# added inference parameters from unsloth notebook

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 5e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 128

				  lora_alpha: 128

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "v_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 1.2

				  top_p: 1.2

									
										56

studio/backend/assets/configs/model_defaults/mistral/unsloth_Magistral-Small-2509-unsloth-bnb-4bit.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,56 @@

				# Model defaults for unsloth/Magistral-Small-2509

				# Based on Magistral_(24B)-Reasoning-Conversational.ipynb

				# Also applies to: mistralai/Magistral-Small-2509, unsloth/Magistral-Small-2509-bnb-4bit

				# added inference parameters from unsloth guides

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 2

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 32

				  lora_alpha: 32

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 0.7

				  min_p: 0.01

				  top_p: 0.95

									
										55

studio/backend/assets/configs/model_defaults/mistral/unsloth_Ministral-3-3B-Instruct-2512.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,55 @@

				# Model defaults for unsloth/Ministral-3-3B-Instruct-2512

				# Based on Ministral_3_VL_(3B)_Vision.ipynb

				# Also applies to: unsloth/Ministral-3-3B-Instruct-2512

				# added inference parameters from unsloth guides

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 4

				  gradient_accumulation_steps: 2

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 32

				  lora_alpha: 32

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				  finetune_vision_layers: true

				  finetune_language_layers: true

				  finetune_attention_modules: true

				  finetune_mlp_modules: true

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

				  temperature: 0.15

				  top_p: 0.95

									
										47

studio/backend/assets/configs/model_defaults/mistral/unsloth_Mistral-Nemo-Base-2407-bnb-4bit.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,47 @@

				# Model defaults for unsloth/Mistral-Nemo-Base-2407-bnb-4bit

				# Based on Mistral_Nemo_(12B)-Alpaca.ipynb

				# Also applies to:  "unsloth/Mistral-Nemo-Base-2407",  "mistralai/Mistral-Nemo-Base-2407", "unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit", "unsloth/Mistral-Nemo-Instruct-2407", "mistralai/Mistral-Nemo-Instruct-2407",

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 2

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 16

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

									
										47

studio/backend/assets/configs/model_defaults/mistral/unsloth_Mistral-Small-Instruct-2409.yaml
									
										Normal file
									
										View file
										
				@ -0,0 +1,47 @@

				# Model defaults for unsloth/Mistral-Small-Instruct-2409

				# Based on Mistral_Small_(22B)-Alpaca.ipynb 

				# Also applies to: unsloth/Mistral-Small-Instruct-2409-bnb-4bit, mistralai/Mistral-Small-Instruct-2409

				training:

				  trust_remote_code: false

				  max_seq_length: 2048

				  # num_epochs: 4

				  num_epochs: 0

				  learning_rate: 2e-4

				  batch_size: 1

				  gradient_accumulation_steps: 4

				  warmup_steps: 5

				  max_steps: 30

				  save_steps: 30

				  weight_decay: 0.001

				  random_seed: 3407

				  packing: false

				  train_on_completions: true

				  gradient_checkpointing: "unsloth"

				  optim: "adamw_8bit"

				  lr_scheduler_type: "linear"

				lora:

				  lora_r: 16

				  lora_alpha: 16

				  lora_dropout: 0.0

				  target_modules:

				    - "q_proj"

				    - "k_proj"

				    - "v_proj"

				    - "o_proj"

				    - "gate_proj"

				    - "up_proj"

				    - "down_proj"

				  use_rslora: false

				  use_loftq: false

				logging:

				  enable_wandb: false

				  wandb_project: "llm-finetuning"

				  enable_tensorboard: false

				  tensorboard_dir: "runs"

				  log_frequency: 10

				inference:

				  trust_remote_code: false

Compare commits

4947 commits February-2 ... main

2 .gitattributes vendored Normal file Unescape Escape View file

55 .github/CODEOWNERS vendored Normal file Unescape Escape View file

4 .github/FUNDING.yml vendored Unescape Escape View file

22 .github/ISSUE_TEMPLATE/bug---issue.md vendored Normal file Unescape Escape View file

21 .github/ISSUE_TEMPLATE/feature-request.md vendored Normal file Unescape Escape View file

27 .github/dependabot.yml vendored Normal file Unescape Escape View file

37 .github/workflows/stale.yml vendored Normal file Unescape Escape View file

62 .gitignore vendored Unescape Escape View file

6 .pre-commit-ci.yaml Normal file Unescape Escape View file

18 .pre-commit-config.yaml Normal file Unescape Escape View file

132 CODE_OF_CONDUCT.md Normal file Unescape Escape View file

29 CONTRIBUTING.md Normal file Unescape Escape View file

664 COPYING Normal file Unescape Escape View file

4 LICENSE Unescape Escape View file

625 README.md Unescape Escape View file

79 build.sh Normal file Unescape Escape View file

7 cli.py Normal file Unescape Escape View file

BIN images/Assistant.png Normal file View file

BIN images/Documentation Button.png Normal file View file

BIN images/Merge.png Normal file View file

BIN images/Run.png Normal file View file

BIN images/STUDIO BLACK LOGO.png Normal file View file

BIN images/STUDIO WHITE LOGO.png Normal file View file

BIN images/Terminal_Type.png Normal file View file

BIN images/Where_Terminal.png Normal file View file

BIN images/buy me a coffee button.png Normal file View file

BIN images/documentation github button.png Normal file View file

BIN images/documentation green button.png Normal file View file

BIN images/documentation lighter.png Normal file View file

BIN images/documentation white button.png Normal file View file

BIN images/made with unsloth.png Normal file View file

BIN images/ollama.png Normal file View file

BIN images/start free finetune button.png Normal file View file

BIN images/unsloth end.png Normal file View file

BIN images/unsloth logo black text.png View file

BIN images/unsloth logo only.png View file

BIN images/unsloth logo white text.png View file

BIN images/unsloth new logo.png View file

BIN images/unsloth sticker.png Normal file View file

1125 install.ps1 Normal file View file

1671 install.sh Executable file View file

1183 pyproject.toml View file

179 scripts/enforce_kwargs_spacing.py Executable file Unescape Escape View file

169 scripts/install_gemma4_mlx.sh Executable file Unescape Escape View file

191 scripts/install_qwen3_6_mlx.sh Normal file Unescape Escape View file

30 scripts/run_ruff_format.py Executable file Unescape Escape View file

661 studio/LICENSE.AGPL-3.0 Normal file Unescape Escape View file

153 studio/Unsloth_Studio_Colab.ipynb Normal file Unescape Escape View file

2 studio/__init__.py Normal file Unescape Escape View file

2 studio/backend/__init__.py Normal file Unescape Escape View file

59 studio/backend/_platform_compat.py Normal file Unescape Escape View file

2 studio/backend/assets/__init__.py Normal file Unescape Escape View file

2 studio/backend/assets/configs/__init__.py Normal file Unescape Escape View file

42 studio/backend/assets/configs/full_finetune.yaml Normal file Unescape Escape View file

398 studio/backend/assets/configs/inference_defaults.json Normal file Unescape Escape View file

42 studio/backend/assets/configs/lora_text.yaml Normal file Unescape Escape View file

56 studio/backend/assets/configs/model_defaults/default.yaml Normal file Unescape Escape View file

43 studio/backend/assets/configs/model_defaults/embedding/unsloth_Qwen3-Embedding-0.6B.yaml Normal file Unescape Escape View file

39 studio/backend/assets/configs/model_defaults/embedding/unsloth_all-MiniLM-L6-v2.yaml Normal file Unescape Escape View file

39 studio/backend/assets/configs/model_defaults/embedding/unsloth_bge-m3.yaml Normal file Unescape Escape View file

42 studio/backend/assets/configs/model_defaults/embedding/unsloth_embeddinggemma-300m.yaml Normal file Unescape Escape View file

38 studio/backend/assets/configs/model_defaults/embedding/unsloth_gte-modernbert-base.yaml Normal file Unescape Escape View file

47 studio/backend/assets/configs/model_defaults/ernie/unsloth_ERNIE-4.5-21B-A3B-PT.yaml Normal file Unescape Escape View file

55 studio/backend/assets/configs/model_defaults/ernie/unsloth_ERNIE-4.5-VL-28B-A3B-PT.yaml Normal file Unescape Escape View file

47 studio/backend/assets/configs/model_defaults/falcon/tiiuae_Falcon-H1-0.5B-Instruct.yaml Normal file Unescape Escape View file

50 studio/backend/assets/configs/model_defaults/gemma/unsloth_codegemma-7b-bnb-4bit.yaml Normal file Unescape Escape View file

53 studio/backend/assets/configs/model_defaults/gemma/unsloth_functiongemma-270m-it.yaml Normal file Unescape Escape View file

46 studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-2-27b-bnb-4bit.yaml Normal file Unescape Escape View file

47 studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-2-2b.yaml Normal file Unescape Escape View file

53 studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-3-270m-it.yaml Normal file Unescape Escape View file

51 studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-3-27b-it.yaml Normal file Unescape Escape View file

51 studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-3-4b-it.yaml Normal file Unescape Escape View file

51 studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-3-4b-pt.yaml Normal file Unescape Escape View file

53 studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-3n-E4B-it.yaml Normal file Unescape Escape View file

53 studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-3n-E4B.yaml Normal file Unescape Escape View file

47 studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-4-26B-A4B-it.yaml Normal file Unescape Escape View file

47 studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-4-26B-A4B.yaml Normal file Unescape Escape View file

47 studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-4-31B-it.yaml Normal file Unescape Escape View file

4947 commits

February-2 ... main

2

.gitattributes vendored Normal file

View file

55

.github/CODEOWNERS vendored Normal file

View file

4

.github/FUNDING.yml vendored

View file

22

.github/ISSUE_TEMPLATE/bug---issue.md vendored Normal file

View file

21

.github/ISSUE_TEMPLATE/feature-request.md vendored Normal file

View file

27

.github/dependabot.yml vendored Normal file

View file

37

.github/workflows/stale.yml vendored Normal file

View file

62

.gitignore vendored

View file

6

.pre-commit-ci.yaml Normal file

View file

18

.pre-commit-config.yaml Normal file

View file

132

CODE_OF_CONDUCT.md Normal file

View file

29

CONTRIBUTING.md Normal file

View file

664

COPYING Normal file

View file

4

LICENSE

View file

625

README.md

View file

79

build.sh Normal file

View file

7

cli.py Normal file

View file

BIN
images/Assistant.png Normal file

View file

BIN
images/Documentation Button.png Normal file

View file

BIN
images/Merge.png Normal file

View file

BIN
images/Run.png Normal file

View file

BIN
images/STUDIO BLACK LOGO.png Normal file

View file

BIN
images/STUDIO WHITE LOGO.png Normal file

View file

BIN
images/Terminal_Type.png Normal file

View file

BIN
images/Where_Terminal.png Normal file

View file

BIN
images/buy me a coffee button.png Normal file

View file

BIN
images/documentation github button.png Normal file

View file

BIN
images/documentation green button.png Normal file

View file

BIN
images/documentation lighter.png Normal file

View file

BIN
images/documentation white button.png Normal file

View file

BIN
images/made with unsloth.png Normal file

View file

BIN
images/ollama.png Normal file

View file

BIN
images/start free finetune button.png Normal file

View file

BIN
images/unsloth end.png Normal file

View file

BIN
images/unsloth logo black text.png

View file

BIN
images/unsloth logo only.png

View file

BIN
images/unsloth logo white text.png

View file

BIN
images/unsloth new logo.png

View file

BIN
images/unsloth sticker.png Normal file

View file

1125

install.ps1 Normal file

View file

1671

install.sh Executable file

View file

1183

pyproject.toml

View file

179

scripts/enforce_kwargs_spacing.py Executable file

View file

169

scripts/install_gemma4_mlx.sh Executable file

View file

191

scripts/install_qwen3_6_mlx.sh Normal file

View file

30

scripts/run_ruff_format.py Executable file

View file

661

studio/LICENSE.AGPL-3.0 Normal file

View file

153

studio/Unsloth_Studio_Colab.ipynb Normal file

View file

2

studio/init.py Normal file

View file

2

studio/backend/init.py Normal file

View file

59

studio/backend/_platform_compat.py Normal file

View file

2

studio/backend/assets/init.py Normal file

View file

2

studio/backend/assets/configs/init.py Normal file

View file

42

studio/backend/assets/configs/full_finetune.yaml Normal file

View file

398

studio/backend/assets/configs/inference_defaults.json Normal file

View file

42

studio/backend/assets/configs/lora_text.yaml Normal file

View file

56

studio/backend/assets/configs/model_defaults/default.yaml Normal file

View file

43

studio/backend/assets/configs/model_defaults/embedding/unsloth_Qwen3-Embedding-0.6B.yaml Normal file

View file

39

studio/backend/assets/configs/model_defaults/embedding/unsloth_all-MiniLM-L6-v2.yaml Normal file

View file

39

studio/backend/assets/configs/model_defaults/embedding/unsloth_bge-m3.yaml Normal file

View file

42

studio/backend/assets/configs/model_defaults/embedding/unsloth_embeddinggemma-300m.yaml Normal file

View file

38

studio/backend/assets/configs/model_defaults/embedding/unsloth_gte-modernbert-base.yaml Normal file

View file

47

studio/backend/assets/configs/model_defaults/ernie/unsloth_ERNIE-4.5-21B-A3B-PT.yaml Normal file

View file

55

studio/backend/assets/configs/model_defaults/ernie/unsloth_ERNIE-4.5-VL-28B-A3B-PT.yaml Normal file

View file

47

studio/backend/assets/configs/model_defaults/falcon/tiiuae_Falcon-H1-0.5B-Instruct.yaml Normal file

View file

50

studio/backend/assets/configs/model_defaults/gemma/unsloth_codegemma-7b-bnb-4bit.yaml Normal file

View file

53

studio/backend/assets/configs/model_defaults/gemma/unsloth_functiongemma-270m-it.yaml Normal file

View file

46

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-2-27b-bnb-4bit.yaml Normal file

View file

47

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-2-2b.yaml Normal file

View file

53

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-3-270m-it.yaml Normal file

View file

51

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-3-27b-it.yaml Normal file

View file

51

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-3-4b-it.yaml Normal file

View file

51

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-3-4b-pt.yaml Normal file

View file

53

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-3n-E4B-it.yaml Normal file

View file

53

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-3n-E4B.yaml Normal file

View file

47

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-4-26B-A4B-it.yaml Normal file

View file

47

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-4-26B-A4B.yaml Normal file

View file

47

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-4-31B-it.yaml Normal file

View file

47

studio/backend/assets/configs/model_defaults/gemma/unsloth_gemma-4-31B.yaml Normal file

View file