Compare commits

...

4947 commits

Author SHA1 Message Date
Roland Tannous
21e9a91a57
Studio: forward standard OpenAI tools / tool_choice on /v1/responses (Codex compat) (#5122)
* Studio: forward standard OpenAI tools / tool_choice on /v1/responses

Mirrors the /v1/chat/completions client-side tool pass-through from #5099
so clients (OpenAI Codex CLI, OpenAI Python SDK, ...) that target the
Responses API receive structured function_call output items instead of
plain text with tool-call tokens leaking into content.

- ResponsesRequest: type tools/tool_choice properly, add parallel_tool_calls;
  accept function_call and function_call_output input items for multi-turn
- Translate flat Responses tool / tool_choice shape to the nested Chat
  Completions shape before forwarding to llama-server
- _normalise_responses_input: map function_call_output -> role="tool",
  function_call -> assistant tool_calls (preserving call_id)
- Non-streaming: map returned tool_calls -> top-level function_call
  output items keyed by call_id
- Streaming: emit response.output_item.added (function_call),
  response.function_call_arguments.delta/.done, and response.output_item.done
  per tool call while keeping the text message at output_index 0
- Pytest coverage: tools/tool_choice translation, multi-turn input mapping,
  non-streaming tool_calls mapping, response round-trip

* Studio: merge system messages and close inner stream on /v1/responses

Fixes two issues surfacing when OpenAI Codex CLI drives /v1/responses
against a GGUF with a strict chat template (gpt-oss harmony, Qwen3, ...).

1. "System message must be at the beginning" upstream errors
   Codex sends `instructions` AND a `role:"developer"` message in `input`,
   producing two separate system-role messages. Strict templates raise
   when a second system message exists or when one appears after a user
   turn. _normalise_responses_input now hoists all instructions / system /
   developer content into a single merged system message at the top of
   the Chat Completions message list.

2. "async generator ignored GeneratorExit" / "Attempted to exit cancel
   scope in a different task"
   _responses_stream consumed the inner chat-completions body_iterator
   without an explicit aclose() in a finally block. On client disconnect
   (Codex frequently cancels mid-stream), Python 3.13 finalized the inner
   async generator on a different task, tripping anyio's cancel-scope
   check. Mirrored the same try/finally + aclose pattern used by the
   /v1/messages, /v1/chat/completions, and /v1/completions passthroughs.

Tests: hoisting of instructions + developer, developer mid-conversation,
multiple system messages in input, no-system passthrough.

* Studio: accept Codex multi-turn shapes and fix cross-task stream close on /v1/responses

Two issues observed driving /v1/responses from OpenAI Codex CLI against a
GGUF backend.

1. 422 on every turn after the first
   Codex replays prior assistant turns with
   `content:[{"type":"output_text","text":...,"annotations":[],"logprobs":[]}]`
   and carries forward `reasoning` items (o-series / gpt-5) between turns.
   Our `ResponsesContentPart` union only accepted input_text / input_image,
   and `ResponsesInputItem` only message / function_call / function_call_output,
   so Pydantic failed the whole list and FastAPI returned
   `"Input should be a valid string"` against the `str` branch of the
   outer union.

   - Add `ResponsesOutputTextPart` for assistant-replay content.
   - Add `ResponsesUnknownContentPart` and `ResponsesUnknownInputItem`
     as permissive catch-alls (drop during normalisation).
   - Wire an explicit `Discriminator` so dispatch is deterministic and
     the fallthrough reaches the catch-all instead of misreporting via
     the outer `Union[str, list[...]]`.
   - `_normalise_responses_input` now accepts output_text parts, flattens
     single-part assistant text to a plain string (keeps legacy chat
     templates happy), and silently drops reasoning / unknown items.

2. "async generator ignored GeneratorExit" / cross-task cancel scope
   `_responses_stream` awaited `openai_chat_completions` in the parent
   route-handler task, which opens the httpx client for the inner
   passthrough on *that* task. The outer `StreamingResponse` then iterates
   in a child task, so the asyncgen GC finalises the inner httpcore byte
   stream on the child task, tripping anyio's "Attempted to exit cancel
   scope in a different task". Move the `await` inside `event_generator`
   so the httpx lifecycle stays within the single streaming child task,
   and surface any HTTPException as a `response.failed` SSE frame.

Tests: assistant output_text replay, reasoning-item tolerance, unknown
content-part tolerance, end-to-end Codex-shape payload (developer + user +
reasoning + function_call + function_call_output + assistant output_text +
user), and single-part assistant flattening to plain string.

* Studio: call llama-server directly from streaming /v1/responses

The previous fix (running the inner await inside event_generator) was not
enough. Wrapping the existing `openai_chat_completions` pass-through still
stacks two async generators: when the outer generator is closed, the
innermost `HTTP11ConnectionByteStream.__aiter__` in httpcore doesn't
receive GeneratorExit before Python's asyncgen GC finalises it in a
sibling task, tripping "Attempted to exit cancel scope in a different
task" and "async generator ignored GeneratorExit" — the same Python 3.13
+ httpcore 1.0.x interaction already seen in PRs #4956, #4981, #5099.

Cure both pass-throughs had: a single same-task httpx lifecycle with
explicit `aiter_lines().aclose()` BEFORE `resp.aclose()` / `client.aclose()`
in the generator's finally block.

Apply it at the Responses layer by dropping the wrapper entirely for GGUF:
open httpx, consume `resp.aiter_lines()`, parse `chat.completion.chunk`,
emit Responses SSE events, close everything in finally — all in the
single StreamingResponse child task. Non-GGUF streaming is rejected with
a 400 (wrapping the transformers backend would re-introduce the
double-layer pattern and isn't a Codex-compatible path today anyway).

Also surfaces upstream httpx.RequestError / non-200 as a
`response.failed` SSE frame rather than a dropped stream now that the
request is dispatched after SSE headers have gone out.

* Studio: silence benign httpcore asyncgen GC warnings on Python 3.13

The streaming pass-throughs (/v1/chat/completions, /v1/messages,
/v1/responses, /v1/completions) all use the proven #4981 / #5099 pattern
— single-task httpx lifecycle with explicit aiter_lines().aclose() ahead
of resp.aclose() / client.aclose() in the generator's finally block.
That handles our own iterators correctly.

The residual noise ("async generator ignored GeneratorExit" /
"Attempted to exit cancel scope in a different task") comes from an
innermost HTTP11ConnectionByteStream.__aiter__ that httpcore creates
internally inside its pool. We hold no reference to it, so we cannot
aclose it ourselves. Python 3.13's asyncgen GC hook finalises it on the
finaliser task, its aclose path enters an anyio CancelScope shield, and
Python flags the cross-task exit. The response has already been
delivered with a 200 by then — it is purely log noise, not a functional
failure. Same interaction seen in modelcontextprotocol/python-sdk #831,
agno #3556, chainlit #2361, langchain-mcp-adapters #254.

Install a targeted sys.unraisablehook that swallows this specific tuple
— RuntimeError mentioning "cancel scope" or "GeneratorExit" plus an
object repr referencing HTTP11ConnectionByteStream — and defers to the
default hook for every other unraisable. Idempotent; guarded by a
sentinel attribute so repeated imports don't stack filters.
2026-04-21 13:17:20 +04:00
Lee Jackson
c20959dbf4
Studio: Improve chat composition, fix scroll behaviour, and refine sidebar UX (#5089)
* Chatbox, scroll, and menu fixes

- Fixed chatbox auto-expand height for multi-line text on the compare page
- Fixed chatbox UI to be consistent across compare and new chat
- Fixed scrolling being enabled on pages with no content, which also triggered the scroll-to-bottom button
- Fixed scroll-to-bottom button to only appear after scrolling up a reasonable amount instead of instantly
- Added shutdown studio button to the menu for easier access
- Fixed pop-up menu width to match the user button width

(cherry picked from commit cd4e390dfa84fe311fae79a781b96cc0ef5970a9)

* fix: correct compare scroll viewport and clean up chat composer UI polish

* Dark theme refactor and sidebar/chat UI refinements

- Complete refactoring of dark theme
- Replaced square rounded-corner user profile image with a circular bordered one
- Replaced user profile icon with 'U' initial and renamed label from 'Studio' to 'User'
- Chat bubbles now have a pointy top-right edge
- Sidebar menu tab line color selection is now consistent across all menus
- Tab-selection color animation now also applies to recent chats
- Removed 'Compare' menu autoselect when a compare chat conversation is selected
- Fixed UI consistency in Compare to match New Chat
- Removed sidebar animation and tab line, replaced with rounded selection for consistency
- Further adjustments to sidebar UI
- Further adjustments to compare chat UI

* Fixed sidebar collapse/expand for recent chats and recent runs not being clickable

* Chatbox, scroll, and menu fixes

- Fixed chatbox auto-expand height for multi-line text on the compare page
- Fixed chatbox UI to be consistent across compare and new chat
- Fixed scrolling being enabled on pages with no content, which also triggered the scroll-to-bottom button
- Fixed scroll-to-bottom button to only appear after scrolling up a reasonable amount instead of instantly
- Added shutdown studio button to the menu for easier access
- Fixed pop-up menu width to match the user button width

* Sidebar, fonts, and chat UI refinements

- Replaced logo PNG with real font text for 'unsloth' and 'BETA' label
- Added Hellix font and applied it across menus and UI elements
- Lighter scrollbar in the sidebar compared to other areas of the app
- Adjusted chat font and chat bubble styling
- Adjusted app menu design to stay consistent with the sidebar
- Adjusted text style for 'New Chat' and repositioned content/chatbox
- Adjusted model selector and top area UI
- Fixed footer text from 'LLM's' to 'LLMs'
- Fixed active selection border color incorrectly appearing on page refresh and during general navigation
- Logo now defaults to 'New Chat' when clicked

* Sidebar, model selector, and mobile UI fixes

- Further adjustments to sidebar UI and logo
- Changed right bar icon
- Model selector adjustments
- Collapsed sidebar now matches the content area background
- Adjusted Hellix font spacing across pages
- Fixed sidebar icon overlap on mobile screens

* Adjust sidebar icons

* Adjust sidebar icons

* Fixed compare chat UI and scrolling issues

* Fixed inference settings icon behavior and context info positioning

- Fixed top right inference settings icon to move into sidepanel during expand/collapse, matching left sidebar behavior
- Adjusted context information element positioning

* Fix: textarea overflow in system prompt editor

* Code block redesign, font, and chat bubble adjustments

- Redesigned code block colors and theme
- Changed code block font to Fira Code
- Fixed scrollbar disappearing when expanding/collapsing tool calls in chats
- Adjusted chat bubble background color

* Fix chat bubble background color in dark theme

* fix: restore textarea auto-sizing and scope prompt editor sizing

* fix: add explicit textarea field sizing for prompt editor overflow

* fix: generate chat nonce on click instead of render

* fix: respect training lock on logo navigation

* Refactor compare page dual chat scrolling behavior

* Revert "Refactor compare page dual chat scrolling behavior"

This reverts commit d056ec09f2.

---------

Co-authored-by: sneakr <hauzin@hotmail.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
2026-04-21 02:20:45 +04:00
Konstantin Azizov
0a5c61ffcc
fix: prefer mainstream clipboard copy over deprecated one (#5109)
Fixes #5097

Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
2026-04-20 23:18:18 +04:00
Lee Jackson
d3215ce113
Studio: Show LoRA live logs and update GGUF quant options (#5058)
* export: update GGUF quant list and ordering

* gguf: add Q2_K_L quantize flags for output and embeddings

* export: add live console logs for LoRA export flow

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: stream q2_k_l quantize logs and include subprocess error details

* fix: route Q2_K_L preset to q2_k ftype with q8_0 output+embeddings

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
2026-04-20 23:14:49 +04:00
Lee Jackson
9c8a079d97
Studio: Local profile customization in settings and sync sidebar identity (#5088)
* studio: add local profile customization in settings

* studio: add local profile settings and sync sidebar identity

* fix: adjust profile card margin

* fix: move helper modules to utils and use single-letter avatar fallback

* fix: keep profile icon visible on sidebar collapse

* fix: sidebar account trigger labeling and profile reset prefs
2026-04-20 22:28:02 +04:00
Roland Tannous
9954781d30
fix(studio/chat): cancel in-flight run when trashing a thread from sidebar (#5067)
Trashing a thread mid-stream used to delete the Dexie rows while the
model kept generating, because the sidebar has no access to the
@assistant-ui aui context. Expose per-thread cancelRun() through the
chat runtime store and call it from deleteChatItem so trash behaves
like Stop → Trash. Covers compare pairs by cancelling each paired
thread.

Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
2026-04-20 21:06:59 +04:00
Michael Han
b24f3f61b8
Update README.md 2026-04-20 00:37:40 -07:00
Michael Han
f5eec8a6f2
Qwen3.6 and ReadMe revamp.md 2026-04-19 23:16:36 -07:00
Roland Tannous
ac2daf8b7a
Studio: forward standard OpenAI tools / tool_choice to llama-server (#5099)
* fix(studio): forward OpenAI tools/tool_choice to llama-server (#4999)

Studio's /v1/chat/completions silently stripped standard OpenAI `tools`
and `tool_choice` fields, so clients using standard function calling
(opencode, Claude Code, Cursor, Continue, ...) never got structured
tool_calls back. Adds a client-side pass-through path mirroring the
existing Anthropic /v1/messages flow: when `tools` is present without
Studio's `enable_tools` shorthand, the request is forwarded to
llama-server verbatim so the client sees native id, finish_reason
("tool_calls"), delta.tool_calls, and accurate usage tokens.

Also wires Anthropic tool_choice forwarding: /v1/messages previously
accepted tool_choice on the request model but silently dropped it with
a warning. Translate the four Anthropic shapes to OpenAI format and
forward them so agentic clients can actually enforce tool use.

- ChatCompletionRequest: add tools, tool_choice, stop; extra="allow"
- ChatMessage: accept role="tool", optional tool_call_id / tool_calls /
  name; content is now optional (assistant with only tool_calls)
- routes/inference.py: _openai_passthrough_stream /
  _openai_passthrough_non_streaming helpers, routing branch in
  openai_chat_completions, vision+tools via content-parts injection
- _build_passthrough_payload: tool_choice parameter (default "auto")
- anthropic_compat: anthropic_tool_choice_to_openai() translator
- tests/test_openai_tool_passthrough.py: Pydantic + translator unit tests
- tests/test_studio_api.py: 5 new E2E tests (non-stream, stream,
  multi-turn, OpenAI SDK, Anthropic tool_choice=any regression)

* fix(studio): surface httpx transport errors from OpenAI passthrough

When the managed llama-server subprocess crashes mid-request, the
async pass-through helpers in routes/inference.py used to return a
bare 500 (non-streaming) or an "An internal error occurred" SSE chunk
(streaming) because _friendly_error only recognized the sync path's
"Lost connection to llama-server" substring -- httpx transport
failures (ConnectError / ReadError / RemoteProtocolError /
ReadTimeout) stringify differently and fell through to the generic
case.

- _friendly_error: map any httpx.RequestError subclass to the same
  "Lost connection to the model server" message the sync chat path
  emits. Placed before the substring heuristics so the streaming path
  automatically picks it up via its existing except Exception catch.
- _openai_passthrough_non_streaming: wrap the httpx.AsyncClient.post
  in a try/except httpx.RequestError and re-raise as HTTPException
  502 with the friendly detail.
- tests/test_openai_tool_passthrough.py: new TestFriendlyErrorHttpx
  class pinning the mapping for ConnectError, ReadError,
  RemoteProtocolError, ReadTimeout, and confirming non-httpx paths
  (context-size heuristic, generic fallback) are unchanged.

* fix(studio): close aiter_bytes/aiter_lines explicitly in passthroughs

The httpcore asyncgen cleanup fix in 5cedd9a5 is incomplete on Python
3.13 + httpcore 1.0.x: it switched to manual client/response lifecycle
but still used anonymous `async for raw_line in resp.aiter_lines():`
patterns in all three streaming paths. Python's async for does NOT
auto-close the iterator on break/return, so the aiter_lines /
aiter_bytes async generator remains alive, reachable only from the
surrounding coroutine frame. Once `_stream()` returns the frame is
GC'd and the orphaned asyncgen is finalized on a LATER GC pass in a
DIFFERENT asyncio task, where httpcore's
HTTP11ConnectionByteStream.aclose() enters anyio.CancelScope.__exit__
with a mismatched task and prints "Exception ignored in: <async
generator>" / "async generator ignored GeneratorExit" / "Attempted
to exit cancel scope in a different task" to the server log.

User observed this on /v1/messages after successful (status 200)
requests, with the traceback pointing at HTTP11ConnectionByteStream
.__aiter__ / .aclose inside httpcore.

Fix: save resp.aiter_lines() / resp.aiter_bytes() as a variable and
explicitly `await iter.aclose()` in the finally block BEFORE
resp.aclose() / client.aclose(). This closes the asyncgen inside the
current task's event loop, so the internal httpcore byte stream is
cleaned up before Python's asyncgen GC hook has anything orphaned to
finalize. Each aclose is wrapped in try/except Exception so nested
anyio cleanup noise can't bubble out.

Applied to all three streaming passthrough paths:
- _anthropic_passthrough_stream (/v1/messages client-side tool path)
- _openai_passthrough_stream (/v1/chat/completions client-side tool
  path, new in this PR)
- openai_completions (/v1/completions bytes proxy from PR #4956)

* fix(studio): default ChatCompletionRequest.stream to false per OpenAI spec

OpenAI's /v1/chat/completions spec defaults `stream` to false, so
clients that omit the field (naive curl, minimal integrations) expect
a single JSON response back. Studio was defaulting to true, silently
switching those clients into SSE and breaking any parser that didn't
also handle streaming. ResponsesRequest and AnthropicMessagesRequest
already default to false correctly; only ChatCompletionRequest was
wrong.

Studio's own frontend always sets `stream` explicitly on every
chat-adapter / chat-api / runtime-provider call site, so the flip has
no UI impact. SDK users (OpenAI Python/JS SDK, opencode, Claude Code,
Cursor, Continue) also always pass `stream` explicitly, so they're
unaffected. The only clients feeling the change are raw-curl users
who were relying on the wrong default -- those get the correct OpenAI
behavior now.

Added a regression test pinning the default so it can't silently
flip back.

* fix(studio): reject images in OpenAI tool passthrough for text-only GGUFs

The new tool passthrough branch runs before _extract_content_parts,
skipping the existing not is_vision guard. Requests combining tools
with an image on a text-only tool-capable GGUF were forwarded to
llama-server, producing opaque upstream errors instead of the
pre-existing clear 400. Restore the guard inline at the dispatch
point, checking both legacy image_base64 and inline image_url parts.

* fix(studio): require tool_call_id on role=tool chat messages

Enforce the OpenAI spec rule that role="tool" messages must carry a
tool_call_id. Without it, upstream backends cannot associate a tool
result with the assistant's prior tool_calls entry and the request
fails in non-obvious ways through the passthrough path. Reject at the
request boundary with a 422 instead.

* fix(studio): harden OpenAI tool passthrough validation and error surfacing

Three related fixes called out by the PR review:

1. Preserve upstream status codes in the streaming passthrough. The
   httpx request is now dispatched before the StreamingResponse is
   constructed. Non-200 upstream responses and httpx RequestError
   transport failures raise HTTPException with the real status
   instead of being buried inside a 200 SSE error frame, so OpenAI
   SDK clients see APIError/BadRequestError/... as expected.

2. Require non-empty content on user/system/tool messages. Per the
   OpenAI spec, content may only be omitted on assistant messages
   that carry tool_calls; enforce that at the request boundary so
   malformed messages never reach the passthrough path.

3. Role-constrain tool-call metadata. tool_calls is only valid on
   role=assistant, tool_call_id and name only on role=tool. Without
   this, a user/system message with tool_calls would flip the
   passthrough branch on and be forwarded to llama-server, surfacing
   as an opaque upstream error.

* fix(studio): normalize image mode and passthrough JSON verbatim

Two Gemini-code-assist review findings on PR #5099:

1. Unconditionally convert decoded images to RGB before PNG encoding.
   The prior code only handled RGBA, letting CMYK/I/F images crash
   at img.save(format="PNG") and surface as opaque 400s. Applied to
   both the passthrough helper and the non-passthrough GGUF path
   that originally carried this pattern, keeping the two sites in
   sync.

2. Return the upstream JSON body as raw bytes via Response rather
   than parse-then-re-serialize with JSONResponse. Matches the
   passthrough helper's "verbatim" contract and drops a redundant
   round-trip.

---------

Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-18 12:53:23 +04:00
Manan Shah
7d0d2f256c
Add qwen3.6 script (#5084)
* unsloth gemma4 support files

* some fixes

* Fixing cache.empty() calls (#4813)

* Fixing cache.empty() calls

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Manan Shah <mananshah@Manans-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix/gemma4 mlx (#4816)

* Fixing cache.empty() calls

* fixing for mlx versions

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Manan Shah <mananshah@Manans-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* removed bidirectional check for 31b (#4839)

Co-authored-by: Manan17 <shahmanan170602@gmail.coml>

* Add Gemma 4 26B MoE support (MLX) (#4844)

* removed bidirectional check for 31b

* Change gemma4_text for moe

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Manan Shah <mananshah@Manans-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix(gemma4): cast RoPE offset to int before mx.arange() (#4901)

* fix(gemma4): cast RoPE offset to int before mx.arange()

* fix(gemma4): use zero-based arange + offset to avoid CPU-GPU sync

* qwen3.6 patches for multi-turn chat

* qwen3.6 script

* removing unnecessary scripts

* displaying errors for not installed packages

---------

Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: Manan Shah <mananshah@Manans-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Manan17 <shahmanan170602@gmail.coml>
Co-authored-by: Théophile Lafargue <138336683+eauchs@users.noreply.github.com>
2026-04-17 01:21:30 -07:00
Daniel Han
d20b306755 Versioning 2026-04-16 12:06:10 -07:00
Daniel Han
0b57884120
Add Qwen3.6 inference defaults for Studio (#5065)
* Add Qwen3.6 inference defaults for Studio

Add qwen3.6 family entry to inference_defaults.json with the
recommended sampling parameters from Qwen's documentation:
temperature=0.7, top_p=0.8, top_k=20, min_p=0.0,
presence_penalty=1.5, repetition_penalty=1.0.

Without this, Qwen3.6 models fall through to the generic qwen3
pattern which uses different defaults (temperature=0.6,
top_p=0.95, no presence_penalty).

* Add Qwen3.6-35B-A3B-GGUF to default model lists

* Add Qwen3.5/3.6 presence_penalty to thinking toggle and small-model disable logic

- Thinking toggle (on-load + button click) now sets presencePenalty: 1.5 for
  Qwen3.5 and Qwen3.6 models (both thinking-ON and thinking-OFF states)
- Small-model thinking-disable check (<9B defaults to no-thinking) extended
  from Qwen3.5-only to also cover Qwen3.6, in all 3 locations:
  frontend on-load, frontend refresh, backend llama_cpp.py
2026-04-16 11:42:42 -07:00
Daniel Han
d56f980452
fix: multi-GPU inference crash for bnb 4-bit/8-bit models (#5068)
* fix: multi-GPU inference crash for bnb 4-bit/8-bit models

When load_in_4bit or load_in_8bit is used with device_map="sequential"
and max_memory constraints that place weights across multiple GPUs (or
entirely on a non-default GPU like cuda:1), the bitsandbytes loading
path in transformers never calls dispatch_model. No AlignDevicesHook is
installed, and the first forward/generate call crashes with:

  RuntimeError: Expected all tensors to be on the same device

This adds _attach_bnb_multidevice_hooks() which is called after
from_pretrained returns. It infers a device map from actual parameter
placements and calls dispatch_model(force_hooks=True) to install the
missing hooks. The function is a complete no-op for the common
single-GPU cuda:0 case.

Call sites: FastBaseModel.from_pretrained (vision.py) and
FastLlamaModel.from_pretrained (llama.py).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: align with PR #5053 final review improvements

- Add hook call to the bnb quantized loading branch in llama.py (the
  primary load_in_4bit path), not just the non-fast-inference fallback
- Expand bnb detection: also check model.is_loaded_in_4bit,
  model.is_loaded_in_8bit, model.quantization_method
- Pass explicit main_device and skip_keys to dispatch_model
- Use logger.info instead of print for the success message
- Use kwargs.get("load_in_8bit", False) at llama.py call sites

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-16 11:35:02 -07:00
Lee Jackson
ee86530e55
chore: switch helper and no-cache fallback to Gemma (#5066) 2026-04-16 22:27:30 +04:00
Wasim Yousef Said
bc9ddb3af6
Fix onboarding followups (#5064)
* Fix onboarding followups

* Rename sidebar studio to train
2026-04-16 10:11:35 -07:00
Wasim Yousef Said
7ef65bd2e5
Chat first onboarding (#5063)
* auth: default to chat

* settings: relaunch onboarding

* onboarding: return to launch page

* studio: stop auto guided tour

* ui: soften global radius

* cleanup: rename onboarding exit prop

* fix onboarding redirect safety

* Show real Unsloth version in settings

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-16 09:58:10 -07:00
हिमांशु
f4422b0a62
change torchcodec version to 0.10.0 in extra-no-deps (#5043)
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
2026-04-16 19:50:57 +04:00
Wasim Yousef Said
b01e9af124
feat(studio): replace navbar with collapsible sidebar (#4936)
* feat(studio): replace navbar navigation with collapsible sidebar

Add an app-wide sidebar with hover-expand and pin-to-dock behavior.
Navigation items (Studio, Recipes, Export, Chat) move from the center
pill navbar to the sidebar. Chat threads and recipes render as
collapsible sub-lists. Navbar simplified to logo + update + close.

- Extend SidebarProvider with pinned/hovered state model
- New AppSidebar with animated active indicator, sloth profile menu,
  theme toggle, guided tour, back/forward navigation
- Chat page refactored to URL-driven view state via search params
- Extract reusable hooks for chat thread and recipe sidebar data
- Guard startViewTransition for browser compatibility
- Wrap chat deletions in Dexie transaction for data integrity

* feat(studio): move logo to sidebar and make navbar overlay

- Sidebar is now full-height with logo in SidebarHeader
- Collapsed sidebar shows sticker.png, expanded shows full logo
- Navbar is absolute-positioned overlay (no layout space)
- Main content extends to top, aligning with navbar controls

* feat(studio): full-height sidebar with recents, edge-to-edge nav buttons

- Sidebar outside max-w-7xl, pinned to left edge
- Remove sidebar rounding, menu buttons rounded-md
- Nav buttons flush to sidebar edges with no left rounding
- Replace collapsible recipes/chat with flat nav items
- Add Recents section with chat history (1 item when not on chat, full on chat)
- New Chat as first nav item with PencilEdit02Icon
- Cursor pointer on all sidebar buttons
- Navbar temporarily hidden for screenshots

* fix(studio): fix chat scroll, action bar hover, collapsible recents

- Fix sticky composer by removing `relative` override on viewport footer
- Action bar buttons only show on hover (autohide=always)
- Remove floating border/shadow from action bar
- Add scroll space above composer for last message actions
- Back/forward buttons use router history (stay in-app)
- Recents section collapsible with chevron on chat route
- Set html/body/#root height for proper h-full chain

* fix(studio): address review feedback, clean up unused code

- Unhide navbar (was left hidden from screenshot)
- Remove unused imports: SidebarMenuSub*, BubbleChatIcon, ColumnInsertIcon
- Remove unused vars: recipeItems, activeRecipeId, canCompare, recipesOpen
- Include compare query id in active sidebar selection
- Use store type for contextUsage instead of inline type
- Simplify noop in sidebar.tsx
- Remove empty className prop

* feat(studio): add mobile sidebar, recent runs section, and misc UX fixes

* feat(studio): scaffold settings feature module with dialog store

* feat(studio): add tri-state theme store for settings

* feat(chat): add clear-all-chats and export-chat-history utils

* feat(studio): add settings dialog shell with tab rail

* feat(studio): add appearance tab with theme and sidebar pin

* feat(studio): add settings general tab with hf token, auto-title, reset prefs

* feat(studio): add settings chat tab with export and clear

* feat(studio): add api keys tab with list and revoke flow

* feat(studio): add create-key form and reveal dialog

* feat(studio): add usage examples panel to api keys tab

* feat(studio): add settings about tab with update and shutdown

* feat(studio): add settings dropdown item and cmd-comma shortcut

* feat(studio): remove legacy api-keys route and chat-sheet preference rows

* fix(studio): settings dialog a11y + polish pass

* feat(studio): inline api key reveal card replacing nested dialog

* fix(studio): hide revoked keys from settings list

* refactor(studio): strip navbar and hoist training unload guard

* feat(studio): explicit sidebar toggle, remove hover-open and pin icons

* fix(studio): use SidebarRight01Icon for collapsed sidebar open toggle

* fix(studio): address code review findings for settings dialog

* feat(studio): collapsible navigate group with standalone new-chat and compare

* fix(studio): chat-only standalone actions, use ColumnInsertIcon for compare

* fix(studio): sidebar new-chat/compare state reset and icon-mode collapsible

* feat(studio): add compact logo assets for sidebar header

* Fixed sidebar design

* fix(studio): sidebar delete icon hover contrast and sizing

* feat(studio): route-gate sidebar recents (chats off /studio, runs on /studio)

* feat(studio): add chat search store

* feat(studio): add chat search index hook with snapshot-on-open

* feat(studio): add chat search command dialog with global shortcut

* feat(studio): wire chat search into sidebar

* fix(studio): trim hf token on save, add show/hide toggle, commit on close

* revert(studio): restore original sidebar/border colors, brighten sidebar

* feat(studio): forward overlayClassName through CommandDialog

* fix(studio): wrap search dialog in Command context, redesign as flat 635px card

* fix(studio): reserve right padding on recent items so delete icon stops overlapping title

* fix(studio): skip hf token unmount-commit during reset-prefs reload

* chore(studio): drop unused icon import and unreachable runs navigate branch

* fix(studio): chat search index filters archived before limit, batches message query, picks up reasoning text

* fix(studio): keep CommandEmpty in tree so empty state renders correctly

* fix(studio): cap system prompt and chat template textareas so they scroll instead of growing

* fix(studio): attach chat-compare tour anchor to sidebar compare button

* fix(studio): persist system theme explicitly so next-themes does not clobber on reload

* fix(studio): auto-switch to history tab when selecting a recent run from sidebar

* UI overhaul: chatbox, scrollbar, sidebar, and compare view

UI Changes:
- Redesigned the Compare UI with general cleanup
- Redesigned the Chatbox UI
- Reduced the width of the user chat bubble for improved readability
- Narrowed the user chat box across the content page
- Adjusted thinking-box text color to be slightly darker
- Removed faded text effect from chat messages
- Removed faded text effect from the thinking box
- Added a small LLM chat safety note at the bottom of the chatbox
- Restyled the scrollbar

Layout & Behavior:
- Reworked the scrollbar to span the full height of the page (no top/bottom padding) and remain persistently visible when content is scrollable, rather than only on hover
- Reworked the Configuration sidebar to span full height — removed rounded corners and borders, with the scrollbar adjusted to match the full top-to-bottom layout
- Adjusted the top menu and bottom chatbox content areas to work correctly with the new full-page scroll behavior
- Made chat content match the chatbox width, with content sliding slightly behind the chatbox when scrolling
- Aligned chat text width with the chatbox for visual consistency, including how far the text extends behind the chatbox

Fixes:
- Fixed the chatbox not auto-expanding when typing multi-line input while bottom-positioned during an active chat (previously only worked before a chat had started)
- Fixed positioning and design of the user chat hover menu buttons to match the assistant chat box — now displayed below the chat bubble instead of on the left side

* Fix user message layout in thread component

* swap code icon

* fix compare layout

* fix compare pane flex

* Sidebar improvements and fixes

- Added scrolling support to the sidebar so menus and recent chats no longer get hidden
- Recent chats are now always visible in the sidebar, not hidden when in Studio, Recipes, or Export
- Recent chat is now deselected when selecting other navigations
- Fixed sidebar glitch where browser resize could make the sidebar and expand button disappear completely
- Fixed glitch where the open-sidebar hover tooltip appeared above the logo when clicking expand sidebar
- Reduced sidebar width on mobile to around 2/3 of the screen (was too wide)
- Made the close-sidebar hover tooltip consistent with the rest of the design
- Removed sidebar collapse/expand animation
- Small adjustment to chat width

* Fix route scrolling, polling, and theme sync issues

* Fix Studio page scrolling

---------

Co-authored-by: sneakr <hauzin@hotmail.com>
2026-04-16 08:46:16 -07:00
Daniel Han
05ec0f110b
Studio: Ollama support, recommended folders, Custom Folders UX polish (#5050)
* Studio: Ollama support, recommended folders, Custom Folders UX polish

Backend:
- Add _scan_ollama_dir that reads manifests/registry.ollama.ai/library/*
  and creates .gguf symlinks under <ollama_dir>/.studio_links/ pointing
  at the content-addressable blobs, so detect_gguf_model and llama-server
  -m work unchanged for Ollama models
- Filter entries under .studio_links from the generic models/hf/lmstudio
  scanners to avoid duplicate rows and leaked internal paths in the UI
- New GET /api/models/recommended-folders endpoint returning LM Studio
  and Ollama model directories that currently exist on the machine
  (OLLAMA_MODELS env var + standard paths, ~/.lmstudio/models, legacy
  LM Studio cache), used by the Custom Folders quick-add chips
- detect_gguf_model now uses os.path.abspath instead of Path.resolve so
  the readable symlink name is preserved as display_name (e.g.
  qwen2.5-0.5b-Q4_K_M.gguf instead of sha256-abc...)
- llama-server failure with a path under .studio_links or .cache/ollama
  surfaces a friendlier message ("Some Ollama models do not work with
  llama.cpp. Try a different model, or use this model directly through
  Ollama instead.") instead of the generic validation error

Frontend:
- ListLabel supports an optional leading icon and collapse toggle; used
  for Downloaded (download icon), Custom Folders (folder icon), and
  Recommended (star icon)
- Custom Folders header gets folder icon on the left, and +, search,
  and chevron buttons on the right; chevron uses ml-auto so it aligns
  with the Downloaded and Recommended chevrons
- New recommended folder chips render below the registered scan folders
  when there are unregistered well-known paths; one click adds them as
  a scan folder
- Custom folder rows that are direct .gguf files (Ollama symlinks) load
  immediately via onSelect instead of opening the GGUF variant expander
  (which is for repos containing multiple quants, not single files)
- When loading a direct .gguf file path, send max_seq_length = 0 so the
  backend uses the model's native context instead of the 4096 chat
  default (qwen2.5:0.5b now loads at 32768 instead of 4096)
- New listRecommendedFolders() helper on the chat API

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review: log silent exceptions and support read-only Ollama dirs

Replace silent except blocks in _scan_ollama_dir and the
recommended-folders endpoint with narrower exception types plus debug
or warning logs, so failures are diagnosable without hiding signal.

Add _ollama_links_dir helper that falls back to a per-ollama-dir hashed
namespace under Studio's own cache (~/.unsloth/studio/cache/ollama_links)
when the Ollama models directory is read-only. Common for system installs
at /usr/share/ollama/.ollama/models and /var/lib/ollama/.ollama/models
where the Studio process has read but not write access. Previously the
scanner returned an empty list in that case and Ollama models would
silently not appear.

The fallback preserves the .gguf suffix on symlink names so
detect_gguf_model keeps recognising them. The prior "raw sha256 blob
path" fallback would have missed the suffix check and failed to load.

* Address review: detect mmproj next to symlink target for vision GGUFs

Codex P1 on model_config.py:1012: when detect_gguf_model returns the
symlink path (to preserve readable display names), detect_mmproj_file
searched the symlink's parent directory instead of the target's. For
vision GGUFs surfaced via Ollama's .studio_links/ -- where the weight
file is symlinked but any mmproj sidecar lives next to the real blob
-- mmproj was no longer detected, so the model was misclassified as
text-only and llama-server would start without --mmproj.

detect_mmproj_file now adds the resolved target's parent to the scan
order when path is a symlink. Direct (non-symlink) .gguf paths are
unchanged, so LM Studio and HF cache layouts keep working exactly as
before. Verified with a fake layout reproducing the bug plus a
regression check on a non-symlink LM Studio model.

* Address review: support all Ollama namespaces and vision projector layers

- Iterate over all directories under registry.ollama.ai/ instead of
  hardcoding the "library" namespace. Custom namespaces like
  "mradermacher/llama3" now get scanned and include the namespace
  prefix in display names, model IDs, and symlink names to avoid
  collisions.

- Create companion -mmproj.gguf symlinks for Ollama vision models
  that have an "application/vnd.ollama.image.projector" layer, so
  detect_mmproj_file can find the projector alongside the model.

- Extract symlink creation into _make_symlink helper to reduce
  duplication between model and projector paths.

* Address review: move imports to top level and add scan limit

- Move hashlib and json imports to the top of the file (PEP 8).
- Remove inline `import json as _json` and `import hashlib` from
  function bodies, use the top-level imports directly.
- Add `limit` parameter to `_scan_ollama_dir()` with early exit
  when the threshold is reached.
- Pass `_MAX_MODELS_PER_FOLDER` into the scanner so it stops
  traversing once enough models are found.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review: Windows fallback, all registry hosts, collision safety

_make_link (formerly _make_symlink):
- Falls back to os.link() hardlink when symlink_to() fails (Windows
  without Developer Mode), then to shutil.copy2 as last resort
- Uses atomic os.replace via tmp file to avoid race window where the
  .gguf path is missing during rescan

Scanner now handles all Ollama registry layouts:
- Uses rglob over manifests/ instead of hardcoding registry.ollama.ai
- Discovers hf.co/org/repo:tag and any other host, not just library/
- Filenames include a stable sha1 hash of the manifest path to prevent
  collisions between models that normalize to the same stem

Per-model subdirectories under .studio_links/:
- Each model's links live in their own hash-keyed subdirectory
- detect_mmproj_file only sees the projector for that specific model,
  not siblings from other Ollama models

Friendly Ollama error detection:
- Now also matches ollama_links/ (the read-only fallback cache path)
  and model_identifier starting with "ollama/"

Recommended folders:
- Added os.access(R_OK | X_OK) check so unreadable system directories
  like /var/lib/ollama/.ollama/models are not advertised as chips

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review: filter ollama_links from generic scanners

The generic scanners (models_dir, hf_cache, lmstudio) already filter
out .studio_links to avoid duplicate Ollama entries, but missed the
ollama_links fallback cache directory used for read-only Ollama
installs. Add it to the filter.

* Address review: idempotent link creation and path-component filter

_make_link:
- Skip recreation when a valid link/copy already exists (samefile or
  matching size check). Prevents blocking the model-list API with
  multi-GB copies on repeated scans.
- Use uuid4 instead of os.getpid() for tmp file names to avoid race
  conditions from concurrent scans.
- Log cleanup errors instead of silently swallowing them.

Path filter:
- Use os.sep-bounded checks instead of bare substring match to avoid
  false positives on paths like "my.studio_links.backup/model.gguf".

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review: drop copy fallback, targeted glob, robust path filter

_make_link:
- Drop shutil.copy2 fallback -- copying multi-GB GGUFs inside a sync
  API request would block the backend. Log a warning and skip the
  model when both symlink and hardlink fail.

Scanner:
- Replace rglob("*") with targeted glob patterns (*/*/* and */*/*/*)
  to avoid traversing unrelated subdirectories in large custom folders.

Path filter:
- Use Path.parts membership check instead of os.sep substring matching
  for robustness across platforms.

Scan limit:
- Skip _scan_ollama_dir when _generic already fills the per-folder cap.

* Address review: sha256, top-level uuid import, Path.absolute()

- Switch hashlib.sha1 to hashlib.sha256 for path hashing consistency.
- Move uuid import to the top of the file instead of inside _make_link.
- Replace os.path.abspath with Path.absolute() in detect_gguf_model
  to match the pathlib style used throughout the codebase.

* Address review: fix stale comments (sha1, rglob, copy fallback)

Update three docstrings/comments that still referenced the old
implementation after recent changes:
- sha1 comment now says "not a security boundary" (no hash name)
- "rglob" -> "targeted glob patterns"
- "file copies as a last resort" -> removed (copy fallback was dropped)

* Address review: fix stale links, support all manifest depths, scope error

_make_link:
- Drop size-based idempotency shortcut that kept stale links after
  ollama pull updates a tag to a same-sized blob. Only samefile()
  is used now -- if the link doesn't point at the exact same inode,
  it gets replaced.

Scanner:
- Revert targeted glob back to rglob so deeper OCI-style repo names
  (5+ path segments) are not silently skipped.

Ollama error:
- Only show "Some Ollama models do not work with llama.cpp" when the
  server output contains GGUF compatibility hints (key not found,
  unknown architecture, failed to load). Unrelated failures like
  OOM or missing binaries now show the generic error instead of
  being misdiagnosed.

---------

Co-authored-by: Daniel Han <info@unsloth.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: danielhanchen <michaelhan2050@gmail.com>
2026-04-16 08:24:08 -07:00
Daniel Han
ff23ce40b4
Fix review findings for chat-template repair (#5049) (#5056)
* Fix review findings for PR #49

1. Sandbox fallback Jinja env in _VariantTokenizerProxy.apply_chat_template
   (use SandboxedEnvironment, matching _derive_assistant_prefix_by_render)
2. Unwrap benign outer-If guards in _template_ends_with_toplevel_for so
   templates like {% if messages %}{% for ... %}{% endfor %}{% endif %}
   are still repairable (preserves Qwen3-Guard rejection via else-branch
   and add_generation_prompt-name checks)
3. Preserve raw name_or_path in _VariantTokenizerProxy._source_path so
   local-path detection works for dict/list variant tokenizers
4. Context-aware strict-mode messages: omit "will still load" and
   "Set UNSLOTH_STRICT_CHAT_TEMPLATE=1" when already raising

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-16 08:02:05 -07:00
Daniel Han
b42e3a120d
Remove legacy venv Scripts entry from User PATH on upgrade (#5060)
Older installers persisted the venv Scripts directory directly in the
User PATH registry. The shim approach from #4961 no longer writes that
entry, but on upgrade the old one survived and python.exe / pip.exe
from the unsloth venv continued winning resolution in every new shell.

Before creating the shim, read the current User PATH, filter out any
entry matching $VenvDir\Scripts (using the same symmetric raw+expanded
comparison as Add-ToUserPath), and write back if changed. No-op on
fresh installs where the legacy entry was never written.

Confirmed on a real Windows machine: `where.exe python` was returning
the venv interpreter first even after the shim PR merged.
2026-04-16 07:36:59 -07:00
Daniel Han
5b8643969e Revert "Remove legacy venv Scripts entry from User PATH on upgrade"
This reverts commit cae4a74297.
2026-04-16 14:20:43 +00:00
Daniel Han
cae4a74297 Remove legacy venv Scripts entry from User PATH on upgrade
Older installers persisted the venv Scripts directory directly in the
User PATH registry. The shim approach (added in this PR) no longer writes
that entry, but it also did not remove the old one. On upgrade, the
legacy entry survived and python.exe / pip.exe from the unsloth venv
continued winning resolution in every new shell, which is exactly the
hijack the shim was designed to prevent.

Before creating the shim, read the current User PATH, filter out any
entry matching $VenvDir\Scripts (using the same symmetric raw+expanded
comparison as Add-ToUserPath), and write back if changed. This runs
once per install and is a no-op on fresh installs where the legacy
entry was never written.
2026-04-16 14:19:04 +00:00
Datta Nimmaturi
6764cb9b90
Restrict flash attn to <=256 head dim. Consolidate attn impl checks (#5051)
* Restrict flash attn to <=256 head dim. Consolidate attn impl checks

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Consolidate the changes into single function

* safeguard for dict instead of object

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-16 09:00:17 -05:00
Daniel Han
c5be8b1cd2
Chat-template repair: warn-by-default, AST classification, dict support (#5049)
* Chat-template repair: warn-by-default, AST classification, dict support

Follow-up hardening on top of PR #4426 (which fixed the #4150
RuntimeError for ChatML LoRA reloads).

Behavior changes:

- Warn-by-default instead of RuntimeError. When fix_chat_template cannot
  repair a broken template, emit a warning and return the original.
  Set UNSLOTH_STRICT_CHAT_TEMPLATE=1 to restore the pre-warn hard fail.
  Fixes the UX where a missing `{% if add_generation_prompt %}` block on
  a saved LoRA (typical after LlamaFactory / Axolotl re-serialize) would
  block model loading entirely.

- Local path vs HF hub distinguished in the warning message. For local
  paths the message points at the likely downstream tool; for HF IDs it
  points at the upstream model maintainers. Previously both said "file a
  bug report to the maintainers of <path>" even when <path> was the
  user's own saves/ directory.

- Dict / list chat_template now handled. Hermes-3 ships with
  {default, tool_use} and the previous code crashed with
  AttributeError: 'dict' object has no attribute 'find' when entering
  _fix_chat_template with a dict. Each variant is now fixed
  independently; structure is preserved.

Internals:

- _find_end_position now matches all four Jinja whitespace-control
  variants ({% %}, {%- %}, {% -%}, {%- -%}) and returns the rightmost
  endfor/endif so multi-for templates aren't locked onto the first loop.
  Previously {%- endfor -%} (both-side dash, used by Qwen3-Guard) was
  silently bypassed.

- _has_add_generation_prompt_block uses Jinja AST via
  jinja2.nodes.If/Name walks instead of substring matching, so
  templates that hide the block behind comments or dash-style variants
  are classified correctly.

- _template_ends_with_toplevel_for gates the GH#4150 ChatML repair on
  the AST: only fires when the last structural top-level node is a For
  (standard ChatML shape), ignoring trailing pure-whitespace output
  nodes. Templates wrapped in an outer If (Qwen3-Guard) are now
  explicitly skipped at the _fix_chat_template level as well, not just
  at load_correct_tokenizer's name-based exemption.

- _validate_patched_template renders the patched template with and
  without add_generation_prompt and confirms the patched output
  responds to the flag by appending (not replacing) content. If
  validation fails, the patch is discarded and we fall through to the
  warn path.

Verified with an expanded regression suite in tests/:
- test_fix_chat_template_pr4426.py: 42/42 template-matrix cells
- test_load_correct_tokenizer_pr4426.py: 5/5 tokenizer loads
- test_chat_template_followups.py: 10/10 new follow-up tests
- test_mistral_pr4426.py: 5 Mistral variants byte-identical
- test_qwen_pr4426.py: 14 Qwen variants byte-identical
  (Qwen1.5, Qwen2, Qwen2.5-Instruct/Coder/Math/VL, Qwen3,
  Qwen3-Coder, QwQ, Qwen3-Guard-Gen)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Guard _validate_patched_template against read-only chat_template

If tokenizer.chat_template is a property or otherwise read-only, the
validation helper would crash with AttributeError when trying to
temporarily set the patched template. Catch the assignment failure and
return False (skip validation), and best-effort restore in the finally
block.

* Replace regex separator inference with render-diff; broaden repair to non-ChatML templates

The previous `_infer_assistant_separator` was a four-tier regex heuristic that
only worked on ChatML-shaped templates and forced a hard `<|im_start|>` /
`<|im_end|>` presence gate on Case 2 repair. This meant a Llama-3, Gemma, or
Phi-3 template stripped of its generation-prompt block by a downstream tool
(LlamaFactory, Axolotl, etc.) would still warn-and-return even though the
structural shape is identical to the ChatML case the PR already handles.

This replaces the regex with `_derive_assistant_prefix_by_render`: render the
template with two dialogs that differ only in assistant content, then
`os.path.commonprefix` on the tails captures the exact assistant-turn prefix
the template emits. The template itself is ground truth, so non-ChatML shapes
work as long as the assistant block is a literal the template emits once per
message.

Three guards keep the derivation safe:
  A. both assistant renders extend the base render (no reordering);
  B. the divergence point is exactly the content-insertion site (sentinel
     follows the common prefix);
  C. a user-role cross-check: if a render with a user sentinel also emits
     the same prefix, role has no effect on output and we reject. A render
     failure on [user, user] (e.g. Gemma's `raise_exception` alternation
     check) is evidence that role matters; we accept.

Sentinels differ at character 0 so `commonprefix` cannot absorb them, and
trailing whitespace/comments after the last `{% endfor %}` are stripped
before probing (they would appear in base but not after the appended
assistant turn and break Guard A).

`_fix_chat_template` and `_repair_string_template` now thread an
`is_sharegpt` kwarg; `_fix_chat_template` retries once with
`is_sharegpt=True` if the first probe returns None (dual-probe fallback
for dict/list callers).

The ChatML `<|im_start|>` / `<|im_end|>` hard gate in Case 2 is dropped.
`_infer_assistant_separator` is deleted.

Verified via:
  - tests/test_fix_chat_template_pr4426.py: 51/51 cells (new Llama-3,
    Gemma, Phi-3 broken-template rows all repair FIX-OK)
  - tests/test_load_correct_tokenizer_pr4426.py: 5/5
  - tests/test_chat_template_followups.py: 18/18 (T11-T18 cover
    non-ChatML repair + probe failure modes)
  - tests/test_mistral_pr4426.py: 5/5 byte-identical
  - tests/test_qwen_pr4426.py: 14/14 byte-identical (Qwen3-Guard AST
    gate still rejects)
  - tests/hermes3_lora_pr4426.py reload: patched template ends with
    `<|im_start|>assistant\n`, inference returns sensible output.
  - temp/sim/battery.py: 79/79 followup; vs baseline: 0 regressions,
    9 improvements.
  - Spot-check probe on real stripped tokenizers (Hermes-3, Phi-4,
    Llama-3.2-1B, Gemma-3-1B): all derive the expected prefix.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address reviewer findings: variant routing, positive-gate detection, comment-safe end scan

Resolves three reviewer findings on PR #5049 (`fix/chat-template-followups`):

Finding #1 [10/10]: dict/list variants now route through
`_fix_chat_template_for_tokenizer` via a new `_VariantTokenizerProxy`
adapter. Previously the dict/list branches called `_fix_chat_template`
directly, silently bypassing the warn/strict (`UNSLOTH_STRICT_CHAT_TEMPLATE`)
contract, the `no == yes` diagnostic, broken-existing-block detection,
and `_validate_patched_template` guard. The proxy swaps
`base.chat_template` to the variant string before each
`apply_chat_template` call so tokenizer globals (`bos_token`, custom
filters, `raise_exception`) remain available; if the base is read-only
it falls back to isolated Jinja rendering.

Finding #2 [1/10]: `_has_add_generation_prompt_block` now requires the
`If` body to contain at least one `Output` node (a new
`_if_body_emits_content` helper walks descendants). This distinguishes a
real generation-prompt block from a header guard like
`{% if not add_generation_prompt is defined %}{% set ... %}{% endif %}`
(body contains only `Assign`) which references the name but emits
nothing. Also dropped a now-redundant `"add_generation_prompt" not in
scrubbed` guard in `_fix_chat_template` Case 2 so header-guarded
templates still get repaired.

Finding #4 [1/10]: `_find_end_position` now replaces Jinja comments with
equal-length whitespace before scanning for `{% endfor %}` / `{% endif %}`
tokens. This prevents a trailing comment containing those tokens from
being picked as the real end tag. Positions in the padded string map 1:1
to positions in the original template.

Tests:
  - tests/test_chat_template_followups.py: 21/21 (T19 strict-mode
    dict variant, T20 header-guard repair, T21 comment-endfor trap
    added; T4/T5 stubs updated with a working apply_chat_template
    that routes through Jinja).
  - tests/test_fix_chat_template_pr4426.py: 51/51 cells unchanged.
  - tests/test_load_correct_tokenizer_pr4426.py: 5/5.
  - tests/test_mistral_pr4426.py: 5/5 byte-identical.
  - tests/test_qwen_pr4426.py: 14/14 byte-identical.
  - temp/sim/battery.py: 79/79 followup; 0 regressions vs baseline.
  - Phase 3 Hermes-3 broken-LoRA reload: inference still returns
    `'The answer to the equation 2+2 is 4.'`.
  - Spot-checks on Hermes-3 / Phi-4 / Llama-3.2-1B / Gemma-3-1B real
    stripped templates: probe still derives the expected prefix.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Tighten comments in chat-template helpers

Pure comment minimization across `_find_end_position`,
`_has_add_generation_prompt_block`, `_if_body_emits_content`,
`_derive_assistant_prefix_by_render`, `_fix_chat_template` Case 2,
and `_VariantTokenizerProxy`. No behavior change; same intent,
fewer lines. All 21 follow-up tests and the 51-cell Phase 1 matrix
still pass.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Sandbox probe, fix is_sharegpt validator mismatch, reject negated gates

Three real bugs from the 10-agent Opus review:

1. Probe now uses `jinja2.sandbox.SandboxedEnvironment` instead of bare
   `jinja2.Environment`. The probe renders at model-load time (before
   the user calls `apply_chat_template`), so it was a new eager
   code-execution surface that the base HF tokenizer loading does not
   have. SandboxedEnvironment blocks attribute-chain exploits at
   negligible cost.

2. `_repair_string_template` now tries validation with both
   `is_sharegpt=False` and `is_sharegpt=True`. Previously, when
   `_fix_chat_template` internally fell back to the other schema via
   its dual-probe, the outer validation still used the caller's
   original `is_sharegpt` -- rendering with the wrong message keys and
   spuriously dropping a valid repair.

3. `_has_add_generation_prompt_block` now skips `If` nodes whose test
   is a `Not` expression. A negated gate like
   `{% if not add_generation_prompt %}{{ x }}{% endif %}` fires when
   agp=False, so its emitting body is not a generation block -- but the
   old code counted any Name reference regardless of polarity.

Cleanup: removed unused `self._label`, added `\r` escape in
generation-block literal, switched variant labels to `!r` formatting,
removed redundant `import os as _os`.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jinja2.sandbox import and sandbox proxy fallback

Two critical findings from the 20-reviewer pass:

1. [20/20] The proxy read-only fallback used bare `jinja2.Environment`,
   not sandboxed. All 20 reviewers independently reproduced marker-file
   creation via `cycler.__init__.__globals__['os'].system(...)` during
   `fix_chat_template()`. Fixed: fallback now uses
   `from jinja2.sandbox import SandboxedEnvironment`.

2. [14/20] The render-diff probe did `import jinja2` then referenced
   `jinja2.sandbox.SandboxedEnvironment`. `jinja2.sandbox` is a
   submodule that is NOT auto-imported by `import jinja2` on Jinja 3.1.6.
   This caused `AttributeError` (swallowed by `except Exception`),
   making the entire Case 2 repair path silently return None in a clean
   process. The 6 reviewers who saw it work had `jinja2.sandbox`
   pre-imported by an earlier module in their process. Fixed: both the
   probe and the proxy fallback now use
   `from jinja2.sandbox import SandboxedEnvironment`.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-16 05:52:33 -07:00
Daniel Han
6e87bade25 Trim verbose comments in PATH helpers
Reduce inline comments from ~160 lines to ~25 across both files.
Keep one-line summaries of the "why"; drop multi-paragraph rationale
blocks that repeated information already captured in commit messages
and PR discussion.
2026-04-16 12:01:01 +00:00
Etherll
ec32ce2e82
fix: use direct registry API for PATH writes instead of SetEnvironmentVariable (#4961)
* fix: replacing SetEnvironmentVariable with direct registry API

* apply reviews

* Use CreateSubKey for HKCU\Environment

* Store PATH backup under HKCU\Software\Unsloth

* Fix $backupKey registry handle leak in PATH backup block

Wrap $backupKey operations in try/finally so the handle is closed even
if GetValue or SetValue throws. The Add-ToUserPath helper already uses
this pattern for its registry key -- the backup block was the only
place missing it.

* Isolate WM_SETTINGCHANGE broadcast from PATH write error handling

Wrap the broadcast dummy-variable calls in their own try/catch so a
broadcast failure does not mask a successful registry PATH write.
Previously, if SetEnvironmentVariable threw after SetValue already
committed the new PATH, Add-ToUserPath would return $false and the
caller would skip Refresh-SessionPath.

* PATH helper polish: venv precedence, quoted entries, raw/expanded dedup

Three small follow-ups surfaced by a 10-reviewer pass against the rebased
PR head. None fix a regression vs main; each strictly improves the new
helpers.

Refresh-SessionPath / Refresh-Environment:
- Move $env:Path to the front of the merge so an activated venv keeps
  precedence over machine/user PATH after a refresh. Pre-PR dropped
  process-only entries entirely; post-PR kept them but at the back.
- Dedup on both raw and expanded forms so %USERPROFILE%\foo and the
  already-expanded C:\Users\me\foo do not both survive.

Add-ToUserPath:
- Trim whitespace and surrounding double-quotes from each compared entry
  so quoted PATH entries like "C:\Program Files\CMake\bin" deduplicate
  against an unquoted directory of the same path.

* Back up User PATH inside Add-ToUserPath, before first mutation

Previously only studio/setup.ps1 took a one-time PATH backup, at script
top (line ~547). install.ps1 (the irm | iex entry point) had no backup,
so users who installed via that path had no recovery surface if anything
clobbered their PATH. The PR description's "one-time backup before any
modifications" promise only held for the studio installer flow.

Move the backup into Add-ToUserPath itself: just before the first actual
SetValue mutation, write the pristine raw PATH to
HKCU\Software\Unsloth\PathBackup if no backup already exists. This:

- Covers both entry points (install.ps1 and studio/setup.ps1).
- Captures the TRUE pristine PATH even when install.ps1 runs first and
  studio/setup.ps1 runs afterwards (the script-top backup in setup.ps1
  would otherwise see an already-modified PATH).
- Is idempotent: once a backup exists, subsequent calls preserve it.
- Skips when nothing would mutate (dedup match) or PATH is empty.

The script-top backup in studio/setup.ps1 is kept for defense in depth.

* Refresh PATH: venv-aware merge order

Reconcile two competing concerns about Refresh-SessionPath /
Refresh-Environment surfaced by separate review rounds:

  - venv at the back -> activated venv loses precedence to system Python
  - process at the front -> stale shims (old node, old python, etc.)
    still on $env:Path can beat a freshly installed tool

New merge order:
  1. Activated venv Scripts dir, only if $env:VIRTUAL_ENV is set
  2. Machine PATH freshly read from registry
  3. User PATH freshly read from registry
  4. Current $env:Path as fallback

This way an explicitly-activated venv keeps priority while a tool the
script just installed wins over any stale entry that was already on
the inherited shell PATH. When no venv is active, fresh registry
entries take precedence as expected.

* Append to User PATH by default, close $envKey in finally

Add-ToUserPath gains a -Position Append|Prepend parameter defaulting to
Append so installing unsloth no longer prepends the bundled venv Scripts
directory ahead of the user's existing python / pip on new shells. The
four current call sites (install.ps1 launcher, studio/setup.ps1 CMake,
nvcc, Python user Scripts) all take the Append default because each one
that needs in-session precedence already does an inline $env:Path prepend
independently. This matches rustup / cargo / nvm / pyenv / uv behavior.

Also wrap the script-top $envKey.GetValue in a try/finally so the
registry handle is released even if the read throws. Matches the pattern
already used for $backupKey five lines below.

* Prepend cmake, nvcc, Python Scripts; keep venv Scripts appended

The previous commit switched Add-ToUserPath to append by default so that
installing unsloth would not silently hijack the user's system python /
pip. That was correct for the venv Scripts dir (which contains python.exe
and pip.exe alongside unsloth.exe), but wrong for the three studio/setup
call sites. Those persist cmake, the driver-compatible nvcc, and the
Python user Scripts dir for future shells, and in all three cases an
older tool already earlier in the user PATH would keep winning after the
install finished. The nvcc case is especially load-bearing: setup selects
a driver-compatible CUDA toolkit, then llama.cpp builds against whatever
wins PATH resolution, so a stale older nvcc produces broken builds.

Pass -Position 'Prepend' explicitly at the three setup.ps1 call sites
(cmake at line 754, nvcc bin at line 1025, Python user Scripts at line
1191). None of those directories holds python.exe, so prepending them
does not re-introduce the original hijack problem. Leave the install.ps1
venv Scripts call on the default Append with a comment explaining why.

* Symmetric dedup, Prepend reorders duplicates, unsloth shim dir

Address three separate findings surfaced by review:

1. Dedup asymmetry (Gemini high-priority): the existing dedup expanded
   registry entries via ExpandEnvironmentVariables but did NOT expand the
   new directory. Passing "%USERPROFILE%\foo" when "C:\Users\me\foo" was
   already in PATH produced a duplicate. Expand both sides so the check
   is symmetric.

2. -Position Prepend no-op on existing duplicates: the dedup loop
   returned $false as soon as it saw a match, regardless of position.
   That left a late-position duplicate in place instead of moving it to
   the front, so "prepend the newly selected cmake/nvcc" did not always
   beat an older copy earlier in PATH. Partition entries into kept and
   dropped lists, then reinsert a single copy at the requested position.
   Append still returns $false on any match so user-curated orderings
   are not reshuffled. Prepend also returns $false when the only copy
   is already at position 0 so we preserve the user's casing.

3. Stop adding the venv Scripts dir to User PATH entirely. That dir
   holds python.exe and pip.exe alongside unsloth.exe, so neither
   Prepend nor Append worked: prepend hijacked the user's system python
   and pip, append made the freshly-installed unsloth.exe lose to any
   older unsloth.exe earlier on PATH. Replace the Scripts-dir PATH add
   with a dedicated shim directory that contains only unsloth.cmd, and
   prepend that dir. The shim calls the venv's unsloth.exe by absolute
   path so future pip upgrades inside the venv propagate automatically.

* Shim via hardlink, Append user Scripts, drop venv sysconfig fallback

Three follow-ups to the c0ab1ab shim commit, targeting concerns raised in
the second 20-reviewer pass:

1. Shim uses unsloth.exe (hardlink, copy fallback) instead of unsloth.cmd.
   The batch-file approach had three distinct regressions:
   - cmd.exe expanded %...% sequences inside user arguments, so prompts
     like "What does 50% mean?" got mangled before reaching the CLI
   - Git Bash / MSYS2 / POSIX-style shells on Windows do not resolve
     bare-name lookups to .cmd files, so `unsloth` stopped working there
   - Set-Content -Encoding ASCII replaced non-ASCII profile characters
     with '?', so installs under C:\Users\Jörg\... wrote a broken shim
   A hardlink (fallback: copy) of unsloth.exe is a native Windows
   executable with no shell indirection. PATHEXT picks .exe before .cmd
   in cmd.exe and PowerShell, Git Bash honors .exe natively, subprocess
   callers hit it directly, and a hardlink stays in sync with the venv
   on pip upgrades because both names point at the same inode.

2. studio/setup.ps1 Python user Scripts dir is added with default Append
   instead of -Position Prepend. That directory holds every pip-installed
   user console script (pip, pytest, huggingface-cli, and so on), not
   just unsloth, so reordering it silently changed resolution order for
   unrelated tools. The new install.ps1 shim at PATH position 0 already
   guarantees `unsloth` resolves to the freshly installed copy, so the
   Python user Scripts entry only needs to be present, not at the front.

3. The sysconfig lookup in studio/setup.ps1 no longer falls back to
   sysconfig.get_path('scripts') when the nt_user scheme dir does not
   exist. When setup.ps1 is invoked from an activated venv (a flow the
   linked issue actually hits) that fallback returns the venv's Scripts
   directory, which would then be added to the persisted User PATH and
   re-introduce the python / pip hijack the shim dir is meant to avoid.
   Stick strictly to the nt_user scheme; skip the block if it does not
   exist on disk.

* Do not crash installer when unsloth.exe shim is locked

The shim update sequence at install.ps1:1095 did a bare Remove-Item /
New-Item HardLink / Copy-Item. Under the script's $ErrorActionPreference
a locked target (most commonly 'unsloth studio' still running while the
user re-invokes the installer) turns the Remove-Item failure into a
terminating error that aborts the install with no actionable message.

The existing shim is perfectly usable in that state, so there is no
reason to abort. Wrap the whole remove/link/copy sequence in a try/catch
that logs the probable cause (Studio still running), points at the fix
(close Studio and re-run), and lets the installer finish with the old
launcher still serving the command.

Also only emit the "added unsloth launcher to PATH" step line when the
launcher was actually (re)created AND the PATH entry was newly added --
previously the message fired even when the shim refresh silently failed,
which was confusing.

* Guard shim PATH entry on existence, use NullString for broadcast delete

Two follow-ups surfaced by the latest review pass:

1. Do not add the shim directory to User PATH when the launcher was not
   actually created. Antivirus blocking unsloth.exe, a disk-full volume,
   or restrictive filesystem permissions can make both the hardlink and
   the copy fallback fail on a fresh install. In that case the existing
   sequence would report "added unsloth launcher to PATH" warnings but
   still prepend the empty $ShimDir to User PATH -- the user sees an
   install that claims success but then cannot resolve `unsloth` in a
   new shell. Gate Add-ToUserPath on Test-Path $ShimExe so the PATH
   entry is only persisted when the launcher is really there.

2. Pass [NullString]::Value instead of $null to the broadcast-delete
   call in Add-ToUserPath. On PowerShell 7.5 and later (running on .NET
   9), a bare $null going into [Environment]::SetEnvironmentVariable
   can be coerced to an empty string rather than a true .NET null,
   which sets the dummy UnslothPathRefresh_XXXXXXXX variable to "" in
   HKCU\Environment instead of deleting it. The leaked variable is
   visible in System Properties and accumulates one entry per install
   run. [NullString]::Value is a PowerShell-specific sentinel that
   crosses the interop boundary as a real null and works on both PS 5.1
   and PS 7.x. See PowerShell/PowerShell#24637 for the underlying issue.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
2026-04-16 04:49:51 -07:00
Imgyu Kim
14ab6fbfae
BUG: fix _fix_chat_template for ChatML templates missing add_generation_prompt (#4426)
Fixes #4150.

Pre-PR, `_fix_chat_template` only patched templates where a trailing `{{ ... }}` expression followed the last `{% endfor %}`. ChatML templates (Hermes, Magnum, Phi-4, etc.) that end cleanly at `{% endfor %}` with no generation-prompt block were left unchanged, so the outer `fix_chat_template` raised:

```
RuntimeError: Unsloth: The tokenizer `...` does not have a
{% if add_generation_prompt %} for generation purposes.
```

This commonly shows up when a downstream tool (LlamaFactory, Axolotl) re-serializes the tokenizer during LoRA save and strips the generation-prompt block.

This PR adds a second branch to `_fix_chat_template` that fires when:

- the content after the last `{% endfor %}` is empty modulo Jinja `{# ... #}` comments,
- the scrubbed template contains `<|im_start|>` and `<|im_end|>`,
- and the scrubbed template does not already mention `add_generation_prompt`.

The assistant-turn separator is inferred from the template itself (preferring an explicit `'<|im_start|>assistant<sep>'` literal, then the unique `message['role'] + '<sep>'` from role concatenations, then `<|im_sep|>` for Phi-4-mini mixed-separator templates, then `\n`), so Phi-4-style templates are not silently corrupted with the wrong separator.

Verified against the existing chat-template corpus:

- Hermes-3, Magnum-v2, Phi-4-mini, Phi-4 multi-sep, ChatML with trailing whitespace, ChatML with trailing Jinja comment, dot-access `message.role`, split-literal `'<|im_start|>assistant'`: all repaired with the correct assistant prefix.
- Already-fixed ChatML templates: idempotent NOP.
- Trap templates with `<|im_start|>` only inside a Jinja comment: correctly not rewritten.
- Llama-3, Gemma-3, Qwen2.5 (non-ChatML): byte-identical.
- Mistral family (5 models including Mistral-Nemo, Mistral-Small-24B, Mixtral): byte-identical, protected both by the structural guard (no ChatML tokens) and the existing name-based exemption in `load_correct_tokenizer`.
- Qwen family (14 models including Qwen2.5, Qwen3, Qwen3-Coder, QwQ, VL, Math, Qwen3-Guard): byte-identical.

End-to-end reproduction: Hermes-3 LoRA SFT, save with stripped chat_template, reload. Pre-PR code path raises the RuntimeError above. Post-PR reload loads cleanly, patches the template at load time, and `apply_chat_template(add_generation_prompt=True)` produces the correct `<|im_start|>assistant\n` prefix.
2026-04-16 00:21:29 -07:00
DoubleMathew
a4d4dfe4ac
fix Gemma4 flash attn disable (#5045)
* fix pass attn implementation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-15 17:50:48 -05:00
Daniel Han
3869fbe1cc
Bump installer minimum to 2026.4.5 (#5041) 2026-04-15 08:23:41 -07:00
Daniel Han
cdb3e752ec Update _utils.py 2026-04-15 08:06:43 -07:00
Daniel Han
ba387e2c8f Update pyproject.toml 2026-04-15 08:06:30 -07:00
Daniel Han
f0d03655e8
Studio: add folder browser modal for Custom Folders (#5035)
* Studio: add folder browser modal for Custom Folders

The Custom Folders row in the model picker currently only accepts a
typed path. On a remote-served Studio (Colab, shared workstation) that
means the user has to guess or paste the exact server-side absolute
path. A native browser folder picker can't solve this: HTML
`<input type="file" webkitdirectory>` hides the absolute path for
security, and the File System Access API (Chrome/Edge only) returns
handles rather than strings, neither of which the server can act on.

This PR adds a small in-app directory browser that lists paths on the
server and hands the chosen string back to the existing
`POST /api/models/scan-folders` flow.

## Backend

* New endpoint `GET /api/models/browse-folders`:
  * `path` query param (expands `~`, accepts relative or absolute; empty
    defaults to the user's home directory).
  * `show_hidden` boolean to include dotfiles/dotdirs.
  * Returns `{current, parent, entries[], suggestions[]}`. `parent` is
    null at the filesystem root.
  * Immediate subdirectories only (no recursion); files are never
    returned.
  * `entries[].has_models` is a cheap hint: the directory looks like it
    holds models if it is named `models--*` (HF hub cache layout) or
    one of the first 64 children is a .gguf/.safetensors/config.json/
    adapter_config.json or another `models--*` subfolder.
  * Sort order: model-bearing dirs, then plain, then hidden; case-
    insensitive alphabetical within each bucket.
  * Suggestions auto-populate from HOME, the HF cache root, and any
    already-registered scan folders, deduplicated.
  * Error surface: 404 for missing path, 400 for non-directory, 403 on
    permission errors. Auth-required like the other models routes.

* New Pydantic schemas `BrowseEntry` and `BrowseFoldersResponse` in
  `studio/backend/models/models.py`.

## Frontend

* New `FolderBrowser` component
  (`studio/frontend/src/components/assistant-ui/model-selector/folder-browser.tsx`)
  using the existing `Dialog` primitive. Features:
  * Clickable breadcrumb with a `..` row for parent navigation.
  * Quick-pick chips for the server-provided suggestions.
  * `Show hidden` checkbox.
  * In-flight fetch cancellation via AbortController so rapid
    navigation doesn't flash stale results.
  * Badges model-bearing directories inline.

* `chat-api.ts` gains `browseFolders(path?, showHidden?)` and matching
  types.

* `pickers.tsx` adds a folder-magnifier icon next to the existing `Add`
  button. Opening the browser seeds it with whatever the user has
  already typed; confirming fills the text input, leaving the existing
  validation and save flow unchanged.

## What it does NOT change

* The existing text-input flow still works; the browser is additive.
* No new permissions or escalation; the endpoint reads only directories
  the server process is already allowed to read.
* No model scanning or filesystem mutation happens from the browser
  itself -- it just returns basenames for render.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Studio: cap folder-browser entries and expose truncated flag

Pointing the folder browser at a huge directory (``/usr/lib``,
``/proc``, or a synthetic tree with thousands of subfolders) previously
walked the whole listing and stat-probed every child via
``_looks_like_model_dir``. That is both a DoS shape for the server
process and a large-payload surprise for the client.

Introduce a hard cap of 2000 subdirectory entries and a
``truncated: bool`` field on the response. The frontend renders a small
hint below the list when it fires, prompting the user to narrow the
path. Below-cap directories are unchanged.

Verified end-to-end against the live backend with a synthetic tree of
2050 directories: response lands at 2000 entries, ``truncated=true``,
listing finishes in sub-second time (versus tens of seconds if we were
stat-storming).

* Studio: suggest LM Studio / Ollama dirs + 2-level model probe

Three improvements to the folder-browser, driven by actually dropping
an LM Studio-style install (publisher/model/weights.gguf) into the
sandbox and walking the UX:

## 1. Quick-pick chips for other local-LLM tools

`well_known_model_dirs()` (new) returns paths commonly used by
adjacent tools. Only paths that exist are returned so the UI never
shows dead chips.

* LM Studio current + legacy roots + user-configured
  `downloadsFolder` from its `settings.json` (reuses the existing
  `lmstudio_model_dirs()` helper).
* Ollama: `$OLLAMA_MODELS` env override, then `~/.ollama/models`,
  `/usr/share/ollama/.ollama/models`, and `/var/lib/ollama/.ollama/models`
  (the systemd-service install path surfaced in the upstream "where is
  everything?" issue).
* Generic user-choice locations: `~/models`, `~/Models`.

Dedup is stable across all sources.

## 2. Two-level model-bearing probe

LM Studio and Ollama both use `root/publisher/model/weights.gguf`.
The previous `has_models` heuristic only probed one level, so the
publisher dir (whose immediate children are model dirs, not weight
files) was always marked as non-model-bearing. Pulled the direct-
signal logic into `_has_direct_model_signal` and added a grandchild
probe so the classic layout is now recognised.

Still O(PROBE^2) worst-case, still returns immediately for
`models--*` names (HF cache layout) and for any direct weight file.

## 3. model_files_here hint on response body

A leaf model dir (just GGUFs, no subdirs) previously rendered as
`(empty directory)` in the modal, confusing users into thinking the
folder wasn't scannable. Added a `model_files_here` count on the
response (capped at 200) and a small hint row in the modal: `N model
files in this folder. Click "Use this folder" to scan it.`

## Verification

Simulated an LM Studio install by downloading the real 84 MB
`unsloth/SmolLM2-135M-Instruct-Q2_K.gguf` into
`~/.lmstudio/models/unsloth/SmolLM2-135M-Instruct-GGUF/`. Confirmed
end-to-end:

* Home listing suggests `~/.lmstudio/models` as a chip.
* Browsing `~/.lmstudio/models` flags `unsloth` (publisher) as
  `has_models=true` via the 2-level probe.
* Browsing the publisher flags `SmolLM2-135M-Instruct-GGUF` (model
  dir) as `has_models=true`.
* Browsing the model dir returns empty entries but
  `model_files_here=1`, and the frontend renders a hint telling the
  user it is a valid target.

* Studio: one-click scan-folder add + prominent remove + plain search icon

Three small Custom Folders UX fixes after real-use walkthrough:

* **One-click add from the folder browser**. Confirming `Use this
  folder` now submits the path directly to
  `POST /api/models/scan-folders` instead of just populating the text
  input. `handleAddFolder` takes an optional explicit path so the
  submit lands in the same tick as `setFolderInput`, avoiding a
  state-flush race. The typed-path + `Add` button flow is unchanged.

* **Prominent remove X on scan folders**. The per-folder delete
  button was `text-muted-foreground/40` and hidden entirely on
  desktop until hovered (`md:opacity-0 md:group-hover:opacity-100`).
  Dropped the hover-only cloak, bumped color to `text-foreground/70`,
  added a red hover/focus background, and sized the icon up from
  `size-2.5` to `size-3`. Always visible on every viewport.

* **Plain search icon for the Browse button**. `FolderSearchIcon`
  replaced with `Search01Icon` so it reads as a simple "find a
  folder" action alongside the existing `Add01Icon`.

* Studio: align Custom Folders + and X buttons on the same right edge

The Custom Folders header used `px-2.5` with a `p-0.5` icon button,
while each folder row used `px-3` with a `p-1` button. That put the
X icon 4px further from the right edge than the +. Normalised both
rows to `px-2.5` with `p-1` so the two icons share a column.

* Studio: empty-state button opens the folder browser directly

The first-run empty state for Custom Folders was a text link reading
"+ Add a folder to scan for local models" whose click toggled the
text input. That's the wrong default: a user hitting the empty state
usually doesn't know what absolute path to type, which is exactly
what the folder browser is for.

* Reword to "Browse for a models folder" with a search-icon
  affordance so the label matches what the click does.
* Click opens the folder browser modal directly. The typed-path +
  Add button flow is still available via the + icon in the
  section header, so users who know their path keep that option.
* Slightly bump the muted foreground opacity (70 -> hover:foreground)
  so the button reads as a primary empty-state action rather than a
  throwaway hint.

* Studio: Custom Folders header gets a dedicated search + add button pair

The Custom Folders section header had a single toggle button that
flipped between + and X. That put the folder-browser entry point
behind the separate empty-state link. Cleaner layout: two buttons in
the header, search first, then add.

* Search icon (left) opens the folder browser modal directly.
* Plus icon (right) toggles the text-path input (unchanged).
* The first-run empty-state link is removed -- the two header icons
  cover both flows on every state.

Both buttons share the same padding / icon size so they line up with
each other and with the per-folder remove X.

* Studio: sandbox folder browser + bound caps + UX recoveries

PR review fixes for the Custom Folders folder browser. Closes the
high-severity CodeQL path-traversal alert and addresses the codex /
gemini P2 findings.

Backend (studio/backend/routes/models.py):

* New _build_browse_allowlist + _is_path_inside_allowlist sandbox.
  browse_folders now refuses any target that doesn't resolve under
  HOME, HF cache, Studio dirs, registered scan folders, or the
  well-known third-party model dirs. realpath() is used so symlink
  traversal cannot escape the sandbox. Also gates the parent crumb
  so the up-row hides instead of 403'ing.
* _BROWSE_ENTRY_CAP now bounds *visited* iterdir entries, not
  *appended* entries. Dirs full of files (or hidden subdirs when
  show_hidden is False) used to defeat the cap.
* _count_model_files gets the same visited-count fix.
* PermissionError no longer swallowed silently inside the
  enumeration / counter loops -- now logged at debug.

Frontend (folder-browser.tsx, pickers.tsx, chat-api.ts):

* splitBreadcrumb stops mangling literal backslashes inside POSIX
  filenames; only Windows-style absolute paths trigger separator
  normalization. The Windows drive crumb value is now C:/ (drive
  root) instead of C: (drive-relative CWD-on-C).
* browseFolders accepts and forwards an AbortSignal so cancelled
  navigations actually cancel the in-flight backend enumeration.
* On initial-path fetch error, FolderBrowser now falls back to HOME
  instead of leaving the modal as an empty dead end.
* When the auto-add path (one-click "Use this folder") fails, the
  failure now surfaces via toast in addition to the inline
  paragraph (which is hidden when the typed-input panel is closed).

* Studio: rebuild browse target from trusted root for CodeQL clean dataflow

CodeQL's py/path-injection rule kept flagging the post-validation
filesystem operations because the sandbox check lived inside a
helper function (_is_path_inside_allowlist) and CodeQL only does
intra-procedural taint tracking by default. The user-derived
``target`` was still flowing into ``target.exists`` /
``target.is_dir`` / ``target.iterdir``.

The fix: after resolving the user-supplied ``candidate_path``,
locate the matching trusted root from the allowlist and rebuild
``target`` by appending each individually-validated segment to
that trusted root. Each segment is rejected if it isn't a single
safe path component (no separators, no ``..``, no empty/dot).
The downstream filesystem ops now operate on a Path constructed
entirely from ``allowed_roots`` (trusted) plus those validated
segments, so CodeQL's dataflow no longer sees a tainted source.

Behavior is unchanged for all valid inputs -- only the
construction of ``target`` is restructured. Live + unit tests
all pass (58 selected, 7 deselected for Playwright env).

* Studio: walk browse paths from trusted roots for CodeQL

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ubuntu <ubuntu@h100-8-cheapest.us-east5-a.c.unsloth.internal>
2026-04-15 08:04:33 -07:00
Roland Tannous
800ddc95f8
Re-apply #4939: updated models template mappers (#4950)
* Reapply "updated models template mappers. added lfm2.5vl450m to transformers 5…" (#4945)

This reverts commit 33503ea248.

* Add missing gemma-4-31B-it bnb-4bit mapper entry and LFM2.5 upstream namespace for PR #4950

- Add unsloth/gemma-4-31B-it-unsloth-bnb-4bit to __INT_TO_FLOAT_MAPPER so
  the int-to-float resolution works for this model (already listed in
  TEMPLATE_TO_MODEL_MAPPER but had no mapper entry).
- Add LiquidAI/LFM2.5-1.2B-Instruct to lfm-2.5 TEMPLATE_TO_MODEL_MAPPER
  entry so the canonical upstream namespace is mapped consistently with lfm-2.

* Add missing gemma-4-31B-it bnb-4bit Ollama mapping and lfm-2.5 chat template alias

- Add unsloth/gemma-4-31B-it-unsloth-bnb-4bit to OLLAMA_TEMPLATE_TO_MODEL_MAPPER
  so Ollama export works for this model (E2B-it and E4B-it bnb-4bit variants were
  already present, 31B-it was inconsistently omitted)
- Register CHAT_TEMPLATES["lfm-2.5"] as alias of the lfm-2 template to prevent
  KeyError when Studio resolves LFM2.5 models through MODEL_TO_TEMPLATE_MAPPER

* Add missing LFM2 bnb-4bit INT_TO_FLOAT_MAPPER entry

unsloth/LFM2-1.2B-unsloth-bnb-4bit is referenced in model_mappings.py
but had no mapper.py entry, so model resolution would fail when users
load that variant with load_in_4bit=False or when the float name is
used with load_in_4bit=True.

* Fix review findings for PR #16

1. ollama_template_mappers.py: Restore dropped Gemma-4 base model IDs
   (E2B, E4B, 31B, 26B-A4B) and add missing google/ upstream IDs to
   the gemma4 Ollama mapper for consistency with other gemma entries.

2. mapper.py: Remove self-mapping non-bnb-4bit entries from
   __INT_TO_FLOAT_MAPPER that were polluting FLOAT_TO_INT_MAPPER with
   lowercase 16-bit names, causing load_in_4bit=True to return bad
   model names. Add direct MAP_TO_UNSLOTH_16bit entries to preserve
   the google->unsloth 16-bit redirects.

3. mapper.py: Add LFM2.5 MAP_TO_UNSLOTH_16bit redirect so
   LiquidAI/LFM2.5-1.2B-Instruct resolves to its unsloth mirror.

* Add review tests for PR #4950

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove top-level test files

These test_*.py files were added at the repo root rather than under tests/.
Removing them from this PR; the production mapper changes remain.

* Add gemma-4-26B-A4B-it mapping

Adds unsloth/gemma-4-26B-A4B-it to __INT_TO_FLOAT_MAPPER as a 2-tuple so
google/gemma-4-26B-A4B-it routes to unsloth/gemma-4-26B-A4B-it across
INT_TO_FLOAT_MAPPER, FLOAT_TO_INT_MAPPER, and MAP_TO_UNSLOTH_16bit.

The 26B-A4B (MoE) model has no bnb-4bit variant, so the key uses the
plain unsloth name rather than the -unsloth-bnb-4bit suffix.

Removes the now-redundant standalone _add_with_lower call for the -it
variant; the 16bit mapping is registered via the dict loop.

* Add unsloth-bnb-4bit mappings for gemma-4 base (non-it) models

Adds E2B, E4B, 31B base unsloth-bnb-4bit entries to __INT_TO_FLOAT_MAPPER.
The 26B-A4B (MoE) base has no bnb-4bit variant on HF, so it stays on the
standalone _add_with_lower line for the 16bit-only routing.

Removes the redundant _add_with_lower lines for E2B, E4B, 31B base since
the dict loop now registers the same google->unsloth route through the
2-tuple entries, plus full FLOAT_TO_INT and INT_TO_FLOAT coverage.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-15 07:52:12 -07:00
Avaya Aggarwal
7c5464ad71
feat: Add cactus QAT scheme support (#4679)
* feat: Add cactus QAT scheme support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test(qat): add tests for cactus QAT scheme and fix missing import

* Fix cactus QAT scheme: correct MappingType import, tighten PerGroup filter

- Drop the broken `from torchao.dtypes import MappingType` import. `MappingType`
  lives in `torchao.quantization` (and `torchao.quantization.quant_primitives`);
  it is not exported from `torchao.dtypes` in any supported torchao release
  (verified on 0.14, 0.16, 0.17). The previous code raised `ImportError` on
  every cactus call and was masked as a misleading 'torchao not found' error.
- Since `IntxWeightOnlyConfig` already defaults `mapping_type` to
  `MappingType.SYMMETRIC`, drop the explicit kwarg entirely and remove the
  import. Behavior is unchanged.
- Introduce a named `group_size = 32` constant (matches the int4 / fp8-int4
  pattern in the surrounding branches) and add a `% group_size == 0`
  divisibility guard to the filter. `PerGroup(32)` requires
  `in_features % 32 == 0` at `quantize_()` time, otherwise torchao raises
  `ValueError: in_features (N) % group_size (32) must be == 0`. The old
  `in_features >= 32` filter would admit non-aligned widths (e.g. 33, 48, 65,
  127) and crash `_prepare_model_for_qat` for those shapes.

* Warn when cactus QAT skips non-divisible Linear layers

Multiple reviewers flagged that the divisibility guard added in the
previous commit can silently leave Linear layers in full precision when
their in_features is not a multiple of 32. For currently supported
Unsloth models (Qwen, Llama, Gemma, Mistral, Phi) every Linear width is
already a multiple of 32/64/128 so this never triggers, but surfacing
the coverage gap is cheap and avoids users assuming 100% QAT coverage
when they bring a custom model with unusual shapes.

Emit a UserWarning listing up to the first 8 skipped layers whenever
the cactus filter excludes any Linear due to the modulo guard. This
keeps the lenient silent-skip behavior (consistent with int4 /
fp8-int4), but stops making it silent.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-15 07:40:03 -07:00
Avaya Aggarwal
f18e9dddf0
feat: Add support for OLMo-3 model (#4678)
* feat: Add support for OLMo-3 model in mapping and tests

* Update unsloth/models/mapper.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update tests/test_get_model_name.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Fix casing, add Think variants, and align version gate for OLMo-3 PR 4678

Mapper: switch slugs from OLMo-3 to canonical Olmo-3 mixed case, drop the
non-existent unsloth/Olmo-3-7B-Instruct-bnb-4bit dead alias, and add the
already-published Olmo-3-7B-Think and Olmo-3-32B-Think Unsloth mirrors.

Loader: change the olmo3 transformers version gate from Version("4.57.0")
to Version("4.57.0.dev0") so nightly/source builds that already contain
olmo3 are not blocked, matching the OLMo-2, Gemma 3 and Cohere patterns.

* Use canonical Olmo-3 casing and cover Think variants in OLMo-3 tests

Mirrors the mapper.py fixes on pr-4678-code: HuggingFace canonical slugs
for the OLMo-3 family use mixed-case Olmo-3 (not OLMo-3 like OLMo-2), and
Unsloth already hosts Olmo-3-7B-Think and Olmo-3-32B-Think mirrors, so
the resolution matrix now covers all three published Olmo-3 families.

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-15 07:39:11 -07:00
Daniel Han
c3cd890357
Studio: refresh Downloaded GGUF list and recurse into variant subdirs (#5032)
* Studio: refresh Downloaded GGUF list and recurse into variant subdirs

Two fixes for the model picker's "Downloaded" section.

Frontend (`pickers.tsx`):
* `HubModelPicker`'s mount effect short-circuited the cached-gguf and
  cached-models refetch whenever the module-level cache already had
  entries (`if (alreadyCached) return;`). After downloading a new repo
  in the same session, reopening the picker rendered the stale cache
  and the new repo never appeared in "Downloaded" until a full page
  reload. The early return is removed so the lists are always refreshed
  on mount; the module cache still drives the initial render so there
  is no spinner flash when we already had data.

Backend (`utils/models/model_config.py`):
* `list_local_gguf_variants` and `_find_local_gguf_by_variant` used a
  non-recursive `Path.glob("*.gguf")`. Some HF GGUF repos (e.g.
  `unsloth/gemma-4-26B-A4B-it-GGUF`) place the largest quants under a
  variant-named subdirectory such as `BF16/...gguf`, which the
  top-level glob missed. Both helpers now use `rglob` and the variant
  filename is stored as a path relative to the scan root so the
  locator can still find the file.

The flat-layout case (variants directly in the snapshot root) is
unchanged: verified against `unsloth/gemma-4-E2B-it-GGUF` which still
returns its UD-Q4_K_XL variant correctly.

* Studio: emit posix-style relative filenames for local GGUF subdirs

`list_local_gguf_variants` was doing `str(f.relative_to(p))`, which on
Windows produces backslash-separated paths like `BF16\foo.gguf`. The
remote `list_gguf_variants` (HF API path) always returns forward-slash
filenames such as `BF16/foo.gguf`, so the two would diverge on Windows.

Switch to `.as_posix()` so the local and remote variant filenames stay
identical across Linux, macOS, and Windows. Verified by simulating with
`PureWindowsPath` in the test suite.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Studio: detect mmproj at snapshot root for nested-variant layouts

When _find_local_gguf_by_variant returns a weight file inside a
quant-named subdir (e.g. snapshot/BF16/foo.gguf), detect_mmproj_file
was scanning only the immediate parent and missing the mmproj file
sitting at the snapshot root. The model was then loaded without
--mmproj, silently breaking vision support for repos that ship
nested variants.

detect_mmproj_file now takes an optional search_root and walks up
from the weight file to that root, in order, so the mmproj at the
snapshot root is picked up. Sibling quant subdirs are not scanned,
so an unrelated variant's mmproj does not leak in.

Also apply the suggested micro-optimization on relative_to in
list_local_gguf_variants -- only build the posix path when storing
the first file for a quant.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-15 07:34:42 -07:00
Daniel Han
156f3fc4b0
Gate trl disable_gradient_checkpointing patch warning on UNSLOTH_ENABLE_LOGGING (#5038)
The "Patched trl.models.utils.disable_gradient_checkpointing with a no-op"
warning fires once on every Unsloth import, including from notebooks where
the user did not opt into verbose logging. It is a routine integration
patch, not an anomaly the user needs to know about. Gate it on
UNSLOTH_ENABLE_LOGGING=1 like other diagnostic notices.
2026-04-15 07:33:48 -07:00
jonahsamost
777e1bd0ac
fix (#4887) 2026-04-15 07:21:03 -07:00
Daniel Han
1a4ca5eca8
Fix grad-accum accepts_loss_kwargs detection for vision wrappers (#5036)
* Fix grad-accum model_accepts_loss_kwargs detection for vision wrappers

Replace the source-string rewrite of Trainer.__init__ with an instance-level
accepts_loss_kwargs shadow applied on the loaded model. Covers:

  1. Unsloth-compiled forward -> True, so HF Trainer does not double-scale
     on top of unsloth_fixed_cross_entropy's num_items_in_batch division.
  2. Stock forward on a conditional-generation wrapper (Gemma3n, Gemma3
     pre-4.57, Qwen-VL family, etc.) where the outer class has no
     accepts_loss_kwargs but the inner .model declares False -> False.
     This is the case that reproduces issue #4982 under trust_remote_code
     or UNSLOTH_COMPILE_DISABLE, where the previous fix's outer-attr
     check walked past the inner model and fell through to signature
     inspection.
  3. Text LMs without any explicit accepts_loss_kwargs -> leave HF default.

The previous .replace()-based patch silently no-ops on transformers 4.48
through 4.52 (variable named model, not unwrapped_model) and is fragile
against any upstream reformat. The new helper walks the PEFT / HF wrapper
chain, finds the first class that declares accepts_loss_kwargs on its own
class dict (type(m).__dict__, not hasattr, to avoid PEFT __getattr__
forwarding), and setattr-shadows that value at every wrapper level so
HF Trainer's hasattr(unwrapped_model, ...) check picks it up at whichever
level accelerate.unwrap_model returns.

Also adds an unconditional post-init clamp of
accelerator.gradient_accumulation_steps = 1 to work around the
transformers 5.0 through 5.5 GradientAccumulationPlugin regression that
makes accelerator.backward divide loss by GA on top of training_step's
own /GA division. Fixed upstream in 5.6.0.dev0; no-op on 4.x and 5.6+.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Trim comments

* Address review: cover PEFT-after-load and custom compile location

Two review findings from 3/20 reviewers:

1. [3 of 20 reviewers] apply_accepts_loss_kwargs_fix was called from the
   loaders before get_peft_model wraps the base model, so on transformers
   4.48-4.52 (which does hasattr on the outer model) the instance shadow
   on the base model was lost after PEFT wrapping. Fix: also call it from
   the wrapped Trainer.__init__ so it runs on whatever model the user
   actually hands to Trainer, which is always the final wrapped form.

2. [1 of 20 reviewers] _forward_is_unsloth_compiled hard-coded the
   substrings "unsloth_compiled" / "unsloth_cache" in the co_filename
   check, which misclassifies compiled forwards when
   UNSLOTH_COMPILE_LOCATION is set to a custom directory. Fix: new
   _unsloth_compile_cache_leaves helper that reads the env var and
   matches the basename against path components, honoring both the
   default and any user override.

Verified locally:
- PEFT-after-load simulation: HF's hasattr(peft, "accepts_loss_kwargs")
  now returns True after our init wrapper runs, and value resolves to
  False on Gemma3n-style inner wrappers.
- Custom UNSLOTH_COMPILE_LOCATION simulation: compiled detection returns
  True for /tmp/my_custom_cache/compiled.py when the env var is set.
- End-to-end Gemma-3 270m + LoRA SFT unchanged: loss 4.9626, grad-norm
  matches prior run, all 4 wrapper levels now carry the shadowed attr.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-15 06:59:36 -07:00
Daniel Han
1ccfd2e0a5
fix(rocm): tighten gfx regex to ignore generic ISA lines (#5033)
* fix(rocm): tighten gfx regex to ignore generic ISA lines

ROCm 6.1+ rocminfo emits generic ISA names such as
"amdgcn-amd-amdhsa--gfx11-generic" and "amdgcn-amd-amdhsa--gfx9-4-generic"
alongside the real GPU name. The previous `gfx[1-9]` regex used in
`_has_rocm_gpu` matched both, so a host with only a generic ISA entry
would be reported as having a usable AMD GPU.

Tighten the pattern to `gfx[1-9][0-9a-z]{2,3}` so only real gfx ids
match. This covers every documented target from GFX6 (gfx600) through
GFX12 (gfx1201), including letter-suffixed ids like gfx90a (MI250 /
MI250X) and gfx90c. Documented generic ISA names always have 1 or 2
digits before the dash and no longer match.

Applied to both `studio/install_python_stack.py` and
`studio/install_llama_prebuilt.py` so the two detection paths agree.

Co-authored-by: Martin Hoyer <mhoyer@redhat.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Martin Hoyer <mhoyer@redhat.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-15 05:24:41 -07:00
Daniel Han
b7a8ff2833
Respect classification head skip list on pre-quantized 4-bit checkpoints (#5027) (#5034)
* Respect classification head skip list on pre-quantized 4-bit checkpoints (#5027)

FastLanguageModel.from_pretrained(..., num_labels=N) crashed with
"NotImplementedError: normal_kernel_cuda not implemented for 'Byte'" on
pre-quantized bnb 4-bit checkpoints (e.g. unsloth/Qwen3-4B-bnb-4bit)
when running on transformers 5.x.

Two pieces were needed to close this out:

1. unsloth_zoo PR: add "score", "classifier", "qa_outputs" to
   SKIP_QUANTIZATION_MODULES so replace_with_bnb_linear leaves task
   heads in the compute dtype.

2. This commit: for pre-quantized checkpoints, transformers reads
   llm_int8_skip_modules from the quantization_config baked into
   config.json and ignores the runtime BitsAndBytesConfig we pass via
   kwargs. Unsloth must merge its skip list into
   model_config.quantization_config.llm_int8_skip_modules before the
   from_pretrained call, or the checkpoint's frozen list
   (e.g. ["lm_head", "multi_modal_projector", "merger",
   "modality_projection"]) wins and the `score` head gets converted to
   Linear4bit with uint8 storage, then _init_weights calls normal_ on
   uint8 and crashes.

Also add a defensive post-load cast on the task head to guard against
any residual path that ends up with a non-floating head dtype.

Verified on transformers 4.57.6 and 5.5.0 with:
- unsloth/Qwen3-4B-bnb-4bit + num_labels=3
- unsloth/Qwen3-4B (non-bnb repo, load_in_4bit=True)
- unsloth/Llama-3.2-1B-Instruct + num_labels=3
- unsloth/ModernBERT-large classifier head (bert_classification notebook)
- Regression: causal LM path unchanged, backbone still 4-bit
- 3-step SFT on num_labels=3 confirms gradient flow and weight updates
  on score.weight

Fixes unslothai/unsloth#5027

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-15 05:16:33 -07:00
David Solanas Sanz
1fcb2502cf
fix: prevent offline freeze by fixing stats retry and forwarding local_files_only (#5016)
Fixes #2393.

- `_utils.py`: `has_internet()` now respects `HF_HUB_OFFLINE` with truthy variant parsing in addition to `TRANSFORMERS_OFFLINE`.
- `_utils.py`: replace uncontrolled `except Exception: stats_check()` retry (which had no time limit and could freeze on Kaggle offline mode) with a logged skip.
- `loader.py`: forward `local_files_only` from kwargs into all `AutoConfig.from_pretrained` and `PeftConfig.from_pretrained` probes in `FastLanguageModel.from_pretrained` and `FastModel.from_pretrained`, including the PEFT base-model reload paths.
2026-04-15 04:51:31 -07:00
Lee Jackson
f9ef639dde
Studio: support GGUF variant selection for non-suffixed repos (#5023)
* fix: support GGUF variant selection for non-suffixed repos

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: harden GGUF detection across cached models and picker flows

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* chore: use shared GGUF picker helper for search rows

* fix: avoid mixed cache duplication and preserve GGUF fallback detection

* fix: unify GGUF cache matching and merge picker hints

* fix: normalize local GGUF matching across picker and model config

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: robust cached-gguf classification + hint-aware click routing

- _repo_gguf_size_bytes: treat size_on_disk=None as 0 and dedupe fallback
  by commit_hash so partial/interrupted downloads don't TypeError out of
  sum() and wipe the entire cached list.
- list_cached_gguf / list_cached_models: narrow per-repo try/except so
  one malformed repo no longer poisons the whole response.
- handleModelClick: route through isKnownGgufRepo instead of the
  suffix-only isGgufRepo, so non-suffixed GGUF repos still open the
  variant expander from every call site.
- Replace the modelIsGgufById/resultIsGgufById Maps with Sets of known
  GGUF ids to stop conflating "no hint" with "known not-GGUF".
- Make HfModelResult.isGguf required (it is always set in makeMapModel).
- Add regression tests for the None size case, mixed-repo inclusion in
  cached-gguf, and per-repo error isolation.

* fix: exclude mmproj from GGUF classification and case-normalize hint lookups

- _repo_gguf_size_bytes now filters mmproj vision-adapter files so
  safetensors+mmproj.gguf repos stay on the cached-models path and
  non-GGUF rows no longer show zero pickable variants. A vision-capable
  GGUF repo (main weight + mmproj adapter) still classifies as GGUF and
  reports the main weight size.
- modelGgufIds / resultGgufIds now key on lowercased ids and
  isKnownGgufRepo lowercases its lookup, so store and HF-search ids
  that differ only by casing still match the same GGUF hint.
- New regression tests: mmproj-only repo excluded from cached-gguf,
  same repo included in cached-models, vision-capable repo still
  classified as GGUF with correct size.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
2026-04-15 15:32:01 +04:00
Roland Tannous
13928b5f0e
Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var (#5024)
* Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var

When set, UNSLOTH_PYTORCH_MIRROR overrides the default
https://download.pytorch.org/whl base URL in all four install scripts
(install.sh, install.ps1, studio/setup.ps1, studio/install_python_stack.py).
When unset or empty, the official URL is used. This lets users behind
corporate proxies or in regions with poor connectivity to pytorch.org
point at a local mirror without patching scripts.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add pytest for UNSLOTH_PYTORCH_MIRROR in install_python_stack.py

Tests that _PYTORCH_WHL_BASE picks up the env var when set, falls back
to the official URL when unset or empty, and preserves the value as-is
(including trailing slashes).

* Remove stale test assertions for missing install.sh messages

* Fix GPU mocking in test_get_torch_index_url.sh

Extract _has_usable_nvidia_gpu and _has_amd_rocm_gpu alongside
get_torch_index_url so the GPU-presence checks work in tests.
Add -L flag handling to mock nvidia-smi so it passes the GPU listing
check. All 26 tests now pass on CPU-only machines.

* Strip trailing slash from UNSLOTH_PYTORCH_MIRROR to avoid double-slash URLs

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-15 11:39:11 +04:00
Datta Nimmaturi
826c98f3c0
[moe][gemma4] Target MoE for gemma4 (#4913)
* Target MoE for gemma4

* refactor attention impl determine

* Revert "refactor attention impl determine"

This reverts commit 888fca08110a9a74278dc1ebc14d0da043bbd11d.

* Remove attention policy changes from gemma4 MoE fix
2026-04-14 16:53:07 -05:00
Daniel Han
5aa8c15246
Studio: hard-stop at n_ctx with a 'Context limit reached' toast (#5021)
* Studio: hard-stop at n_ctx with a dedicated 'Context limit reached' toast

llama-server's default behavior when the KV cache fills is to silently
drop the oldest non-``n_keep`` tokens and keep generating. The UI has
no way to tell the user that earlier turns were evicted -- they just
see degraded continuity and a confusing ``5,361 / 4,096`` on the
context usage bar.

Launch llama-server with ``--no-context-shift`` so it returns a clean
error once the request would exceed ``n_ctx``. In the chat adapter,
catch the error, identify it as a context-limit error via
``isContextLimitError()``, and surface a dedicated toast that names
the exact control to adjust: the ``Context Length`` field in the chat
Settings panel.

Also add a lightweight tooltip hint on ``ContextUsageBar`` when usage
crosses 85%, so users see the "raise Context Length in Settings"
suggestion before they hit the hard stop.

Tests:

  * ``test_llama_cpp_no_context_shift.py`` pins the ``--no-context-shift``
    flag in the static launch-command template, and pins it inside the
    unconditional ``cmd = [ ... ]`` block so a future refactor can't
    hide it behind a branch.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Shorten --no-context-shift comment to 1 line

* Match backend _friendly_error rewrite in isContextLimitError

Codex review on PR caught that ``backend/routes/inference.py::_friendly_error``
rewrites the raw llama-server text
  "request (X tokens) exceeds the available context size (Y tokens)"
into
  "Message too long: X tokens exceeds the Y-token context window. ..."
on the main streaming GGUF path. The heuristic only looked for
"context size" / "exceeds the available context" / "context shift",
none of which survive the rewrite, so the new "Context limit reached"
toast would never fire for the most common case. Add matches for
"message too long" and "context window" so both wordings hit.

Also addresses Gemini feedback on the launch-flag test:
  * Use ``inspect.getsource(LlamaCppBackend.load_model)`` instead of
    reading ``__file__`` directly; scopes the assertions to the
    function that actually launches llama-server.
  * Replace the hardcoded ``"            ]"`` indent search with a
    line-at-a-time scan for a line that is just ``]``, so the test
    survives reformatting.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 10:58:20 -07:00
Daniel Han
5861a7ce15
Studio: split model-load progress label across two rows (#5020)
* Studio: split model-load progress label across two rows

The chat flow and training overlay both compose a progress label like
"112.6 of 122.3 GB • 331.0 MB/s • 30s left" and render it next to the
percent badge in a single flex row. Once the rate + ETA part shows up,
the label outgrows the row width and wraps mid-phrase, orphaning the
percent ("19 left %") onto a second ragged line.

Fix in model-load-status.tsx: split the label on the first " • " into
a primary (size) chunk that stays on row 1 with the percent, and a
secondary (rate/ETA) chunk that renders on its own muted row below.
Labels without a bullet (e.g. "22.8 GB downloaded") collapse cleanly
to one row. The inline-status variant keeps only the primary and
surfaces the full label via the tooltip.

Also extracts the rate/ETA math out of useTransferStats into a pure
``transfer-stats.ts`` module (appendSample + computeTransferStats) so
it can be reasoned about and tested without React. The hook is now a
thin wrapper that feeds sample history through the pure functions.

Backend: adds two companion test files for load_progress():

  * test_llama_cpp_load_progress_matrix.py (21 tests) -- platform
    matrix (Linux /proc, macOS/Windows absence), VmRSS parsing
    variants (tab/space/missing/malformed), filesystem edges (HF-cache
    symlinks, broken symlinks, nonexistent paths, relative paths),
    shard aggregation (partial multi-shard, two series in same dir,
    mmproj-* exclusion, single-file), lifecycle races, concurrent
    sampling (10 threads x 50 iters against real /proc), fraction
    bounds.
  * test_llama_cpp_load_progress_live.py (5 tests) -- no-mock live
    integration: real subprocess allocating 100 MB to match VmRSS,
    real ready phase, real dead-pid degradation, real shard
    aggregation, repeated polling. Skipped on non-Linux.

Both complement the existing test_llama_cpp_load_progress.py.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Hoist splitProgressLabel out of JSX IIFE (review feedback)

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 10:58:16 -07:00
Eda Z
5b8dbdc3c2
Fix bitsandbytes ROCm install by using pip instead of uv (#4966)
* Fix bitsandbytes ROCm install by using pip instead of uv

* Also use pip for PyPI fallback path in _install_bnb_rocm

The original fix correctly switched the pre-release wheel install from
uv to pip, but left the PyPI fallback path on uv. If uv breaks bnb
on ROCm, the fallback would hit the same issue. Move pip bootstrap
before the branch so both paths use pip consistently.

* Harden pip bootstrap: try ensurepip first, warn on failure

- Try ensurepip --upgrade before falling back to uv pip install pip.
  ensurepip works offline and does not need PyPI, making the bootstrap
  robust when the network or index is unavailable.
- If both ensurepip and uv fail, emit a visible warning instead of
  silently swallowing the error (which previously led to a cryptic
  "No module named pip" downstream).
- Use run_maybe_quiet so --verbose users see bootstrap output.
- Update comment to document the actual root cause: uv rejects the
  wheel because filename version and metadata version disagree.

* Add --isolated to pip install calls in _install_bnb_rocm

uv pip install ignores pip.conf and PIP_* env vars, but python -m pip
reads them. Without --isolated, users with PIP_INDEX_URL pointing to a
private mirror that does not carry bitsandbytes would see the PyPI
fallback fail where it previously worked under uv. --isolated restores
parity with the old uv behavior.

* Drop --isolated from PyPI fallback in _install_bnb_rocm

--isolated suppresses PIP_INDEX_URL, PIP_EXTRA_INDEX_URL, and pip.conf.
This is correct for the pre-release path (hardcoded GitHub URL, no index
consulted), but breaks the PyPI fallback for users in corporate or
air-gapped environments whose only route to bitsandbytes is a private
mirror configured via those mechanisms. Keep --isolated on the direct-URL
pre-release install; drop it from the index-dependent fallback.

* Drop --isolated from pre-release pip install, fix warning wording

--isolated suppresses pip.conf cert/proxy/CA settings in addition to
index config. For the direct GitHub URL, index config is irrelevant but
cert/proxy settings matter in corporate SSL-inspection environments.
Without this fix, users with pip.conf-based CA bundles get a TLS error
on the pre-release download and silently fall back to the broken PyPI
version -- the exact outcome the PR is trying to prevent.

Also fix the fallback warning: "unreachable" is too specific since the
pre-release install can fail for reasons other than network reachability.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-14 10:23:40 -07:00
pre-commit-ci[bot]
a0b9d14081
[pre-commit.ci] pre-commit autoupdate (#5004)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.15.9 → v0.15.10](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.9...v0.15.10)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 09:49:18 -07:00
Daniel Han
bb14ab144a
Studio: live model-load progress + rate/ETA on download and load (#5017)
* Studio: live model-load progress + rate/ETA on download and load

Two UX fixes for the opaque multi-minute wait between clicking Load
and being able to chat, visible most clearly on large MoE GGUFs like
MiniMax-M2.7 (131 GB of weights on a 97 GB GPU):

1. **Model-load phase is now observable.** The existing chat flow
   transitions the toast to "Starting model..." as soon as the
   download hits 100%, then shows a spinner with no other feedback
   until llama-server reports healthy. For a 130 GB model that spinner
   freezes for five-plus minutes while the kernel pages shards into
   the page cache. A new `GET /api/inference/load-progress` endpoint
   samples `/proc/<pid>/status VmRSS` on the llama-server subprocess
   against the sum of shard file sizes on disk, so the UI can render
   a real bar plus rate / ETA during that window.

2. **Rate and ETA on downloads and loads.** Both the chat toast and
   the training-start overlay used to show a static pair of numbers
   (for example "15.4 of 140.8 GB"). A rolling 15-second window over
   the existing byte-series now surfaces "85.3 MB/s, 24m 23s left"
   beside that pair. The estimator is shared between the download
   and load phases so the numbers don't reset when the phase flips.

Also fixes a pre-existing assignment bug uncovered while wiring this
up: `load_model` was storing the caller's `gguf_path` kwarg into
`self._gguf_path`, which is `None` on the HF-download code path. The
resolved on-disk path (`model_path`) is what llama-server actually
mmaps; downstream consumers need that. No existing reader used
`_gguf_path`, so this is a correctness fix for the new endpoint.

- Backend: `LlamaCppBackend.load_progress()`, `GET /api/inference/load-progress`, `LoadProgressResponse` Pydantic model.
- Frontend: `useTransferStats` hook, `formatRate` / `formatEta` helpers, `getLoadProgress` client, rewired chat toast and `DownloadRow` in the training overlay.
- Tests: `studio/backend/tests/test_llama_cpp_load_progress.py` covers empty states, mmap phase, ready phase, sharded total aggregation, missing gguf_path, and unreadable /proc (7 cases). `tsc -b` and `vite build` on the frontend both clean.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 09:46:22 -07:00
Roland Tannous
514bb3a20e
studio: pin peft to 0.18.1 to fix export subprocess issues (#5015)
* studio: pin peft to 0.18.1 to fix export subprocess issues

peft 0.19.0 causes export subprocess shutdown failures in Studio.
Reverting to 0.18.1 resolves the issue.

* studio: move peft pin to extras-no-deps to prevent torch upgrade

Installing peft via overrides.txt would resolve its deps and pull in
torch>=0.11.0, breaking other pinned packages. Moving the pin to
extras-no-deps.txt ensures --no-deps is used during install.
2026-04-14 20:16:30 +04:00
Datta Nimmaturi
4328d0b4f6
Fix num_items_in_batch GA for Gemma4 (#4998)
* Fix num_items_in_batch GA for Gemma4

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 09:01:10 -07:00
Daniel Han
7252410ccc
studio: stream export worker output into the export dialog (#4897)
* studio: stream export worker output into the export dialog

The Export Model dialog only showed a spinner on the "Exporting..."
button while the worker subprocess was doing the actual heavy lifting.
For Merged to 16bit and GGUF / Llama.cpp exports this meant several
minutes (or more, for large models) of opaque silence, with no way to
tell whether save_pretrained_merged, convert_hf_to_gguf.py, or
llama-quantize was making progress.

This adds a live terminal-style output panel inside the export dialog,
rendered just above the Cancel / Start Export buttons and scrollable
with auto-follow-tail. It shows stdout and stderr from both the worker
process itself and any child process it spawns (GGUF converter,
llama-quantize), coloured by stream.

Backend

- core/export/worker.py: new _setup_log_capture(resp_queue) installed
  before LogConfig.setup_logging. It saves the original stdout/stderr
  fds, creates pipes, os.dup2's the write ends onto fds 1 and 2 (so
  every child process inherits the redirected fds), and spins up two
  daemon reader threads. Each thread reads bytes from a pipe, echoes
  them back to the original fd (so the server console keeps working),
  splits on \n and \r, and forwards each line to the resp queue as
  {"type":"log","stream":"stdout|stderr","line":...,"ts":...}.
  PYTHONUNBUFFERED=1 is set so nested Python converters flush
  immediately.

- core/export/orchestrator.py:
  - Thread-safe ring buffer (collections.deque, maxlen 4000) with a
    monotonically increasing seq counter. clear_logs(),
    get_logs_since(cursor), get_current_log_seq(), is_export_active().
  - _wait_response handles rtype == "log" by appending to the buffer
    and continuing the wait loop. Status messages are also surfaced as
    a "status" stream so users see high level progress alongside raw
    subprocess output.
  - load_checkpoint, _run_export, and cleanup_memory now wrap their
    bodies with the existing self._lock (previously unused), clear the
    log buffer at the start of each op, and flip _export_active in a
    try/finally so the SSE endpoint can detect idle.

- routes/export.py:
  - Wrapped every sync orchestrator call (load_checkpoint,
    cleanup_memory, export_merged_model, export_base_model,
    export_gguf, export_lora_adapter) in asyncio.to_thread so the
    FastAPI event loop stays free during long exports. Without this
    the new SSE endpoint could not be served concurrently with the
    blocking export POST.
  - New GET /api/export/logs/stream SSE endpoint. Honors
    Last-Event-ID and a since query param for reconnect, emits log /
    heartbeat / complete / error events, uses the id field to carry
    the log seq so clients can resume cleanly. On first connect
    without an explicit cursor it starts from the current seq so old
    lines from a previous run are not replayed.

Frontend

- features/export/api/export-api.ts: streamExportLogs() helper that
  authFetches the SSE endpoint and parses id / event / data fields
  manually (same pattern as streamTrainingProgress in train-api.ts).

- features/export/components/export-dialog.tsx:
  - Local useExportLogs(exporting) hook that opens the SSE stream on
    exporting transitions to true, accumulates up to 4000 lines in
    component state, and aborts on cleanup.
  - New scrollable output panel rendered above DialogFooter, only
    shown for Merged to 16bit and GGUF / Llama.cpp (LoRA adapter is
    a fast disk write with nothing to show). Dark terminal styling
    (bg-black/85, emerald text, rose for stderr, sky for status),
    max-height 14rem, auto-scrolls to the bottom on new output but
    stops following if the user scrolls up. A small streaming / idle
    indicator is shown next to the panel title.
  - DialogContent widens from sm:max-w-lg to sm:max-w-2xl when the
    output panel is visible so the logs have room to breathe.

Verified

- Python smoke test (tests/smoke_export_log_capture.py): spawns a
  real mp.get_context("spawn") process, installs _setup_log_capture,
  confirms that parent stdout prints, parent stderr prints, AND a
  child subprocess invoked via subprocess.run (both its stdout and
  stderr) are all captured in the resp queue. Passes.
- Orchestrator log helpers tested in isolation: _append_log,
  get_logs_since (with and without a cursor), clear_logs not
  resetting seq so reconnecting clients still progress. Passes.
- routes.export imports cleanly in the studio venv and /logs/stream
  shows up in router.routes.
- bun run build: tsc -b plus vite build, no TypeScript errors.

No existing export behavior is changed. If the subprocess, the SSE
endpoint, or the frontend hook fails, the export itself still runs to
completion the same way it did before, with or without logs visible.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* export dialog: trim bootstrap noise, scope logs per screen, show realpath

Several follow-ups to the live export log work:

1. Worker bootstrap noise (transformers venv activation, Unsloth banner,
   "Top GGUF/hub models" lists, vision detection, 2k-step weight load
   bar) is dropped from the export-dialog stream. A threading.Event
   gate in worker.py defaults closed and only opens once _handle_export
   actually starts; until then the reader thread still echoes lines to
   the saved console fd for debugging but does not push them onto the
   resp_queue. The orchestrator already spawns a fresh subprocess for
   every checkpoint load, so the gate is naturally reset between runs.

2. tqdm in non-tty mode defaults to a 10s mininterval, which makes
   multi-step bars look frozen in the panel. Set TQDM_MININTERVAL=0.5
   in the worker env so any tqdm-driven progress emits more often.

3. The dialog's useExportLogs hook now also clears its line buffer
   when exportMethod or open changes, so re-opening the dialog into a
   different action's screen no longer shows the previous action's
   saved output. A useElapsedSeconds tick + "Working Xs" badge in the
   log header gives users a visible sign that long single-step phases
   (cache copies, GGUF conversion) are still running when no new lines
   are arriving.

4. ExportBackend.export_{merged,base,gguf,lora} now return
   (success, message, output_path); the worker forwards output_path on
   each export_*_done response, the orchestrator's _run_export passes
   it to routes/export.py, which surfaces it via
   ExportOperationResponse.details.output_path. The dialog's Export
   Complete screen renders the resolved on-disk realpath under "Saved
   to" so users can find their exported model directly.

* fix(cli): unpack 3-tuple return from export backend

ExportOrchestrator.export_{merged,base,gguf,lora} now return
(success, message, output_path) so the studio dialog can show
the on-disk realpath. The CLI still unpacked 2 values, so every
`unsloth export --format ...` crashed with ValueError before
reporting completion. Update the four call sites and surface
output_path via a "Saved to:" echo.

* fix(studio): anchor export log SSE cursor at run start

The export dialog SSE defaulted its cursor to get_current_log_seq()
at connect time, so any line emitted between the POST that kicks
off the export and the client opening the stream was buffered with
seqs 1..k and then skipped (seq <= cursor). Long-running exports
looked silent during their first seconds.

Snapshot _log_seq into _run_start_seq inside clear_logs() and
expose it via get_run_start_seq(). The SSE default cursor now uses
that snapshot, so every line emitted since the current run began
is reachable regardless of when the client connects. Old runs
still can't leak in because their seqs are <= the snapshot.

* fix(studio): reconnect export log SSE on stream drop

useExportLogs launched streamExportLogs once per exporting
transition and recorded any drop in .catch(). Long GGUF exports
behind a proxy with an idle kill-timeout would silently lose the
stream for the rest of the run even though the backend already
supports Last-Event-ID resume. The "retry: 3000" directive emitted
by the backend is only meaningful to native EventSource; this
hook uses a manual fetch + ReadableStream parse so it had no
effect.

Wrap streamExportLogs in a retry loop that tracks lastSeq from
ExportLogEvent.id and passes it as since on reconnect. Backoff is
exponential with jitter, capped at 5s, reset on successful open.
The loop stops on explicit backend `complete` event or on effect
cleanup.

* fix(studio): register a second command so Typer keeps `export` as a subcommand

The CLI export unpacking tests wrap `unsloth_cli.commands.export.export`
in a fresh Typer app with a single registered command. Typer flattens a
single-command app into that command, so the test's
`runner.invoke(cli_app, ["export", ckpt, out, ...])` treats the leading
`"export"` token as an unexpected extra positional argument -- every
parametrized case failed with:

    Got unexpected extra argument (.../out)

Register a harmless `noop` second command so Typer preserves subcommand
routing and the tests actually exercise the 3-tuple unpack path they
were written to guard.

Before: 4 failed
After:  4 passed

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: studio-install <studio@local.install>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
2026-04-14 08:55:43 -07:00
Daniel Han
eca592effe
studio: show HF model download progress in training start overlay (#4894)
* studio: show HF model download progress in training start overlay

During the training setup phase, the overlay only displayed a static
"Loading model..." line while model weights were being downloaded from
Hugging Face. On slow connections this looked like the app had frozen.

This adds a small self-contained progress block inside the existing
TrainingStartOverlay that polls the existing
GET /api/models/download-progress endpoint and renders a Progress bar
with bytes downloaded, total bytes, and percent complete.

Notes:

- Frontend only change. No backend, worker, SSE, or runtime store edits.
- Reuses the existing getDownloadProgress client wrapper and the
  existing /api/models/download-progress endpoint that already scans
  the HF blob cache for completed and .incomplete files.
- selectedModel is read directly from useTrainingConfigStore inside the
  overlay, so no prop drilling and live-training-view.tsx is unchanged.
- Polling runs at 1500 ms and is gated on the HF repo regex
  (^[A-Za-z0-9._-]+/[A-Za-z0-9._-]+$), the same regex the backend uses,
  so local paths and empty form state never hit the endpoint.
- Polling stops once progress reaches 1.0 so the bar can stay at 100
  until the overlay hides on the first training step.
- Network errors are silently swallowed, matching the chat side flow
  (the bar simply freezes at the last value).
- When downloadedBytes is 0 the block is hidden entirely, so cached
  models do not flash a progress bar.
- When the HF API cannot determine the total size, the block falls
  back to "X downloaded" with no percent and no bar.

Verified with bun run build (tsc -b plus vite build, no TypeScript
errors).

* training overlay: track dataset download + show on-disk realpath

Adds a dedicated "Downloading dataset..." section to the training-start
overlay alongside the existing model-weights one, so an HF dataset that
is downloading mid-startup is no longer mislabeled as model weights or
hidden entirely. The new GET /api/datasets/download-progress endpoint
mirrors /api/models/download-progress against the datasets-- prefix in
HF_HUB_CACHE.

Both endpoints now also return cache_path, the resolved on-disk
realpath of the snapshot directory (or the cache repo root if no
snapshot is materialized yet). The overlay surfaces this under each
download row so users can immediately see where the model and dataset
landed without digging through server logs.

The frontend's existing useModelDownloadProgress hook is generalized
to a single useHfDownloadProgress(repoId, fetcher) hook that the
model and dataset variants both delegate to, keeping polling, gating,
and completion semantics in one place.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Studio: Polish training start overlay download progress UI (#4957)

* studio: polish training start overlay download progress visuals

* Fix formatCachePath cross-platform support and redundant sizeLabel

- Extend formatCachePath regex to also shorten macOS /Users/<user> paths to ~
- Suppress sizeLabel when no byte info is available (cachePath-only state),
  since the "Preparing" badge already conveys the status

* Fix misleading status badge when download total is unknown

- Hide badge when totalBytes is 0 but downloadedBytes > 0, since we cannot
  determine if the download is still in progress or already complete (happens
  when HF size metadata lookup fails for gated/private repos)
- Keep "Preparing" badge for the zero-bytes cachePath-only state
- Add Windows native path shortening to formatCachePath (C:\Users\<name>)

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

---------

Co-authored-by: studio-install <studio@local.install>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
2026-04-14 08:54:01 -07:00
Daniel Han
44082cf88e
Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM (#5014)
* Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM

The chat settings sheet's ctx slider reads `max_context_length` from
`/api/inference/status` and renders

    Exceeds estimated VRAM capacity (N tokens). The model may use
    system RAM.

when the user drags the slider above that value. For models whose
weights fit on some GPU subset, `_max_context_length` was already set
to the binary-search cap and the warning fired correctly.

For models whose weights exceed 90% of every GPU subset's free memory
(e.g. MiniMax-M2.7-GGUF at 131 GB on a 97 GB GPU), the ceiling-probe
loop never matched a subset, so `max_available_ctx` stayed at the
native context (e.g. 196608). The slider ran all the way to native
with no indication that any value above the 4096 spec default would
trigger `--fit on` and degrade performance.

Anchor `max_available_ctx` at `min(4096, native_context_length)` when
no subset fits, so the warning fires at the right threshold and the
user sees the correct safe-zone / warning-zone split:

    Before (MiniMax-M2.7 on 97 GB GPU):
      slider 0 .. 196608, warning threshold = 196608  (never fires)

    After:
      slider 0 .. 196608, warning threshold = 4096    (fires correctly)

No frontend changes required: `chat-settings-sheet.tsx` already
consumes `ggufMaxContextLength` (= status.max_context_length) as the
warning threshold and `ggufNativeContextLength` as the slider max.

Adds tests/test_llama_cpp_max_context_threshold.py covering
weights-exceed-VRAM (single / multi-GPU), a native-ctx below the 4096
fallback case (don't lie about supported ctx), fittable-model
regressions (small / multi-GPU / tiny on huge GPU), and the
`max_context_length` property's fallback semantics.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 08:53:49 -07:00
Daniel Han
b2f80f210e
Studio: make GGUF disk-space preflight cache-aware (#5012)
* Studio: make GGUF disk-space preflight cache-aware

The pre-download disk check in LlamaCppBackend.load_model compared the
repo's total GGUF size against free disk without crediting bytes
already present in the Hugging Face cache. Re-loading a large cached
model (e.g. MiniMax-M2.7-GGUF at 131 GB) then failed cold with
"Not enough disk space to download any variant" whenever free disk
was below the full weight footprint, even though nothing actually
needed to be downloaded.

Subtract bytes already on disk via try_to_load_from_cache before
comparing against free space. A partial blob (interrupted download) is
not credited, so a second attempt still allocates room to finish the
download. The log line now also surfaces how much is already cached.

Adds tests/test_llama_cpp_cache_aware_disk_check.py covering the
fully-cached, partial-cache-insufficient-disk, partial-cache-enough-disk,
cold-cache, incomplete-blob, and zero-size-path-info cases. Sparse
tempfiles keep the GB-scale scenarios cheap to simulate.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 08:53:37 -07:00
Daniel Han
767fa8cade
Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM (#5011)
* Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM

The load-time auto-fit in LlamaCppBackend.load_model had two issues for
models whose weights do not fit on any GPU subset (the common case for
large MoE GGUFs such as MiniMax-M2.7, Qwen3.5-397B-A17B, etc.):

1. Auto mode (max_seq_length=0) left effective_ctx at the model's native
   context when no subset passed the 90% fit check. The UI slider then
   landed on e.g. 196608 for MiniMax-M2.7, far above anything usable.
   Default the auto-pick to 4096 so the UI starts at a sane value; the
   slider ceiling stays at the native context so the user can still
   opt in to longer contexts and receive the "might be slower" warning.

2. Explicit ctx was silently shrunk when weights fit but the requested
   KV overflowed the 90% budget. The shrink loop emitted -c <capped>
   -ngl -1 without informing the caller, so a user who had opted into
   a longer context via the UI never actually got it. Drop the shrink
   loop on the explicit path and emit -c <user_ctx> --fit on instead,
   letting llama-server flex -ngl (CPU layer offload).

Adds tests/test_llama_cpp_context_fit.py covering both paths, the
file-size-only fallback when KV metadata is missing, non-regression on
fittable auto-pick, and platform-agnostic input shape.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 08:53:25 -07:00
TF-MTGE
a31c82a640
fix(studio): remove 300s cap on load_checkpoint (inherits 3600s default) (#4922)
* fix: increase wait response timeout to 900 sec instead of 300 sec. #4845

* Apply suggestion from @gemini-code-assist[bot]

good catch

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-14 08:53:14 -07:00
Datta Nimmaturi
da78c6be71
[Studio] Install flash attn at setup time for linux (#4979)
* [Studio] Install flash attn at setup time for linux

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleanup changes

Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Test cases

* wheel_utils: narrow url_exists exceptions and log at debug level

---------

Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
2026-04-14 16:40:17 +04:00
Datta Nimmaturi
dccc0ebada
[Studio] Show non exported models in chat UI (#4892)
* Show non exported models in chat UI

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Distinguish b/w LoRa and full fine tune saves. Cleanup

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
2026-04-14 15:03:58 +04:00
Bharath Kumar Adinarayan
a50f61009b
fix(studio): default chart view to full training history (#5007)
* fix(studio): default chart view to full training history instead of last 80 steps

Fixes #5003

* chore: windowsize as null code comment

---------

Co-authored-by: imagineer99 <samleejackson0@gmail.com>
Co-authored-by: Wasim Yousef Said <wasimysdev@gmail.com>
2026-04-14 03:29:27 -07:00
Lee Jackson
bfa17330bd
Studio: Polish API key copy button and harden async clipboard fallback (#5006)
* fix: polish clipboard style and fix async clipboard path

* Use copyToClipboardAsync in CopyButton for Safari fallback

CopyButton was calling navigator.clipboard.writeText directly,
bypassing the execCommand fallback added in this same PR. Switch
to copyToClipboardAsync which tries execCommand first (Safari
user-gesture requirement) then falls back to the async clipboard API.

* Fix copyToClipboard sync contract regression and improve async path

- Restore copyToClipboard() to return only the execCommand result,
  preserving the boolean contract that 7 existing callers depend on
  to gate their "Copied!" UI state. The fire-and-forget async fallback
  was returning true before the promise resolved, causing false success.

- Add document.body null guard to copyWithExecCommand for SSR safety.

- Reorder copyToClipboardAsync to try the async Clipboard API first,
  avoiding unnecessary DOM/focus overhead in Radix focus-trapped dialogs
  where execCommand always fails anyway.

* Restore queryCommandSupported guard and fix async catch path

- Restore the queryCommandSupported("copy") guard in copyToClipboard()
  to match the original contract exactly: when execCommand is entirely
  unsupported, fall through to fire-and-forget async clipboard write.

- Fix copyToClipboardAsync catch block: after navigator.clipboard.writeText
  rejects, the user-gesture frame is gone, so execCommand will also fail.
  Return false from catch instead of falling through. The execCommand
  fallback at the bottom only runs when the Clipboard API is absent
  (still in user-gesture frame).

* Restore execCommand fallback in copyToClipboardAsync catch path

The catch block was returning false after clipboard API rejection,
based on the incorrect premise that the user-gesture frame is lost
after an await. Per the HTML spec, transient user activation IS
preserved through promise microtask chains. The real reason
execCommand fails in the Radix dialog is the focus trap intercepting
textarea.focus(), not gesture loss.

For non-dialog callers, execCommand can still succeed after a
clipboard rejection. Inside a Radix modal, execCommand returns
false harmlessly (focus trap blocks it).

* Harden textarea fallback for mobile and continue to async path on failure

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
2026-04-14 14:22:14 +04:00
Wasim Yousef Said
97eafd999e
studio: fix api-keys access + refresh (#5005)
* studio: fix api-keys access + refresh

* studio: guard v1 in spa fallback
2026-04-13 23:48:51 +04:00
AdamPlatin123
d2fc582840
studio: skip training status/metrics polling when idle (#4988)
* fix(studio): skip training status/metrics polling when idle

Add an early return in the status and metrics setInterval callbacks when
the runtime store reports phase === "idle" and hasHydrated is true.
Previously these polls fired unconditionally every 3s/5s, generating
unnecessary network traffic and console errors when no training was
running.

* fix(studio): reduce idle polling to 30s instead of stopping entirely

Review feedback (PR #4988): completely stopping polling when idle risks
permanent UI desync if hydration fails, and misses out-of-band state
changes from other clients. Add a 30s background poll that only fires
when idle to recover gracefully.

* fix: harden idle status polling around hydration and runtime reset

---------

Co-authored-by: AdamPlatin123 <AdamPlatin123@users.noreply.github.com>
Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
Co-authored-by: imagineer99 <samleejackson0@gmail.com>
2026-04-13 12:02:12 -07:00
Daniel Han
9a261aec5f
Studio: Expose openai and anthropic compatible external API end points (#4956)
* Studio: add API key authentication for programmatic access

External users want to hit the Studio API (chat completions with tool
calling, training, export, etc.) without going through the browser
login flow. This adds sk-unsloth- prefixed API keys that work as a
drop-in replacement for JWTs in the Authorization: Bearer header.

Backend:
- New api_keys table in SQLite (storage.py)
- create/list/revoke/validate functions with SHA-256 hashed storage
- API key detection in _get_current_subject before the JWT path
- POST/GET/DELETE /api/auth/api-keys endpoints on the auth router

Frontend:
- /api-keys page with create form, one-time key reveal, keys table
- API Keys link in desktop and mobile navbar
- Route registered with requireAuth guard

Zero changes to any existing route handler -- every endpoint that uses
Depends(get_current_subject) automatically works with API keys.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use actual origin in API key usage examples

The examples on /api-keys were hardcoded to localhost:8888 which is
wrong for remote users. Use window.location.origin so the examples
show the correct URL regardless of where the user is connecting from.

* Add `unsloth studio run` CLI command for one-liner model serving

Adds a `run` subcommand that starts Studio, loads a model, creates an
API key, and prints a ready-to-use curl command -- similar to
`ollama run` or `vllm serve`.

Usage: unsloth studio run -m unsloth/Qwen3-1.7B-GGUF --gguf-variant UD-Q4_K_XL

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add end-to-end tests for `unsloth studio run` and API key usage

Tests the 4 usage examples from the API Keys page:
1. curl basic (non-streaming) chat completions
2. curl streaming (SSE) chat completions
3. OpenAI Python SDK streaming completions
4. curl with tools (web_search + python)

Also tests --help output, invalid key rejection, and no-key rejection.
All 7 tests pass against Qwen3-1.7B-GGUF.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add /v1/completions, /v1/embeddings, /v1/responses endpoints and --parallel support

- llama_cpp.py: accept n_parallel param, pass to llama-server --parallel
- run.py: plumb llama_parallel_slots through to app.state
- inference.py: add /completions and /embeddings as transparent proxies to
  llama-server, add /responses as application-level endpoint that converts
  to ChatCompletionRequest; thread n_parallel through load_model
- studio.py: set llama_parallel_slots=4 for `unsloth studio run` path

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Make /v1/responses endpoint match OpenAI Responses API format

The existing /v1/responses shim returned Chat Completions format, which
broke OpenAI SDK clients using openai.responses.create(). This commit
replaces the endpoint with a proper implementation that:

- Returns `output` array with `output_text` content parts instead of
  `choices` with `message`
- Uses `input_tokens`/`output_tokens` instead of `prompt_tokens`/
  `completion_tokens` in usage
- Sets `object: "response"` and `id: "resp_..."`
- Emits named SSE events for streaming (response.created,
  response.output_text.delta, response.completed, etc.)
- Accepts all OpenAI Responses API fields (tools, store, metadata,
  previous_response_id) without erroring -- silently ignored
- Maps `developer` role to `system` and `input_text`/`input_image`
  content parts to the internal Chat format

Adds Pydantic schemas for request/response models and 23 unit tests
covering schema validation, input normalisation, and response format.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Studio: add Anthropic-compatible /v1/messages endpoint (#4981)

* Add Anthropic-compatible /v1/messages endpoint with tool support

Translate Anthropic Messages API format to/from internal OpenAI format
and reuse the existing server-side agentic tool loop. Supports streaming
SSE (message_start, content_block_delta, etc.) and non-streaming JSON.
Includes offline unit tests and e2e tests in test_studio_run.py.

* Add enable_tools, enabled_tools, session_id to /v1/messages endpoint

Support the same shorthand as /v1/chat/completions: enable_tools=true
with an optional enabled_tools list uses built-in server tools without
requiring full Anthropic tool definitions. session_id is passed through
for sandbox isolation. max_tokens is now optional.

* Strip leaked tool-call XML from Anthropic endpoint content

Apply _TOOL_XML_RE to content events in both streaming and
non-streaming tool paths, matching the OpenAI endpoint behavior.

* Emit custom tool_result SSE event in Anthropic stream

Adds a non-standard tool_result event between the tool_use block close
and the next text block, so clients can see server-side tool execution
results. Anthropic SDKs ignore unknown event types.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Split /v1/messages into server-side and client-side tool paths

enable_tools=true runs the existing server-side agentic loop with
built-in tools (web_search/python/terminal). A bare tools=[...] field
now triggers a client-side pass-through: client-provided tools are
forwarded to llama-server and any tool_use output is returned to the
caller with stop_reason=tool_use for client execution.

This fixes Claude Code (and any Anthropic SDK client) which sends
tools=[...] expecting client-side execution but was previously routed
through execute_tool() and failing with 'Unknown tool'.

Adds AnthropicPassthroughEmitter to convert llama-server OpenAI SSE
chunks into Anthropic SSE events, plus unit tests covering text
blocks, tool_use blocks, mixed, stop reasons, and usage.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix httpcore GeneratorExit in /v1/messages passthrough stream

Explicitly aclose aiter_lines() before the surrounding async with
blocks unwind, mirroring the prior fix in external_provider.py
(a41160d3) and cc757b78's RuntimeError suppression.

* Wire stop_sequences through /v1/messages; warn on tool_choice

Plumb payload.stop_sequences to all three code paths (server-side
tool loop, no-tool plain, client-side passthrough) so Anthropic SDK
clients setting stop_sequences get the behavior they expect. The
llama_cpp backend already accepted `stop` on both generate_chat_
completion and generate_chat_completion_with_tools; the Anthropic
handler simply wasn't passing it.

tool_choice remains declared on the request model for Anthropic SDK
compatibility (the SDK often sets it by default) but is not yet
honored. Log a structured warning on each request carrying a non-
null tool_choice so the silent drop is visible to operators.

* Wire min_p / repetition_penalty / presence_penalty through /v1/messages

Align the Anthropic endpoint's sampling surface with /v1/chat/completions.
Adds the three fields as x-unsloth extensions on AnthropicMessagesRequest
and threads them through all three code paths: server-side tool loop,
no-tool plain, and client-side passthrough.

The passthrough builder emits "repeat_penalty" (not "repetition_penalty")
because that is llama-server's field name; the backend methods already
apply the same rename internally.

* Fix block ordering and prev_text reset in non-streaming tool path

_anthropic_tool_non_streaming was building the response by appending
all tool_use blocks first, then a single concatenated text block at
the end — losing generation order and merging pre-tool and post-tool
text into one block. It also never reset prev_text between synthesis
turns, so the first N characters of each post-tool turn were dropped
(where N = length of the prior turn's final cumulative text).

Rewrite to build content_blocks incrementally in generation order,
matching the streaming emitter's behavior: deltas within a turn are
merged into the trailing text block, tool_use blocks interrupt the
text sequence, and prev_text is reset on tool_end so turn N+1 diffs
against an empty baseline.

Caught by gemini-code-assist[bot] review on #4981.

* Make test_studio_run.py e2e tests pytest-compatible

Add a hybrid session-scoped studio_server fixture in conftest.py that
feeds base_url / api_key into the existing e2e test functions. Three
invocation modes are now supported:

1. Script mode (unchanged) — python tests/test_studio_run.py
2. Pytest + external server — point at a running instance via
   UNSLOTH_E2E_BASE_URL / UNSLOTH_E2E_API_KEY env vars, no per-run
   GGUF load cost
3. Pytest + fixture-managed server — pytest drives _start_server /
   _kill_server itself via --unsloth-model / --unsloth-gguf-variant,
   CI-friendly

The existing _start_server / _kill_server helpers and main() stay
untouched so the script entry point keeps working exactly as before.
Test function signatures are unchanged — the (base_url, api_key)
parameters now resolve via the new fixtures when running under
pytest.

* Rename test_studio_run.py -> test_studio_api.py

The file is entirely about HTTP API endpoint testing (OpenAI-compatible
/v1/chat/completions, Anthropic-compatible /v1/messages, API key auth,
plus a CLI --help sanity check on the command that runs the API). None
of its tests cover training, export, chat-UI, or internal-Python-API
concerns.

The old name misleadingly suggested "tests for the unsloth studio run
CLI subcommand" — the new name reflects the actual scope.

Updates:
- git mv the file (rename tracked, history preserved)
- Rewrite opening docstring to state the API surface focus and call
  out what is explicitly out of scope
- Update all 4 Usage-block path references to the new filename
- LOG_FILE renamed to test_studio_api.log
- conftest.py fixture import rewritten from test_studio_run to
  test_studio_api, plus 7 docstring/comment references updated

No functional changes to test logic, signatures, or main().

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix httpcore asyncgen cleanup in /v1/messages and /v1/completions

The earlier fix in 985e92a9 was incomplete: it closed aiter_lines()
explicitly but still used `async with httpx.AsyncClient()` /
`async with client.stream()` inside the generator. When the generator
is orphaned (e.g. client disconnects mid-stream and Starlette drops
the StreamingResponse iterator without explicitly calling aclose()),
Python's asyncgen finalizer runs the cleanup in a DIFFERENT task than
the one that originally entered the httpx context managers. The
`async with` exits then trigger httpcore's HTTP11ConnectionByteStream
.aclose(), which enters anyio.CancelScope.__exit__ with a mismatched
task and raises RuntimeError("Attempted to exit cancel scope in a
different task"). That error escapes any user-owned try/except
because it happens during GC finalization.

Replace `async with` with manual client/response lifecycle in both
/v1/messages passthrough and /v1/completions proxy. Close the
response and client in a finally block wrapped in
`try: ... except Exception: pass`. This suppresses RuntimeError (and
other Exception subclasses) from the anyio cleanup noise while
letting GeneratorExit (a BaseException, not Exception) propagate
cleanly so the generator terminates as Python expects.

Traceback observed in user report:
  File ".../httpcore/_async/connection_pool.py", line 404, in __aiter__
      yield part
  RuntimeError: async generator ignored GeneratorExit
...
  File ".../anyio/_backends/_asyncio.py", line 455, in __exit__
      raise RuntimeError(
  RuntimeError: Attempted to exit cancel scope in a different task

* Expand unsloth studio run banner with SDK base URL and more curl examples

Add an explicit "OpenAI / Anthropic SDK base URL" line inside the info
box so SDK users don't accidentally copy the bare server URL (without
/v1) into their OpenAI/Anthropic SDK constructors and hit 404s.

Replace the single /v1/chat/completions curl example with three
labeled blocks: chat/completions, Anthropic /messages, and OpenAI
Responses. The Anthropic example includes max_tokens (Anthropic SDKs
require it even though Studio accepts None).

All examples derived from a computed sdk_base_url so the /v1 prefix
stays in sync if the public path ever changes.

* Hash API keys with HMAC-SHA256 + persistent server secret

Stores the HMAC secret in a new app_secrets singleton table. Fixes
CodeQL py/weak-sensitive-data-hashing alert on storage.py:74-76,
394-395. Refresh tokens stay on plain SHA-256 (unchanged _hash_token)
so existing user sessions survive upgrade — API keys are new on this
branch so there is no migration.

* Use PBKDF2 for API key hashing per CodeQL recommendation

HMAC-SHA256 was still flagged by py/weak-sensitive-data-hashing.
Switch to hashlib.pbkdf2_hmac, which is in CodeQL's recommended
allowlist (Argon2/scrypt/bcrypt/PBKDF2). Persistent server-side
salt stays in app_secrets for defense-in-depth. 100k iterations to
match auth/hashing.py's password hasher.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
2026-04-13 21:08:11 +04:00
Roland Tannous
3bb72a557f
Pin kernels==0.12.1 to avoid huggingface_hub dataclass conflict (#5000) 2026-04-13 20:42:02 +04:00
Lee Jackson
21a7895959
Studio: Prompt manager, message deletion, and chat UI improvements (#4938)
* feat(chat): code block styling, delete with Dexie sync, settings sheet polish

* style: config save/delete padding fix

* fix(studio): centralize dark code-block surface and optimize message sync writes

* style: config padding/alignment polish

* fix(studio): upsert custom presets without implicit rename-delete

* fix settings sheet save state polish

* fix settings sheet button widths

* fix chat settings presets

* fix chat delete sync

* fix chat trust remote code flow

---------

Co-authored-by: shine1i <wasimysdev@gmail.com>
2026-04-13 16:42:33 +02:00
AdamPlatin123
3b092bcd46
fix(studio): prevent route transition DOM duplication via AnimatePresence (#4987)
Add mode="wait" and exit={{ opacity: 0 }} to the root AnimatePresence
wrapper so outgoing routes fully unmount before incoming routes render.
Without this, rapid navigation between Studio/Export/Recipes/Chat caused
pages to stack (2x–3x duplication).

Co-authored-by: AdamPlatin123 <AdamPlatin123@users.noreply.github.com>
Co-authored-by: Wasim Yousef Said <wasimysdev@gmail.com>
2026-04-13 01:38:00 -07:00
Manan Shah
80c12ff1a6
Move gemma4 script (#4994)
* updating gemma4 script

* moving gemma4 script to scripts folder
2026-04-12 23:41:15 -07:00
Manan Shah
db3b3a4d9b
updating gemma4 script (#4992)
* updating gemma4 script

* show errors
2026-04-12 23:11:32 -07:00
Daniel Han
93a24f6698
Add ROCm test suite for PR #4720 (#4824)
95 Python tests and 23 shell tests covering ROCm detection,
torch index URL selection, hardware flags, prebuilt asset selection,
and install pathway logic. All tests use mocks -- no AMD hardware required.

Companion to #4720 (AMD ROCm/HIP support).
2026-04-11 04:44:13 -07:00
Daniel Han
53af4a1b3e
Fix Gemma-4 GRPO catastrophic KL divergence with TRL 1.0.0+ (#4934)
* Fix Gemma-4 GRPO catastrophic KL divergence with TRL 1.0.0+

Two compounding bugs caused Gemma-4 GRPO training to diverge with KL ~10^12
at step 1 against TRL 1.0.0+. Both fixes are runtime patches in the existing
TRL/model patch flow and are no-ops for models and TRL versions that are not
affected.

Fix 1 (rl.py): replace trl.models.utils.disable_gradient_checkpointing with
a no-op context manager. TRL 1.0.0+ wraps generation in
`with torch.no_grad(), disable_gradient_checkpointing(self.model, ...):`
purely to suppress a cosmetic PyTorch warning ("None of the inputs have
requires_grad=True"). Inside torch.no_grad() the gradient checkpointing
state has no functional effect on the forward pass. On context exit, TRL
calls model.gradient_checkpointing_enable() which dispatches to HF's
generic implementation and overwrites Unsloth's custom
`use_gradient_checkpointing="unsloth"` wrapper, corrupting Gemma-4 forward
numerics. Replacing the toggle with a no-op preserves Unsloth's custom GC
wrapper across generation passes. The patch walks sys.modules dynamically
to also rebind the symbol on every trl.* module that already imported it
(grpo_trainer, dpo_trainer, rloo_trainer, dppo_trainer, gfpo_trainer,
grpo_with_replay_buffer_trainer, and any future trainer module).

Fix 2 (vision.py): inject `final_logit_softcapping` from `config.text_config`
into the top-level `model.config` for multimodal models. Unsloth's GRPO
trainer reads `getattr(model.config, "final_logit_softcapping", 0)` but
for Gemma-4 the attribute lives only on the nested `Gemma4TextConfig`,
so the lookup silently defaults to 0 instead of 30.

Backwards compatibility:
- trl 0.22.2: no `disable_gradient_checkpointing` symbol exists, the patch
  early-returns via `hasattr` guard.
- trl 0.27.1: same broken pattern as 1.0.0, the noop replacement is correct.
- trl 1.0.0+: end-to-end verified on `unsloth/gemma-4-E2B-it` GRPO with TRL
  1.0.0 and transformers 5.5.0. Step 1 loss=2.46e-08, kl=2.92e-05 (machine
  zero) vs broken baseline loss=1.37e+06, kl=1.76e+09.
- Llama / non-VLM text models: Fix 2 is a no-op (no `text_config`); Fix 1
  is functionally identical (Unsloth's GC wrapper is preserved).
- Qwen3-VL and other VLMs without final_logit_softcapping: Fix 2 is a no-op
  (text_config.final_logit_softcapping is None).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply loop 1 review fixes for PR #4934

- Move Fix 2 from vision.py to rl_replacements.py:858 and :1110 at the
  actual consumer sites. This avoids mutating model.config (which could
  leak into save_pretrained output) and covers text-only Gemma-4 paths
  that do not flow through FastBaseModel.from_pretrained.
- Revert the vision.py injection block entirely.
- Narrow the bare except blocks in patch_trl_disable_gradient_checkpointing
  from `except Exception:` to `(AttributeError, ImportError)` and
  `(AttributeError, TypeError)` to avoid masking unrelated bugs.
- Add logger.warning_once when the noop patch is installed, matching
  patch_trl_openenv and patch_trl_vllm_generation convention.
- Remove the dead per-module `_unsloth_noop_patched` sentinel check inside
  the sys.modules walk. The function-level early return already covers
  this case.
- Move `import sys` and `from contextlib import contextmanager` to the
  module-level imports instead of inside the function body.
- Rewrite the ordering comment in PatchFastRL to accurately describe
  why patch_trl_disable_gradient_checkpointing must run before
  patch_trl_rl_trainers.
- Fix keyword default spacing to match surrounding rl.py style.

End-to-end verified: Gemma-4-E2B GRPO on TRL 1.0.0 + transformers 5.5.0
step 1 loss=2.464e-08 kl=2.921e-05, all 5 steps succeed.

* Apply loop 2 review fix for PR #4934

Extract the final_logit_softcapping fallback logic into a shared helper
`_unsloth_get_final_logit_softcapping(config)` defined in rl_replacements.py
and injected into the compiled cache via RL_PRE_ITEMS["grpo_trainer"]. Both
call sites (`grpo_trainer__generate_and_score_completions` and
`grpo_trainer_compute_loss`) now use the helper instead of inlining the
same text_config fallback block twice.

Verified: compiled cache file lists the helper at module scope and both
consumer sites call it. Gemma-4-E2B GRPO step 1 loss=2.464e-08 kl=2.921e-05
(unchanged), all 5 steps pass.

* Apply loop 3 review fix for PR #4934

Extend _unsloth_get_final_logit_softcapping to also fall back to
config.get_text_config() for composite configs such as T5GemmaConfig
where the text sub-config is not exposed via the text_config attribute
but only via the get_text_config() method. Guard against (TypeError,
ValueError) raised by ambiguous composite configs, and skip the
self-referential case where get_text_config() returns self.

This addresses the 6/7 reviewer consensus from the third review loop.

Verified:
- Helper returns 30.0 for Gemma-4, T5Gemma, and Gemma 1/2 configs.
- Helper returns 0 for Llama, Qwen, Mistral, Cohere, Granite, and
  ambiguous configs raising ValueError.
- Gemma-4-E2B GRPO step 1 loss=2.464e-08 kl=2.921e-05 (unchanged).
- Llama-3.2-1B GRPO all 5 steps loss=0 kl=0 (no regression).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-10 07:58:15 -07:00
Daniel Han
65b4028560
Pin bitsandbytes to continuous-release_main on ROCm (4-bit decode fix) (#4954)
* Pin bitsandbytes to continuous-release_main on ROCm for 4-bit decode fix

bitsandbytes 0.49.2 on PyPI ships with a broken 4-bit GEMV kernel on
every ROCm target:

  - CDNA (gfx90a / gfx942 / gfx950 = MI210 / MI300X / MI350) via a
    broken blocksize=32/64 warp64 GEMV kernel whose tests were
    explicitly skipped with ROCM_WARP_SIZE_64 guards because the
    code was known broken.
  - RDNA3 / RDNA3.5 (gfx1100-1103 / gfx1150-1152) via a compile-time
    BNB_WARP_SIZE macro in the host-side dispatch that resolves to
    64 when the multi-arch wheel is compiled with CDNA as the
    primary target, so num_blocks is wrong on RDNA and half the GEMV
    output is never written.

At decode shape (1, 1, hidden) both bugs produce NaN. Training is
unaffected because training shapes are (batch, seq_len > 1, hidden)
and never touch the GEMV path. The crash during autoregressive
inference surfaces as _assert_async_cuda_kernel in torch.multinomial
which on HIP becomes a hard HSA_STATUS_ERROR_EXCEPTION instead of
a clean Python error.

Both bugs are fixed by bitsandbytes commit 713a3b8 ("[ROCm] Enable
blocksize 32 4-bit quantization and GEMV kernels on AMD CDNA",
PR #1887, merged 2026-03-09) which replaces BNB_WARP_SIZE with a
runtime hipDeviceGetAttribute query and ships a working CDNA warp64
kernel. That commit has not shipped to PyPI yet, but
continuous-release_main wheels are published on every push to bnb
main via GitHub Releases.

Point the ROCm install path at the continuous-release_main x86_64 and
aarch64 wheels and fall back to PyPI >=0.49.1 when the pre-release is
unreachable (offline installs, firewalled hosts, or architectures not
covered by the pre-release wheels). Drop the pin once bnb cuts a
0.50+ tag on PyPI.

Verified on MI300X (gfx942, ROCm 7.2, torch 2.10.0+rocm7.1): direct
bnb GEMV shape test now returns 0.0078 max abs error at seq_len=1
(no NaN) vs NaN on 0.49.2, and full Unsloth + for_inference + 4-bit
sampling generation works end-to-end.

NVIDIA / CPU / Mac / Windows paths are unaffected -- the helper is
gated on the ROCm torch index and platform.machine() respectively.

* Drop Studio ROCm 16-bit fallback now that bnb 0.50+ fixes 4-bit decode

The 16-bit fallback in studio/backend/core/inference/inference.py was
added as a workaround for a bug that this PR already fixes at the
install layer: bitsandbytes <= 0.49.2 has a broken 4-bit GEMV kernel
on every ROCm target, which NaNs at decode shape (seq_len=1) and
crashes autoregressive inference. bnb PR #1887 (commit 713a3b8, in
0.50.0.dev0+, pinned by install.sh / install_python_stack.py in this
PR) restores correct 4-bit decode on MI300X and verified working
end-to-end with full Unsloth + for_inference + sampling.

Revert the dual code path so ROCm and NVIDIA both go through the
normal FastLanguageModel.from_pretrained + for_inference flow:

  - Remove the conditional `from unsloth import` that skipped the
    import on ROCm. The monkey-patches it was trying to avoid were
    never the cause of the crash; bnb 4-bit GEMV was.
  - Remove the `if _hw_module.IS_ROCM:` branch in load_model that
    loaded with plain transformers + PEFT + bfloat16, and the
    `_resolve_fp16_base` helper it relied on.
  - Remove the `get_chat_template is not None` fallback in
    _load_chat_template_info -- get_chat_template is now always
    imported.
  - Refactor the audio/vision ROCm guard to check _hw_module.IS_ROCM
    directly instead of the removed _IS_ROCM_ENV global. Audio and
    vision on ROCm still need separate validation (FastVisionModel
    and the CSM audio codecs were never tested on HIP) so the guard
    stays for now.

Add _bnb_rocm_4bit_ok() as a runtime safety net for users who
install from this PR before the install.sh bnb pin kicks in, or
whose installer fell back to the PyPI pin because the continuous-
release wheel was unreachable. When the installed bnb is < 0.50 on
ROCm, force load_in_4bit=False and strip any -unsloth-bnb-4bit /
-bnb-4bit suffix from the model path so a pre-quantized repo
resolves to its FP16 sibling instead of pulling bnb back in via
the repo's quantization_config. LoRA adapters whose base is a
pre-quantized repo on old bnb will still fail inside Unsloth's
loader -- the only real fix there is `unsloth studio update`.

Verified on MI300X (gfx942, ROCm 7.2, torch 2.10.0+rocm7.1):

  - HAPPY path (bnb 0.50.0.dev0, load_in_4bit=True, pre-quantized
    repo): loads in 4-bit via the fixed GEMV, generation returns
    "Paris." for greedy and sampling.
  - SAFETY-NET path (simulated old bnb, suffix-stripped to the
    FP16 sibling, load_in_4bit=False): loads in bf16, generation
    returns "Paris." for greedy and sampling.

Net diff is ~45 lines smaller than the pre-revert state because
the entire plain-transformers 16-bit branch is gone.

* Cache _bnb_rocm_4bit_ok() with functools.cache

load_model() can be called many times in a single session but the bnb
version and hardware state cannot change at runtime, so memoise the
check. First call is ~1.9 ms (dominated by the lazy `import bitsandbytes`
inside the try block), subsequent calls drop to sub-microsecond dict
lookups. Zero behavioral change.

* Shorten verbose bnb/ROCm comments

Comment-only cleanup across install.sh, studio/install_python_stack.py,
and studio/backend/core/inference/inference.py. No behavioral change.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove _bnb_rocm_4bit_ok safety net from inference.py

Studio's ROCm support is brand new (PR #4720, merged today) and every
fresh install pulls the bnb continuous-release_main wheel via
install.sh / install_python_stack.py in this same PR. There are no
existing ROCm Studio installs carrying bnb < 0.50, so the defensive
version-check fallback is guarding against a scenario that cannot
actually occur. Delete the helper, the functools import, and the
safety-net block -- inference.py now calls FastLanguageModel.from_pretrained
directly with no ROCm branching.

* Drop audio/vision ROCm guard in inference.py — verified unblocked by bnb fix

Vision inference was blocked by the same bnb 4-bit GEMV bug that affected
text inference (vision models use bnb 4-bit for the LM backbone). With
bnb 0.50+ pinned in install.sh / install_python_stack.py, vision works
end-to-end on MI300X: Llama-3.2-11B-Vision-Instruct-unsloth-bnb-4bit
loaded in 4-bit via FastVisionModel + for_inference returns a correct
answer to a multimodal prompt.

Audio (CSM) was never actually blocked by HIP — on this hardware CSM
loads and runs its backbone forward pass fine with bnb 0.50, then fails
during generate() with a transformers-level kwarg validation mismatch
in generation_csm.py (`backbone_last_hidden_state` rejected). That's a
pre-existing transformers/CSM integration bug that reproduces identically
on NVIDIA, so the ROCm-gated guard was never actually protecting users
from anything HIP-specific.

Remove the combined audio/vision guard and the now-unused _hw_module
import. Also restore the one-word "Can be" in an inline comment that
drifted during the earlier comment-shortening pass, so the inference.py
delta vs pre-#4720 is exactly the max_seq_length<=0 crash fix and
nothing else.

* Shorten max_seq_length=0 guard comment to one line

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-10 06:25:39 -07:00
Daniel Han
cad8c6ad05
Add AMD ROCm/HIP support across installer and hardware detection (#4720)
* Add ROCm detection to install.sh and expand shell tests

Add AMD ROCm GPU detection to get_torch_index_url() in install.sh.
When nvidia-smi is not found, probe for ROCm via amd-smi, /opt/rocm
version file, hipconfig, dpkg-query, and rpm.

Includes validation guard for malformed _rocm_tag, Debian epoch prefix
stripping, ROCm 7.2+ cap to rocm7.1 index, bitsandbytes AMD install,
and status messaging. Shell tests expanded to 23 cases.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Add ROCm torch reinstall support to install_python_stack.py

Add _detect_rocm_version() and _ensure_rocm_torch() to detect when a
Linux host has ROCm but the venv received CPU-only torch, and reinstall
with the correct ROCm wheels. Covers ROCm 6.0 through 7.1 with a
30-second timeout on the torch GPU probe subprocess.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Add ROCm support to llama.cpp prebuilt installer

Add has_rocm field to HostInfo, extend detect_host() to probe for ROCm
via hipcc/amd-smi/rocm-smi/ROCM_PATH, and route ROCm hosts to upstream
prebuilts (Linux ROCm 7.2 prebuilt with source fallback, Windows HIP
prebuilt with CPU fallback). Add linux-rocm and windows-hip install
kinds to runtime_patterns_for_choice().

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Add IS_ROCM hardware flag and fix AMD error message

Add IS_ROCM flag to hardware.py detect_hardware() (set when
torch.version.hip is present, DeviceType stays CUDA). Export IS_ROCM
from __init__.py. Add "rocm" key to get_package_versions().

Replace "We do not support AMD" error in tokenizer_utils.py with a
helpful message pointing to ROCm installation docs.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Add comprehensive ROCm support test suite (68 tests)

Add tests/studio/install/test_rocm_support.py covering all ROCm code
paths across install_llama_prebuilt.py, install_python_stack.py,
hardware.py, tokenizer_utils.py, and install.sh. All tests use mocks
and run without AMD hardware.

Covers: asset selection (11), runtime patterns (5), HostInfo (4),
ROCm version detection (9), torch reinstall (9), index mapping (8),
hardware flag (8), tokenizer message (2), install.sh structure (10),
and live regression (1).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Harden ROCm support: probe error handling, version cap, validation

Address review findings from 8 independent reviewers:

- Wrap _ensure_rocm_torch() torch probe in try/except for
  TimeoutExpired and OSError so a hung or broken torch import does not
  crash the installer (8/8 reviewers flagged this)
- Add torch>=2.4,<2.11.0 version cap to the ROCm reinstall path to
  prevent installing unsupported torch 2.11.0 from the rocm7.1 index
- Use with-statement for file reads in _detect_rocm_version() to avoid
  resource leaks
- Handle ROCM_PATH="" correctly (use `or "/opt/rocm"` instead of
  default parameter to avoid relative path resolution)
- Strengthen shell validation guard from rocm[0-9] to rocm[1-9] to
  reject rocm0.x tags that would produce nonexistent PyTorch index URLs
- Switch shell version cap from blocklist to allowlist (rocm6.*|rocm7.0*
  |rocm7.1* pass through, everything else caps to rocm7.1) so future
  ROCm 10+ does not fall through to a nonexistent index
- Add sorted() to _ROCM_TORCH_INDEX lookup for defensive ordering
- Fix test_probe_timeout_handled: replace zero-assertion test with
  proper assertions verifying reinstall proceeds after timeout

* Clean up rocm_paths list construction in detect_host()

Filter None from the ROCM_PATH env var lookup at list construction time
instead of relying on the inline `if p` guard in the any() call.

* Require actual AMD GPU presence before selecting ROCm paths

All 8 reviewers across 2 cycles independently flagged that ROCm
detection used toolkit/filesystem hints (hipcc, /opt/rocm, rocm-core)
as a proxy for GPU presence, which would misroute CPU-only or NVIDIA
hosts that happen to have ROCm tools installed.

Now all 3 detection points (install.sh, install_python_stack.py,
install_llama_prebuilt.py) probe for an actual AMD GPU before
entering the ROCm path:

- install.sh: check rocminfo for gfx* GPU names, or amd-smi list
  for device rows, before version detection
- install_python_stack.py: new _has_rocm_gpu() function probes
  rocminfo and amd-smi list before _ensure_rocm_torch() proceeds
- install_llama_prebuilt.py: detect_host() probes rocminfo/amd-smi
  list instead of just checking tool existence or directory paths

Also:
- Shell test mock amd-smi now handles "list" subcommand
- Python tests updated to mock _has_rocm_gpu where needed
- Added test_no_gpu_with_rocm_tools_skips to verify the new guard
- Test index lookups now use sorted() to match production code

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Harden hipconfig version parsing and torch probe compatibility

- Add parts[1].isdigit() check in hipconfig version parsing to handle
  versions like "6.3-HIP" where the minor component has non-numeric
  suffix (strip "-" prefix before int() conversion)
- Use getattr() in torch probe subprocess to safely handle old or
  custom torch builds that may lack torch.version.hip/cuda attributes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Strengthen AMD GPU detection and add NVIDIA precedence guard

- Change amd-smi list detection from any-non-empty-output to requiring
  "gpu" marker in output, matching the shell-side NR>1 check. Prevents
  false positives from header-only amd-smi list output.
- Add nvidia-smi check at the top of _ensure_rocm_torch() so mixed
  AMD+NVIDIA hosts preserve NVIDIA precedence (matching install.sh and
  install_llama_prebuilt.py behavior).
- Apply the same amd-smi marker fix to install_llama_prebuilt.py
  detect_host() for consistency.

* Add Windows-specific ROCm/HIP detection in detect_host()

The previous detect_host() ROCm check used rocminfo and amd-smi list
which are Linux-only tools. On Windows, has_rocm would always be False,
making the Windows HIP prebuilt path at line 1794 unreachable.

Now detect_host() uses platform-specific detection:
- Linux: rocminfo (check for gfx GPU names) or amd-smi list
- Windows: hipinfo.exe, amd-smi, or amdhip64.dll on PATH

This allows Windows AMD users to get the HIP prebuilt binary instead
of silently falling through to the CPU prebuilt.

* Add AMD ROCm gaps: Mamba/SSM source builds, GPU monitoring, Windows messaging, RDNA expansion

- worker.py: Add HIP detection to causal-conv1d/mamba-ssm probe, check
  for hipcc before ROCm source builds, improve status messages and error
  reporting, add timeout and uv support for the source build fallback
- amd.py: New AMD GPU monitoring module via amd-smi metric --json,
  mirroring nvidia.py structure (utilization, temperature, power, VRAM)
- hardware.py: Branch to amd.py when IS_ROCM is True for GPU utilization,
  visible GPU queries, and physical GPU count
- install_python_stack.py: Detect AMD GPUs on Windows and warn that
  ROCm-enabled PyTorch must be installed manually
- kernels/utils.py: Expand is_rdna() to cover RDNA2 (gfx1030-1032),
  RDNA3 (gfx1102-1103), RDNA3.5 (gfx1150-1152) alongside existing entries
- tests: Add 32 new tests covering all changes (95/95 pass)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Harden ROCm detection, fix VRAM heuristic, and expand RDNA2 coverage

- Windows ROCm detection: validate actual GPU presence via hipinfo/amd-smi
  output markers instead of just checking tool existence on PATH
- _ensure_rocm_torch: validate nvidia-smi actually reports a GPU before
  giving NVIDIA precedence (fixes AMD-only hosts with stale NVIDIA tools)
- amd.py _parse_numeric: handle dict-shaped metric objects from newer
  amd-smi versions ({"value": 10, "unit": "W"}) and strip MiB/GiB units
- amd.py VRAM heuristic: raise threshold from 100k to 10M to correctly
  handle MI300X (192 GB = 196608 MB) and other high-VRAM GPUs
- amd.py visible GPU: use AMD-reported GPU IDs instead of enumerate index
  so non-dense sets like CUDA_VISIBLE_DEVICES=1,3 report correctly
- install.sh: add ROCm <6.0 minimum version guard (no PyTorch wheels
  exist for older versions); fix rocm7.1* glob to not match rocm7.10+
- is_rdna: add gfx1033-1036 for RDNA2 mobile GPUs (RX 6600M etc.)
- worker.py: increase ROCm source build timeout from 600s to 1800s;
  fix success log message for ROCm source builds
- Tests: update mocks for _has_usable_nvidia_gpu, add RDNA2 target asserts

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add HIP_VISIBLE_DEVICES support, unit-aware VRAM parsing, Windows GPU validation

- hardware.py: check HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES on ROCm
  before falling back to CUDA_VISIBLE_DEVICES, so multi-GPU AMD setups with
  HIP-specific env vars report the correct visible device set
- amd.py: add _parse_memory_mb() that reads "unit" from dict-shaped amd-smi
  JSON (e.g. {"value": 192, "unit": "GiB"}) and converts to MB correctly;
  fixes MI300X VRAM misreported as 0.19 GB instead of 192 GB
- install_python_stack.py: Windows AMD warning now validates actual GPU
  presence via hipinfo/amd-smi output markers before printing
- install_llama_prebuilt.py: restore amdhip64.dll fallback for Windows HIP
  detection after tool-based checks, so Windows HIP installs without CLI
  tools on PATH are still detected
- hardware.py: fix IS_ROCM comment to accurately describe its role

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix HIP_VISIBLE_DEVICES empty-string handling in GPU visibility spec

Use explicit None checks instead of Python `or` operator when reading
HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES, so that an empty string
("") is correctly honored as "no visible GPUs" rather than silently
falling through to CUDA_VISIBLE_DEVICES on mixed ROCm+CUDA systems.

* Fix IS_ROCM test assertion for multi-line formatting

* Cap torchvision/torchaudio versions, remove amdhip64.dll fallback, fix visible GPU count

- Cap torchvision<0.26.0 and torchaudio<2.11.0 alongside torch<2.11.0 in
  both install.sh and install_python_stack.py to prevent resolver from
  selecting incompatible companion packages from ROCm wheel index
- Remove amdhip64.dll fallback in Windows ROCm detection (DLL presence
  without hipinfo/amd-smi is not proof of GPU existence)
- Fix get_visible_gpu_count() to use _get_parent_visible_gpu_spec() which
  respects HIP_VISIBLE_DEVICES/ROCR_VISIBLE_DEVICES on ROCm hosts

* Attribute is_rdna() RDNA2/3/3.5/4 expansion to PR #4428

The is_rdna() expansion to cover RDNA2 (gfx1030-1036), RDNA3
(gfx1100-1103), RDNA3.5 (gfx1150-1152), and RDNA4 (gfx1200-1201)
architectures is based on the original work from PR #4428.

Co-authored-by: GoldenGrapeGentleman <yueyuan@amd.com>
Co-authored-by: billishyahao <bill.he@amd.com>

* Support AMD Radeon for studio (#4770)

Co-authored-by: Iswarya Alex <iswarya.alex@amd.com>

* Remove ROCm test files from main PR

Move test_rocm_support.py and shell test additions to a separate PR
to keep the main ROCm support PR focused on implementation changes.

* Fix installer and hardware detection issues for PR #4720

- Fix empty _tri_arg passed to uv pip install in Radeon path (causes
  "Empty field is not allowed for PEP508" error)
- Fix Radeon fallback: use ROCm index instead of CPU-only when
  repo.radeon.com is unreachable (TORCH_INDEX_URL already has ROCm)
- Use $TORCH_CONSTRAINT in fallback paths instead of hardcoded strings
- Fix _pick_radeon_wheel: relax suffix to match manylinux_2_28_x86_64
  wheels (AMD Radeon repo does not use bare linux_x86_64 platform tag)
- Fix IS_ROCM export: use __getattr__ so callers always see the live
  value after detect_hardware() runs
- Fix apply_gpu_ids: set HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES
  on ROCm so _get_parent_visible_gpu_spec picks up narrowed GPU set
- Fix _parse_memory_mb: distinguish GB (1000 MB) from GiB (1024 MiB)
- Add amd-smi version as a fallback in _detect_rocm_version
- Fix trailing whitespace and missing newline at EOF in install.sh

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix GPU detection false positives and add missing health groups

- Fix _has_rocm_gpu() false positive: require "GPU: <number>" data rows
  from amd-smi list, not just header containing "gpu"
- Apply same fix in detect_host() in install_llama_prebuilt.py
- Add runtime_payload_health_groups for linux-rocm and windows-hip so
  partial/corrupt ROCm/HIP prebuilt installs are properly detected
- Add bitsandbytes install to Radeon fallback paths (was only in the
  success path, skipped when repo.radeon.com was unreachable)
- Keep DEVICE/CHAT_ONLY as direct imports in __init__.py (matching main)
  and only use __getattr__ for IS_ROCM

* Fix _ensure_rocm_torch and Windows AMD warning false positives

- _ensure_rocm_torch: only skip when HIP is already present, not for
  CUDA builds (which are unusable on AMD-only hosts). Fixes the case
  where a venv has a stale CUDA wheel and the repair step is skipped.
- Windows AMD warning: use GPU data row check (same as Linux fix) to
  avoid false positives from amd-smi list header-only output.

* Fix amd-smi GPU detection for GPU[N] output format

Older amd-smi versions output "GPU[0] : Card series: ..." instead of
"GPU: 0". The regex now matches both "GPU: <digit>" and "GPU[<digit>"
formats to detect actual GPU data rows.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Harden AMD GPU detection against false positives

- install.sh: replace weak amd-smi list check (awk 'NR>1 && NF') with
  strict pattern matching GPU data rows (/^GPU[[:space:]]*[:\[]/)
- All files: reject rocminfo gfx000 (CPU HSA agent) by requiring
  gfx[1-9] instead of gfx[0-9] in the rocminfo GPU probe
- Fixes false positives on hosts with ROCm tools but no AMD GPU

* Remove duplicate comment from pre-commit merge

* Refactor: deduplicate AMD detection, consolidate bitsandbytes, clean up imports

- Extract _has_amd_rocm_gpu() shell function to avoid duplicating the
  rocminfo/amd-smi GPU detection logic in get_torch_index_url and
  the Radeon auto-detect block
- Consolidate bitsandbytes install into a single case block after torch
  install (was duplicated 4 times across Radeon success/fallback paths)
- Move math and re imports to top of amd.py (were inline in functions)
- Add _smi_query() helper in hardware.py to centralize IS_ROCM backend
  selection for get_gpu_utilization and get_visible_gpu_utilization

Addresses Gemini code review suggestions.

* Fix VRAM parsing for string values and GB/GiB consistency

- Extract unit from string-valued VRAM fields (e.g. "192 GiB") so
  _parse_memory_mb correctly applies the unit multiplier instead of
  treating the value as bare MB
- Treat GB and GiB identically (both as binary x1024) since GPU tools
  including amd-smi use binary units even when labeling them "GB"
- Fixes incorrect VRAM reporting on MI300-class cards (was showing
  ~0.19 GB instead of 192 GB for string-valued outputs)

* Add --no-cache to uv for ROCm HIP source builds

Avoid stale cache artifacts from partial HIP source builds when
uv is used for causal-conv1d/mamba-ssm compilation on ROCm.
The pip path already uses --no-cache-dir; this adds the uv equivalent
(--no-cache) only when is_hip is True.

* Fix critical: initialize _amd_gpu_radeon before case block

_amd_gpu_radeon was only set inside the */rocm*) case arm, so on
NVIDIA/CPU/macOS paths where TORCH_INDEX_URL does not contain "rocm",
the variable was unbound. With set -u (nounset) enabled, this crashes
the installer for every non-AMD user.

Move initialization to before the case block so it is always defined.

* Fix Windows AMD: route has_rocm hosts to HIP prebuilt path

resolve_release_asset_choice was selecting windows-cpu for all Windows
x86_64 hosts including those with has_rocm=True. Windows AMD users
should fall through to resolve_upstream_asset_choice which tries the
HIP prebuilt first. Add "not host.has_rocm" guard to the published
windows-cpu selection.

* Harden ROCm detection, Radeon wheel fallback, and HIP visibility

Addresses review findings from parallel reviewers on PR #4720:

- install.sh: add _has_usable_nvidia_gpu() helper requiring nvidia-smi -L
  to actually list a GPU before treating the host as NVIDIA. Fixes the
  stale-nvidia-smi-on-PATH regression where AMD-only hosts fell into the
  CUDA branch.
- install.sh: fix hipconfig awk blocks to propagate a non-zero exit code
  when the output is not a recognisable version string, so the ||-chain
  continues to dpkg-query / rpm instead of terminating early.
- install.sh: fail-closed on Radeon wheel fallback. When torch,
  torchvision or torchaudio is missing from the Radeon repo for the
  active Python tag, fall back to the standard ROCm index instead of
  silently mixing Radeon wheels with PyPI defaults. Quote all wheel
  arguments individually so wheel filenames cannot be word-split or
  glob-expanded.
- install_llama_prebuilt.py: detect_host() now requires nvidia-smi -L to
  list a GPU before setting has_physical_nvidia. Routes AMD ROCm hosts
  with a broken leftover nvidia-smi to the ROCm path instead of
  misclassifying them as NVIDIA.
- install_llama_prebuilt.py: scan upstream assets for any rocm-<version>
  prebuilt instead of hard-coding rocm-7.2, so ROCm 6.x / 7.0 / 7.1 / 7.3+
  users pick up a matching upstream prebuilt when one exists.
- install_llama_prebuilt.py: validate_server() adds --n-gpu-layers 1 for
  linux-rocm and windows-hip hosts, so new HIP prebuilts are preflighted
  on the GPU path instead of passing validation on CPU only.
- install_llama_prebuilt.py: restore the published windows-cpu fallback
  for AMD Windows hosts without a HIP prebuilt so hash-approved bundles
  are still preferred over the raw upstream CPU asset.
- install_python_stack.py: drop the /opt/rocm / hipcc gate in
  _ensure_rocm_torch() and rely on _has_rocm_gpu(). Runtime-only ROCm
  installs (package-managed minimal installs, Radeon software) that ship
  amd-smi / rocminfo without hipcc can now repair a CPU-only venv via
  "unsloth studio update". Adds an explicit IS_WINDOWS / IS_MACOS guard.
- studio/backend/utils/hardware/amd.py: honour HIP_VISIBLE_DEVICES /
  ROCR_VISIBLE_DEVICES / CUDA_VISIBLE_DEVICES in
  get_primary_gpu_utilization(). A process restricted to GPU 2 now
  reports metrics for GPU 2 instead of physical GPU 0. Tighten the plain
  bytes unit detection to an explicit allowlist.
- studio/backend/utils/hardware/hardware.py: route
  get_backend_visible_gpu_info()'s backend_cuda_visible_devices field
  through a helper that reads HIP_VISIBLE_DEVICES on ROCm. Drop the
  unconditional "(rocm=False)" suffix in apply_gpu_ids() logs.

* Fix round 2 regressions: ROCm validate_server and Windows HIP routing

Follow-up to 810b833b addressing review findings on the first round of
hardening commits:

- install_llama_prebuilt.py validate_server: gate --n-gpu-layers on the
  resolved install_kind instead of host.has_rocm. AMD Windows hosts
  without a HIP prebuilt fall back to windows-cpu and must not be
  validated with GPU layers; thread install_kind through from the
  caller.
- install_llama_prebuilt.py resolve_release_asset_choice: reinstate the
  "not has_rocm" guard on the published windows-cpu bundle so AMD
  Windows hosts reach resolve_upstream_asset_choice() where the new
  HIP prebuilt path lives. Prefer a published windows-hip bundle first
  when one exists, fall through to upstream HIP + upstream CPU
  otherwise.
- install_llama_prebuilt.py detect_host: also set has_physical_nvidia
  when the secondary --query-gpu block confirms a working NVIDIA GPU,
  so older nvidia-smi versions without -L support do not silently skip
  the Linux diagnostics that key off has_physical_nvidia.
- install_llama_prebuilt.py: drop redundant "import re as _re" /
  "import re as _re_rocm" local aliases in favour of the existing
  top-level "import re".
- install_python_stack.py _ensure_rocm_torch: run the AMD
  bitsandbytes install unconditionally after the HIP-torch probe so
  "unsloth studio update" on venvs that already have ROCm torch still
  gains the AMD bitsandbytes build.
- install.sh: add a non-x86_64 early-exit to get_torch_index_url() so
  aarch64 / arm64 Linux hosts do not hit the ROCm wheel index
  (PyTorch only publishes ROCm wheels for linux_x86_64).
- install.sh: add bitsandbytes install to the migrated-environment
  branch so upgrades pick it up for ROCm hosts instead of only the
  fresh-install path.
- install.sh: in the Radeon wheel path, pass version constraints +
  --no-index --find-links to uv instead of explicit wheel URLs so a
  version-compatible torch / torchvision / torchaudio triple is
  resolved, rather than picking the highest-version wheel for each
  package independently.
- studio/backend/utils/hardware/amd.py _first_visible_amd_gpu_id: fall
  through to lower-priority visibility env vars when the first entry
  is malformed (leading comma, all-whitespace first token) instead of
  silently returning GPU 0.

* Fix round 3 findings: x86_64 guard, ROCm version clip, Radeon deps

Address issues surfaced by the round 3 reviewers on top of 8636fa63:

- install_python_stack.py _ensure_rocm_torch: add the same `x86_64`
  guard that install.sh already has. Linux aarch64 / arm64 ROCm hosts
  must skip the repair path entirely; PyTorch only publishes ROCm
  wheels for linux_x86_64, and without this guard
  `unsloth studio update` aborts with a missing-wheel error on non
  x86_64 hosts.
- install_llama_prebuilt.py resolve_upstream_asset_choice: add a
  best-effort _detect_host_rocm_version() helper (reading
  /opt/rocm/.info/version, amd-smi version, hipconfig --version) and
  filter rocm_candidates to entries whose major.minor is <= host
  version. Falls back to the newest candidate only when no compatible
  one exists, so a ROCm 6.4 host downloads rocm-6.4 instead of being
  handed the numerically newest rocm-7.2 bundle (which fails preflight
  and forces a source build).
- install.sh: remove the round 2 --no-index switch from the Radeon
  wheel branch. --no-index forced uv to ignore PyPI entirely, which
  broke transitive dependency resolution (filelock, sympy, networkx,
  jinja2, fsspec, setuptools, typing-extensions, ...) on a fresh venv.
  Restore the round 1 explicit wheel URL invocation but add a
  torch / torchvision / torchaudio version-pair sanity check so a
  mismatched trio (e.g. torch 2.9.1 + torchvision 0.23.0 + torchaudio
  2.9.0) falls back to the standard ROCm index instead of installing a
  broken combination.
- install_python_stack.py _ensure_rocm_torch: restructure the
  "tag is None" path so it no longer short-circuits the bitsandbytes
  install. On a ROCm runtime older than anything in
  _ROCM_TORCH_INDEX, print the "no wheel" warning but still run the
  AMD bitsandbytes install.
- studio/backend/core/training/worker.py: restore the pre-PR
  "no timeout" behaviour for non-HIP causal-conv1d / mamba-ssm source
  builds. The round 2 "timeout = 1800 if is_hip else 300" cap aborts
  slow non-HIP builds (Linux aarch64, unsupported torch/CUDA combos)
  after 5 minutes; omit timeout for the non-HIP branch so the cap
  only applies to ROCm source builds.

* Fix round 4 findings: apply_gpu_ids env inheritance, Radeon X.Y, bitsandbytes gate

Address remaining issues surfaced by the round 4 reviewers:

- studio/backend/utils/hardware/hardware.py apply_gpu_ids: mirror the
  selection into HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES whenever
  the caller already had a ROCm visibility env var set, not only when
  IS_ROCM has already been set by detect_hardware(). Training and
  inference workers call apply_gpu_ids() before detect_hardware()
  runs, so the old guard would leave a forked ROCm worker with a
  stale HIP_VISIBLE_DEVICES mask that no longer matched the
  narrowed CUDA_VISIBLE_DEVICES selection.
- install.sh get_radeon_wheel_url: accept X.Y ROCm versions in
  addition to X.Y.Z. The `/opt/rocm/.info/version` file and some
  hipconfig versions report only two components, and the Radeon
  repository publishes both rocm-rel-X.Y.Z/ and rocm-rel-X.Y/
  directories, so treating X.Y as invalid caused Radeon hosts to fall
  back to the generic ROCm index even when a matching AMD wheel set
  existed.
- install_python_stack.py _ensure_rocm_torch: only install the AMD
  bitsandbytes build when the venv actually has a ROCm-compatible
  torch (either already present or just installed by this function).
  Previously the bitsandbytes install ran unconditionally, which
  could leave an AMD bitsandbytes layered on top of a CPU/CUDA torch
  on hosts where the ROCm runtime is older than any entry in
  _ROCM_TORCH_INDEX. Also add --force-reinstall so an existing
  CPU/CUDA bitsandbytes is replaced by the AMD build during upgrades.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix gemini findings: amd-smi metric envelope validation and dict-wrapped GPU id

Two medium-severity defensive fixes from the gemini-code-assist review on
the AMD monitoring backend:

1. _extract_gpu_metrics may return a dict where every value is None when
   amd-smi succeeds (zero exit) but the JSON envelope contains no usable
   fields (error response, unsupported card). The new _has_real_metrics
   helper lets get_primary_gpu_utilization surface available:False and
   lets get_visible_gpu_utilization skip ghost device rows so the UI
   does not render placeholder cards with empty numbers.

2. Newer amd-smi versions wrap scalar fields as {"value": 0, "unit":
   "none"}, including the per-GPU id. The previous int(raw_id) call
   silently fell back to the enumeration index in that case, losing the
   real GPU id. Routing raw_id through the existing _parse_numeric
   helper handles bare ints, floats, strings, and the dict shape
   uniformly, with a debug log on parse failure.

* Fix gemini round 2 findings: explicit length guard on ROCm version file parser

Both _detect_rocm_version (install_python_stack.py) and
_detect_host_rocm_version (install_llama_prebuilt.py) read /opt/rocm/.info/version
or $ROCM_PATH/lib/rocm_version, split on "." and unconditionally accessed
parts[1]. The surrounding broad `except Exception: pass` already swallowed
the resulting IndexError, so a one-component file like "6\n" did fall
through to the next detection source -- but the control flow relied on
exception handling instead of an explicit check.

Add `if len(parts) >= 2:` guards in both helpers so the loop falls through
on its own without raising. Behaviour is unchanged for the common multi-
component case; the previously-silent IndexError path becomes an explicit
no-op.

* Fix gemini round 3: include has_rocm in validate_server fallback path

When validate_server is called without an explicit install_kind (older
call sites that have not been updated), the fallback was only enabling
--n-gpu-layers for NVIDIA and macOS arm64 hosts. AMD ROCm Linux hosts
fell through to the CPU validation path even though the prebuilt being
exercised was a HIP binary.

Add host.has_rocm to the fallback expression so the GPU offload flag is
applied consistently with the install_kind=='linux-rocm' / 'windows-hip'
branches above.

* Fix gemini round 4: remove risky bytes-vs-MB heuristic in _parse_memory_mb

The previous heuristic divided any bare number above 10_000_000 by
1024*1024 on the assumption that large unit-less values were bytes.
This misclassified small VRAM allocations: 5 MB of used VRAM reported
as 5_242_880 bytes without a unit would be taken at face value and
render as 5_242_880 MB (~5 TB) in the monitoring UI.

Modern amd-smi always provides explicit units (MiB/GiB dict form),
and legacy amd-smi returns bare numbers in MB -- the heuristic never
had a real workload to handle. Drop it and default to MB for bare
numeric input, keeping the existing unit-aware branches for dict /
string inputs unchanged.

The unrelated gemini suggestion to "default minor to 0" in the
amd-smi version awk parser was intentionally NOT applied: rocm7.0
and rocm7.1 ship different wheel sets, so silently substituting 0
for a missing minor could install the wrong wheels. The existing
reject-and-fall-through behaviour is safer.

* Fix gemini round 5: POSIX compliance and leading-comma visibility parsing

Three medium findings from gemini-code-assist addressed in this commit:

1. _pick_radeon_wheel used grep -o and sort -V, both GNU extensions
   that are not in POSIX and break on BSD/BusyBox coreutils. install.sh
   has a #!/bin/sh shebang so the whole pipeline was rewritten as a
   single awk script that extracts all href="..." hits on each line,
   filters to wheels matching the package prefix and python tag, and
   picks the newest version via zero-padded lexical comparison. No
   external sort or grep is needed.

2. _first_visible_amd_gpu_id in the AMD monitoring backend treated a
   leading comma (e.g. HIP_VISIBLE_DEVICES=",1") as "fall through to
   the next env var", which is surprising given the clear intent to
   narrow to device 1. Filter empty tokens after the split and return
   the first real one. An all-commas value ("," / ",,,") still falls
   through because no real tokens exist; the empty-string and "-1"
   explicit-zero cases are unchanged.

The unrelated amd-smi version awk parser suggestion was not applied
(see round 4 commit message for rationale: defaulting a missing minor
to 0 could silently install the wrong ROCm wheel set).

* Fix 20-reviewer.py findings: base drift, Radeon %2B, dpkg/rpm fallback, bnb, backend label

Consolidated fix batch from a 20-parallel reviewer.py run on the current
head. Each fix is drawn from a high-consensus finding and addresses a
real bug or feature gap, not a stylistic preference.

1. install.sh: bump `unsloth>=2026.4.2` -> `unsloth>=2026.4.4` at five
   call sites so this branch no longer regresses main's version floor
   (main bumped to 2026.4.4 in #4876). Without this, merging 4720 would
   silently downgrade the minimum version pin for fresh installs.

2. install.sh: URL-decode Radeon wheel names before extracting the
   torch / torchvision / torchaudio version strings. Real wheel URLs
   from repo.radeon.com are percent-encoded ("torch-2.10.0%2Brocm7.2.0...")
   so the previous `[+-]` terminator in the sed regex never matched,
   `_torch_ver` stayed empty, `_radeon_versions_match` stayed false,
   and every Radeon consumer install silently fell back to the generic
   ROCm index. Now decode %2B -> + first, then extract, then validate.

3. install.sh: the two AMD bitsandbytes install lines were running
   `uv pip install "bitsandbytes>=0.49.1"` without `--force-reinstall`,
   so upgrades where the venv already has a CPU/CUDA bitsandbytes
   satisfying the constraint would keep the stale non-AMD wheel. Add
   `--force-reinstall --no-cache-dir` to both call sites, matching the
   pattern already used in install_python_stack.py::_ensure_rocm_torch.

4. install_python_stack.py and install_llama_prebuilt.py: add
   `dpkg-query -W rocm-core` and `rpm -q rocm-core` fallbacks to the
   Python-side ROCm version detectors so they match the chain in
   install.sh::get_torch_index_url. Package-managed ROCm installs
   (Debian/Ubuntu/RHEL/Fedora distro packages) can expose GPUs via
   rocminfo/amd-smi but still lack /opt/rocm/.info/version, hipconfig,
   or amd-smi `version` output -- without these fallbacks, `unsloth
   studio update` on such hosts returned None and skipped the ROCm
   torch repair. Also strip the dpkg epoch prefix ("1:6.3.0-1") before
   parsing so epoch-annotated packages parse correctly.

5. hardware.py: add a `_backend_label(device)` helper that returns
   "rocm" when IS_ROCM is set and the device is DeviceType.CUDA, and
   use it for every `"backend": ...` emission in JSON responses served
   to the Studio frontend. Internally we still represent ROCm hosts as
   DeviceType.CUDA (ROCm torch reuses the whole torch.cuda.* API
   surface), but the user-facing API now correctly reports "rocm" on
   AMD boxes instead of labeling them as "cuda".

All 250 simulation scenarios pass (was 233 before this batch: added 17
new regression tests covering the version pin, %2B decoding, bnb
force-reinstall flags, dpkg/rpm fallback presence, and the
_backend_label helper's four-way truth table).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix gemini round 6 + URL audit: amd.py defensive checks, rocm6.5+ clip to 6.4

Two rounds of fixes in one commit, plus a full URL audit of every PyPI /
download.pytorch.org / repo.radeon.com reference the PR introduces.

amd.py (4 medium gemini findings on commit b3627bc2):

1. _extract_gpu_metrics used `and vram_total_mb` as part of the vram_util
   gate. The follow-up `vram_total_mb > 0` already handles the division
   guard, but the truthiness check was redundant and slightly surprising
   for a 0.0 valid value. Replace with explicit `is not None and > 0`
   for both vram_util and power_util.

2. get_physical_gpu_count called `data.get("gpu", ...)` without guarding
   for non-dict envelopes. A scalar / string JSON response from amd-smi
   would raise AttributeError. Add an isinstance(data, dict) check and
   return None for unexpected shapes.

3. get_visible_gpu_utilization had the same .get() exposure on the outer
   envelope. Rewrite the gpu_list extraction as an explicit
   list/dict/else cascade so a malformed scalar envelope produces
   gpu_list=[data] and continues without raising.

4. The same function's per-entry loop also called gpu_data.get() on
   whatever was inside gpu_list. If a scalar ever leaks into the list
   (directly or via the previous fix's fallback), _extract_gpu_metrics
   would raise on the first .get() inside the helper. Skip non-dict
   entries in the loop before extracting metrics.

install.sh (URL audit finding, previously flagged by 20-reviewer as #13):

5. get_torch_index_url used `rocm6.*` in the rocm tag case statement,
   which matched rocm6.5 and rocm6.6 and emitted
   download.pytorch.org/whl/rocm6.5 -- which returns HTTP 403 because
   PyTorch only publishes rocm 5.7, 6.0-6.4, 7.0-7.2. Enumerate the
   supported 6.x minors explicitly and add a rocm6.* fallback branch
   that clips to rocm6.4 (the last supported 6.x wheel set).

URL audit results (all URLs PR 4720 references):
- 14/14 download.pytorch.org/whl/{cpu,cu118,cu124,cu126,cu128,cu130,
  rocm6.0..6.4,rocm7.0..7.2} return HTTP 200.
- 9/9 repo.radeon.com/rocm/manylinux/rocm-rel-{5.7,6.0,6.1,6.2,6.3,
  6.4,7.0,7.1,7.2}/ return HTTP 200.
- X.Y.Z patch directories exist for 7.0.2, 7.1.1, 7.2.1 but NOT for
  6.3.0, 6.4.0, 6.2.1 -- install.sh already handles this via the X.Y.Z
  -> X.Y fallback sed in the Radeon wheel install block.
- Docs links (rocm.docs.amd.com, docs.unsloth.ai AMD guide) and the
  llama.cpp GitHub releases API endpoint all return 200.

Test suite: 255 -> 258. New regression coverage:
- U17: get_physical_gpu_count tolerates scalar amd-smi envelope
- U18: get_visible_gpu_utilization tolerates scalar envelope
- U19a-c: vram_util / power_util return None on zero total, but
  vram_total_gb still echoes 0.0 (not None)
- A_rocm{6.5,6.6,6.9}_clips_to_rocm64: install.sh clips unsupported
  6.x minors to rocm6.4 instead of producing a 403 index URL

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix reviewer.py round 2: tokenizer AMD multi-GPU, --no-torch bnb, main.py backend label

Three high-confidence findings from a second 20-parallel reviewer.py run
on commit 7effb3ae. Triaged 15 total findings and applied the three that
were confirmed as real bugs; the rest were either false positives (e.g.
"migrated AMD venv not repaired" -- _ensure_rocm_torch runs downstream
via setup.sh regardless), design decisions (e.g. visibility mask env
vars not consulted in installer detection), or edge cases the existing
fallback logic already handles.

1. unsloth/tokenizer_utils.py [6/20]: the multi-GPU guard's shell probe
   runs `nvidia-smi --query-gpu=memory.used`, catches the failure, then
   only raises if `torch.cuda.is_available()` is False. On ROCm torch,
   torch.cuda.is_available() returns True (ROCm reuses the torch.cuda.*
   API), so the guard becomes dead code on AMD hosts and multi-GPU AMD
   setups slip through even though unsloth does not support them yet.
   Add a torch.cuda.device_count() > 1 fallback inside the except so
   AMD multi-visible-device setups are flagged consistently with the
   original CUDA memory check.

2. install.sh [1/20]: the fresh-install bitsandbytes block for AMD ROCm
   ran unconditionally when TORCH_INDEX_URL matched `*/rocm*`, even when
   SKIP_TORCH=true (from --no-torch or Intel Mac auto-detect). A user
   running `install.sh --no-torch` on an AMD host would still pull in
   bitsandbytes despite explicitly asking for GGUF-only mode. Wrap the
   case block in an outer `[ "$SKIP_TORCH" = false ]` guard.

3. studio/backend/main.py [3/20]: the /api/system endpoint returned
   `"device_backend": get_device().value`, which is "cuda" on ROCm
   hosts (because ROCm torch piggybacks on torch.cuda). Other endpoints
   (hardware.py) already use the _backend_label helper which swaps
   "cuda" -> "rocm" when IS_ROCM. Route /api/system through the same
   helper so the Studio UI reports the backend consistently across all
   endpoints.

4. studio/backend/tests/test_utils.py: update test_backend_matches_device
   to call _backend_label(get_device()) instead of raw get_device().value
   so the test matches the new contract and still passes on CUDA hosts.

Tests: 258 -> 261. New regression coverage:
- X08 main.py /api/system uses _backend_label
- X09 tokenizer multi-GPU guard has device_count() fallback
- X10 fresh-install bnb case block gated on SKIP_TORCH=false

* fix: prevent bitsandbytes from overwriting ROCm torch with CUDA wheels

During install, bitsandbytes was installed without --no-deps, causing
uv to resolve torch from PyPI (CUDA build) and silently overwrite the
ROCm wheels that were just installed in the previous step.

This happened in three places:
- install.sh: bitsandbytes install in both migrated and fresh paths
- install_python_stack.py: bitsandbytes install inside _ensure_rocm_torch()

Additionally, multiple install steps in install_python_stack.py (extras,
overrides, studio deps) can pull in CUDA torch via transitive
dependencies. A final _ensure_rocm_torch() call at the end of the
install sequence ensures ROCm torch is always in place at runtime.

All changes are gated behind ROCm-specific conditions and do not affect
NVIDIA, CPU-only, macOS, or Windows install paths.

Tested on AMD Instinct MI300X VF with ROCm 7.2.0 -- confirms
torch==2.10.0+rocm7.1 with HIP 7.1.25424 after install.

* fix: ROCm inference fallback -- skip Unsloth patching and bnb 4-bit on HIP

On AMD ROCm (HIP), two issues prevent the normal Unsloth inference path:

1. Unsloth's global monkey-patching of transformers model classes
   (LlamaRotaryEmbedding, attention modules) triggers
   _assert_async_cuda_kernel crashes on HIP during generation.
   Training uses different code paths and works fine.

2. bitsandbytes 4-bit matmul kernels also trigger HIP assertion
   failures on MI300X (CDNA3 / gfx942), even without Unsloth patching.

This commit adds a ROCm-specific inference fallback that:
- Skips importing Unsloth at module level (prevents global patching)
- Loads models in 16-bit with plain transformers + PEFT instead
- Resolves pre-quantized model names (e.g. "xxx-bnb-4bit" -> "xxx")
  since pre-quantized HF repos still trigger bnb codepaths
- Guards get_chat_template calls (unavailable without Unsloth import)
- Fixes max_seq_length=0 being passed to from_pretrained (GGUF
  semantics don't apply to transformers path)

The NVIDIA path is completely unchanged -- Unsloth import and
for_inference() optimization remain active. GGUF inference (via
llama-server/HIP) is unaffected since it never imports Python model
classes. AMD GPUs typically have large VRAM (e.g. 192GB on MI300X)
so 16-bit loading is practical for inference.

Tested on AMD Instinct MI300X VF (ROCm 7.2, HIP 7.1.25424):
- Simple generation: PASS
- Compare mode (base vs finetuned): PASS
- GGUF inference + tool calling: PASS (unaffected by this change)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: guard audio/vision inference on ROCm, remove unused import

- Add clear RuntimeError for audio/vision model inference on ROCm
  (these paths use Unsloth's FastModel/FastVisionModel which would
  crash on HIP; GGUF inference is the supported path on AMD)
- Remove unused `import os as _os` from the ROCm changes

* fix: amd-smi parsing for newer output format (gpu_data wrapper, mem_usage, temperature)

amd-smi on recent ROCm versions (7.x) wraps metric output in a
{"gpu_data": [...]} envelope instead of returning a raw list. This
caused get_primary_gpu_utilization() and get_visible_gpu_utilization()
to fail silently (returning available=False) because the GPU data
dict was never unwrapped.

Additionally:
- VRAM data moved from "vram" to "mem_usage" with "total_vram" /
  "used_vram" keys. Added fallback key lookup.
- Temperature "edge" sensor returns "N/A" on MI300X VF; the previous
  dict.get() chain returned the "N/A" string instead of falling
  through to "hotspot". Changed to a loop that checks each key until
  a parseable value is found.

Tested on AMD Instinct MI300X VF (ROCm 7.2, amd-smi 24.x):
- GPU utilization: 0% (idle), up to 100% during training
- Temperature: 40-44C (from hotspot sensor)
- VRAM: 0.28/191.69 GB (idle)
- Power: 158-211W draw

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Bug fix detecting radeon (#4940)

* Bug fix detecting radeon

* Expanding GPU target for gfx1100*

* Generalize gfx family-prefix filter to cover gfx10/gfx12 as well

rocminfo on ROCm 6.1+ emits LLVM generic-family ISA lines alongside the
specific GPU (e.g. gfx11-generic next to gfx1100). The outer grep captures
the bare family prefix from the generic line, and passing that to
-DGPU_TARGETS breaks the HIP build because clang only accepts specific
gfxNNN ids.

The previous filter only special-cased gfx11. Generalize it so any bare
2-digit family prefix (gfx10, gfx11, gfx12, ...) is dropped whenever a
specific sibling target is present in the same list. No real AMD GPU has
a 2-digit gfx id, so the filter can only ever drop family prefixes and
never a real target.

Covers the existing gfx11 cases unchanged, and extends the same fix to
gfx10-1-generic / gfx10-3-generic (RDNA1/2) and gfx12-generic (RDNA4),
which would otherwise hit the same build failure on newer rocminfo.

---------

Co-authored-by: Iswarya Alex <iswarya.alex@amd.com>
Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>

---------

Co-authored-by: Eda Z <eda.zhou@amd.com>
Co-authored-by: GoldenGrapeGentleman <yueyuan@amd.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: billishyahao <bill.he@amd.com>
Co-authored-by: Iswarya Alex <47045679+iswaryaalex@users.noreply.github.com>
Co-authored-by: Iswarya Alex <iswarya.alex@amd.com>
Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-04-10 01:56:12 -07:00
Roland Tannous
33503ea248
Revert "updated models template mappers. added lfm2.5vl450m to transformers 5…" (#4945)
This reverts commit bcf4fd6bd3.
2026-04-09 23:14:57 -07:00
Roland Tannous
bcf4fd6bd3
updated models template mappers. added lfm2.5vl450m to transformers 5… (#4939)
* updated models template mappers. added lfm2.5vl450m to transformers 5.3.0 whitelist

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-09 23:36:42 +04:00
Ricardo-M-L
d5525e8bbb
fix: check find() return value before adding offset in try_fix_tokenizer (#4923)
* fix: check find() return value before adding offset in try_fix_tokenizer

The `str.find()` result was checked for -1 only after adding
`len(find_text)`, turning the guard into dead code. When the substring
is absent, `start` becomes `len(find_text) - 1` (a positive number),
so the `if start == -1: continue` never triggers and the subsequent
slice extracts garbage from the tokenizer string.

Split the find and offset into two steps so the -1 check works correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add defensive guards for token_id None and end find() returning -1

- Skip loop iteration early when token_id is None to avoid constructing
  a find_text that can never match valid JSON
- Guard end = tokenizer_string.find('",', start) against -1 to prevent
  silent garbage extraction from malformed tokenizer strings

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-09 06:15:46 -07:00
Lee Jackson
dc16e0c65b
Studio: keep chat input visible and fix compare pane clipping (#4924)
* fix(chat): sticky composer bar in thread

* fix(chat): fix compare pane clipping

* fix(chat): tighten scroll-to-bottom placement and compare footer spacing

* Fix TypeScript build break and clean up ViewportFooter classes

- Remove unused `compact` prop from ThreadScrollToBottom call site
  (component is FC with no props, passing it caused TS2322)
- Extract shared classes (sticky, bottom-0, z-20, bg-transparent) from
  ternary branches into the unconditional className string
- Restore `relative` on normal-mode footer so the inner absolute
  bg-background strip has a positioning context
- Remove redundant md:pb-3 / md:pb-4 (same value as base pb-3 / pb-4)
- Remove no-op `sticky bottom-0` from SharedComposer wrapper in both
  LoraCompareContent and GeneralCompareContent (flex layout with
  shrink-0 already pins it at the bottom; parent has no scrollable
  overflow for sticky to bind to)
- Fix truncated comment on pointer-events rationale

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-09 06:00:56 -07:00
kiankyars
ad5972492d
Fix raw text paragraph break normalization (#4884)
* Fix raw text paragraph break normalization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalize horizontal whitespace before stripping non-ASCII and collapse leftover doubles

Run the [^\S\n]+ horizontal-whitespace collapse before the non-ASCII strip
so that Unicode whitespace (\u00A0, \u202F, \u2009, \u3000, \v, \f, etc.)
becomes a single ASCII space instead of being deleted outright. The prior
ordering silently merged adjacent words on HTML/PDF/OCR-sourced text:
"hello\u00a0world" used to produce "helloworld" after this PR; it now
produces "hello world".

Also drop \t from the allow-list since the horizontal-whitespace collapse
already normalizes tabs to a single space, and add a targeted [ ]{2,} pass
right after the non-ASCII strip so that a non-whitespace non-ASCII character
sitting between two spaces ("word1 (c) word2") does not leave an interior
double space. Without this extra pass, clean_text was not idempotent on
such inputs: the first call produced "word1  word2" and only the second
call collapsed it to "word1 word2". Fuzz testing over 10000 random inputs
now satisfies the idempotence invariant in every case.

* Add regression tests for Unicode/control whitespace and non-ASCII edge cases

Cover:
- Unicode horizontal whitespace separators (NBSP, narrow NBSP, thin space,
  en/em space, ideographic space, vertical tab, form feed) normalizing to
  a single ASCII space instead of being deleted.
- Mixed paragraph + Unicode whitespace realistic input ("Section\u00a01\r\n\r\nBody\ftext\u202Fhere").
- Tab collapsing and space trimming around newlines.
- Non-whitespace non-ASCII characters (copyright, accented letters, emoji)
  sitting between spaces: must not leave an interior double space, and
  clean_text must be idempotent on these inputs.
- Non-ASCII characters adjacent to a newline: stripping must not leave
  stray leading or trailing spaces on the neighbouring line, and must not
  swallow an adjacent paragraph break.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-09 04:45:43 -07:00
cheehook
7aa442289b
Fix Mistral DPO/preference training crash on non-xformers platforms (e.g. Intel XPU) (#4889)
* Fix Mistral training crash when xformers is unavailable

* Fix/adjust Mistral DPO training crash fix for PR #4889

- Clarify comment in MistralForCausalLM_fast_forward: the DPO embed-masking
  block runs BEFORE attention_mask is nulled out, and it is the consumer that
  requires a 2D mask.
- Add defensive attention_mask.ndim == 2 guard to the LlamaModel_fast_forward
  DPO embed-masking block so it self-protects if a 4D mask ever reaches it.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-09 04:38:44 -07:00
Daniel Han
da2ef6dce6
Only run ldconfig CUDA-linking recovery when we have permission (#4930)
* Only run ldconfig CUDA-linking recovery when we have permission

When `import unsloth` runs on a non-root environment (shared HPC,
locked-down container, CI runner, etc.) the CUDA-linking recovery path
shells out to `os.system("ldconfig /usr/lib64-nvidia")`, which fails
loudly with "Permission denied". It's especially noisy for users who
don't even have bitsandbytes installed - they're doing 16bit or full
finetuning and the line immediately above told them "16bit and full
finetuning works!". The reason the recovery runs at all in that case
is that `bnb.functional.lib.cdequantize_blockwise_fp32` raises
AttributeError on `bnb is None`, the bare `except:` swallows it, and
the code drops into the recovery unconditionally.

Fix: gate the recovery body on `os.geteuid() == 0`. When we don't
have permission to run ldconfig, silently skip the recovery. When we
do, the recovery runs UNCHANGED - same `os.system()` calls, same
reload + retry, same warnings. `libcuda_dirs()` is used by both triton
and bitsandbytes, so we still want to run the recovery whenever we
have permission, regardless of whether bnb is installed.

For non-root users who DO have bitsandbytes installed and broken,
emit a single remediation warning telling them how to fix it manually
(`sudo ldconfig /usr/lib64-nvidia`). This preserves the diagnostic
guidance from the original code without the Permission denied noise.

Scope:
- Only the `DEVICE_TYPE == "cuda"` branch is touched.
- The `hip` (AMD ROCm) and `xpu` (Intel) branches are unchanged.
- On a real CUDA box running as root, behavior is byte-identical to
  main: same os.system() calls, same reload, same retry, same warnings.
  AST-verified by /tmp/verify_minimal/verify.py.
- `hasattr(os, "geteuid")` guards against Windows where `os.geteuid`
  doesn't exist.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Han <info@unsloth.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-09 00:07:25 -07:00
dependabot[bot]
5fa8683b27
build(deps): bump the bun-frontend group across 1 directory with 16 updates (#4586)
* build(deps): bump the bun-frontend group across 1 directory with 16 updates

Bumps the bun-frontend group with 16 updates in the /studio/frontend directory:

| Package | From | To |
| --- | --- | --- |
| [@dagrejs/dagre](https://github.com/dagrejs/dagre) | `2.0.4` | `3.0.0` |
| [@dagrejs/graphlib](https://github.com/dagrejs/graphlib) | `3.0.4` | `4.0.1` |
| @hugeicons/core-free-icons | `3.3.0` | `4.0.0` |
| [@streamdown/cjk](https://github.com/vercel/streamdown/tree/HEAD/packages/streamdown-cjk) | `1.0.2` | `1.0.3` |
| [@streamdown/code](https://github.com/vercel/streamdown/tree/HEAD/packages/streamdown-code) | `1.0.2` | `1.1.1` |
| [lucide-react](https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react) | `0.577.0` | `1.6.0` |
| [recharts](https://github.com/recharts/recharts) | `3.7.0` | `3.8.0` |
| [shadcn](https://github.com/shadcn-ui/ui/tree/HEAD/packages/shadcn) | `3.8.5` | `4.1.0` |
| [streamdown](https://github.com/vercel/streamdown/tree/HEAD/packages/streamdown) | `2.3.0` | `2.5.0` |
| [@biomejs/biome](https://github.com/biomejs/biome/tree/HEAD/packages/@biomejs/biome) | `1.9.4` | `2.4.8` |
| [@eslint/js](https://github.com/eslint/eslint/tree/HEAD/packages/js) | `9.39.4` | `10.0.1` |
| [@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node) | `24.12.0` | `25.5.0` |
| [eslint](https://github.com/eslint/eslint) | `9.39.4` | `10.1.0` |
| [eslint-plugin-react-refresh](https://github.com/ArnaudBarre/eslint-plugin-react-refresh) | `0.4.26` | `0.5.2` |
| [globals](https://github.com/sindresorhus/globals) | `16.5.0` | `17.4.0` |
| [typescript](https://github.com/microsoft/TypeScript) | `5.9.3` | `6.0.2` |



Updates `@dagrejs/dagre` from 2.0.4 to 3.0.0
- [Release notes](https://github.com/dagrejs/dagre/releases)
- [Changelog](https://github.com/dagrejs/dagre/blob/master/changelog.md)
- [Commits](https://github.com/dagrejs/dagre/compare/v2.0.4...v3.0.0)

Updates `@dagrejs/graphlib` from 3.0.4 to 4.0.1
- [Release notes](https://github.com/dagrejs/graphlib/releases)
- [Changelog](https://github.com/dagrejs/graphlib/blob/master/changelog.md)
- [Commits](https://github.com/dagrejs/graphlib/compare/v3.0.4...v4.0.1)

Updates `@hugeicons/core-free-icons` from 3.3.0 to 4.0.0

Updates `@streamdown/cjk` from 1.0.2 to 1.0.3
- [Release notes](https://github.com/vercel/streamdown/releases)
- [Changelog](https://github.com/vercel/streamdown/blob/main/packages/streamdown-cjk/CHANGELOG.md)
- [Commits](https://github.com/vercel/streamdown/commits/@streamdown/cjk@1.0.3/packages/streamdown-cjk)

Updates `@streamdown/code` from 1.0.2 to 1.1.1
- [Release notes](https://github.com/vercel/streamdown/releases)
- [Changelog](https://github.com/vercel/streamdown/blob/main/packages/streamdown-code/CHANGELOG.md)
- [Commits](https://github.com/vercel/streamdown/commits/@streamdown/code@1.1.1/packages/streamdown-code)

Updates `lucide-react` from 0.577.0 to 1.6.0
- [Release notes](https://github.com/lucide-icons/lucide/releases)
- [Commits](https://github.com/lucide-icons/lucide/commits/1.6.0/packages/lucide-react)

Updates `recharts` from 3.7.0 to 3.8.0
- [Release notes](https://github.com/recharts/recharts/releases)
- [Changelog](https://github.com/recharts/recharts/blob/main/CHANGELOG.md)
- [Commits](https://github.com/recharts/recharts/compare/v3.7.0...v3.8.0)

Updates `shadcn` from 3.8.5 to 4.1.0
- [Release notes](https://github.com/shadcn-ui/ui/releases)
- [Changelog](https://github.com/shadcn-ui/ui/blob/main/packages/shadcn/CHANGELOG.md)
- [Commits](https://github.com/shadcn-ui/ui/commits/shadcn@4.1.0/packages/shadcn)

Updates `streamdown` from 2.3.0 to 2.5.0
- [Release notes](https://github.com/vercel/streamdown/releases)
- [Changelog](https://github.com/vercel/streamdown/blob/main/packages/streamdown/CHANGELOG.md)
- [Commits](https://github.com/vercel/streamdown/commits/streamdown@2.5.0/packages/streamdown)

Updates `@biomejs/biome` from 1.9.4 to 2.4.8
- [Release notes](https://github.com/biomejs/biome/releases)
- [Changelog](https://github.com/biomejs/biome/blob/main/packages/@biomejs/biome/CHANGELOG.md)
- [Commits](https://github.com/biomejs/biome/commits/@biomejs/biome@2.4.8/packages/@biomejs/biome)

Updates `@eslint/js` from 9.39.4 to 10.0.1
- [Release notes](https://github.com/eslint/eslint/releases)
- [Commits](https://github.com/eslint/eslint/commits/v10.0.1/packages/js)

Updates `@types/node` from 24.12.0 to 25.5.0
- [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases)
- [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node)

Updates `eslint` from 9.39.4 to 10.1.0
- [Release notes](https://github.com/eslint/eslint/releases)
- [Commits](https://github.com/eslint/eslint/compare/v9.39.4...v10.1.0)

Updates `eslint-plugin-react-refresh` from 0.4.26 to 0.5.2
- [Release notes](https://github.com/ArnaudBarre/eslint-plugin-react-refresh/releases)
- [Changelog](https://github.com/ArnaudBarre/eslint-plugin-react-refresh/blob/main/CHANGELOG.md)
- [Commits](https://github.com/ArnaudBarre/eslint-plugin-react-refresh/compare/v0.4.26...v0.5.2)

Updates `globals` from 16.5.0 to 17.4.0
- [Release notes](https://github.com/sindresorhus/globals/releases)
- [Commits](https://github.com/sindresorhus/globals/compare/v16.5.0...v17.4.0)

Updates `typescript` from 5.9.3 to 6.0.2
- [Release notes](https://github.com/microsoft/TypeScript/releases)
- [Commits](https://github.com/microsoft/TypeScript/compare/v5.9.3...v6.0.2)

---
updated-dependencies:
- dependency-name: "@dagrejs/dagre"
  dependency-version: 3.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: "@dagrejs/graphlib"
  dependency-version: 4.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: "@hugeicons/core-free-icons"
  dependency-version: 4.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: "@streamdown/cjk"
  dependency-version: 1.0.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: bun-frontend
- dependency-name: "@streamdown/code"
  dependency-version: 1.1.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: bun-frontend
- dependency-name: lucide-react
  dependency-version: 1.6.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: recharts
  dependency-version: 3.8.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: bun-frontend
- dependency-name: shadcn
  dependency-version: 4.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: streamdown
  dependency-version: 2.5.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: bun-frontend
- dependency-name: "@biomejs/biome"
  dependency-version: 2.4.8
  dependency-type: direct:development
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: "@eslint/js"
  dependency-version: 10.0.1
  dependency-type: direct:development
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: "@types/node"
  dependency-version: 25.5.0
  dependency-type: direct:development
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: eslint
  dependency-version: 10.1.0
  dependency-type: direct:development
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: eslint-plugin-react-refresh
  dependency-version: 0.5.2
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: bun-frontend
- dependency-name: globals
  dependency-version: 17.4.0
  dependency-type: direct:development
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: typescript
  dependency-version: 6.0.2
  dependency-type: direct:development
  update-type: version-update:semver-major
  dependency-group: bun-frontend
...

Signed-off-by: dependabot[bot] <support@github.com>

* Revert dagrejs upgrades

Keep @dagrejs/dagre at ^2.0.4 and @dagrejs/graphlib at ^3.0.4.

* Revert biome, eslint, typescript, and recharts upgrades

These upgrades break studio/frontend locally:

- @biomejs/biome 2.4.10 fails to parse the existing biome.json
  (files.ignore and organizeImports keys removed in v2; schema
  version mismatch).
- typescript 6.0.2 emits TS5101 on tsconfig.app.json baseUrl
  ("Option 'baseUrl' is deprecated and will stop functioning in
  TypeScript 7.0"), so tsc -b exits 2.
- eslint 10.2.0 conflicts with eslint-plugin-react-hooks@7.0.1,
  which peers on eslint ^9; npm install fails with ERESOLVE.
- recharts 3.8.1 widened LegendPayload.dataKey to include a
  function type, which breaks the React key={item.dataKey} usage
  in src/components/ui/chart.tsx (TS2322).

Hold these at their current pinned versions until the upstream
peer deps and config migrations are ready.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-08 04:34:33 -07:00
Wasim Yousef Said
8e977445d4
Let recipes use the model loaded in Chat (#4840)
* feat: inject local model provider into recipe jobs via JWT

* feat: auto-generate JWT for local model providers in recipes

* feat: add is_local flag to model provider config types and utils

* fix(studio): skip endpoint validation for local providers

* feat(studio): add local/external model source toggle to provider dialog

* feat(studio): thread localProviderNames through model config dialog chain

* feat(studio): show 'Local model (Chat)' label for local model_provider configs

* fix: hardcode loopback for local endpoint, clear stale creds on toggle

* fix: document TOCTOU/JWT rotation, add deferred import comments, fix is_local serialization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix(studio): clear stale local model state on provider toggle and validation

* fix(studio): override empty local endpoint in validation and skip model gate for unused providers

* fix(studio): resolve loopback port from app.state, clear stale local provider fields, sync model id on toggle

Address review feedback on the local-model-provider flow:

- Backend (jobs.py): _resolve_local_v1_endpoint now reads the actual bound
  port from app.state.server_port (set in run.py after binding) instead of
  parsing it out of request.base_url, which is wrong behind any reverse
  proxy or non-default port. The two duplicated urlparse blocks are gone.
- Backend (jobs.py): defensively pop api_key_env, extra_headers, extra_body
  from local providers so a previously external provider that flipped to
  local cannot leak invalid JSON or rogue auth headers into the local /v1
  call. Also dedupe the post-loop assignment and tighten the local-name
  intersection so empty names cannot match.
- Backend (jobs.py): hoist datetime and urllib.parse imports to the top
  import block for consistency with the rest of the file.
- Backend (run.py): expose the bound port on app.state.server_port after
  the uvicorn server is constructed.
- Frontend (model-provider-dialog.tsx): clear extra_headers and extra_body
  when toggling to local mode. Hidden inputs would otherwise keep stale
  JSON blocking validate/run.
- Frontend (model-config-dialog.tsx): factor the local-aware provider
  selection logic into applyProviderChange and call it from both
  onValueChange and onBlur, so manually typing a provider name and tabbing
  away keeps the model field consistent.
- Frontend (recipe-studio.ts store): handle both directions of the
  is_local toggle in the cascade. external -> local now backfills
  model: "local" on already-linked model_configs so they pass validation
  immediately, mirroring the existing local -> external clear path.
- Frontend (validate.ts + build-payload.ts): thread localProviderNames
  into validateModelConfigProviders and skip the "model is required"
  check for local-linked configs. Local providers do not need a real
  model id since the inference endpoint uses the loaded Chat model.

* fix(studio): narrow store cascade types, sync model placeholder on graph relink and node removal, harden ephemeral port path

Loop 2 review fixes:

- recipe-studio.ts: type-narrow next.is_local by also checking
  next.kind === "model_provider". TS otherwise raised TS2339 because
  next was typed as the union NodeConfig after the spread. The behavior
  is unchanged but the code now compiles cleanly.
- model-config-dialog.tsx: convert the lastProviderRef / providerInputRef
  ref-during-render pattern (pre-existing react-hooks/refs lint error)
  to a useEffect that syncs providerInputRef from config.provider. The
  combobox blur path still uses applyProviderChange and remains stable.
- recipe-graph-connection.ts: when a graph drag links a model_provider
  to a model_config, mirror the dialog applyProviderChange behavior:
  fill model: "local" if the new provider is local and the model field
  is blank, clear model when relinking from a local placeholder to an
  external provider, otherwise leave the model alone.
- reference-sync.ts: when a referenced provider node is removed, clear
  the synthetic model: "local" placeholder along with the provider
  field, so a future relink to an external provider does not pass
  validation with a stale value that fails at runtime.
- run.py: only publish app.state.server_port when the bound port is a
  real positive integer; for ephemeral binds (port==0) leave it unset
  and let request handlers fall back to request.base_url.
- jobs.py: _resolve_local_v1_endpoint also falls back when
  app.state.server_port is non-positive, and uses `is None` instead of
  the truthy fallback so a literal 0 is handled correctly.

* fix(studio): strict is_local check, narrow loaded-model gate to LLM-reachable configs, add scope-server port fallback

Loop 3 review fixes:

- jobs.py, validate.py: require `is_local is True` instead of truthy
  check. Malformed payloads such as is_local: "false" or is_local: 1
  would otherwise be treated as local and silently rewritten to the
  loopback endpoint.
- jobs.py: _resolve_local_v1_endpoint now tries request.scope["server"]
  (the actual uvicorn-assigned (host, port) tuple) as a second
  resolution step before falling back to parsing request.base_url.
  This covers direct-uvicorn startup paths and ephemeral binds that
  never publish app.state.server_port.
- jobs.py: new _used_llm_model_aliases helper collects the set of
  model_aliases that an LLM column actually references, and the
  "Chat model loaded" gate is now only triggered when a local
  provider is reachable from that set. Orphan model_config nodes on
  the canvas no longer block unrelated recipe runs.

* fix(studio): force skip_health_check on local-linked configs, skip JSON parsing for local providers, local-aware inline editor

Loop 4 review fixes:

- jobs.py: after rewriting local providers, also force
  skip_health_check: true on any model_config linked to a local
  provider. The /v1/models endpoint only advertises the real loaded
  model id, so data_designer's default model-availability health check
  would otherwise fail against the placeholder "local" id before the
  first chat completion call. The inference route already ignores the
  model id in chat completions, so skipping the check is safe.
- builders-model.ts: buildModelProvider now short-circuits for local
  providers and emits only { name, endpoint: "", provider_type, is_local }
  without running parseJsonObject on the hidden extra_headers/extra_body
  inputs. Imported or hydrated recipes with stale invalid JSON in those
  fields no longer block client-side validate/run.
- inline-model.tsx: the model_config branch now accepts an optional
  localProviderNames prop and mirrors the dialog applyProviderChange
  behavior. Changing provider to/from a local one auto-fills or clears
  the "local" placeholder consistently with the other edit paths.
- recipe-graph-node.tsx: derive localProviderNames from the store via
  useMemo (stable identity) and pass it through renderNodeBody to
  <InlineModel>. Hooks order is preserved by declaring them above the
  early return for markdown_note nodes.
- run.py: minor comment tweak - loop 3 already added the scope-server
  fallback path, note that in the comment.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: danielhanchen <info@unsloth.ai>
2026-04-08 03:48:22 -07:00
Daniel Han
c3d2d58046
Update dependabot.yml (#4915) 2026-04-08 03:39:50 -07:00
dependabot[bot]
0087515d5c
build(deps): bump oxc-parser (#4776)
Bumps the npm-oxc-validator group in /studio/backend/core/data_recipe/oxc-validator with 1 update: [oxc-parser](https://github.com/oxc-project/oxc/tree/HEAD/napi/parser).


Updates `oxc-parser` from 0.121.0 to 0.123.0
- [Release notes](https://github.com/oxc-project/oxc/releases)
- [Changelog](https://github.com/oxc-project/oxc/blob/main/napi/parser/CHANGELOG.md)
- [Commits](https://github.com/oxc-project/oxc/commits/crates_v0.123.0/napi/parser)

---
updated-dependencies:
- dependency-name: oxc-parser
  dependency-version: 0.123.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: npm-oxc-validator
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 03:35:40 -07:00
dependabot[bot]
67e9db4921
build(deps): bump oxc-parser (#4776)
Bumps the npm-oxc-validator group in /studio/backend/core/data_recipe/oxc-validator with 1 update: [oxc-parser](https://github.com/oxc-project/oxc/tree/HEAD/napi/parser).


Updates `oxc-parser` from 0.121.0 to 0.123.0
- [Release notes](https://github.com/oxc-project/oxc/releases)
- [Changelog](https://github.com/oxc-project/oxc/blob/main/napi/parser/CHANGELOG.md)
- [Commits](https://github.com/oxc-project/oxc/commits/crates_v0.123.0/napi/parser)

---
updated-dependencies:
- dependency-name: oxc-parser
  dependency-version: 0.123.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: npm-oxc-validator
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 03:35:33 -07:00
pre-commit-ci[bot]
c2184af079
[pre-commit.ci] pre-commit autoupdate (#4879)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.15.8 → v0.15.9](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.8...v0.15.9)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-07 22:50:48 -07:00
Roland Tannous
f801e59c29
split venv_t5 into tiered 5.3.0/5.5.0 and fix trust_remote_code (#4878)
* split venv_t5 into venv_t5_530 and venv_t5_550 for tiered transformers 5.x support

* fix bfloat16 crash on T4 for FORCE_FLOAT32 models and disable trust_remote_code auto-enable for native t5 models

* revert FORCE_FLOAT32 dtype change

* restrict trust_remote_code auto-enable to Nemotron models only

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use config.json model_type for tier detection, add unsloth/nvidia namespace guard

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

This reverts commit fb43d468e2.

* Revert "use config.json model_type for tier detection, add unsloth/nvidia namespace guard"

This reverts commit fc49ae2453.

* add unsloth/nvidia namespace guard to Nemotron trust_remote_code auto-enable

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* reorder tier checks: all substring matches before config.json fetches

* extract shared activate_transformers_for_subprocess into transformers_version.py

* narrow Nemotron trust_remote_code to nemotron_h/nemotron-3-nano, add to export worker

* clean venv_t5 dirs before re-install in setup.sh, clarify version alias comment

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* run venv_t5 migration outside deps fast-path gate in both setup scripts

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-07 20:05:01 +04:00
Daniel Han
1d8160376e
Bump minimum unsloth version to 2026.4.4 in install scripts (#4876) 2026-04-06 09:46:35 -07:00
Daniel Han
b295daf932 Update _utils.py 2026-04-06 09:39:06 -07:00
Lee Jackson
8c89b84bb6
Studio: Fix empty chat threads on navigation and stabilize new chat flow (#4872)
* fix(chat): prevent implicit empty thread creation and stabilize new-chat flow

* fix(chat): harden compare thread sync and simplify sidebar thread query

* fix(chat): harden new-thread state sync and isolate compare active thread updates

* fix(chat): stabilize new-thread state sync and prevent compare/session bleed

* Fix thread restoration, handleNewThread guard, sidebar filter, and delete flow

- Remove __LOCALID_ filter from getInitialSingleChatView: in this
  Dexie-backed adapter, AUI's __LOCALID_ prefixed IDs ARE the real
  persistent thread IDs stored by initialize(). Filtering them out
  breaks thread restoration on navigation.

- Simplify handleNewThread to synchronous: the async Dexie message
  check is redundant (persistence is already deferred to first append)
  and strands users on legacy empty threads. Use a simple guard that
  checks the store's activeThreadId to detect unsent drafts.

- Add message-count filter to sidebar: filter threads to only show
  those with at least one message, hiding legacy empty threads.

- Add store-based sidebar highlighting fallback: use activeThreadId
  from the store when view.threadId is not set (nonce-backed chats).

- Fix handleDelete to call onNewThread() instead of onSelect(), and
  clear activeThreadId, so the runtime properly resets after deleting
  the active thread.

* Fix handleDelete nonce path and restore __LOCALID_ filter

handleDelete was calling onNewThread() after clearing activeThreadId,
but the handleNewThread guard sees !view.threadId && !activeThreadId
and returns early, leaving the UI stuck on the deleted thread.
Fix by directly calling onSelect with a new nonce instead.

Restore __LOCALID_ filter in getInitialSingleChatView to prevent
restoring unpersisted AUI local thread IDs on navigation. Without
this filter, navigating away from /chat before sending a message
would restore a non-existent thread that Dexie cannot fetch.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-06 09:32:54 -07:00
Daniel Han
4c83e3540e Update 2026-04-06 09:20:17 -07:00
Daniel Han
723bfb2363
Add unit tests for HfFileSystem glob skip guard (#4854)
Tests verifying that HfFileSystem().glob() is correctly skipped when
is_model or is_peft is False, matching the guard added in PR #4852.
2026-04-06 08:54:36 -07:00
JYYYYYT
aa4c6010e1
fix(studio): custom folder scan fails to find GGUF variants when pointing directly at a model directory (#4860)
Fix custom folder scanning when pointing directly at a model directory.

When a user adds a custom scan folder that points directly at a model
directory (e.g. /path/to/gemma-4-e2b-it-gguf/ containing config.json
and gemma-4-E2B-it-BF16.gguf), the model list previously showed
individual .gguf files as separate entries instead of recognizing the
directory as a single model. Clicking any entry showed "No GGUF
variants found" because list_local_gguf_variants received a file path
and immediately returned empty.

Changes:
- Add _is_model_directory() helper that detects directories with both
  config metadata and actual model weight files (excludes mmproj GGUFs
  and non-weight .bin files like tokenizer.bin)
- _scan_models_dir: detect self-model and return single directory entry
- _scan_lmstudio_dir: surface model directories directly instead of
  descending into them as publisher folders; handle both root and child
  model directories
- Add _resolve_gguf_dir() helper for GGUF path resolution that only
  falls back to parent directory when parent has model metadata
- list_local_gguf_variants / _find_local_gguf_by_variant: use resolver
  so .gguf file paths inside model directories work correctly
2026-04-06 08:31:07 -07:00
Roland Tannous
0835f0a61b
fix: skip redundant HfFileSystem().glob() calls in loader.py (#4852)
* fix: skip redundant HfFileSystem().glob() calls in loader.py

Guard the SUPPORTS_LLAMA32 glob blocks with `is_model and is_peft` so
the HfFileSystem HTTP call is only made when both configs could actually
exist. This prevents indefinite hangs on slow/unreliable networks since
the glob result is redundant when either AutoConfig or PeftConfig
already failed to load.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove test file from main PR - moved to separate PR

Tests for the glob skip guard belong in their own PR to keep
the loader change minimal and reviewable.

* Harden HfFileSystem glob: fix Windows path splitting, add try/except

- Use str.rsplit("/", 1) instead of os.path.split to extract filenames
  from HfFileSystem paths. HfFileSystem always returns POSIX-style paths,
  but os.path.split uses the OS separator, so on Windows the entire path
  was returned as the "filename" and the config name comparison always
  failed.
- Wrap the HfFileSystem().glob() call in try/except to gracefully handle
  network failures (offline mode, timeouts, unreachable Hub). On failure
  both_exist stays False, which is the safe default.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove redundant HfFileSystem().glob() call for remote repos

When is_model and is_peft are both True, AutoConfig and PeftConfig
have already loaded successfully, proving both config.json and
adapter_config.json exist. The HfFileSystem network call to re-verify
this was redundant and could cause hangs on slow networks.

Replace the glob + try/except block with a direct both_exist = True
assignment.

* Remove unused HfFileSystem import

HfFileSystem was only used for the glob() calls that were replaced
with direct both_exist = True assignments in the previous commit.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-06 07:46:39 -07:00
Daniel Han
07b6fcc344
Remove Gemma-4 from FORCE_FLOAT32 (#4875)
Gemma-4 does not need FORCE_FLOAT32. Testing shows that both float16 and
bfloat16 work correctly without the forced float32 override:

- Inference: identical outputs for float16 and bfloat16 (greedy decoding)
- Training (100 steps, 4-bit LoRA, SFT on FineTome-100k):
  - float16 final loss: 3.048
  - bfloat16 final loss: 3.065
  - Losses converge to within 0.02 by step 60
  - Grad norms healthy and comparable for both dtypes

The FORCE_FLOAT32 path was actually causing training divergence. With
it enabled, the compiled float32 run diverged at step ~28 with grad norms
collapsing to near zero and loss plateauing at ~12.4. Without it, both
dtypes train normally.

This enables float16 on Tesla T4 and other GPUs without bfloat16 support.
2026-04-06 07:33:28 -07:00
Daniel Han
ab65b47c73
Add tests for is_vision_model() caching behaviour (#4855)
* Add tests for is_vision_model() caching behaviour

* Fix review feedback: remove dead helper, fix exception test

- Remove unused _make_config() helper function (dead code)
- Fix test_exception_result_cached to actually exercise the exception path
  by mocking load_model_config to raise OSError instead of using
  side_effect=[False] which only tested normal False returns

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use strict mock specs so tests exercise intended detection paths

Use MagicMock(spec=[]) for all config mocks so hasattr() only returns
True for explicitly set attributes. Without this, MagicMock defaults
make all hasattr checks truthy, allowing tests to pass via unintended
detection paths (e.g. img_processor instead of vision_config).

---------

Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-06 06:41:40 -07:00
Roland Tannous
278f462996
[Studio][Optimization]Add vision detection cache to is_vision_model() (#4853)
* Add vision detection cache to is_vision_model() to avoid redundant subprocess spawns

is_vision_model() is called 4-5 times per training run for the same model
with zero caching. For transformers 5.x models, each call spawns a full
subprocess (~6s each). This adds a module-level _vision_detection_cache dict
following the same pattern as the existing _audio_detection_cache used by
detect_audio_type(). The function is refactored into a thin cache wrapper
around _is_vision_model_uncached(), saving ~12s per training run.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Include hf_token in vision cache key for gated model correctness

Cache key is now (model_name, hf_token) instead of just model_name.
This prevents stale False results when an unauthenticated probe for a
gated model is followed by an authenticated call.

* Remove test file from main PR - will be submitted separately

* Fix vision cache: normalize model names and skip caching transient failures

- Normalize model names in cache key using resolve_cached_repo_id_case()
  to avoid duplicate entries for different casings of the same HF repo
  (aligns with case normalization from #4822)
- Return None instead of False on transient failures (network errors,
  subprocess timeouts, HF API issues) so the cache layer can distinguish
  "definitely not a vision model" from "failed to check"
- Only cache definitive True/False results; transient failures are retried
  on the next call instead of being permanently locked in as False

* Refine failure handling: cache deterministic failures, guard normalization

- Subprocess non-zero exit, JSON errors, and general exceptions return
  False (deterministic, cached) instead of None (retryable). Only
  subprocess.TimeoutExpired returns None since timeouts are transient.
- Wrap cache key normalization in try/except so resolve_cached_repo_id_case
  or normalize_path failures fall back to raw model_name instead of
  crashing callers.

* Harden vision detection cache: fix transient failure handling, thread safety, token security

- All subprocess failure paths now return None (transient) instead of False,
  preventing permanent misclassification of VLMs after temporary HF/auth/network errors
- Use SHA256 fingerprint for hf_token in cache key instead of raw bearer token
- Add threading.Lock with double-checked locking to prevent thundering herd
  of concurrent subprocess spawns for the same uncached model
- Distinguish permanent failures (RepositoryNotFoundError, GatedRepoError,
  ValueError) from transient ones in _is_vision_model_uncached
- Pass resolved/normalized model name to detection (not just cache key)
- Log normalization fallback at debug level instead of silent swallow
- Thread hf_token through callers in routes/models.py and trainer.py
  that previously omitted it

* Refine lock strategy and token fingerprint

- Move detection computation outside the lock to avoid serializing
  long-running subprocess spawns (60s timeout) and HF API calls across
  all concurrent model checks. Lock is now only held for cache writes.
- Use full SHA256 digest for token fingerprint instead of truncated
  16-char prefix to eliminate collision risk.

* Fix huggingface_hub import fallback and use atomic cache read

- Add fallback import path for RepositoryNotFoundError/GatedRepoError
  from huggingface_hub.utils (older hub versions) when .errors is
  not available
- Use sentinel-based dict.get() for single atomic cache read instead
  of two-step in/[] pattern (future-proof for no-GIL runtimes)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-06 06:41:20 -07:00
Leo Borcherding
68965988cf
Fix/studio colab button message: Add fallback message for Colab Studio button when proxy URL fails (#4866)
* Add fallback message for Colab Studio button when localhost link doesn't work

* Make fallback message darker grey for better readability

* Make fallback message bold for better visibility

---------

Co-authored-by: LeoBorcherding <LeoBorcherding@users.noreply.github.com>
2026-04-05 21:57:45 -07:00
Daniel Han
6100867447
Bump minimum unsloth version to 2026.4.2 in install scripts (#4842) 2026-04-03 15:14:28 -07:00
Daniel Han
170c4b9b99 Update _utils.py 2026-04-03 15:02:14 -07:00
Daniel Han
4020a70a93
Add tests for cache case resolution (from PR #4822) (#4823)
Tests for resolve_cached_repo_id_case and get_model_config case
resolution, separated from the runtime changes in PR #4822.
2026-04-03 13:58:26 -07:00
Daniel Han
4f65cc94bc
Add Gemma 4 model sampling defaults (#4838)
Add per-model YAML configs and MODEL_NAME_MAPPING entries for all 8
Gemma 4 models (4 instruct + 4 base):
- gemma-4-31B-it / gemma-4-31B
- gemma-4-26B-A4B-it / gemma-4-26B-A4B
- gemma-4-E2B-it / gemma-4-E2B
- gemma-4-E4B-it / gemma-4-E4B

GGUF variants (only for -it models) resolve via the gemma-4 family
entry in inference_defaults.json.

Sampling defaults: temperature=1.0, top_p=0.95, top_k=64, min_p=0.0,
no repetition or presence penalty. Matches gemma-3n and gemma-3.
2026-04-03 13:57:15 -07:00
Daniel Han
a32b871f0e
studio: add speculative decoding support (ngram-mod, on by default) (#4836)
* studio: add speculative decoding support (ngram-mod, on by default)

Enable n-gram speculative decoding for GGUF models in Unsloth Studio.
Uses llama.cpp's ngram-mod mode which gives 10-40% faster generation
with zero VRAM cost via a 4MB fixed hash table that auto-resets on
low acceptance rates.

Backend:
- Add speculative_type field to LoadRequest, LoadResponse, and
  InferenceStatusResponse pydantic models
- Add speculative_type parameter to LlamaCppBackend.load_model()
  with allowlist validation (ngram-simple, ngram-mod)
- Pass --spec-type, --spec-ngram-size-n 16, --draft-max 24 flags
  to llama-server when ngram-mod is active
- Default to ngram-mod for non-vision GGUF models server-side
- Silently skip speculative decoding for vision models (unsupported
  in llama.cpp server-context.cpp)

Frontend:
- Add speculative_type to TS API types
- Add speculativeType/loadedSpeculativeType to chat runtime store
  with default value of "ngram-mod"
- Add On/Off toggle in Model settings section (GGUF only, hidden
  for vision models), included in dirty check for Apply/Reset
- Wire speculative_type through model load request and response
- Restore speculative type state on page refresh/reconnect

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: remove server-side speculative decoding override

The backend was overriding speculative_type=None to "ngram-mod" for
non-vision GGUF models, which prevented users from disabling spec
decoding via the UI toggle. The frontend store already defaults to
"ngram-mod", so the backend fallback was redundant and blocked the
explicit "Off" setting.

* fix: use recommended ngram-mod params from llama.cpp docs

Update speculative decoding params to match the recommended values
from llama.cpp docs (docs/speculative.md):
  --spec-ngram-size-n 24 (was 16, docs say small n not recommended)
  --draft-min 48 (was 0)
  --draft-max 64 (was 24, docs note MoEs need long drafts)

Also fix comment: ngram-mod uses ~16 MB (4M entries * 4 bytes),
not 4 MB.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add benchmark table and references to speculative decoding comment

Include speedup numbers from llama.cpp PRs #18471 and #19164 as an
inline comment so future readers understand the expected gains.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-03 13:56:59 -07:00
Daniel Han
2c73ab7871
fix(studio): harden sandbox security for terminal and python tools (#4827)
* fix(studio): harden sandbox security for terminal and python tools

The existing command blocklist used naive str.split() which is trivially
bypassable via quoting, full paths, nested shells, variable expansion,
and cross-tool pivoting through Python os.system/subprocess. Fixes #4818.

Changes:
- Replace str.split() blocklist with shlex.split() + os.path.basename()
  tokenization and regex scanning at shell command boundaries
- Add sanitized subprocess environment (_build_safe_env) that strips
  credentials (HF_TOKEN, WANDB_API_KEY, GH_TOKEN, AWS_*, etc.) and
  restricts PATH to /usr/local/bin:/usr/bin:/bin
- Add PR_SET_NO_NEW_PRIVS via prctl on Linux so sudo/su/pkexec fail
  at the kernel level regardless of how they are invoked
- Add RLIMIT_NPROC (256) and RLIMIT_FSIZE (100MB) to prevent fork
  bombs and disk filling attacks
- Extend AST safety checker to detect os.system(), os.popen(),
  subprocess.run/Popen/call/check_output, os.exec*, os.spawn* calls
  containing blocked commands or dynamic (non-literal) arguments
- Add cross-platform support: cmd.exe on Windows, bash on Unix;
  CREATE_NO_WINDOW flag on Windows, preexec_fn on Unix
- Expand blocklist from 7 to 14 commands: add su, chown, passwd,
  mount, umount, fdisk, kill, killall, pkill
- Apply all layers to both _bash_exec and _python_exec

Zero measurable performance overhead -- shlex parsing and a single
prctl syscall per subprocess fork.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix review findings: exception_catching dead code, false positives, process substitution

- Include exception_catching reasons in _check_code_safety so bare
  except-in-loop timeout evasion is actually blocked (was computed in
  _check_signal_escape_patterns but never read by the caller)
- Remove base.split() inner loop that caused false positives on quoted
  text arguments containing blocked words (e.g. echo "kill this process")
- Add targeted nested shell detection for bash/sh/zsh -c arguments
  instead, which catches bash -c 'sudo whoami' without false positives
- Add <() process substitution to the regex character class so
  diff <(rm -rf /path) is also caught
- Fix error message to say "unsafe patterns" instead of specifically
  mentioning signal manipulation when other categories trigger

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review feedback: regex paths, keyword args, list element scanning

- Regex now matches blocked commands after optional path prefix at shell
  boundaries (catches ls; /usr/bin/sudo and similar)
- Nested shell detection uses os.path.basename so bash -c "/bin/rm" is
  caught
- AST checker now inspects keyword arguments (not just positional) so
  subprocess.run(args="sudo ...", shell=True) is detected
- List elements in subprocess calls are now checked via
  _find_blocked_commands for consistency (catches subprocess.run(["bash",
  "-c", "rm -rf /"]))
- Dynamic argument check uses _is_safe_literal that validates list
  contents are all string literals

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix nested shell scan to only check the script body, not positional args

bash -c 'script' arg0 arg1 -- only tokens[i+1] is the script body;
subsequent tokens are $0, $1 positional parameters passed to the script
and are not executed as shell commands. Scanning all remaining tokens
caused false positives.

* Add subshell parentheses to regex command boundary detection

(sudo whoami) was not caught because ( was not in the regex character
class for shell command boundaries. Add ( to the set alongside ;, &,
|, backtick, newline.

* Address high-priority review findings from 7 parallel reviewers

- Track from-imports of dangerous functions (from os import system,
  from subprocess import run as r, etc.) via shell_exec_aliases dict
  so bare-name calls are detected by the AST checker
- Include the active Python interpreter and virtualenv directories
  in the sanitized PATH so pip, uv, and Studio packages remain
  accessible in the sandbox
- Add Windows-specific blocked commands (rmdir, takeown, icacls,
  runas, powershell, pwsh) only on win32 platform
- Add os.posix_spawn and os.posix_spawnp to _SHELL_EXEC_FUNCS
- Handle tuple literals same as list literals in AST argument
  inspection (both _extract_strings_from_list and _is_safe_literal)

* Fix false positive on check=True kwargs and recursive nested shell scanning

- Only inspect command-carrying keyword arguments (args, command,
  executable, path, file) in the AST checker, not control flags like
  check=True, text=True, capture_output=True which are booleans and
  were incorrectly flagged as non-literal dynamic arguments
- Replace split() in nested shell detection with recursive call to
  _find_blocked_commands so that quoted commands (bash -c '"sudo"
  whoami') and semicolons (bash -c "sudo;ls") within nested shells
  are properly detected through the full shlex + regex pipeline

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Move preexec_fn imports to module level and use find_library for libc

Addresses two Gemini review findings:

1. preexec_fn thread safety: _sandbox_preexec previously imported ctypes
   and resource inside the function body, which runs between fork() and
   exec() in the child process. In a multi-threaded server, this could
   deadlock if the import machinery locks were held by another thread at
   fork time. Now all imports and the libc handle are resolved once at
   module load time, so _sandbox_preexec only calls C-level functions
   (prctl, setrlimit) with no Python import activity.

2. Hardcoded libc.so.6 path: replaced with ctypes.util.find_library("c")
   which works on glibc (libc.so.6), musl (libc.musl-*.so.1), and other
   Linux distributions where libc has a different soname.

* Apply Gemini style suggestions: combined regex, dict.fromkeys, constant hoisting

- Combine per-word regex loop into a single re.findall with alternation
  pattern, avoiding repeated regex compilation and searching
- Replace manual dedup loop with dict.fromkeys for PATH entries
- Hoist _CMD_KWARGS frozenset out of visit_Call to avoid recreating it
  on every AST node visit

* Add cmd /c nested shell detection for Windows parity

The nested shell scan only checked for Unix shells (bash -c, sh -c, etc).
Add cmd /c and cmd.exe /c detection so that Windows nested shell
invocations are also recursively scanned for blocked commands. The token
scan already catches blocked commands at any position, so this is
defense-in-depth for consistency across platforms.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Handle combined shell flags (-lc, -xc) and interleaved flags (--login -c)

The nested shell scan only matched token == "-c" with the immediately
preceding token being a shell name. This missed:
- Combined flags: bash -lc 'rm ...' (-lc ends with c, is a valid
  combined flag meaning -l -c)
- Interleaved flags: bash --login -c 'sudo ...' (--login sits between
  bash and -c)

Now matches any short flag ending in 'c' (e.g. -lc, -xc, -ic) and
walks backwards past intermediate flags to find the shell binary.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix /bin/bash bypass, remove RLIMIT_NPROC, reduce AST false positives

Addresses three high-consensus findings from 20-reviewer pass:

1. /bin/bash -c 'sudo whoami' bypassed nested shell scan because the
   backwards flag-skip logic treated paths starting with / as flags.
   Now only skips tokens starting with - as Unix flags; on Windows
   only skips short /X flags (not /bin/bash style paths). [9/20]

2. RLIMIT_NPROC=256 caused subprocess.run to fail with EAGAIN because
   Linux enforces NPROC per real UID, not per process tree. Removed
   RLIMIT_NPROC entirely; RLIMIT_FSIZE and PR_SET_NO_NEW_PRIVS remain
   as the primary resource and privilege controls. [5/20]

3. AST checker rejected safe dynamic subprocess usage like
   cmd=["git","status"]; subprocess.run(cmd) as shell_escape_dynamic.
   Now only flags dynamic args for shell-string functions (os.system,
   os.popen, subprocess.getoutput, etc.) or when shell=True is
   explicitly set. List-based subprocess calls with shell=False (the
   default) do not pass through a shell and are not flagged. [12/20]

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Handle Windows drive letter paths and .exe extensions in command detection

Gemini review found that Windows absolute paths (C:\Windows\System32\
shutdown.exe) and executable extensions (.exe, .com, .bat, .cmd) were
not handled:

- Token scan now strips .exe/.com/.bat/.cmd extensions before checking
  the blocklist, so sudo.exe matches sudo, shutdown.bat matches shutdown
- Regex pattern now includes optional Windows drive letter prefix
  ([a-zA-Z]:[/\\]) and optional executable extension suffix, so commands
  after shell metacharacters with full Windows paths are also caught

* Handle **kwargs dict expansion, non-literal shell=, and except Exception false positive

Addresses three findings from second 20-reviewer pass:

1. **kwargs dict expansion (9/20): subprocess.run(**{"args": "rm ...",
   "shell": True}) bypassed the AST checker because **kwargs were
   treated as opaque. Now expands literal dict **kwargs to inspect
   their keys, and flags opaque **kwargs (variable dicts) as unsafe.

2. Non-literal shell= values (7/20): shell=variable was treated as
   shell=False (safe). Now any shell= value that is not literally
   False is treated as potentially True (conservative default).

3. except Exception false positive (1/20): except Exception in a loop
   was flagged as timeout evasion, but Exception does not catch
   SystemExit or KeyboardInterrupt which are used for timeout
   enforcement. Narrowed to only flag except BaseException and
   except TimeoutError in loops.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-03 13:33:42 -07:00
Neodon
c027ec192e
fix(studio): ensure first chat tool call starts in session sandbox (#4810)
Fixes #4809

On a new Studio chat, the first tool call could start before the frontend
initializes the thread ID. That meant the first request could go out without
a session_id, so the backend started the tool in the shared sandbox root
instead of the chat's session sandbox.

Frontend:
- Eagerly initialize the thread when switching to a new chat
- Resolve the thread ID once at request time and keep it stable through
  async model-load waits
- Disable ActiveThreadSync during new-chat initialization to prevent
  stale thread IDs from being written back
- Add error handling for thread initialization failures
- Clear activeThreadId on all compare-mode entry paths to prevent
  cross-session leakage
- Fix exitCompare to restore context usage from the saved view
- Coerce falsy thread IDs to undefined for consistent backend/frontend
  fallback behavior
- Use _default as the image sessionId fallback to match the backend

Backend:
- Use ~/studio_sandbox/_default when a request arrives without a session_id
2026-04-03 11:44:22 -07:00
Lee Jackson
a29b4e23fd
studio: reuse HF cached repo casing to prevent duplicate downloads (#4822)
* fix(studio): reuse HF cached repo casing to prevent duplicate downloads

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Move cache case resolution tests to separate PR

Tests for resolve_cached_repo_id_case and get_model_config case resolution
belong in their own PR to keep this change focused on the runtime fix.

* fix(studio): debug-log HF_HUB_CACHE fallback in path_utils

* Fix stale memoization in resolve_cached_repo_id_case

- Check exact-case path before memo to ensure a newly-appeared exact
  match always wins over a previously memoized variant
- Validate memoized entries still exist on disk before returning them
  to prevent stale results when cache dirs are deleted/recreated

* Minor cleanups for cache case resolution

- Use .is_dir() instead of .exists() for exact-case cache check
  (cache entries are always directories)
- Remove redundant fallback in _detect_audio_from_tokenizer since
  get_cache_path already handles case resolution and returns None
  when the model is not cached

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-03 05:48:24 -07:00
Wasim Yousef Said
50dede11cc
Allow non-LLM recipes to run and move Data tab first in executions (#4805)
* feat: allow non-LLM recipes to run without provider block

* feat: reorder execution tabs and add generation-aware data tab empty state

* fix: add accessibility attrs to data tab spinner and use literal ellipsis

* fix(studio): use shared spinner, stub provider, and hide unused LLM metrics

Backend: inject stub model provider for sampler-only recipes so
DataDesigner init does not reject empty provider lists.

Frontend: use shared Spinner component, hide LLM columns metric
and model usage card when recipe has no LLM columns.

* Fix tab reset and terminal auto-scroll regressions for PR #4805

Reset detailTab to "data" when switching between executions so
the Data tab default is applied consistently, not only on first
mount. Also add detailTab to the terminal scroll effect deps so
auto-scroll-to-bottom fires when the user opens the Overview tab
after landing on Data.

* Guard terminal scroll reset to only fire on Overview tab

The previous scroll effect ran on every tab switch, which could
reset the user's manual scroll position if they scrolled up in
the terminal and briefly switched tabs. Now the scroll-to-bottom
and sticky-bottom reset only fires when navigating to the
Overview tab.

* Use None for stub provider api_key instead of literal string

The stub ModelProvider that satisfies the DataDesigner registry
for non-LLM recipes should not carry a fake credential string.
Using None avoids sending an Authorization header if the provider
is ever inadvertently invoked.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-03 05:37:26 -07:00
Wasim Yousef Said
5b7c0615f3
feat(studio): differentiate web search and URL fetch in chat tool UI (#4802)
Differentiate web_search query searches from URL fetches in the Studio chat UI.

Backend (llama_cpp.py):
- Emit "Reading: hostname" for URL fetches and "Searching: query" for query searches in SSE status events
- Only show hostname for valid http/https URLs; schemeless/non-http URLs get "Reading page..." generic fallback
- Strip www. prefix for consistency with the frontend

Frontend (tool-ui-web-search.tsx):
- Tool card shows "Read hostname" / "Reading hostname..." for URL fetches
- Shows "Searched query" / "Searching for query..." for query searches
- Uses new URL() with protocol check; falls back to "Read page" / "Reading page..." for non-http URLs
2026-04-03 05:03:27 -07:00
Daniel Han
8981e6c804
Update test_pr4562_bugfixes.py for simplified install policy (#4817)
- Add TestFetchJsonRetries for JSON retry logic and max_pages
- Update TestSourceCodePatterns for simplified --simple-policy flow
- Add tests for installed prebuilt release reporting
- Add test for CUDA toolkit version-sorted nvcc discovery
- Remove assertions for removed --resolve-install-tag / --resolve-source-build paths
2026-04-03 04:06:14 -07:00
DoubleMathew
ac562bac66
Fix/llama.cppbuilding (#4804)
* Simplify llama.cpp install logic

* print release tag

* Retry failed json decode

* don't pull all ggml releases

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove test file changes from main PR

Test changes for test_pr4562_bugfixes.py will be submitted in a separate PR to keep this PR focused on the install path simplification.

* Fix setup.sh executable bit and direct tag lookup for pinned releases

- Restore setup.sh file mode to 100755 (was accidentally changed to 100644)
- Add direct GitHub API tag lookup in iter_release_payloads_by_time for
  non-latest requested tags (e.g. b7879) instead of relying on paginated
  release scans that may miss older releases beyond the 5-page limit
- Update stale DEFAULT_PUBLISHED_REPO comment to match new value

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix force-compile default ref and remove dead code in setup.ps1

- Change FORCE_COMPILE_DEFAULT_REF from "main" to "master" in all three
  files (install_llama_prebuilt.py, setup.sh, setup.ps1) since
  ggml-org/llama.cpp uses "master" as its default branch, not "main".
  Using "main" would cause git clone --branch to fail when
  UNSLOTH_LLAMA_FORCE_COMPILE=1 with UNSLOTH_LLAMA_TAG=latest.
- Remove dead if ($SkipPrebuiltInstall) block inside the else branch of
  setup.ps1 that could never be reached (the outer elseif already
  handles $SkipPrebuiltInstall=true).
- Maintain setup.sh executable bit (100755).

* Improve iter_release_payloads_by_time error handling for direct tag lookup

When a pinned release tag is not found (HTTP 404), fall through to the
paginated release scan instead of silently returning empty results.
Non-404 errors (network failures, rate limits) are propagated to the
caller so users get actionable error messages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-03 00:34:20 -07:00
Michael Han
c1685b9459
Gemma 4 update.md 2026-04-02 22:54:03 -07:00
Manan Shah
a7e6964117
Fix/gemma4 install script (#4815)
* transformer 5.5.0 has now been released

* fallback for python < 3.10
:
2026-04-02 22:03:35 -07:00
Roland Tannous
6644a771b4
fix: patch PEFT for Gemma4ClippableLinear in loader checkpoint path (fixes export) (#4807)
* fix: patch PEFT for Gemma4ClippableLinear in loader checkpoint path

The same Gemma4ClippableLinear monkey-patch that exists in vision.py
for training is needed in loader.py for loading existing checkpoints
(used by export and inference).

Gemma4ClippableLinear wraps nn.Linear but does not subclass it, so
PEFT's LoRA injection fails with "Target module not supported".
The patch redirects PEFT to target the inner .linear child instead.

Applied only to the vision model PeftModel.from_pretrained path.
Temporary fix until PEFT adds native support (peft#3129).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: wrap ClippableLinear patch in try/finally to always restore

Ensures _create_and_replace is restored even if PeftModel.from_pretrained
raises, preventing leaked global state across subsequent model loads.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-03 04:03:54 +04:00
Roland Tannous
f91ef8f9b0
fix(studio): lazy-import transformers in model_config to fix 5.x version switch (#4806)
* fix(studio): lazy-import AutoConfig in model_config.py to fix transformers 5.x version switch

Move `from transformers import AutoConfig` from module level to inside
load_model_config() where it is actually used.

model_config.py is transitively imported at module load time via:
  core/inference/__init__ → llama_cpp → utils.models → model_config

In inference subprocesses (mp.spawn), this chain runs before
_activate_transformers_version() can prepend .venv_t5/ to sys.path.
The eager import caches transformers 4.57.6 in sys.modules, and the
subsequent sys.path change has no effect — Python always checks
sys.modules before sys.path.

Making the import lazy ensures transformers is not loaded until after
version activation, so the subprocess picks up the correct version.

* fix(studio): also lazy-import extract_model_size_b in llama_cpp.py

Belt-and-suspenders: make the import that originally triggered the
chain lazy as well, so future module-level AutoConfig additions in
utils.models cannot reintroduce the problem.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-03 02:56:01 +04:00
Daniel Han
e553a8ad0b
fix(studio): suppress fatal error when prebuilt manifest is missing (#4799)
When DEFAULT_PUBLISHED_REPO is ggml-org/llama.cpp, the prebuilt
resolver raises PrebuiltFallback because ggml-org releases do not
include a llama-prebuilt-manifest.json asset. This was caught by the
generic Exception handler and printed as "fatal helper error" to
stderr, which triggers NativeCommandError on PowerShell.

Catch PrebuiltFallback separately in the top-level __main__ handler
and exit with EXIT_FALLBACK (code 2) instead of EXIT_ERROR (code 1).
The message is still logged but without the "fatal helper error"
prefix. The shell scripts already handle non-zero exits and fall
back to source builds.

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-04-02 12:18:11 -07:00
Daniel Han
8ffd5826f2 Gemma-4 2026-04-02 11:59:37 -07:00
Daniel Han
934478ae31
fix(studio): revert llama.cpp default tag to latest (#4797)
* fix(studio): revert llama.cpp default tag to latest

The latest ggml-org/llama.cpp release (b8637) now includes Gemma 4
support. Revert the temporary "b8637" pin from #4796 to "latest" so
the prebuilt resolver always picks the newest release automatically
without needing manual tag bumps.

* docs: add comment explaining latest vs master for llama.cpp tag

Document in all three files why "latest" is preferred over "master"
and when "master" should be used as a temporary override.

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-04-02 11:52:37 -07:00
Daniel Han
401621618b
fix(studio): don't set trust_remote_code for Gemma 4 training (#4795)
Gemma 4 is a native transformers 5.5 model and does not need
trust_remote_code=True. The auto-enable logic (added for NemotronH)
was catching all transformers 5.x models, including Gemma 4.

When trust_remote_code=True, unsloth_compile_transformers() returns
early without running the compiler. This disables the fused cross
entropy patch, causing logged training loss to be inflated by the
gradient_accumulation_steps factor.

Exclude models matching "gemma-4" or "gemma4" from the auto-enable
so the compiler runs and applies fused cross entropy correctly.
2026-04-02 11:44:26 -07:00
Daniel Han
8d1712b4ea
fix(studio): pin llama.cpp to b8637 release (Gemma 4 support) (#4796)
ggml-org/llama.cpp b8637 includes Gemma 4 support (ggml-org/llama.cpp#21309).
Revert the temporary "master" default back to a pinned release tag.

This eliminates the HTTP 422 errors from the prebuilt resolver (which
could not find a release matching "master"), avoids unnecessary source
builds, and restores prebuilt binary downloads on all platforms.

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-04-02 11:43:53 -07:00
DoubleMathew
7ae9b7f45f
fix windows llama.cpp compile from source issue (#4793)
* fix windows llama.cpp compile from source issue

* undo local repo usage

* fix llama.cpp install

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix windows

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: route resolve-source-build call through Invoke-LlamaHelper

The --resolve-source-build call at the source-build resolution path
was still calling install_llama_prebuilt.py directly instead of going
through Invoke-LlamaHelper. On PS7+ with ErrorActionPreference=Stop,
stderr from the 422 response (when tag is "master") would trigger a
terminating NativeCommandError and crash setup.

* fix: suppress stderr error records from Invoke-LlamaHelper

ErrorActionPreference=Continue prevents termination but PowerShell
still displays stderr lines as visible ErrorRecord objects. Capture
all output via 2>&1 and split stdout from stderr manually so that
stderr lines never appear on the console. When StderrPath is given
the stderr content is written to that file for diagnostics.

* fix: always rebuild llama.cpp on Windows when tag is master

When the requested llama.cpp tag is "master" (a moving target), skip
the "already built" early exit so the build path runs and syncs to
the latest commit. Without this, existing llama-server binaries from
an older build (e.g. b8635 which lacks Gemma 4 support) are reused
and model loading fails.

Pinned tags (e.g. b8635) still skip the rebuild when the binary
already exists, since the tag is immutable.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-04-02 11:43:46 -07:00
Daniel Han
7023e2a4ff
fix(studio): prioritize curated defaults over HF download ranking in Recommended (#4792)
The model list merge order was `top_gguf + top_hub + static_models`,
which meant the HF download-ranked models always came first. New models
like Gemma 4 have low download counts and were not in the HF top-40,
so they got buried after 80 other models despite being at the top of
the curated static defaults in defaults.py.

Flip the merge to `static_models + top_gguf + top_hub` so editorial
picks (new model launches, promoted models) always appear first in the
Recommended section, with HF popularity backfilling after.

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-04-02 10:46:53 -07:00
Roland Tannous
0446d46689
fixed name (#4791) 2026-04-02 21:04:42 +04:00
Daniel Han
1ce83c40aa
fix(studio): build llama.cpp from master instead of latest release tag (#4790)
The latest ggml-org/llama.cpp release (b8635) does not include Gemma 4
support (ggml-org/llama.cpp#21309 merged after the release was cut).
This causes `llama-server` to fail with "unknown model architecture:
gemma4" when loading Gemma 4 GGUFs.

Temporarily default _DEFAULT_LLAMA_TAG to "master" so all new installs
build from the llama.cpp master branch which includes Gemma 4 support.
Once a new upstream release is cut with Gemma 4, this can be reverted
back to "latest".

Changes:
- setup.sh: add _DEFAULT_LLAMA_TAG="master" maintainer default
- setup.ps1: add $DefaultLlamaTag="master" maintainer default
- install_llama_prebuilt.py: change DEFAULT_LLAMA_TAG fallback to "master"

Users can still override via UNSLOTH_LLAMA_TAG env var.
2026-04-02 09:45:56 -07:00
Daniel Han
2af53bf9a6
Pin transformers and huggingface-hub in main Studio venv (#4788)
Revert the >= loosening from f9c4b08 back to exact pins.
Using transformers>=4.57.6 allows pip to install 5.x into the main
Studio venv, which breaks huggingface_hub imports
(is_offline_mode removed in newer hub versions).

The main venv must stay on transformers==4.57.6 and
huggingface-hub==0.36.2. The 5.x version lives only in .venv_t5/
and is dynamically switched via sys.path at runtime.
2026-04-02 09:21:30 -07:00
Daniel Han
a241c58d84
Use transformers v5.5-release branch and pin to 5.5.0 (#4786)
The v5.5-release branch now exists on huggingface/transformers.
Use transformers==5.5.0 for all install paths and
git+transformers.git@v5.5-release for the MLX installer.

Also bumps huggingface_hub from 1.7.1 to 1.8.0 in setup.sh and
setup.ps1 to stay consistent.
2026-04-02 09:10:02 -07:00
Daniel Han
a353557249
Force llama.cpp to always use mainline ggml-org (#4785)
Hardcode the release repo to ggml-org/llama.cpp and remove the
UNSLOTH_LLAMA_RELEASE_REPO and UNSLOTH_LLAMA_SOURCE env var overrides
so that all users always build/download from mainline llama.cpp.
2026-04-02 09:03:00 -07:00
Daniel Han
f1c3b9caa9
Pin Gemma-4 transformers requirement to 5.5.0 stable (#4784)
Gemma-4 support landed in transformers main
(huggingface/transformers#45192). Update the version pin from
5.5.0.dev0 to 5.5.0 across loader, Studio version switcher,
and the MLX installer. Also fix install_gemma4_mlx.sh which
referenced a non-existent v5.5-release branch -- pin it to
the correct commit (91b1ab1) instead.
2026-04-02 08:59:21 -07:00
Daniel Han
4f9986ecb9
fix(studio): improve tool-calling re-prompt for small models (#4783)
Small GGUF models (<9B) frequently generate full code or lengthy
explanations instead of calling tools, bypassing the existing
plan-without-action re-prompt mechanism. Three issues:

1. _REPROMPT_MAX_CHARS=500 was too low -- models that output full
   HTML/code responses (often 1000+ chars) never triggered the
   re-prompt at all, since it only fires on short responses.

2. _MAX_REPROMPTS=1 gave the model only one chance to comply.
   Small models often need 2-3 nudges before switching from
   text generation to tool calling.

3. The re-prompt text ("Please use the available tools...") was
   too polite for small models to follow reliably.

4. Tool-calling detection missed chat templates using Jinja
   whitespace-trimming syntax ({%- if tools -%}) since only
   ({%- if tools %}) and ({% if tools %}) were checked.

Changes:
- Raise _REPROMPT_MAX_CHARS from 500 to 2000 so longer responses
  (code blocks, multi-paragraph plans) still trigger re-prompts
- Raise _MAX_REPROMPTS from 1 to 3 for more retry budget
- Use direct, imperative re-prompt language that small models
  follow more reliably ("STOP. You MUST call a tool NOW.")
- Strengthen the system prompt tool nudge to explicitly forbid
  outputting code blocks (redirect to the python tool instead)
- Add Jinja whitespace-trimmed variants to the tool_markers
  list so all template styles are detected correctly
2026-04-02 08:59:02 -07:00
Daniel Han
f9c4b08726
UI Changes (#4782)
* UI Changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unrelated test file

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-02 08:05:55 -07:00
Roland Tannous
3b613eb1e8
ui improvement (#4781)
* ui

* ui

* ui
2026-04-02 07:57:47 -07:00
Daniel Han
c8d311a053
feat(studio): display images from Python tool execution in chat UI (#4778)
* feat(studio): display images from Python tool execution in chat UI

When the model calls the Python tool to create a matplotlib plot or
other image file, the image now displays inline in the chat output
instead of being invisible to the user.

Backend:
- Detect new image files (png/jpg/gif/webp/bmp) after Python subprocess
  completes by diffing os.listdir before/after execution
- Append __IMAGES__ sentinel to tool result for frontend consumption
- Strip sentinel before injecting result into LLM context (role: tool)
  so the model never sees file paths
- Add GET /sandbox/{session_id}/{filename} endpoint with JWT auth
  (header or query param), path traversal protection, extension
  allowlist, realpath containment check, and nosniff header

Frontend:
- Parse __IMAGES__ sentinel in tool_end SSE events, create structured
  result with text/images/sessionId
- Render <img> tags in Python tool UI pointing at the sandbox endpoint

Also fixes a bug where SyntaxError in user code was misreported as
"unsafe code detected" instead of showing the actual Python traceback.
The _check_code_safety function now lets SyntaxError pass through to
the subprocess for a proper error message.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix(studio): improve SVG detection and strip XML preamble

Handle <?xml ...?> declarations before <svg> tags in code fences,
strip XML declaration from SVGs before data URI rendering, and
update the sloth suggestion prompt to request showing code.

* fix(studio): persist parentId so retries survive reload

The append() handler was destructuring only { message } from
ExportedMessageRepositoryItem and discarding parentId. When loading
a saved thread, load() used ExportedMessageRepository.fromArray()
which chains all messages sequentially, flattening retry branches
into a linear list.

Now append() writes parentId to the MessageRecord, and load()
reconstructs the tree when parentIds are present. Old threads
without parentId fall back to the existing fromArray() behavior.

* fix(studio): address review findings for image display and retry persistence

Image detection:
- Use mtime comparison instead of filename-only diff so overwritten
  files (e.g. plt.savefig("chart.png") called twice) are detected

Sentinel parsing:
- Use rsplit/lastIndexOf instead of split/indexOf so user code that
  prints __IMAGES__: does not collide with the backend sentinel

Mixed legacy/new threads:
- For old messages without a stored parentId, infer sequential parent
  from the previous message instead of null, preventing multiple roots

Sandbox endpoint:
- Change Cache-Control from "public, max-age=3600" to "private,
  no-store" since these are authenticated responses

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-02 05:08:16 -07:00
Lee Jackson
5a5f1a4f34
studio: fix chat font changes leaking outside chat page (#4775)
* fix(frontend): scope sans font overrides to chat thread only

* fix(frontend): use font-sans fallback for heading stack and simplify chat font rules

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-02 05:04:23 -07:00
DoubleMathew
1ce8a8e7cd
Feat/custom llama prebuilt (#4771)
* update logic to incorporate custom prebuilt installs

* bug fixes

* update for review comments

* fix tags

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Separate test changes from main PR

Move test file changes out of this PR to keep the diff focused on
the install_llama_prebuilt.py and setup script changes. Test updates
will be submitted in a follow-up PR.

* Fix branch ref normalization and harden JSON parsing

- Add checkout_friendly_ref() to strip refs/heads/ prefix from branch
  refs before emitting them in SourceBuildPlan. git clone --branch does
  not accept fully qualified refs like refs/heads/main.
- Apply normalization in source_build_plan_for_release() and the
  direct-ref fallback in resolve_source_build_plan().
- Allow validated_checksums_for_bundle() to accept releases that carry
  only an exact-commit source archive without the legacy upstream-tag
  source tarball.
- Add 2>/dev/null || true guards to all inline python -c JSON parsing
  in setup.sh so a malformed payload does not abort the script under
  set -e.

* Fix Windows CUDA asset ordering and tag ref normalization

- Reorder windows_cuda_upstream_asset_names to prefer the main binary
  archive (llama-{tag}-bin-win-cuda-*) over the cudart sidecar archive
  (cudart-llama-bin-win-cuda-*). The cudart ZIP only contains CUDA
  runtime DLLs, not llama-server or llama-quantize binaries.
- Extend checkout_friendly_ref to also strip refs/tags/ prefix for tag
  refs, matching the refs/heads/ handling for branch refs.

* Simplify JSON parsing consistency in setup.sh

Use json.load(sys.stdin) consistently for all inline JSON parsing
in setup.sh, instead of the more complex json.loads(raw) pattern
on the install-tag resolution path. The 2>/dev/null || true guard
already handles empty/malformed input gracefully.

* Fix source build plan fallback for commit ref kind in PR #4771

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <daniel@unsloth.ai>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-02 04:52:26 -07:00
Daniel Han
b20efc370a
Add regression tests for custom llama prebuilt installer (#4772)
Expand test coverage for install_llama_prebuilt.py:
- Add tests for source build plan resolution with custom repos
- Add tests for branch/commit/PR ref matching and normalization
- Add tests for manifest checksum validation
- Add tests for Windows CUDA upstream asset name patterns
- Update capsys checks to capture stderr after log() redirect
2026-04-02 04:45:09 -07:00
Michael Han
e2fd946fe1
Add files via upload 2026-04-02 03:00:10 -07:00
Michael Han
31d6aeb197
Unsloth new logo 2026-04-02 02:58:21 -07:00
Daniel Han
e4d1499230
fix(studio): prevent small models from stalling on tool-calling tasks (#4769)
* fix(studio): prevent small models from stalling on tool-calling tasks

Small GGUF models (< 9B params) in "Think, Search, Code" mode would
often describe what they planned to do ("Let me create this dashboard")
and then stop generating without ever calling a tool.

Three changes:

1. Simplify web_tips for small models: remove the "fetch its full content
   by calling web_search with the url parameter" guidance for models < 9B.
   This multi-step instruction causes small models to plan elaborate
   search-then-fetch-then-code sequences they cannot reliably execute.

2. Add "always call tools directly" imperative to the system prompt nudge
   so models act immediately instead of narrating their intentions.

3. Add plan-without-action re-prompt in the agentic loop: when the model
   emits planning text (matching patterns like "let me", "I'll", etc.)
   without calling any tool, inject a nudge asking it to call the tool
   and continue the loop. Capped at 2 re-prompts per request.

Benchmarked with Qwen3.5-4B-GGUF (N=5 trials per variant):
- Baseline: 40% of requests had any tool call
- Combined fix: 100% of requests had at least one tool call

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-02 02:11:07 -07:00
Daniel Han
dc0729aadf
Add regression test for shell injection fix in GGML conversion (#4773)
AST-based test ensures subprocess.Popen calls in GGML conversion functions
use argv lists instead of shell=True. Companion to PR #4768.
2026-04-02 00:10:47 -07:00
mateeaaaaaaa
752cef3299
fix(security): shell injection in GGML export conversion (#4768)
* Fix shell injection in GGML conversion paths

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove test file from security fix PR

Move test_save_shell_injection.py to a separate PR to keep this PR focused on the security fix itself.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-02 00:10:43 -07:00
AdamPlatin123
ba8081fc96
fix(chat): correct loading text for cached models during inference (#4764)
Distinguish between actual network downloads and GPU memory loading for cached LoRA adapters in Studio chat.

- Add isCachedLora detection for local LoRA adapter paths using comprehensive cross-platform regex (Unix, Windows, UNC, WSL, tilde)
- Thread isCachedLora through loadInfo to chat-page inline status for proper 3-way distinction (cached / local LoRA / downloading)
- Skip download progress polling for cached LoRA models (no useless /download-progress API calls)
- Fix initial toast state to use isCachedLoad consistently instead of only checking isDownloaded
- Fix cancelLoading toast to not mention background downloads for cached/local loads
- Keep download-specific text ("Downloading model..." / "Download complete") inside the download-only polling block
2026-04-01 20:24:48 -07:00
Lee Jackson
ca4ea8b9fb
studio: align composer/code, unify fonts, and remove tool collapse jitter (#4763)
- Add min-w-0 guards to thread/message/markdown containers to prevent
  content overflow past the composer width
- Unify chat typography from Hellix/Space Grotesk to the sans stack,
  keeping monospace for code blocks and inline code
- Restructure desktop navbar right-side controls with shrink-0 wrappers
  for consistent spacing across HoverCard roots
- Soften tool-call label styling (font-medium + text-foreground/85
  instead of bold)
- Add responsive code block sizing via @container queries
- Add horizontal scrolling for wide code blocks within the thread column
- Scope list-item code block alignment CSS to .aui-thread-root
- Preserve useScrollLock in tool-fallback and tool-group collapsibles
- Fall back to bg-background on ViewportFooter when hideComposer is true
- Widen inline code monospace selector to cover th, blockquote, and
  heading elements
- Remove unused @fontsource-variable/space-grotesk import
2026-04-01 19:57:10 -07:00
DoubleMathew
71b934ef9d
Fix custom llama.cpp source builds and macos metal source builds (#4762)
* Fix script unbound variable error

* remove stale test script, add llama.cpp metal source builds, update tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix Metal precedence, test sync, and add behavioral tests

- Move macOS arm64 Metal check before CUDA/ROCm in GPU backend
  decision chain so Metal is not bypassed when nvcc is in PATH
- Remove RPATH flags from CPU fallback CMAKE_ARGS (only needed
  for Metal library linking)
- Update test_llama_pr_force_and_source.py to match _CLONE_ARGS
  rename from _CLONE_BRANCH_ARGS in setup.sh
- Add confirm_install_tree guard test for
  existing_install_matches_choice
- Add TestMacOSMetalBuildLogic bash subprocess tests verifying
  Metal flag selection, nvcc precedence, and CPU fallback behavior

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix Metal CPU fallback to also cover cmake build failures and update tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. _GPU_BACKEND_FRAGMENT synced -- removed dead CPU_FALLBACK_CMAKE_ARGS= init (6/8)
2. RPATH assertion replaced -- new test_macos_arm64_cpu_fallback_args_exclude_rpath checks the actual runtime CPU_FALLBACK_CMAKE_ARGS output for @loader_path and -DCMAKE_BUILD_WITH_INSTALL_RPATH=ON (6/8)
3. _TRY_METAL_CPU_FALLBACK=false reset after both configure-failure and build-failure fallback branches in setup.sh (4/8)
4. macOS test now removes libmtmd.0.dylib instead of the platform-agnostic convert_hf_to_gguf.py (3/8)
5. Empty-string tag test added -- test_empty_tag_omits_branch_flag for resolved_tag= (2/8)
6. RPATH checks on cmake call logs -- both fallback tests now assert @loader_path and -DCMAKE_BUILD_WITH_INSTALL_RPATH=ON are absent from CPU fallback cmake calls, plus baseline flag preservation (multiple)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* tests clean up

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-01 14:06:39 -05:00
Daniel Han
39fe23ded8
Tests for architecture-aware KV cache estimation (#4760)
* test: add 66 tests for architecture-aware KV cache estimation

Covers all 5 estimation paths (MLA, Hybrid Mamba, Sliding Window,
Standard GQA, Legacy), GGUF parser for 8 new metadata fields,
_can_estimate_kv gate conditions, quantization scaling, edge cases,
path priority ordering, and lifecycle (init/unload/reparse).

Zero external dependencies beyond pytest. No GPU or network required.
Cross-platform (Linux, macOS, Windows, WSL).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-01 06:13:37 -07:00
Daniel Han
653eb3819a
fix(studio): allow context length slider to reach model's native limit (#4746)
* fix(studio): allow context length slider to reach model's native limit

The context length slider was hard-capped to the VRAM-estimated maximum,
preventing users from requesting higher context even though the backend
already handles it safely (multi-GPU selection, --fit fallback). Expose
the model's native context length from GGUF metadata as a separate API
field and use it as the slider ceiling instead. Add an amber warning
when the selected context exceeds the estimated VRAM capacity.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Raise VRAM budget to 90% and add native_context_length tests

Increase the GPU memory utilization threshold from 70% to 90% across
_select_gpus and _fit_context_to_vram, allowing longer context lengths
before VRAM capping kicks in.

Add 33 tests for the native_context_length feature covering the backend
property, context value separation invariants, Pydantic models, route
completeness, edge cases, and cross-platform binary I/O.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-01 06:12:52 -07:00
Daniel Han
d22b2a18f9
fix: add tokenizers to no-torch deps and TORCH_CONSTRAINT for arm64 macOS py313+ (#4748)
* fix: add tokenizers to no-torch runtime deps and add TORCH_CONSTRAINT for arm64 macOS py313+

Two installer fixes:

1. Add `tokenizers` to `no-torch-runtime.txt` before `transformers`.
   Without it, `from transformers import AutoConfig` crashes on startup
   because `--no-deps` skips transitive dependencies.

2. Add `TORCH_CONSTRAINT` variable to `install.sh`. On arm64 macOS with
   Python 3.13+, tighten the torch requirement to `>=2.6` since torch
   <2.6 has no cp313 arm64 wheels. The variable replaces the previously
   hard-coded constraint in the uv pip install line.

Includes 66 tests (42 pytest + 24 bash) covering:
- Structural checks on install.sh, install.ps1, no-torch-runtime.txt
- Shell snippet tests with mocked python for 13 platform/version combos
- Mock uv integration verifying correct constraint string
- E2E venv tests on Python 3.12 and 3.13 confirming AutoConfig works
- Negative control proving AutoConfig fails without tokenizers
- Full no-torch sandbox regression guards (safetensors, huggingface_hub)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix incomplete no-torch manifest and align E2E tests with real --no-deps path

- Add missing transitive deps to no-torch-runtime.txt that are required
  under --no-deps: regex, typing_extensions, filelock, httpx, httpcore,
  certifi, idna, anyio, sniffio, h11. Without these, `from transformers
  import AutoConfig` still fails after install.sh --no-torch.

- Change all E2E tests to use --no-deps (matching what install.sh does)
  instead of normal dep resolution. Previous tests passed even with an
  incomplete manifest because uv backfilled transitive deps.

- Rewrite negative control to derive from the real no-torch-runtime.txt
  with tokenizers stripped, proving the specific fix matters.

- Replace GNU-only sed -i with heredoc in shell test for macOS compat.

- Remove unused os/sys imports from Python test file.

- Quote SKIP_TORCH and mock uv paths in bash -c strings.

* Assert install succeeds before checking import results in E2E tests

Address review feedback: test_torch_not_importable and
test_tokenizers_directly_importable in Group 3 now assert that
uv pip install returns 0 before checking import behavior. This
prevents false positives when the install itself fails silently.

* Assert install succeeds in negative control and tighten error check

- Add missing install-success assertion in test_negative_control_no_tokenizers
  to prevent false positives from network/install failures.

- Tighten error message check to look for "tokenizers" in stderr or
  ModuleNotFoundError, rather than the generic "No module" substring
  which could match unrelated import failures.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-01 06:12:17 -07:00
Daniel Han
76cb48be0b
fix: studio web search SSL failures and empty page content (#4754)
- Fix SSL handshake failures (SSLV3_ALERT_HANDSHAKE_FAILURE, CERTIFICATE_VERIFY_FAILED) when fetching HTTPS pages by introducing _PinnedHTTPSConnection that separates TCP connect (to pinned IP) from TLS handshake (with real hostname for SNI/cert verification)
- Fix SSRF DNS-rebinding vulnerability: previous impl swapped conn.host before connect(), causing fresh DNS resolution; new subclass keeps TCP pinned to validated IP
- Fix SPA/JS-rendered doc sites returning empty content by rotating real browser User-Agents (Chrome/Firefox/Safari)
- Strip nav/footer from HTML-to-Markdown output so article content is not buried under navigation chrome
- Increase raw fetch cap from 64KB to 512KB so SSR article content is reached on GitBook/Docusaurus/Next.js pages
- Fix IPv6 address bracketing in URL netloc construction
- Hoist SSL context, handler classes, and stdlib imports to module level (created once, not per-call)
- Use consistent UA across redirect hops to avoid breaking session-aware bot detection
2026-04-01 06:12:02 -07:00
Daniel Han
f84c2d03d3
Add installer test coverage for prebuilt llama.cpp changes (#4756)
Split out from #4741 to keep the main PR focused on installer logic.

- New test_install_llama_prebuilt_logic.py: tests for resolve logic,
  fallback behavior, env_int, busy/lock handling
- New test_validate_llama_prebuilt.py: validator tests for staged
  release_tag/upstream_tag handling
- New test_llama_pr_force_and_source.py: tests for PR_FORCE and
  LLAMA_SOURCE maintainer defaults
- Updated test_selection_logic.py: expanded selection/fallback coverage
- Updated test_pr4562_bugfixes.py: updated bugfix tests for new logic
- Updated smoke_test_llama_prebuilt.py: minor update
2026-04-01 06:06:29 -07:00
DoubleMathew
428efc7d95
Resolve latest usable published llama.cpp release instead of fixed pinned tag (#4741)
Replaces the fixed prebuilt llama.cpp tag with dynamic published-release
resolution, adds bounded fallback across older published releases, and
introduces maintainer-editable defaults for PR/source overrides.

Changes:
- Resolve latest from the latest usable published release in unslothai/llama.cpp
- Use the selected release upstream_tag as the authoritative llama.cpp version
- Prefer Unsloth-published platform assets when available
- Fall back to same-tag upstream ggml-org/llama.cpp assets where allowed
- Keep Linux CUDA anchored to Unsloth-published CUDA bundles only
- Add bounded fallback across older Unsloth published releases
- Add separate busy/in-use install handling (exit code 3)
- Skip reinstall when the installed bundle already matches the selected candidate
- Add maintainer-editable _DEFAULT_LLAMA_PR_FORCE and _DEFAULT_LLAMA_SOURCE
- Harden env parsing so malformed installer env vars do not crash import-time fallback logic
- Honor UNSLOTH_LLAMA_RELEASE_TAG in all resolve steps
- Always sync git remote URL in existing-checkout path
2026-04-01 06:06:17 -07:00
Daniel Han
5d7d882ce6
Fix save_pretrained_merged for full-finetuned models (#4755)
* Fix save_pretrained_merged for full-finetuned models

save_pretrained_merged and push_to_hub_merged silently do nothing when
the model is not a PeftModel (i.e. full finetuning without LoRA).
merge_and_overwrite_lora returns None immediately for non-PeftModel,
and unsloth_generic_save does not check the return value.

Add a non-PeftModel branch in unsloth_generic_save that falls back to
model.save_pretrained / model.push_to_hub. When save_method contains
"16bit", cast weights to bfloat16 (or float16) via a state_dict copy
to honor the user's intent without mutating the live model.

The existing PeftModel (LoRA) code path is unchanged.

* Forward create_pr and revision to tokenizer.push_to_hub

The tokenizer push_to_hub call was missing create_pr and revision,
which could cause the tokenizer to push to the wrong branch or
bypass PR creation when the model push uses them.

* Honor merged_16bit dtype contract for full-finetuned models

Cast state_dict to bfloat16/float16 when save_method contains "16bit"
to match the documented behavior of save_pretrained_merged. Also pass
state_dict and save kwargs consistently to both save_pretrained and
push_to_hub paths.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review feedback for PR #4755

- Simplify PeftModel isinstance check (PeftModelForCausalLM inherits
  from PeftModel)
- Add is_main_process guard for distributed training
- Forward variant to save_pretrained
- Set tokenizer padding_side to "left" before saving (matches other
  save paths)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-01 06:05:37 -07:00
Daniel Han
77e1a9edc9
feat(studio): architecture-aware KV cache VRAM estimation (#4757)
* feat(studio): architecture-aware KV cache VRAM estimation

Replace the single legacy formula (2 * n_kv_heads * head_dim * n_layers
* n_ctx * bpe) with 5-path estimation that reads 8 additional GGUF
metadata fields:

  1. MLA (DeepSeek-V2/V3, GLM-4.7, GLM-5, Kimi-K2.5) -- K-only cache
     using compressed KV latent + RoPE; no separate V allocation
  2. Hybrid Mamba (Qwen3.5-27B, Qwen3.5-35B-A3B) -- only attention
     layers (1 in N) carry KV; Mamba layers have none
  3. Sliding Window (Gemma-3, gpt-oss) -- SWA layers cache
     min(ctx, window) tokens instead of the full context
  4. Standard GQA -- uses explicit key_length/value_length from GGUF
     instead of embed // n_heads (which is wrong for many models)
  5. Legacy fallback -- identical to old formula for old GGUFs

New GGUF fields parsed: attention.key_length, attention.value_length,
attention.sliding_window, full_attention_interval,
attention.kv_lora_rank, attention.key_length_mla, ssm.inner_size,
ssm.state_size.

Validated against 9 real GGUF files (72/72 field checks pass).
The legacy formula was off by +682% for Gemma-3 and -81% for
DeepSeek-V3.1.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix MLA fallback and SWA global/local ratio heuristic

Two fixes based on review findings:

1. MLA fallback now uses key_length_mla from GGUF metadata instead of
   hardcoded rope_dim=64. Falls back to 64 only when key_length_mla is
   absent. This ensures correct estimates for MLA variants that use
   rope dimensions other than 64.

2. SWA global/local layer ratio changed from 50/50 to 1/4 (25% global,
   75% SWA). Most sliding window architectures have predominantly local
   layers (Gemma-3 uses ~17% global, gpt-oss uses ~50%). The 1/4
   heuristic is closer to the common case and still a large improvement
   over the legacy formula which ignores SWA entirely.

* Tighten _can_estimate_kv gate and treat sliding_window=0 as disabled

Two additional fixes from review round 1 (5/8 and 4/8 reviewer consensus):

1. _can_estimate_kv now requires BOTH key_length AND value_length for
   the explicit-dims path. Previously key_length alone was enough,
   which could cause silent fallthrough to the legacy formula with
   fabricated defaults (n_kv=1, head_dim=128) when value_length was
   absent from the GGUF.

2. SWA path now requires sliding_window > 0. Some GGUFs use 0 as a
   disabled sentinel. Without this guard, min(ctx, 0) would zero out
   all SWA layer contributions, severely underestimating KV cache.

* Fix MLA n_kv safety and use ceiling division for hybrid path

Addresses Gemini Code Assist review findings:

1. MLA path now uses n_kv_mla = n_kv_heads or 1 (not n_heads). This
   prevents a 128x overestimate for DeepSeek-V3 if head_count_kv is
   absent from the GGUF (n_heads=128 would have been used instead).

2. Hybrid path now uses ceiling division for attention layer count.
   This prevents undercounting by 1 when n_layers is not perfectly
   divisible by full_attention_interval.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-01 06:04:12 -07:00
Daniel Han
3f3757b143
Fix forward compatibility with transformers 5.x (#4752)
* Fix forward compatibility with transformers 5.x

Tested on transformers 4.57.6, 5.3.0, and 5.4.0. All changes are no-ops
on transformers 4.x.

1. Skip exec-based config patching for transformers >= 5.0

   Config classes in v5 use @strict, @auto_docstring, and interval()
   which break exec(inspect.getsource(...)). Those configs already use
   rope_parameters (the v5 replacement for rope_scaling).

2. Slice position_ids to last token in fast_forward_inference

   Transformers 5.x generate() accumulates position_ids as
   [batch, full_seq_len] across decode steps instead of [batch, 1].
   cos[position_ids] then produces the wrong shape for rotary
   embeddings. Fixed in llama, qwen3, falcon_h1, gemma2, cohere,
   granite. No-op on 4.x since position_ids is already [batch, 1].

3. Handle @strict config kwargs for sequence classification

   num_labels, max_position_embeddings, id2label etc. are set on the
   config object and passed via config= instead of as kwargs.
   AutoModelForSequenceClassification routing added to FastModel loader.

4. Exclude modernbert from flex_attention

   ModernBERT with flex_attention hits CUDA illegal memory access in
   create_block_mask. Falls back to eager attention safely.

5. Propagate token_type_ids and mm_token_type_ids through GRPO VLM path

   Gemma3 Vision requires token_type_ids during training. Qwen3VL
   requires mm_token_type_ids for M-RoPE. Extract from inputs in
   compute_loss, pass to grpo_accumulated_loss, and extend
   mm_token_type_ids for completion tokens in
   _generate_and_score_completions.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add try/except safety net around config exec for pre-release transformers versions

* Pop config-level kwargs in seqclass path and use except Exception

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-01 06:04:03 -07:00
Roland Tannous
41df4ec437
feat(studio): strip org prefix in model search to surface unsloth variants (#4749)
When searching for a specific publisher model (e.g. `openai/gpt-oss-20b`), the
unsloth search used the full `openai/gpt-oss-20b` string with `author=unsloth`,
which returned zero results because no unsloth model contains the publisher
prefix in its name. Users never discovered unsloth variants.

This PR strips the org prefix for publisher-qualified queries so unsloth variants
surface, then pins the original publisher model after a small batch of unsloth
results. Plain queries (no slash) and unsloth-prefixed queries are unchanged.

- Strict regex (`/^([^/\s]+)\/([^/\s]+)$/`) only triggers on valid `owner/repo`
  identifiers; incomplete typeahead, multi-slash, and URL-like inputs are rejected
- Queries for `unsloth/...` models (case-insensitive) keep the full 20-result
  prefetch and secondary sort
- Pinned model lookup fires in parallel with the unsloth prefetch
- Canonical-name dedup prevents duplicates when HF normalizes casing
- Publisher detection extracted into a single `useMemo` block
2026-04-01 04:37:28 -07:00
Leo Borcherding
63ad6dbd6d
Fix OOM model styling in Studio model selectors (#4738)
Replace strikethrough + opacity-50 OOM styling with gray text and red pill badge across all Studio model selectors (chat, training, onboarding).

- Use gray-500/gray-400 for OOM model names (better contrast than strikethrough)
- Red pill badge for OOM indicator with light/dark mode support
- Scope GGUF gray override to quant name only so downloaded/recommended labels keep colors
- Add !important on TIGHT/OOM badges to resist ComboboxItem hover overrides
2026-04-01 02:06:49 -07:00
Daniel Han
6c0826a9e4
Fix Windows local GGUF model loading crash (#4730)
* Fix Windows "Non-relative patterns are unsupported" when loading local GGUF models

When a user loads a GGUF model from a local Windows path (e.g.
C:\Users\danie\.lmstudio\models\unsloth\functiongemma-270m-it-GGUF),
the model identifier contains backslashes and a drive letter. Both
load_model_defaults() and _has_specific_yaml() constructed a YAML
filename from the full absolute path and passed it to Path.rglob(),
which rejects non-relative patterns on Windows.

Fixed by detecting Windows-style paths (drive letters, UNC paths,
backslashes) in addition to Unix-style paths, and using only the
directory basename for the YAML filename lookup when the identifier
is a local filesystem path.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor: reuse is_local_path helper, fix case-sensitive suffix lookup

- Replace inline local-path detection in model_config.py and
  inference_config.py with the existing is_local_path() from utils.paths,
  which already handles Unix, Windows drive-letter, UNC, and backslash paths
- Fix case-sensitive suffix lookup in load_model_defaults(): the
  _REVERSE_MODEL_MAPPING is lowercase-keyed, so suffix comparisons must use
  .lower() to match paths like /path/to/Spark-TTS-0.5B/LLM

* Fix WSL path parsing and _has_specific_yaml suffix lookup

- Use normalize_path() before Path() operations so backslash Windows
  paths (e.g. C:\Users\...\model) are correctly split on POSIX/WSL hosts
  where pathlib treats backslashes as literal characters
- Add suffix-based (2-component and 1-component) lookup to
  _has_specific_yaml() so it matches the same resolution rules as
  load_model_defaults(), fixing wrong inference params for local
  suffix-mapped models like Spark-TTS-0.5B/LLM

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-01 01:38:09 -07:00
Datta Nimmaturi
256c6e4884
Refactor flex attn to prefer flash if possible (#4734)
Replaces prefer_flex_attn_if_supported (which only returned flex_attention or None) with determine_attention_implementation, a centralized hierarchy: FA2 > Flex > SDPA > Eager.

Changes:
- New determine_attention_implementation function in _utils.py with clear priority chain
- _set_attn_impl helper to stamp config consistently
- _FLEX_EXCLUDED_MODELS / _FLEX_EXCLUDED_PREFIXES for model-specific exclusions
- Gemma3N explicit eager override in vision.py (timm vision towers)
- Preserved sdpa fallback for unmapped/remote-code vision configs
- Config re-stamped to eager when supports_sdpa guard fires

Co-authored-by: Datta Nimmaturi <Datta0@users.noreply.github.com>
2026-04-01 00:30:21 -07:00
Wasim Yousef Said
d63cc57e1e
fix: clear tool status badge immediately after tool execution (#4733)
* fix: clear tool status badge immediately after tool execution

The tool status timer badge (Searching 1s, 2s...) persisted after
tool calls finished because the status clear event was only sent
at the start of the next generation iteration, not after tool
execution completed.

Backend: yield status clear after all tools finish in the agentic
loop iteration, before continue starts the next generation pass.

Frontend: debounce badge visibility by 300ms so sub-second tool
calls dont flash the badge.

* Fix debounce regression for consecutive tool calls

Only apply the 300ms show-delay when transitioning from idle to
tool-active. When switching between consecutive tools in the same
turn (e.g. web_search -> python), keep the badge visible immediately
so it does not flicker or disappear during multi-tool runs.

* Delay wasActiveRef reset to bridge inter-iteration tool gaps

The backend emits a status-clear event between tool iterations,
which was resetting wasActiveRef immediately and causing the next
tool to be re-debounced (300ms hidden gap between consecutive tools
in the same turn). Now the ref reset is delayed by 500ms so a
follow-up tool within the same agentic turn shows the badge
immediately, while a genuinely new turn still gets the debounce.

* Use thread lifecycle to track tool-run boundaries

Replace the 500ms wall-clock timeout with the actual thread.isRunning
state to determine when wasActiveRef should reset. This properly
handles all cases:
- Consecutive tools within the same run stay visible without flicker
- The badge hides only when the thread run actually ends
- New turns always get a fresh 300ms debounce on the first tool
- No heuristic timeout that can misfire on slow or fast inference

* Consolidate wasActiveRef reset into single effect

Removes the separate isThreadRunning effect to avoid a race where
the ref resets before the tool-status effect reads it (when
isThreadRunning flips to false before setToolStatus(null) from
the adapter's finally block). Now wasActiveRef resets only when
both toolStatus is null AND the thread run has ended, eliminating
any flicker on the last tool of a run.

* Simplify debounce: use visible state instead of ref tracking

Drop wasActiveRef entirely and use the visible state as the
debounce gate. When the badge is not yet on screen, debounce
for 300ms before showing. When already visible from a prior tool,
keep showing immediately. This correctly handles all cases:
- All fast tools (<300ms) are suppressed, not just the first
- Consecutive tools after the badge is shown stay visible
- Badge persists across inter-iteration clears while thread runs
- New turns get a fresh debounce after visible resets

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-01 00:28:38 -07:00
Wasim Yousef Said
4fb9778988
feat: move folder management into model selector dropdown (#4731)
* refactor: move folder management from sidebar into model selector

* Fix folder management: restore LoRA picker sync, error handling, caching

- Restore onFoldersChange callback to keep LoRA adapter picker in sync
  when scan folders are added/removed (fixes regression from sidebar move)
- Thread onFoldersChange through ModelSelector -> HubModelPicker prop chain
- Add module-level _scanFoldersCache to prevent folder list flash on re-open
- Surface error toast on folder removal failure instead of silently ignoring
- Guard handleAddFolder against concurrent double-submit via folderLoading
- Clear folderInput on Escape key dismiss to prevent stale input on re-open
- Add refreshLocalModelsList and refreshScanFolders to useEffect dep array

* Fix compare-mode folder sync, Escape key propagation, cancel toggle state

- Wire onFoldersChange through CompareContent/GeneralCompareContent so
  compare-mode selectors also refresh local models after folder changes
- Add e.stopPropagation() on Escape key in folder input to prevent
  Radix Popover from closing the entire model selector dropdown
- Add e.preventDefault() on Enter key to prevent form submission
- Clear folderInput and folderError when cancel toggle hides the input,
  matching the Escape key behavior for consistency

* Fix folder mutation state ordering and touch accessibility

- Use optimistic updates for add/remove so the folder list reflects
  changes immediately instead of waiting on a second listScanFolders
  round-trip that could silently fail.
- Move refreshScanFolders out of the finally block in handleRemoveFolder
  so it runs after the cache update, not after onFoldersChange.
- Make the remove button visible on touch/mobile devices and reachable
  via keyboard focus (opacity-100 on small screens, focus-visible).
- Add aria-label to the remove button for screen readers.

* Deduplicate optimistic folder add to match backend behavior

The backend returns the existing ScanFolderInfo row when adding a
path that is already registered. The optimistic update was blindly
appending the returned row, producing duplicate entries and React
key warnings. Now checks by id before appending.

* Add aria-label to folder toggle button and strengthen dedup check

- Add aria-label to the +/cancel icon button for screen readers.
- Extend optimistic dedup check to also compare by path, not just id,
  to handle edge cases where the cache is stale.

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-31 23:15:50 -07:00
Lee Jackson
2cac3e8e4d
studio: Polish Windows installer/setup logs (#4736)
* style(windows): clean installer/setup log output and remove seeded credential banner

* Keep startup credential hint without exposing plaintext password

Print the username and .bootstrap_password file path on first-run
admin creation instead of the raw password. Headless / Docker / SSH
operators still get a startup-time hint for initial sign-in, and the
plaintext credential no longer appears in terminal output or logs.

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-03-31 23:12:42 -07:00
Daniel Han
6984e118eb
Bump installer minimum version pin to 2026.3.18 (#4729)
Matches the latest PyPI release.
2026-03-31 07:00:51 -07:00
Daniel Han
cfeb8c3245 Versioning 2026-03-31 06:51:34 -07:00
Wasim Yousef Said
1e8875584d
feat: custom scan folders for GGUF model discovery (#4723)
* feat: add scan_folders table and CRUD functions to studio_db

* feat: add scan folders API endpoints and integrate into model scan

* feat: add scan folders API client and update source types

* feat: add custom source to model filters and selector

* feat: add Model Folders section to chat settings sidebar

* style: fix biome formatting in ModelFoldersSection

* fix: address review findings for custom scan folders

empty string bypass, concurrent delete crash guard,
Windows case normalization, response_model on endpoints,
logging, deduplicated filter/map, module level cache for
custom folder models, consistent source labels, handleRemove
error surfacing, per folder scan cap

* fix: show custom folders section regardless of chatOnly mode

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor: extract shared refreshLocalModelsList in pickers

* Harden custom scan folder validation and scanning

- Validate path exists, is a directory, and is readable before persisting
- Apply per-folder model cap during traversal instead of after (avoids
  scanning millions of inodes in large directories)
- Wrap per-folder scan in try/except so one unreadable folder does not
  break the entire /api/models/local endpoint for all callers
- Normalize case on Windows before storing so C:\Models and c:\models
  dedup correctly
- Extend macOS denylist to cover /private/etc and /private/tmp (realpath
  resolves /etc -> /private/etc, bypassing the original denylist)
- Add /boot and /run to Linux denylist

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Improve scan robustness and preserve Windows path casing

- Preserve original Windows path casing in DB instead of lowercasing
  (normcase used only for dedup comparison, not storage)
- Catch PermissionError per child directory so one unreadable subdirectory
  does not skip the entire custom folder scan
- Wrap list_scan_folders() DB call in try/except so a DB issue does not
  break the entire /api/models/local endpoint

* fix: scan custom folders for both flat and HF cache layouts

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix Windows case-insensitive path dedup with COLLATE NOCASE

Use COLLATE NOCASE on the scan_folders.path column so that the UNIQUE
constraint correctly deduplicates C:\Models and c:\models on Windows
without lowercasing the stored path. Also use COLLATE NOCASE in the
pre-insert lookup query on Windows to catch existing rows with
different casing.

* Restore early-exit limit in _scan_models_dir for custom folders

Keep the limit parameter so _scan_models_dir stops iterating once
enough models are found, avoiding unbounded traversal of large
directories. The post-traversal slice is still applied after combining
with _scan_hf_cache results.

* feat: scan custom folders with LM Studio layout too

* Fix custom folder models being hidden by dedup

Custom folder entries were appended after HF cache and models_dir
entries.  The dedup loop kept the first occurrence of each model id,
so custom models with the same id as an existing HF cache entry were
silently dropped -- they never appeared in the "Custom Folders" UI
section.

Use a separate dedup key for custom-source entries so they always
survive deduplication.  This way a model can appear under both
"Downloaded" (from HF cache) and "Custom Folders" (from the
user-registered directory) at the same time.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Harden LM Studio scan and fix COLLATE NOCASE on Linux

- Add per-child and per-publisher OSError handling in _scan_lmstudio_dir
  so one unreadable subdirectory does not discard the entire custom
  folder's results
- Only apply COLLATE NOCASE on the scan_folders schema on Windows where
  paths are case-insensitive; keep default BINARY collation on Linux
  and macOS where /Models and /models are distinct directories

* Use COLLATE NOCASE in post-IntegrityError fallback SELECT on Windows

The fallback SELECT after an IntegrityError race now uses the same
case-insensitive collation as the pre-insert check, so a concurrent
writer that stored the path with different casing does not cause a
false "Folder was concurrently removed" error.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-31 06:40:31 -07:00
Daniel Han
9a8b622306
Studio: simplify tool-call dedup and replace html2text with builtin converter (#4722)
* Simplify tool-call dedup: drop hashlib, inline helpers

The duplicate tool-call detector only compares calls within a single
request from the same JSON parser, so dict key order is guaranteed
identical for identical calls (Python 3.7+ insertion-ordered dicts).

- Replace hashlib.md5(json.dumps(...)) with name + str(args)
- Inline _tool_call_key, _is_duplicate_call, _record_tool_call
  since each was a one-liner used once
- Remove unused hashlib import

* Remove tool_calling_benchmark_results.md from repo

* Replace html2text with builtin HTML-to-Markdown converter

Drop the external html2text (GPL-3.0) dependency and its regex
fallback. Add _html_to_md.py (~190 lines, stdlib only) using
html.parser.HTMLParser that handles headings, links, bold/italic,
lists, tables, blockquotes, code blocks, and entity decoding.
Strips script/style/head tags entirely.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use json.dumps(sort_keys=True) for tool-call dedup key

str(dict) is sensitive to insertion order, so semantically identical
calls with different key ordering would bypass duplicate detection.
Switch to json.dumps with sort_keys=True for a canonical representation.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert dedup key to str(arguments)

json.dumps(sort_keys=True) is unnecessary here -- the arguments dict
always comes from the same JSON parser within a single request, so
key insertion order is deterministic (Python 3.7+).  str() is faster
and sufficient for consecutive-call dedup.

* Address review comments on _html_to_md.py

- Remove "hr" from _BLOCK_TAGS so the dedicated hr handler is reachable
- Prefix all newlines with ">" inside blockquotes (multi-line support)
- Emit full ![alt](url) for images instead of alt text only
- Replace newlines with spaces inside table cells
- Track header cells per-row (_row_has_th) instead of last-cell-only
- Strip trailing tabs in addition to spaces in cleanup regex

* Fix blockquote rendering, truncated-HTML buffer flush, and dedup key canonicalization

_html_to_md.py:
- Rewrite blockquote handling with stack-based buffer approach so nested
  blockquotes, pre blocks inside blockquotes, and multi-paragraph quotes
  all render correctly with proper "> " prefix on every line.
- Add flush_pending() to recover content from truncated HTML where closing
  tags are missing (common when _fetch_page_text caps the download size).
  Flushes open <a>, <td>, <pre>, and blockquote buffers.
- Skip <img> tags to match prior html2text ignore_images=True behavior
  and avoid data-URI amplification consuming the output budget.
- Collapse all whitespace (including newlines) in non-pre content per
  standard HTML whitespace rules: \s+ -> single space.
- Escape pipe characters in table cell content to prevent column breakage.
- Emit separator row after the first row for tables without <th> headers.
- Guard against IndexError on _ol_counter for orphan <li> elements.
- Normalize CRLF line endings before parsing.

llama_cpp.py:
- Restore canonical dedup key with json.dumps(sort_keys=True) so that
  semantically identical tool calls with different JSON key order are
  correctly detected as duplicates.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix table optional end tags, inline code whitespace, and link text normalization

_html_to_md.py:
- Extract _finish_cell() and _finish_row() helpers to handle HTML tables
  that omit optional </td>, </th>, or </tr> end tags. This is valid HTML
  and common on real web pages -- previously the parser would silently
  drop earlier cells and entire rows.
- Call _finish_cell()/_finish_row() from handle_starttag for <tr>/<td>/<th>,
  handle_endtag for </tr>/<td>/<th>/<table>, and flush_pending() so all
  three paths (normal close, implicit close, truncated HTML) use the same
  row-finalization logic including header separator emission.
- Add _in_inline_code flag so handle_data() preserves literal whitespace
  inside <code> spans instead of collapsing it. Source like
  <code>pip  install   unsloth</code> now correctly renders as
  `pip  install   unsloth` rather than `pip install unsloth`.
- Extract _finish_link() helper that normalizes accumulated link text with
  \s+ -> single space before building the Markdown link. Prevents block-
  level content inside <a> tags (e.g. <a><div>one</div><div>two</div></a>)
  from producing multiline [one\n\ntwo](href) link labels.
- Empty blockquotes now produce no output instead of a stray ">".
- Remove unused _bq_depth field (all routing uses _bq_stack).
- Flush open cells and rows in handle_endtag("table") for robustness.

* Support <ol start=N>, <dl>/<dt>/<dd>, and preserve code block whitespace

_html_to_md.py:
- Honor <ol start="N"> attribute so ordered lists preserve their original
  numbering instead of always restarting from 1. Important for docs/tutorials
  that continue numbering across sections.
- Add dl, dt, dd to _BLOCK_TAGS so definition lists (common on MDN, Python
  docs, Django docs) produce separated text instead of concatenated blobs.
- Rewrite _cleanup() to be fence-aware: content inside fenced code blocks
  is now preserved verbatim (intentional blank lines in <pre> content are
  no longer collapsed). Outside code blocks, blank runs are limited to one
  and trailing whitespace is stripped.
- Fix _prefix_blockquote() to strip trailing whitespace before collapsing
  blank lines, preventing the "\n\n \n\n" pattern from sneaking through.

* Suppress whitespace-only text nodes between table structural elements

Indented HTML tables (nearly all real-world pages) produce whitespace
text nodes between <table>, <tr>, </tr> etc. that land in the output
as leading spaces before table rows, breaking Markdown table alignment.

Skip whitespace-only text nodes when inside a table but not inside a
cell, so indentation from source HTML does not leak into the output.

* Revert dedup key to str(arguments) with explanatory comment

json.dumps(sort_keys=True) is unnecessary overhead here: arguments
always comes from json.loads on model output within a single request,
so dict insertion order is deterministic in Python 3.7+. A repeated
call from the model produces the same JSON, which parses to the same
dict repr. str() avoids re-serialization on every tool call.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-31 06:15:18 -07:00
Lee Jackson
9451bb1bac
fix(export): preserve selected/manual model on enter and blur (#4726) 2026-03-31 17:05:55 +04:00
Daniel Han
e159b93b97
studio: improve GGUF tool calling accuracy and reliability (#4700)
* studio: improve GGUF tool calling accuracy and reliability

- Add URL fetching to web_search tool so models can read full page
  content instead of only getting search snippets. Uses html2text for
  clean markdown conversion with regex fallback.
- Inject current date and behavioral guidance (URL fetch workflow,
  no repeated queries, use code for data processing) into the
  tool-use system prompt.
- Append error recovery nudge to tool results that indicate failure,
  helping small models avoid looping on the same broken call.
- Strip leaked <tool_call> XML from assistant messages in conversation
  history and from the outgoing SSE stream.
- Raise default max tool iterations from 10 to 25 across backend,
  model schema, and frontend defaults.
- Increase _MAX_PAGE_CHARS from 4k to 16k so fetched pages contain
  enough content for the model to extract useful information.
- Add "IMPORTANT: These are only short snippets" hint to search
  results so models know to fetch full pages when needed.

Tested with Qwen3.5-4B-GGUF (UD-Q4_K_XL), 10 runs before/after:
- XML leaks in responses: 10/10 -> 0/10
- URL fetch usage: 0 -> 4/10 runs
- Runs producing actual correct answers: 0/10 -> 2/10
- Average tool calls per query: 5.5 -> 3.8 (more efficient)
- Average response time: 12.3s -> 9.8s

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add tool calling benchmark results across model sizes and quants

Tested 16 configurations (4 models x 2 quants x 2 KV cache types)
with 10 runs each on NVIDIA B200.

Best config: 27B UD-Q4_K_XL + bf16 KV -- 6/10 runs found all 4
correct songs, 0 XML leaks, 131s average response time.

* Add duplicate tool-call detection and final-answer synthesis

When the model repeats the exact same tool call (same name + arguments)
twice in a row, skip execution and return a redirect message telling it
to try a different approach. This prevents the 8x-repeated-query loops
observed on 27B and 35B models.

When the tool iteration cap (25) is reached, inject a "provide your
final answer now" message before the final streaming pass. This lets
the model synthesize a useful answer from everything it gathered
instead of being silently cut off.

Tested on Qwen3.5-27B UD-Q4_K_XL (10 runs):
- Repeated query runs: 4/10 -> 2/10
- Cap hits: 1/10 -> 0/10
- All 4/4 accuracy: 5/10 -> 7/10

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix CodeQL alert: handle whitespace in script/style closing tags

The regex fallback for HTML stripping did not match closing tags
with whitespace before the angle bracket (e.g. </script >).
Use \s* before > in both script and style patterns.

* Address reviewer findings: SSRF, timeout crash, XML regex, dedup

- SSRF: resolve hostname via getaddrinfo and reject private, loopback,
  link-local, multicast, and reserved addresses before fetching
- Timeout: handle timeout=None (unlimited mode) in URL fetch path
  by defaulting to 60s instead of crashing on min(None, 60)
- Download cap: read at most max_chars*4+1 bytes instead of the
  full response body before truncating
- XML regex: match both <tool_call> and <function=...> markup in
  the history/stream cleanup (inference.py)
- CodeQL: use [^>]* in closing script/style tags to handle any
  whitespace or attributes before >
- Dedup: track whether each tool call failed so retries after
  transient errors are allowed; only block consecutive identical
  calls that both succeeded
- Final-answer synthesis: guard on max_tool_iterations > 0 so
  callers who disable tools do not get a false "used all calls" turn

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix redirect SSRF, SSE streaming regression, dedup off-by-one

- SSRF redirect bypass: disable auto-redirect in urllib, manually
  follow up to 5 hops with host validation at each step. Prevents
  public URLs from redirecting to loopback/private targets.
- SSE streaming: track prev_text on the raw cumulative and strip
  XML from the delta only, so completed tool_call tags do not cause
  the cumulative to shrink and drop trailing real text.
- Dedup off-by-one: check the immediately previous call (window=1)
  instead of requiring 2 matching history entries, so the second
  identical successful call is blocked rather than the third.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix redirect HTTPError handling and tighten error prefixes

- Redirect fix: urllib raises HTTPError (not a normal response) when
  the redirect handler returns None. Catch HTTPError for 3xx codes
  and extract the Location header from the exception object.
- Error prefixes: remove overly broad "No " prefix that matched
  "No results found." (a valid empty-search outcome, not an error).
  Replace with specific prefixes like "Blocked:", "No query provided",
  "Failed to resolve". This ensures empty search results are correctly
  classified as non-errors for duplicate-call tracking.

* Fix SSE cross-chunk XML leaks, cleanup review findings

- SSE streaming: sanitize the full cumulative text before diffing
  against the previous sanitized snapshot, so XML tags that span
  chunk boundaries are stripped correctly. The previous delta-based
  approach leaked split tags.
- DRAINING fallback: use _strip_tool_markup() helper instead of a
  manual regex that only handled <tool_call> but not <function=...>.
- Move hashlib import, _TOOL_XML_RE compile, and datetime import to
  module level per style guide.
- Remove unused _hit_tool_cap variable.

* Fix DNS rebinding, charset detection, HTTPError handling, dedup double-record

- DNS rebinding: resolve hostname once via getaddrinfo, pin the
  returned IP, rewrite the URL to connect to the pinned IP with
  a Host header. Each redirect hop re-resolves and re-validates.
  Closes the TOCTOU window between validation and connection.
- Charset: use resp.headers.get_content_charset() instead of
  hardcoding utf-8, so pages with other encodings decode correctly.
- HTTPError: return descriptive "HTTP {code} {reason}" instead of
  re-raising into a generic "Search failed" message.
- Dedup: remove redundant _record_tool_call in the duplicate branch;
  the single call at the end of the loop handles all cases.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-31 03:06:44 -07:00
Lee Jackson
815619d972
feat: add update instructions card with OS toggle and mobile expand flow (#4721)
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
2026-03-31 14:05:05 +04:00
Roland Tannous
cc5e4fbf17
fix: auto-retry stalled HF downloads with HF_HUB_DISABLE_XET=1 (#4712)
* fix: auto-retry stalled HF downloads with HF_HUB_DISABLE_XET=1

The heartbeat thread now monitors the HF Hub cache directory for
file-size growth. If no bytes are written for 3 minutes, it sends a
"stall" message to the orchestrator, which kills the subprocess and
retries with HF_HUB_DISABLE_XET=1 (falling back from Xet to standard
HTTPS). If the retry also stalls, it errors out with a clear message.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: include transport type (xet/https) in heartbeat and stall log messages

Makes it clear in backend logs whether the download is using xet or
https transport, and which transport stalled — helpful for debugging.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: monitor HF Hub .tmp dir to avoid false stall detections

huggingface_hub downloads into .tmp/ before atomically moving to
blobs/. Without monitoring .tmp, a large shard actively downloading
for several minutes would show zero blob growth and trigger a false
stall.

* fix: scope HF cache size check to specific model being loaded

Instead of scanning every models--*/blobs directory (O(N) with cached
models), only check the specific model's blobs dir plus the global
.tmp dir. Much faster on systems with many cached models.

* Fix false stall detection on cached/local models and cleanup issues

- Only fire stall if download activity was observed (cache size changed
  at least once). Previously, any model load taking >180s would trigger
  a false stall, even for already-cached or local models where no
  download is happening.
- Return -1 from _get_hf_cache_size on exception to distinguish
  "unable to measure" from "genuinely zero bytes". Skip stall logic
  when measurement fails.
- Add _shutdown_subprocess before raising on terminal stall path to
  prevent leaking a stuck subprocess.
- Detect pre-existing HF_HUB_DISABLE_XET=1 in the parent environment
  to avoid a redundant retry cycle when Xet is already disabled.
- Remove global .tmp directory scanning (not used by modern
  huggingface_hub; in-progress downloads use .incomplete files in
  blobs/ which are already captured by iterdir).
- Add f.is_file() guard in cache size calculation.
- Replace em dashes with ASCII dashes for Windows terminal compat.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Harden stall detection edge cases

- Guard -1 to valid value transition: when initial _get_hf_cache_size
  returns -1 (error) and later recovers to a real value, do not count
  that as download activity. Only set saw_download_activity when the
  previous measurement was also valid (>= 0).
- Move os import to top-level in orchestrator.py instead of inline
  import os as _os.
- Fix misleading comment about post-download protection.

* Use .incomplete files to detect active downloads for stall detection

Replace the saw_download_activity heuristic with direct .incomplete file
detection. huggingface_hub creates *.incomplete files in blobs/ during
active downloads and removes them on completion. This gives a reliable
signal for whether a download is actually in progress.

Benefits:
- Cached models: no .incomplete files -> no stall fired even after 180s
- Post-download init (quantization, GPU loading): .incomplete files gone
  so stall timer resets, long init phases are not killed
- Pre-download hangs (XET handshake stall): .incomplete files are
  created at download start, so zero-byte stalls are now detected
- No more false positives from -1 to valid measurement transitions

The _get_hf_download_state function now returns (total_bytes,
has_incomplete) tuple or None on error, replacing _get_hf_cache_size.

* Add debug logging to download state exception handler

Log the exception at debug level when _get_hf_download_state fails,
instead of silently returning None. Helps with troubleshooting cache
measurement issues.

* Watch both adapter and base model repos for LoRA stall detection

When loading a LoRA adapter, the actual download bottleneck is often
the base model, not the adapter itself. Update the heartbeat to watch
both mc.identifier and mc.base_model cache directories so stall
detection works for LoRA loads where the base model stalls on Xet.

Also update _get_hf_download_state to accept multiple model names and
skip names without "/" (local paths) since those do not have HF cache
directories.

* Fix model name filtering for official HF models without org prefix

Models like gpt2 and bert-base-uncased do not contain a slash but are
still valid HF Hub models with cache directories. Replace the "/" check
with a proper local-path detection that checks for path separators and
path-like prefixes instead.

Also fix the base_model watch list to not require "/" in the base model
name, so official models used as LoRA bases are also monitored.

* Fix local path detection that broke all org/model names on Linux

The os.path.sep check matched "/" in HF model IDs like "org/model" on
Linux, causing the stall detector to skip ALL standard HF models.

Replace with a check that only skips names starting with "/" (absolute
paths), "." (relative paths), "~" (home-relative), or containing "\"
(Windows paths). HF model IDs like "org/model" or "gpt2" pass through
correctly on all platforms.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-31 03:00:46 -07:00
Daniel Han
e164c930ff
fix(studio): correct default weight_decay and learning rate (#4695)
* fix(studio): change default weight_decay from 0.01 to 0.001

The default weight decay across Studio was 0.01 but should be 0.001.
Updated the default in all backend fallbacks, the Pydantic model, the
frontend config, and every YAML preset/model-default config.

* fix(studio): auto-set learning rate based on training method

Default LR should be 2e-4 for LoRA/QLoRA and 2e-5 for full fine-tuning.

Frontend: track whether the user has manually edited the LR field via a
_learningRateManuallySet flag (same pattern as trainOnCompletions).
When switching training method and the user has not touched the LR,
auto-set it to the appropriate default. Reset the flag on model load.

Backend: change trainer.py start_training default from 5e-5 to 2e-4,
update default.yaml fallback from 5e-5 to 2e-4, and fix
full_finetune.yaml from 0.0002 (2e-4) to 2e-5.

* refactor(studio): centralize weight_decay and learning rate defaults

Create studio/backend/core/training/constants.py as the single source of
truth for DEFAULT_WEIGHT_DECAY (0.001), DEFAULT_LEARNING_RATE (2e-4),
DEFAULT_LEARNING_RATE_FULL (2e-5), and DEFAULT_LEARNING_RATE_STR ("2e-4").

All backend modules (trainer.py, training.py, worker.py, models/training.py)
now import from constants.py instead of hardcoding values.

On the frontend, add LR_DEFAULT_LORA and LR_DEFAULT_FULL to
config/training.ts and use them in the store instead of magic numbers.
A comment cross-references the backend constants file.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix model-specific LR override, persist migration, and flag resets

- Preserve model-specific learning rates from YAML configs when the
  async autoSelectTrainingMethod callback fires (fixes Qwen2.5-1.5B
  getting 2e-4 instead of its configured 1e-5, etc.)
- Bump zustand persist version to 9 with migration so existing users
  with weightDecay=0.01 get updated to 0.001
- Clear _learningRateManuallySet in reset() and applyConfigPatch()
  for consistency with trainOnCompletions flag behavior
- Add DEFAULT_LEARNING_RATE_FULL_STR to constants.py

* Refine applyConfigPatch to only clear LR flag when patch includes LR

Only reset _learningRateManuallySet when the applied config patch
actually provides a learningRate value. This prevents unrelated config
patches from silently disarming the manual-edit guard, which would
cause a subsequent setTrainingMethod call to overwrite the user's
custom LR.

* Preserve model-specific LR when switching between qlora and lora

Only auto-switch the learning rate when the training category changes
(adapter <-> full fine-tuning). Switching between qlora and lora keeps
the current LR since both methods share the same learning rate range.
This preserves curated per-model defaults (e.g. 1e-5 for
Qwen2.5-1.5B-Instruct) when the user toggles between adapter methods.

* Remove constants.py, use YAML configs as the source of truth

The YAML config files (model-specific + default.yaml) are the intended
config layer for training defaults. The Python backend fallbacks now use
inline values that match the YAML configs, rather than importing from a
separate constants module. This keeps the config architecture simple:
YAML files are the single source of truth, and the inline Python
fallbacks are just safety nets that mirror them.

* fix(studio): preserve model-specific LR when switching training method

Stash YAML-provided learning rate and use it to restore the correct
value when switching between adapter and full fine-tune modes.

- qlora <-> lora no longer overwrites the model's LR
- full -> adapter restores the YAML LR instead of a hardcoded constant
- selecting a model while on full fine-tune uses LR_DEFAULT_FULL
  instead of applying the YAML adapter LR

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
2026-03-31 13:50:25 +04:00
Wasim Yousef Said
28aaf849bf
fix: throttle and cache HuggingFace modelInfo API calls (#4696)
* fix: throttle and cache HuggingFace modelInfo API calls

The frontend was firing 40 to 60 parallel modelInfo requests on app
startup with zero caching or deduplication, causing HF rate limits.

Adds a caching layer (hf-cache.ts) with TTL cache, inflight request
dedup, and a concurrency limiter. Also debounces the HF token input
so typing a token no longer re-fires all model searches per keystroke.

* fix: only fetch VRAM info for visible models in chat selector

* Fix cache key isolation and VRAM badge stability for PR #4696

- Cache key now includes a token fingerprint (last 8 chars) instead of a
  boolean, so switching HF tokens gives separate cache entries instead of
  serving stale data from the previous token.
- Extract token via credentials?.accessToken to match the @huggingface/hub
  API surface.
- Extend CachedResult type with safetensors/tags fields so downstream
  consumers no longer need unsafe `as` casts.
- Merge VRAM param map with previous state on scroll instead of replacing
  it, preventing a brief flash of missing VRAM badges when new models
  become visible.

* Fix VRAM badges missing for search-filtered recommended models

When a user types a search query, filteredRecommendedIds can include
models beyond the currently visible page. These models had no VRAM data
because useRecommendedModelVram only received visibleRecommendedIds.

Now we pass the union of visibleRecommendedIds and filteredRecommendedIds
to the VRAM hook, so recommended models surfaced by search also show
their VRAM badges. The hf-cache layer ensures no duplicate network calls.

* Apply biome formatting to hf-cache.ts and use-recommended-model-vram.ts

Auto-formatted with biome check --write to match project lint rules:
- Block statements for single-line if/for bodies
- Import sorting (type imports first)
- Consistent line wrapping

* Fix extractToken to handle both current and deprecated HF auth forms

The @huggingface/hub CredentialsParams type is a union:
  - { accessToken: "hf_..." }               (current preferred form)
  - { credentials: { accessToken: "..." } }  (deprecated form)

Previously only checked params.credentials?.accessToken (deprecated path).
Now checks both forms so the cache key is correct regardless of which
calling convention is used.

* Simplify extractToken, map merge, and set construction

- extractToken: remove type assertions, use direct property access with
  truthiness checks for cleaner union type handling
- VRAM map merge: use Map spread constructor instead of manual for loop
- idsForVram: use Set spread construction for more concise dedup

* Add rationale comment for MAX_CONCURRENT=3 in hf-cache.ts

* Skip GGUF repos in VRAM fetch and pre-populate cache from listModels

Two changes to reduce redundant HF API calls:

1. Filter GGUF repos from idsForVram before passing to useRecommendedModelVram.
   GGUF repos have no safetensors metadata and the render layer already shows
   a static "GGUF" badge -- fetching modelInfo for them is a no-op that wastes
   a semaphore slot and a network round-trip.

2. Add primeCacheFromListing() to hf-cache.ts and call it from listModels
   yield sites in mergedModelIterator and priorityThenListingIterator.
   listModels returns the same type (ModelEntry & Pick<ApiModelInfo, T>) as
   modelInfo with the same additionalFields, so the data is interchangeable.
   Priming only writes if the key is not already fresh, so it never overwrites
   a recent modelInfo response.

   This means models discovered via listModels are already in cache when
   useRecommendedModelVram later calls cachedModelInfo for them, eliminating
   duplicate network requests.

* Fix cache key mismatch: prime both token and anonymous slots

The VRAM hook calls cachedModelInfo without credentials (anonymous key),
but listModels results were primed only under the authenticated key.
For authenticated users the priming was a no-op -- cache miss every time.

Fix: prime both the token-specific slot and the anonymous slot when an
access token is present. Public model metadata (safetensors, tags) is
identical regardless of auth so this is safe.

Also add a defensive guard in primeCacheFromListing for empty name.

* Auto-prime anonymous cache slot from authenticated modelInfo fetches

When cachedModelInfo is called with a token, the result was only stored
under the token-specific key (e.g. model::abc12345). The VRAM hook
calls cachedModelInfo without credentials and reads the anonymous slot
(model::anon), causing a cache miss and duplicate fetch for every
priority model.

Now cachedModelInfo also writes to the anonymous slot on success when
a token is present. Public model metadata (safetensors, tags) is
identical regardless of auth, so this is safe and eliminates ~10
duplicate API calls on first page load.

* Guard anonymous cache priming against gated/private models

Only prime the anonymous cache slot for non-gated, non-private models.
Previously, authenticated modelInfo responses and listing results were
unconditionally copied into the anonymous slot, which could briefly
expose gated/private model metadata after clearing the HF token.

Now checks result.gated and result.private before writing the anon slot.
Public unsloth/ models (the common case) still benefit from the
optimization; gated models like meta-llama/* require a fresh fetch
per auth context.

* Extract primeFromListing helper to deduplicate cache priming logic

The cache priming pattern (prime token slot + conditionally prime anon
slot for non-gated models) was duplicated in three places. Extracted
into a single primeFromListing() function for maintainability.

* Export CachedResult type, add isStale helper, simplify primeFromListing

- Export CachedResult so consumers can use it directly instead of
  the indirect Parameters<typeof ...> pattern.
- Extract isStale(key) helper to deduplicate the cache freshness
  check that was repeated in primeCacheFromListing, cachedModelInfo,
  and the anonymous-slot priming logic.
- Simplify primeFromListing to use CachedResult directly for both
  the data parameter and the gated/private guard, eliminating the
  double cast.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-31 02:21:17 -07:00
Datta Nimmaturi
3b5a49776b
[studio] multi gpu: revert to balanced for inference. (#4698)
* Revert to balanced for inference

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused for_inference parameter from get_device_map

Since inference and training both use "balanced" now, the for_inference
flag is dead code. Remove it from the function signature, the call site
in inference.py, and simplify the tests accordingly.

* Remove redundant TestDeviceMapForInference test class

TestGpuAutoSelection already covers the same multi-gpu and single-gpu
device_map assertions. The TestDeviceMapForInference class was left
over from when for_inference had distinct behavior.

* Remove redundant test_get_device_map_multi_gpu_uses_balanced

Its assertions ([0,1] -> balanced, [0] -> sequential) are already
covered by test_get_device_map_uses_explicit_gpu_selection.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-31 01:24:41 -07:00
Daniel Han
fe6609a624
fix(studio): open tour ReadMore links in new tab (#4694)
* fix(studio): open tour ReadMore links in new tab

The quick tour "Read more" links navigate away from Studio instead of
opening in a separate tab. Add target="_blank" and rel="noopener
noreferrer" to the ReadMore component so external doc links open in a
new browser tab.

* fix(studio): only open external ReadMore links in new tab

Apply target="_blank" conditionally based on whether the href starts
with "http", so internal links still navigate in the same tab.

* Tighten external-link detection in ReadMore component

Use regex /^https?:\/\// instead of startsWith("http") so the check
requires the full protocol prefix and does not match non-URL strings
that happen to begin with "http".

* Hoist regex to module scope for ReadMore

Move EXTERNAL_URL_RE to top-level constant to satisfy the biome
useTopLevelRegex lint rule and avoid re-creating the RegExp on
every render.

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-03-30 23:41:14 -07:00
Lee Jackson
308bb948d1
studio: prevent false multimodal warning during model loading (#4704)
* studio: gate multimodal incompatibility warning on settled model capabilities

* Also disable Start button during isCheckingVision fallback

When getModelConfig fails and the fallback checkVisionModel is still
in-flight, isLoadingModelDefaults clears before isCheckingVision does.
Without also gating on isCheckingVision the Start button briefly
re-enables with stale capability flags.

Add isCheckingVision to the disabled condition and show "Loading
model..." text while either flag is active.

* Show correct error message for audio dataset incompatibility

The incompatibility warning always said "switch to a vision model"
even when the actual issue was an audio dataset on a non-audio model.
Now shows an audio-specific message when the mismatch is audio.

* Extract isLoadingModel constant for clarity

Pull the combined model-loading condition into a single constant
reused by the settled check, the disabled prop, and the button label.

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-03-30 23:11:20 -07:00
pre-commit-ci[bot]
66f250a614
[pre-commit.ci] pre-commit autoupdate (#4705)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.15.7 → v0.15.8](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.7...v0.15.8)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-30 21:58:16 -07:00
Roland Tannous
d6d3f59984
fix: replace hard timeout with inactivity timeout for model loading (#4707)
The 180s wall-clock timeout would kill model loads on slow connections
even when the download was actively progressing. Now the worker sends
heartbeat status messages every 30s during loading, and the orchestrator
resets its 300s deadline on each one — so it only times out when the
subprocess goes truly silent.
2026-03-31 07:35:04 +04:00
Roland Tannous
7f353acfd4
fix: skip download progress polling for exported GGUF models (#4709)
* fix: skip download progress polling for exported GGUF models

* fix: revert isLocalGgufDir change — exported GGUFs are file paths, not dirs

* fix: set isDownloaded true for all adapters in LoraModelPicker
2026-03-31 07:21:23 +04:00
Etherll
34272a796f
Fix/bun windows bin detection (#4703)
* fix(studio): detect bun .exe shims in Windows binary check

* Update setup.sh

* add .bunx checking
2026-03-30 21:58:33 +04:00
Daniel Han
6d83ad9a28
fix(studio): avoid UnicodeEncodeError on Windows cp1252 consoles (#4699)
* fix(studio): replace unicode emoji in print() to avoid cp1252 crash on Windows

On Windows the default console encoding is cp1252 which cannot encode
unicode emoji like U+2705 or U+26A0. bare print() calls with these
characters cause a UnicodeEncodeError at runtime.

- run.py: replace emoji with ASCII status prefixes [OK] and [WARNING]
- format_conversion.py: remove duplicate print() that mirrors the
  logger.info() call on the next line, and drop the emoji from the
  log message since loggers handle encoding separately

* fix(studio): apply same emoji/print cleanup to parallel VLM conversion path

The parallel URL-based conversion logic has the same duplicate print()
with emoji that was fixed in the sequential path. Remove the bare
print() and drop the emoji from the logger.info() call.

* Treat install_python_stack.py failure as fatal in setup.ps1

On Linux/Mac, setup.sh runs under set -euo pipefail so a non-zero
exit from install_python_stack.py aborts the installer. On Windows,
setup.ps1 had no exit code check -- if the Python script crashed
(eg from the cp1252 UnicodeEncodeError), the installer silently
continued past the dependency loop and reported success. Studio
would then fail at launch with ModuleNotFoundError for structlog,
fastapi, and other deps that were never installed.

Capture $LASTEXITCODE and exit 1 if the dependency installer fails,
matching the error handling pattern already used for PyTorch install.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 06:40:47 -07:00
Daniel Han
a0bca759f3
Fix editable install scanning 6,500+ node_modules dirs (#4697)
* fix: scope packages.find to prevent node_modules namespace scanning

The packages.find section had no include filter, so setuptools'
find_namespace_packages discovered all directories as potential Python
packages -- including the 6,557 directories inside
studio/frontend/node_modules/ after the frontend build step.

This caused the editable install overlay step to run 20,000+ glob
operations across 6,619 "packages", which on fast NVMe takes ~5s but
on slower disks can take 7+ minutes.

Adding an explicit include filter scopes discovery to only the packages
we actually ship (unsloth, unsloth_cli, studio, studio.backend), dropping
from 6,619 to 58 discovered packages and the editable build time from
5.4s to 1.2s.

Also removes the broken kernels/moe exclude (used "/" instead of "."
notation so it never matched) and adds a node_modules exclude as a
safety net.

* fix: use precise node_modules exclude patterns

Use "*.node_modules" and "*.node_modules.*" instead of "*.node_modules*"
to avoid accidentally excluding valid packages that might contain
"node_modules" as a substring in their name.
2026-03-30 02:40:29 -07:00
Datta Nimmaturi
9311df2b29
[Studio] multi gpu finetuning/inference via "balanced_low0/sequential" device_map (#4602)
* [WIP] balanced device map for studio

* gpus as a request parameter

* API for multi GPU stuff

* return multi gpu util in new API

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use balanced_low0 instead of balanced

* Use balanced_low0 instead of balanced

* Fix device_map typo, UUID parsing crash, set() filter bug, and broken tests

- balanced_low0 -> balanced_low_0 (transformers/accelerate rejects the old string)
- get_parent_visible_gpu_ids() now handles UUID/MIG CUDA_VISIBLE_DEVICES
  gracefully instead of crashing on int() parse
- _get_backend_visible_gpu_info() set() or None bug: empty set is falsy so
  CUDA_VISIBLE_DEVICES=-1 would disable filtering and report all GPUs
- test_gpu_selection.py: add missing get_visible_gpu_utilization import and
  add required job_id arg to start_training() calls

* Smart GPU determinism using estimates

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disallow gpu selection for gguf for now

* cleanup

* Slightly larger baseline

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Treat empty list as auto

* Verbose logging/debug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup and revert unnecessary deletions

* Cleanup excessive logs and guard against disk/cpu offload

* auth for visibility API. cleanup redundant imports. Adjust QLoRA estimate

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support for non cuda gpus

* Fix multi-GPU auto-selection memory accounting

The multi_gpu_factor was applied uniformly to all GPUs including the
first one, which unfairly penalizes single-GPU capacity when
transitioning to multi-GPU. This created a discontinuity where a model
that barely fits 1 GPU would suddenly require 2 GPUs because the first
GPU's free memory was discounted by 20%.

Now the first GPU keeps its full free memory, and only additional GPUs
have an overhead factor (0.85) applied to account for inter-GPU
communication and sharding overhead. This gives more accurate
auto-selection and avoids unnecessary multi-GPU for models that
comfortably fit on one device.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add sandbox tests for multi-GPU selection logic

24 tests covering model size estimation, memory requirements, automatic
GPU selection, device map generation, GPU ID validation, and multi-GPU
overhead accounting. All tests use mocks so they run without GPUs on
Linux, macOS, and Windows.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix reviewer findings: 4bit inference estimate, fallback, GGUF gpu_ids, retry

1. 4-bit inference now uses reduced memory estimate (model_size/3 + buffer)
   instead of the FP16 1.3x multiplier. This prevents over-sharding
   quantized models across unnecessary GPUs.

2. When model size estimation fails, auto_select_gpu_ids now falls back to
   all visible GPUs instead of returning None (which could default to
   single-GPU loading for an unknown-size model).

3. GGUF inference route now treats gpu_ids=[] as auto-selection (same as
   None) instead of rejecting it as an unsupported explicit request.

4. Training retry path for "could not get source code" now preserves the
   gpu_ids parameter so the retry lands on the same GPUs.

5. Updated sandbox tests to cover the new 4-bit inference estimate branch.

* Remove accidentally added unsloth-zoo submodule

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix UUID/MIG visibility and update test expectations

1. nvidia.py: When CUDA_VISIBLE_DEVICES uses UUID/MIG tokens, the
   visibility APIs now return "unresolved" with empty device lists instead
   of exposing all physical GPUs. This prevents the UI from showing GPUs
   that the backend process cannot actually use.

2. test_gpu_selection.py: Updated test expectations to match the new
   multi-GPU overhead accounting (first GPU at full capacity, 0.85x for
   additional GPUs) and 4-bit inference memory estimation formula.
   All 60 tests now pass.

* Add CPU/disk offload guard to audio inference path

The audio model loading branch returned before the common
get_offloaded_device_map_entries() check, so audio models loaded with a
multi-GPU device_map that spilled layers to CPU/disk would be accepted
instead of rejected. Now audio loads also verify no modules are offloaded.

* Improve VRAM requirement estimates

* Replace balanced_low_0 with balanced

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine calculations for slightly easier nums

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adjust estimates

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use nums instead of obj to avoid seralisation error

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Harden nvidia-smi parsing and fix fallback GPU list

1. nvidia.py: Wrap int() casts for GPU index and memory in try/except
   so MIG slices, N/A values, or unexpected nvidia-smi output skip the
   unparseable row instead of aborting the entire GPU list.

2. nvidia.py: Handle GPU names containing commas by using the last
   field as memory instead of a fixed positional index.

3. hardware.py: fallback_all now uses gpu_candidates (GPUs with verified
   VRAM data) instead of raw devices list, which could include GPUs
   with null VRAM that were excluded from the ranking.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleanup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* consolidate raise_if_offload

* Improve MoE support. Guard against nvidia-smi failures

* Improve MoE support. Guard against nvidia-smi failures

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix shared-expert LoRA undercount, torch VRAM fallback, and apply_gpu_ids edge case

1. vram_estimation.py: compute_lora_params now includes shared experts
   (n_shared_experts) alongside routed experts when computing MoE LoRA
   adapter parameters. Previously only n_experts were counted, causing
   the estimator to undercount adapter, optimizer, and gradient memory
   for DeepSeek/GLM-style models with shared experts.

2. hardware.py: _torch_get_per_device_info now uses mem_get_info (which
   reports system-wide VRAM usage) instead of memory_allocated (which
   only reports this process's PyTorch allocations). This prevents
   auto-selection from treating a GPU as mostly free when another
   process is consuming VRAM. Falls back to memory_allocated when
   mem_get_info is unavailable.

3. hardware.py: apply_gpu_ids([]) now returns early instead of setting
   CUDA_VISIBLE_DEVICES="" which would disable CUDA entirely. Empty
   list inherits the parent visibility, same as None.

4. hardware.py: Upgraded fallback_all GPU selection log from debug to
   warning so operators are notified when the model likely will not fit
   in available VRAM.

* Guard nvidia-smi subprocess calls against OSError and TimeoutExpired

get_visible_gpu_utilization and get_backend_visible_gpu_info now catch
OSError (nvidia-smi not found) and TimeoutExpired internally instead
of relying on callers to wrap every invocation. Returns the standard
available=False sentinel on failure so the torch-based fallback in
hardware.py can take over.

* Guard get_primary_gpu_utilization and reset GPU caches between tests

1. nvidia.py: get_primary_gpu_utilization now catches OSError and
   TimeoutExpired internally, matching the pattern already used in
   get_visible_gpu_utilization and get_backend_visible_gpu_info. All
   three nvidia-smi callers are now self-contained.

2. test_gpu_selection.py: Added _GpuCacheResetMixin that resets the
   module-level _physical_gpu_count and _visible_gpu_count caches in
   tearDown. Applied to all test classes that exercise GPU selection,
   device map, or visibility functions. This prevents stale cache
   values from leaking between tests and causing flaky results on
   machines with real GPUs.

* Fix nvidia-smi fallback regression and physical GPU count validation

1. hardware.py: get_gpu_utilization, get_visible_gpu_utilization, and
   get_backend_visible_gpu_info now check result.get("available") before
   returning the nvidia-smi result. When nvidia-smi is unavailable or
   returns no data (e.g., containers without nvidia-smi, UUID/MIG masks),
   the functions fall through to the torch-based fallback instead of
   returning an empty result. This fixes a regression where the internal
   exception handling in nvidia.py prevented the caller's except block
   from triggering the fallback.

2. hardware.py: resolve_requested_gpu_ids now separates negative-ID
   validation from physical upper-bound validation. The physical count
   check is only enforced when it is plausibly a true physical count
   (i.e., higher than the largest parent-visible ID), since
   torch.cuda.device_count() under CUDA_VISIBLE_DEVICES returns the
   visible count, not the physical total. The parent-visible-set check
   remains authoritative in all cases. This prevents valid physical IDs
   like [2, 3] from being rejected as "out of range" when nvidia-smi is
   unavailable and CUDA_VISIBLE_DEVICES="2,3" makes torch report only
   2 devices.

* Fix UUID/MIG torch fallback to enumerate devices by ordinal

When CUDA_VISIBLE_DEVICES uses UUID or MIG identifiers,
get_parent_visible_gpu_ids() returns [] because the tokens are
non-numeric. The torch fallback in get_visible_gpu_utilization() and
get_backend_visible_gpu_info() previously passed that empty list to
_torch_get_per_device_info(), getting nothing back.

Now both functions detect the empty-list case and fall back to
enumerating torch-visible ordinals (0..device_count-1) with
index_kind="relative". This means the UI and auto-selection still
see real device data in Kubernetes, MIG, and Slurm-style UUID
environments where nvidia-smi output cannot be mapped to physical
indices.

Updated test_uuid_parent_visibility to verify the new torch fallback
path returns available=True with relative ordinals.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add type hint for gpu_ids parameter in InferenceOrchestrator.load_model

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-30 02:33:15 -07:00
Michael Han
fbfcbc69f2
Update README.md 2026-03-30 01:34:36 -07:00
Michael Han
d2b8ed8def
Update install.md 2026-03-30 01:33:33 -07:00
Lee Jackson
2f0a5baa87
fix(studio): preserve GGUF context max after apply and refresh (#4691)
Fixes #4670

Separates the GGUF context slider ceiling from the currently active context length so lowering context via Chat Settings no longer locks the slider max to the reduced value.

- Backend: adds `max_context_length` to GGUF load/status responses, computed from the largest VRAM/KV-fit cap across all usable GPU subsets
- Frontend: stores `ggufMaxContextLength` and uses it for Context Length slider/input bounds; hydrates from both `/api/inference/load` and `/api/inference/status`
- Defaults UI ceiling to native context for CPU-only and fallback paths
- Seeds `effective_ctx` and `max_available_ctx` before GPU probing to prevent `UnboundLocalError` on probe failure
- Property fallback uses native `_context_length`, not effective `context_length`
2026-03-30 01:33:16 -07:00
Lee Jackson
5557e1fd27
studio: unify Windows installer/setup logging style, verbosity controls, and startup messaging (#4651)
* refactor(studio): unify setup terminal output style and add verbose setup mode

* studio(windows): align setup.ps1 banner/steps with setup.sh (ANSI, verbose)

* studio(setup): revert nvcc path reordering to match main

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio(setup): restore fail-fast llama.cpp setup flow

* studio(banner): use IPv6 loopback URL when binding :: or ::1

* Fix IPv6 URL bracketing, try_quiet stderr, _step label clamp

- Bracket IPv6 display_host in external_url to produce clickable URLs
- Redirect try_quiet failure log to stderr instead of stdout
- Clamp _step label to column width to prevent negative padding

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add sandbox integration tests for PR #4494 UX fixes

Simulation harness (tests/simulate_pr4494.py) creates an isolated uv
venv, copies the real source files into it, and runs subprocess tests
for all three fixes with visual before/after demos and edge cases.

Standalone bash test (tests/test_try_quiet.sh) validates try_quiet
stderr redirect across 8 scenarios including broken-version contrast.

39 integration tests total (14 IPv6 + 15 try_quiet + 10 _step), all
existing 75 unit tests still pass.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Truncate step() labels in setup.sh to match PS1 and Python

The %-15s printf format pads short labels but does not truncate long
ones.  Change to %-15.15s so labels wider than 15 chars are clipped,
matching the PowerShell .Substring(0,15) and Python label[:15] logic.

* Remove sandbox integration tests from PR

These test files are not part of the styling fix and should not
ship with this PR.

* Show error output on failure instead of suppressing it

- install_python_stack.py: restore _red for patch_package_file
  warnings (was downgraded to _dim)
- setup.ps1: capture winget output and show on failure for CUDA,
  Node, Python, and OpenSSL installs (was piped to Out-Null)
- setup.ps1: always show git pull failure warning, not just in
  verbose mode

* Show winget error output for Git and CMake installs on failure

Same capture-and-print-on-failure pattern already used for
Node, Python, CUDA, and OpenSSL winget installs.

* fix: preserve stderr for _run_quiet error messages in setup.sh

The step() helper writes to stdout, but _run_quiet's error header
was originally sent to stderr (>&2). Without the redirect, callers
that separate stdout/stderr would miss the failure headline while
still seeing the log body on stderr. Add >&2 to both step calls
inside _run_quiet to match main's behavior.

* feat: add --verbose flag to setup and update commands

Wire UNSLOTH_VERBOSE=1 through _run_setup_script() so that
'unsloth studio update --verbose' (and the deprecated 'setup')
passes the flag to setup.sh / setup.ps1 / install_python_stack.py.

* fix(studio): honor verbose logging and keep llama.cpp failures non-blocking

* fix(studio): switch installer to 'studio update' and normalize Windows setup logs

* chore(studio): refine localhost tip and remove skip-base setup nois

* fix(studio): align Windows setup logs with Linux style and improve startup tips

* fix(studio): align Windows setup logs with Linux style

* refactor(windows-installer): align install/setup logs with Linux style and silence auto-launch output

* refactor(windows): align installer/setup output with Linux style and reduce default verbosity

* refactor(windows): match install.ps1 output style/colors to setup and quiet default logs

* fix(studio-banner): update personal-computer localhost tip

* fix(setup.sh): restore verbose llama.cpp build output while keeping default quiet mode

* fix(install.sh): align installer logging with setup style and restore POSIX-safe color output

* fix(install.sh): preserve installer reliability and launch visibility

Export verbose mode for child setup processes, harden install command handling under set -e, and keep first-run studio launch non-silent so users can always see URL and port fallback output.

* fix(windows installer): keep exit semantics and degrade status accurate

Use quiet command redirection that preserves native exit codes, keep startup output visible on first launch, and report limited install status when llama.cpp is unavailable.

* fix(setup.sh): improve log clarity and enforce GGUF degraded signaling

Restore clean default setup output, add verbose-only diagnostics, fail fast on Colab dependency install errors, and return non-zero when GGUF prerequisites or llama.cpp artifacts are unavailable.

* fix(installer): harden bash preflight and PowerShell GPU checks

Fail fast when bash is unavailable before invoking setup.sh, and replace remaining nvidia-smi pipeline checks with stream redirection patterns that preserve reliable native exit-code handling.

* fix(windows): keep verbose output visible while preserving exit codes

Ensure PowerShell wrapper helpers in install/update stream native command output to host without returning it as function output, so npm logs no longer corrupt exit-code checks in verbose mode.

* fix(windows): avoid sticky UNSLOTH_VERBOSE and gate studio update verbosity

* Fix degraded llama.cpp exit code, PS verbose stderr, banner URLs, npm verbose

- setup.sh: Do not exit non-zero when llama.cpp is unavailable; the footer
  already reports the limitation, and install.sh runs under set -e so a
  non-zero exit aborts the entire install including PATH/shortcuts/launch.
- setup.ps1: Remove $? check in Invoke-SetupCommand verbose path; PS 5.1
  sets $? = $false when native commands write to stderr even with exit 0.
  Merge stderr into stdout with 2>&1 and rely solely on $LASTEXITCODE.
- startup_banner.py: Show the actual bound address when Studio is bound to
  a non-loopback interface instead of always showing 127.0.0.1/localhost.
- setup.sh: Use run_quiet_no_exit instead of run_quiet_no_exit_always for
  npm install steps so --verbose correctly surfaces npm output.

* Fix install.ps1 verbose stderr, propagate UNSLOTH_VERBOSE, fix git clone verbose

- install.ps1: Apply same Invoke-InstallCommand fix as setup.ps1 -- merge
  stderr into stdout with 2>&1 and drop the $? check that misclassifies
  successful native commands on PS 5.1.
- install.ps1 + setup.ps1: Export UNSLOTH_VERBOSE=1 to the process env
  when --verbose is passed so child processes like install_python_stack.py
  also run in verbose mode.
- setup.sh: Use run_quiet_no_exit for git clone llama.cpp so --verbose
  correctly surfaces clone diagnostics during source-build fallback.

* Surface prebuilt llama.cpp output in verbose mode, remove dead code, fix banner

- setup.sh: Use tee in verbose mode for prebuilt llama.cpp installer so
  users can see download/validation progress while still capturing the log
  for structured error reporting on failure.
- setup.ps1: Same fix for Windows -- use Tee-Object in verbose mode.
- setup.sh: Remove run_quiet_no_exit_always() which has no remaining callers.
- startup_banner.py: Avoid printing the same URL twice when Studio is
  bound to a specific non-loopback address that matches the display host.

* Fix run_install_cmd exit code after failed if-statement

The previous pattern 'if "$@"; then return 0; fi; _rc=$?' always captured
$? = 0 because $? reflects the if-statement result, not the command's exit
code. Switch to '"$@" && return 0; _rc=$?' which preserves the actual
command exit code on failure. Applies to both verbose and quiet branches.

* Fix _run_quiet exit code, double uv install, missing --local flag

- setup.sh: Fix _run_quiet verbose path that always captured exit code 0
  due to $? resetting after if-then-fi with no else. Switch to the same
  '"$@" && return 0; exit_code=$?' pattern used in install.sh.
- setup.sh: Consolidate the two uv install branches (verbose + quiet)
  into a single attempt with conditional output. Previously, when verbose
  mode was on and the install failed, a second silent attempt was made.
- install.ps1: Pass --local flag to 'unsloth studio update' when
  $StudioLocalInstall is true. Without this, studio.py's update() command
  overwrites STUDIO_LOCAL_INSTALL to "0", which could cause issues if
  setup.ps1 or install_python_stack.py later checks that variable.

* Revert SKIP_STUDIO_BASE change for --no-torch, restore install banners

- Revert SKIP_STUDIO_BASE from 0 to 1 for --no-torch. install.sh already
  installs unsloth+unsloth-zoo and no-torch-runtime.txt before calling
  setup.sh, so letting install_python_stack.py redo it was redundant and
  slowed down --no-torch installs for no benefit.
- Restore the "Unsloth Studio installed!" success banner and "starting
  Unsloth Studio..." launch message so users get clear install completion
  feedback before the server starts.

* Make llama.cpp build failure a hard error with proper cleanup

- setup.sh: Restore exit 1 when _LLAMA_CPP_DEGRADED is true. GGUF
  inference requires a working llama.cpp build, so this should be a
  hard failure, not a silent degradation.
- install.sh: Catch setup.sh's non-zero exit with '|| _SETUP_EXIT=$?'
  instead of letting set -e abort immediately. This ensures PATH setup,
  symlinks, and shortcuts still get created so the user can fix the
  build deps and retry with 'unsloth studio update'. After post-install
  steps, propagate the failure with a clear error message.

* Revert install.ps1 to 'studio setup' to preserve SKIP_STUDIO_BASE

'studio update' pops SKIP_STUDIO_BASE from the environment, which
defeats the fast-path version check added in PR #4667. When called
from install.ps1 (which already installed packages), SKIP_STUDIO_BASE=1
must survive into setup.ps1 so it skips the redundant PyPI check and
package reinstallation. 'studio setup' does not modify env vars.

* Remove deprecation message from 'studio setup' command

install.ps1 uses 'studio setup' (not 'studio update') to preserve
SKIP_STUDIO_BASE. The deprecation message was confusing during first
install since the user never typed the command.

* Fix stale env vars, scope degraded exit, generic error message for PR #4651

- install.ps1: Always set STUDIO_LOCAL_INSTALL and clear STUDIO_LOCAL_REPO
  when not using --local, to prevent stale values from a previous --local
  run in the same PowerShell session. Fix log messages to say 'setup' not
  'update' since we call 'studio setup'.
- setup.sh: Only exit non-zero for degraded llama.cpp when called from the
  installer (SKIP_STUDIO_BASE=1). Direct 'unsloth studio update' keeps
  degraded installs successful since Studio is still usable for non-GGUF
  workflows and the footer already reports the limitation.
- install.sh: Make the setup failure error message generic instead of
  GGUF-specific, so unrelated failures (npm, Python deps) do not show
  misleading cmake/git recovery advice.

* Show captured output on failure in quiet mode for PR #4651

Both Invoke-InstallCommand (install.ps1) and Invoke-SetupCommand
(setup.ps1) now capture command output in quiet mode and display it
in red when the command fails. This matches the behavior of
run_install_cmd in install.sh where failure output is surfaced even
in quiet mode, making cross-platform error debugging consistent.

* Match degraded llama.cpp exit on Windows, fix --local recovery hint for PR #4651

- setup.ps1: Exit non-zero for degraded llama.cpp when called from
  install.ps1 (SKIP_STUDIO_BASE=1), matching setup.sh behavior. Direct
  'unsloth studio update' keeps degraded installs successful.
- install.sh: Show 'unsloth studio update --local' in the recovery
  message when the install was run with --local, so users retry with
  the correct flag instead of losing local checkout context.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-30 00:53:23 -07:00
Roland Tannous
5bbfabb151
fix: [Studio] setup.ps1 update-flow for windows (#4667)
* fix: add PyPI version check to setup.ps1 for fast update path

Port the update-flow logic from setup.sh to setup.ps1 so that
`unsloth studio update` on Windows skips Python dependency reinstall
when the installed version already matches PyPI latest.

* fix: clear SKIP_STUDIO_BASE in update command

install.ps1 sets SKIP_STUDIO_BASE=1 which persists in the PowerShell
session. If the user runs `unsloth studio update` in the same terminal,
the env var causes the version check to be skipped. Clear it explicitly
in the update command.

* fix: harden version check and clear stale env vars in update flow

- Normalize $InstalledVer with Out-String + Trim() to avoid array/whitespace
  comparison issues in PowerShell 5.1 (python output can be captured as
  string[] instead of scalar string)
- Move Fast-Install --upgrade pip inside if (-not $SkipPythonDeps) so the
  fast path avoids unnecessary network round-trips
- Clear STUDIO_LOCAL_REPO when --local is not passed to prevent a previous
  --local session from leaking into a plain update

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-29 21:14:36 -07:00
Roland Tannous
a6c1f893fc
Fix blank page on Windows due to broken .js MIME type (#4674)
* Fix blank page on Windows due to broken .js MIME type in registry

* Update studio/backend/main.py

adding defensive suggestion by gemini where we make the mimetypes specific to windows platforms

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-28 22:26:49 +04:00
Lee Jackson
5d2dca801c
studio: add HF/local model selection UI for GGUF export (#4365)
* feat(studio): add HF/local model selection UI for GGUF export

* fix(studio):fix selector ring clipping

* fix(studio): export page trust_remote_code control and label styling

* fix(studio): accept hf_token in load_checkpoint orchestrator method

The route was passing hf_token to load_checkpoint() but the method
didn't accept it, causing a TypeError on every /api/export/load-checkpoint
request.

* fix(studio): clear HF model selection when input is edited

Previously selectedSourceModel was only cleared when the input became
empty, so editing to a different repo ID after selecting a model would
silently keep the old selection.

---------

Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
2026-03-28 22:18:25 +04:00
Daniel Han
362ad3606b Update _utils.py 2026-03-27 08:42:00 -07:00
Daniel Han
82d14b44d3
fix: preserve Windows drive-letter paths on native Windows (#4665)
normalize_path() unconditionally converted Windows paths like
C:\Users\... to WSL format /mnt/c/Users/..., which breaks path
resolution on native Windows. This caused LM Studio GGUF models
to fail detection (detect_gguf_model returned None for the invalid
path), falling through to the Unsloth import path which requires
a GPU.

Now only performs the /mnt/ mapping when actually running under WSL.
On native Windows, drive letters are preserved and backslashes are
normalized to forward slashes.
2026-03-27 08:19:41 -07:00
Daniel Han
9477e7c43f
Bump minimum unsloth version to 2026.3.16 in install scripts (#4663)
Update install.sh and install.ps1 to require unsloth>=2026.3.16,
matching the latest PyPI release.
2026-03-27 07:47:08 -07:00
Daniel Han
df3b18c579 Update _utils.py 2026-03-27 07:24:39 -07:00
Daniel Han
844a816ed0 Update pyproject.toml 2026-03-27 07:14:03 -07:00
Roland Tannous
562e54fc6e
Fix HF cache default and show LM Studio models in chat/inference (#4653)
* fix: default HF cache to standard platform path instead of legacy Unsloth cache

* feat: show LM Studio and local models in chat Fine-tuned tab

* feat: show LM Studio models in Hub models tab

* fix: fetch local models after auth refresh completes

* Revert "fix: fetch local models after auth refresh completes"

This reverts commit cfd61f0ac7.

* fix: increase llama-server health check timeout to 600s for large models

* feat: expandable GGUF variant picker for LM Studio local models

* fix: show GGUF variant label for locally loaded LM Studio models

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: show publisher name in LM Studio model labels

* fix: set model_id for loose GGUF files in LM Studio publisher dirs

* fix: show publisher prefix in Fine-tuned tab LM Studio models

* fix: only use model_id for lmstudio source models

* fix: only show LM Studio models in Hub tab on Mac/chat-only mode

* fix: respect XDG_CACHE_HOME, handle Windows paths in isLocalPath, refresh LM Studio on remount

- _setup_cache_env now reads XDG_CACHE_HOME (falls back to ~/.cache)
  instead of hard-coding ~/.cache/huggingface. This follows the standard
  HF cache resolution chain and respects distro/container overrides.

- isLocalPath in GgufVariantExpander uses a regex that covers Windows
  drive letters (C:\, D:/), UNC paths (\\server\share), relative paths
  (./, ../), and tilde (~/) -- not just startsWith("/").

- HubModelPicker.useEffect now calls listLocalModels() before the
  alreadyCached early-return gate so LM Studio models are always
  refreshed on remount. Also seeds useState from _lmStudioCache for
  instant display on re-open.

* fix: add comment explaining isLocalPath regex for Windows/cross-platform paths

* fix: prioritize unsloth publisher in LM Studio model list

* fix: scope unsloth-first sort to LM Studio models on all platforms

* fix: add missing _lmStudioCache module-level declaration

* fix: prioritize unsloth publisher before timestamp sort in LM Studio group

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-27 06:59:27 -07:00
Wasim Yousef Said
73969a1e4f
fix: disable OCR in pymupdf4llm PDF extraction (#4659) 2026-03-27 06:53:33 -07:00
Daniel Han
c4e34c88c8
Fall back to parsing model name when HF API has no param count (#4656)
Some models like unsloth/Qwen3-0.6B have no safetensors metadata
on Hugging Face, so the training model selector showed no parameter
size badge. The chat model picker already had extractParamLabel()
as a fallback that parses sizes like "0.6B" from the model name.

Add the same fallback to the training model selector and the
onboarding model selection step.

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-03-27 05:57:49 -07:00
Wasim Yousef Said
4ab7fb1f7b
fix: replace navbar shutdown text button with icon-only button (#4655) 2026-03-27 05:44:59 -07:00
Daniel Han
e36f72c685
Detect always-on reasoning models and show Think button as locked-on (#4654)
* Detect always-on reasoning models and show Think button as locked-on

Models with hardcoded <think>/<think> tags or reasoning_content in
their chat template (e.g. distilled reasoning models) always produce
thinking output regardless of any toggle. Previously these models
were not detected as reasoning-capable at all, so the Think button
was grayed out even though the model was actively reasoning.

Backend:
- Detect <think>/<think> and reasoning_content in GGUF chat templates
  as a fallback when enable_thinking is not present
- Add reasoning_always_on flag to LoadResponse and InferenceStatusResponse
- Pass the flag through all GGUF load and status response paths

Frontend:
- Add reasoningAlwaysOn to the chat runtime store and API types
- When reasoning_always_on is true, show the Think button as lit
  (active) but not clickable, with a tooltip explaining the model
  always uses thinking
- Force reasoningEnabled=true when the model always reasons

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use pointer-events-none instead of disabled for always-on Think button

The HTML disabled attribute was not fully blocking clicks on the Think
button for always-on reasoning models. Switch to pointer-events-none
CSS class which prevents all mouse interaction at the CSS level.

* Use a static span instead of disabled button for always-on Think

Replace the button element with a plain span when reasoning is
always on. This makes it physically impossible to toggle since
there is no clickable element at all, avoiding any CSS or
disabled-attribute edge cases.

* Simplify always-on Think button to stay lit and remain toggleable

Keep the Think button as a normal toggleable button but ensure it
shows as lit when reasoning_always_on is true. The model always
reasons regardless of the toggle state so there is no need to
block interaction.

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-27 05:42:26 -07:00
Daniel Han
eacaf6827c
fix: no-torch install deps without pulling torch transitively (#4650)
Use --no-deps for ALL packages (unsloth, unsloth-zoo, and runtime deps)
since the current PyPI metadata for unsloth still declares torch as a
hard dependency. Runtime deps (typer, pydantic, safetensors,
transformers, etc.) are installed from no-torch-runtime.txt with
--no-deps to prevent transitive torch resolution from accelerate, peft,
trl, and sentence-transformers.

no-torch-runtime.txt now includes unsloth's own direct deps (typer,
pydantic, pyyaml, nest-asyncio) since --no-deps skips those too.

install.sh installs no-torch-runtime.txt directly (via helper function
_find_no_torch_runtime). install.ps1 does the same via
Find-NoTorchRuntimeFile. SKIP_STUDIO_BASE stays at 1 to avoid setup.sh
fast-path issues.

install_python_stack.py NO_TORCH branch does the same for unsloth
studio update, using package_name instead of hardcoded "unsloth".
2026-03-27 05:19:26 -07:00
Daniel Han
a7c43bc46d
Fix inference failing for transformers 5.x models (trust_remote_code) (#4652)
* Fix inference failing for transformers 5.x models (trust_remote_code)

The training worker in core/training/worker.py auto-enables
trust_remote_code for unsloth/* models that need transformers 5.x
(e.g. NVIDIA-Nemotron-3-Nano-4B). The inference worker did not have
the same logic, so loading these models for chat would fail with
"No config file found" while training worked fine.

Add the same auto-detection to the inference worker so
trust_remote_code is set automatically when needed.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-27 04:51:30 -07:00
Wasim Yousef Said
887b8cb1c2
fix: add auth + UX improvements to shutdown button (#4642)
* Studio shutdown button

* fix: add auth to shutdown endpoint and improve UX

- Add JWT auth (Depends(get_current_subject)) to POST /api/shutdown
- Use authFetch instead of bare fetch in shutdown dialog
- Only show beforeunload prompt when training is running
- Remove Ctrl+W/Cmd+W interception (browsers don't allow it)
- Store shutdown task on app.state to prevent GC

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-27 04:36:08 -07:00
Daniel Han
1fb9fe3304
Fix orphan server cleanup killing user's own llama-server (#4622)
* fix: only kill studio-managed llama-server processes, not user's own servers

_kill_orphaned_servers() checked for "unsloth" anywhere in the process
cmdline, which matched the user's own llama-server when serving models
from unsloth/ HF repos (the model path in -m contains "unsloth"). This
caused the user's server to get SIGKILLed on Studio startup, destroying
their prompt cache and forcing full model re-loads.

Narrow the check to only match processes whose binary path lives under
~/.unsloth/llama.cpp/ (the Studio install directory).

* Address review: cover env var paths, move Path.home() inside try block

- Also check LLAMA_SERVER_PATH and UNSLOTH_LLAMA_CPP_PATH so orphans
  from custom install locations are still cleaned up.
- Move studio_dirs construction inside the try/except so a Path.home()
  failure (containers without HOME) does not crash the constructor.

* Address reviewer feedback: proper path ancestry, /proc/pid/exe, legacy paths

Changes based on 10-reviewer consensus:

- Use Path.is_relative_to() instead of substring matching to prevent
  false positives on sibling paths like ~/.unsloth/llama.cpp-backup/.
- Use /proc/<pid>/exe (symlink to real binary) instead of parsing the
  first cmdline token, which breaks on paths with spaces. Falls back
  to cmdline parsing on non-Linux or when /proc is unavailable.
- Add legacy in-tree install paths (project_root/llama.cpp/ and
  project_root/bin/) so orphans from older setup.sh are still cleaned.
- Treat LLAMA_SERVER_PATH as an exact binary match rather than widening
  it to its parent directory, which could match unrelated servers in
  shared locations like /usr/local/bin/.
- Keep everything inside the try/except so Path.home() failures in
  containers do not crash the constructor.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review: add Linux platform guard and log cleanup errors

- Guard pgrep fallback with sys.platform check so it does not crash
  on Windows/macOS when psutil is unavailable.
- Replace silent except-pass with logger.warning for observability.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-27 04:33:04 -07:00
Daniel Han
b1c3a1e857
fix: replace [huggingfacenotorch] with no-torch-runtime.txt requirements (#4649)
The [huggingfacenotorch] extras only exist in pyproject.toml but are
NOT published on PyPI, so uv pip install "unsloth[huggingfacenotorch]"
fails on fresh installs from the registry.

Fix: add studio/backend/requirements/no-torch-runtime.txt with the
runtime deps (safetensors, transformers, datasets, accelerate, etc.)
that mirror [huggingfacenotorch] from pyproject.toml. In no-torch mode:
1. install.sh/ps1 install unsloth + unsloth-zoo with --no-deps
2. SKIP_STUDIO_BASE=0 so install_python_stack.py's NO_TORCH branch runs
3. install_python_stack.py installs no-torch-runtime.txt
2026-03-27 03:58:51 -07:00
Daniel Han
9d68621614
Streaming tool detection: guard late tool_calls, filter incomplete fragments (#4648)
* Guard against late tool_calls after visible content, filter incomplete fragments

1. If visible content was already emitted (_last_emitted is non-empty)
   when delta.tool_calls arrives, ignore the tool_calls instead of
   reclassifying the turn as a tool call. llama-server never
   interleaves content and tool_calls (they are mutually exclusive),
   but this guard is defensive for other OpenAI-compatible backends.

2. Filter out incomplete structured tool_calls fragments before
   execution. Entries with empty function.name (from truncation by
   max_tokens, disconnect, or interruption) are skipped instead of
   being passed to execute_tool().

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-27 03:40:14 -07:00
Wasim Yousef Said
5c7c3883cb
feat: update app icons to rounded logo (#4640)
Replace favicon.png, unsloth-gem.png, and unsloth.ico with rounded.png.
Update install.sh to source rounded.png for Linux/macOS shortcuts.
2026-03-27 03:18:20 -07:00
Daniel Han
79d9bf0c9a
Fix GGUF GPU fit check to account for KV cache VRAM (#4623)
* fix: account for KV cache in GGUF GPU fit check and auto-cap context length

The GPU fit check only compared GGUF file size against free VRAM,
ignoring KV cache memory. Models with large native context lengths
(e.g. Qwen3.5-9B at 262k) would pass the fit check since the GGUF
is only 5.6 GB, but the KV cache at 262k context needs ~40 GB at
f16. This caused llama-server to silently fall back to CPU inference.

Changes:
- Parse block_count, head_count_kv, head_count, and embedding_length
  from GGUF metadata alongside context_length
- Add KV cache VRAM estimation based on architecture params and the
  selected cache quantization type (f16, q8_0, q4_0, etc.)
- Auto-reduce context length to the maximum that fits in available
  GPU VRAM when the native context would exceed it
- Include estimated KV cache size in the _select_gpus total so the
  fit decision reflects actual runtime memory, not just file size

For the reported scenario (Qwen3.5-9B on RTX 3090 with 22415 MiB
free), context is auto-reduced from 262144 to ~63k with f16 KV cache,
keeping the model fully on GPU. With q4_0 KV cache quantization the
context can reach ~226k.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: resolve 6 bugs in KV cache VRAM estimation and add test harness

- Fix q8_0 BPE constant: 1.125 -> 34/32 (1.0625) to match llama.cpp block size
- Fix _fit_context_to_vram returning min_ctx when weights exceed budget
  (should return requested_ctx unchanged, let --fit handle it)
- Fix binary search inflating below-2048 requests (lo=min_ctx=2048 > hi)
- Fix n_ctx=0 regressing to 4096 when metadata unavailable (preserve sentinel)
- Fix multi-GPU auto-cap using single-GPU budget instead of aggregate
- Fix _context_length being overwritten with capped effective value

Add tests/test_gguf_kv_vram.py: 43 cross-platform pytest tests covering
pure logic, integration (monkeypatched load_model), and real GGUF parsing.
Runs in an isolated uv venv with only pytest -- no GPU/torch/structlog needed.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: complete _effective_context_length lifecycle

- Initialize _effective_context_length in __init__ (prevents AttributeError)
- Reset _effective_context_length in unload_model (prevents stale values)
- Update context_length property to return effective (capped) value for
  the UI/API, falling back to native _context_length if not set

* fix: multi-GPU selection tries smallest subset first

The previous approach summed all GPUs' memory to cap context, then
selected GPUs afterward. This was overly optimistic for heterogeneous
setups (e.g., 48 GiB + 4 GiB): the context was inflated by the tiny
GPU's contribution, then both GPUs were dragged in.

Now we try GPU subsets from smallest (1 GPU) to largest, capping
context for each. We pick the smallest subset where the model+KV
fits. This prefers single-GPU when possible (simpler, no tensor
split overhead) and avoids pulling in GPUs that barely help.

Add tests: test_multi_gpu_prefers_fewer_gpus,
test_multi_gpu_heterogeneous.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: prefer fewer GPUs over higher context in GPU selection

Multi-GPU inference is slower due to tensor-split overhead, so we
should prefer fewer GPUs with reduced context over more GPUs with
full context. Now the loop stops at the first GPU subset where the
model fits, rather than continuing to find subsets that allow higher
context. Only if the model can't fit on N GPUs do we try N+1.

This preserves the original behavior: use multi-GPU only when the
model doesn't fit on a single GPU.

* fix: make _kill_orphaned_servers cross-platform via psutil

Replace pgrep + os.kill(SIGKILL) with psutil.process_iter() and
proc.kill(), which work on Linux, macOS, and Windows. Build an
allowlist of install roots matching _find_llama_server_binary so
only studio-managed servers are killed.

* fix: skip KV estimation loop when effective context is unknown

When n_ctx=0 and GGUF metadata lacks context_length, effective_ctx
stays 0. _estimate_kv_cache_bytes(0) returns 0, so a GPU could be
selected with no KV headroom. Guard the loop with effective_ctx > 0
to fall back to file-size-only GPU selection in this case.

* chore: temporarily remove test harness (will add back separately)

* refactor: deduplicate UINT32/UINT64 handling in GGUF parser

Replace duplicated if/elif chains for vtype 4 and 10 with a single
block using setattr. No behavioral change.

* fix: honor explicit n_ctx by using multi-GPU before capping

When the user explicitly sets n_ctx, try to fit the full requested
context using _select_gpus (which adds GPUs as needed). Only cap
context if it doesn't fit on any GPU combination.

When n_ctx=0 (auto/native context), keep the existing behavior:
prefer fewer GPUs with reduced context, since multi-GPU is slower
and the user didn't ask for a specific context length.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: context_length property returns native value for frontend slider

The frontend uses context_length as the slider max. Returning the
capped effective value prevented users from requesting higher context
on reload (e.g., after switching to q4_0 KV cache). Revert to
returning the native GGUF metadata value -- the backend auto-caps
at load time regardless.

* revert: context_length returns effective (capped) value

The UI slider should show what the server is actually running at,
not the theoretical maximum. Revert to returning the effective
context length.

* fix: raise minimum context floor from 2048 to 4096

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-27 03:14:42 -07:00
Daniel Han
e318da21a7
Fix ~1.2s TTFT penalty when tools are enabled in Studio (#4639)
* Fix ~1.2s TTFT penalty when tools are enabled in Studio

When users enable web search, Python execution, or terminal tools,
every message gets a ~1.2s delay before any text appears -- even when
the model does not call any tool. This happens because
generate_chat_completion_with_tools() does a non-streaming detection
pass (stream: False) first, waits for the complete response, then
checks for tool calls. For the ~90% of messages that don't trigger a
tool call, this blocking wait is entirely wasted.

Root cause: the detection pass payload uses stream: False, forcing
llama-server to generate the entire response before returning any
tokens.

Fix: replace the non-streaming detection pass with a streaming pass
(stream: True) and a speculative buffer state machine that detects
tool signals in the first 1-2 SSE chunks:

- BUFFERING: accumulate content tokens, check first chars for tool
  signal prefixes (<tool_call>, <function=)
- STREAMING: no tool detected, yield tokens to caller immediately
- DRAINING: tool signal found, silently accumulate rest of stream

Three detection paths:
1. Structured delta.tool_calls -- detected instantly, transition to
   DRAINING, accumulate fragments, assemble at stream end.
2. XML tool markup in content -- buffer holds up to 32 chars checking
   for <tool_call> or <function= prefix, then transitions to DRAINING.
3. No tool signal -- first non-whitespace, non-XML char triggers
   immediate transition to STREAMING (fast path, ~90% of requests).

Safety net: after any stream ends in STREAMING state, check accumulated
content for XML tool signals. Handles rare "content before tool call"
edge case.

Additional supporting changes:
- Add headers parameter to _stream_with_retry for auth forwarding
- Share _strip_tool_markup and regex patterns between the detection
  pass and the final streaming pass (removes duplication)
- Remove the iteration==0 non-streaming content shortcut (no longer
  needed since all iterations stream directly)
- Keep the final streaming pass as fallback for max_tool_iterations
  exhaustion

Benchmarked on Qwen3.5-4B Q4_K_XL:
- No tools:              TTFT ~112ms (unchanged)
- Tools enabled, no call: TTFT ~112ms (was ~1207ms)
- Decode TPS:            226 (unchanged in all cases)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add unit tests for streaming tool detection state machine

16 tests covering every tool call parsing path:
- Plain text (no tool call) streaming
- Structured delta.tool_calls detection and fragment assembly
- XML <tool_call>JSON</tool_call> detection via buffer
- XML <function=name> tag detection via buffer
- Whitespace before tool XML
- Safety net (content then tool XML)
- Parallel multi-tool calls
- Reasoning token bypass (thinking models)
- Reasoning then tool call
- Empty response handling
- Buffer prefix timeout (HTML not mistaken for tool)
- Non-XML first char instant streaming
- False positive rejection (<tool_tip> vs <tool_call>)
- Arguments split across multiple chunks
- auto_heal_tool_calls=False respects the flag
- Metrics accumulation across tool iterations

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix reasoning-only BUFFERING, pre-tool content emission, and code duplication

Addresses review feedback on the streaming tool detection:

1. Reasoning tokens are no longer yielded during BUFFERING/DRAINING
   states. The consumer in routes/inference.py tracks prev_text across
   tool iterations without resetting it, so yielding reasoning during
   a detection pass that resolves to a tool call would corrupt the
   delta computation for subsequent iterations. Reasoning is now
   silently accumulated during detection (matching the old non-streaming
   behavior) and flushed together with content when the buffer resolves
   to STREAMING.

2. Handle reasoning-only responses in the BUFFERING resolver. When a
   thinking model emits only reasoning_content with no content tokens,
   the stream ends while still in BUFFERING state. The resolver now
   detects this case and yields reasoning as plain text (without
   <think> wrapper), matching the final streaming pass behavior for
   models like Qwen3 in always-think mode.

3. Replace duplicated re.sub calls for stripping tool markup with
   the existing _strip_tool_markup(content_text, final=True) helper,
   removing ~40 lines of redundant regex code.

4. Update tests: adjust reasoning test expectations to match the new
   behavior (reasoning batched with content, not streamed individually
   during BUFFERING). Add test_reasoning_only_no_content for the
   reasoning-only edge case. 17/17 tests pass.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address remaining reviewer findings: late tool_call IDs and XML speculation

1. Late-arriving tool_calls.id: when a provider sends the real ID on a
   later delta chunk (after the initial one with index and function
   name), the accumulator now updates the ID instead of keeping the
   synthetic "call_{idx}" placeholder. (P2, 2/10 reviewers)

2. XML speculation respects auto_heal_tool_calls: when auto_heal is
   explicitly disabled, _TOOL_XML_SIGNALS is empty so the BUFFERING
   state never speculatively holds content for XML prefix detection.
   Content starting with literal "<tool_call>" or "<function=" text
   flows straight through without delay. (P2, 1/10 reviewers)

Skipped: finish_reason="tool_calls" without delta.tool_calls fallback
(P1, 1/10 reviewers). llama-server always sends delta.tool_calls
fragments in streaming mode. A non-streaming fallback for this edge
case would add complexity for a scenario that does not occur in
practice with the supported backend.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Check request.is_disconnected() every 20 tokens instead of every token

The disconnect check is an async round-trip that adds overhead on every
loop iteration. Since the cancel watcher in llama_cpp.py already
handles connection teardown (closes the streaming response on cancel),
this route-layer check is a secondary safety net that does not need to
run on every single token.

Check every 20 tokens across all 4 streaming paths:
- gguf_tool_stream (tool-enabled GGUF)
- gguf_stream_chunks (standard GGUF)
- audio_input_generate (audio/whisper input)
- generic backend stream (non-GGUF fallback)

* Fix safety net, DRAINING metadata, and test import path

1. Safety net no longer retroactively executes tools after visible
   content was already emitted to the user. Once _last_emitted is
   non-empty, the stream is committed to normal content mode.
   Retroactive tool execution after visible output would violate the
   streaming contract and corrupt the route-layer cumulative delta
   tracker (prev_text). The tool XML is still stripped by
   _strip_tool_markup so the user sees clean content.

2. DRAINING false-positive path now merges accumulated metrics from
   prior tool iterations instead of dropping them. Uses the same
   merge formula as the STREAMING path.

3. Test import path fixed to use repo root instead of hardcoded
   sibling directory. Works in clean checkouts and CI.

4. Renamed test_content_then_tool_xml_safety_net to
   test_content_then_tool_xml_no_retroactive_execution to reflect
   the corrected behavior.

17/17 tests pass.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Redact --api-key value from llama-server startup log

When UNSLOTH_DIRECT_STREAM=1, the generated bearer token was logged
verbatim in the startup command. Replace the secret with <redacted>
before logging.

* Remove test file temporarily

* Revert disconnect throttle, reset prev_text on tool_start, restore XML safety net

Addresses all P1 findings from reviewer round 3 (10 reviewers):

1. Revert disconnect check to every iteration (was every 20th).
   All 10 reviewers flagged this as a correctness regression for
   short streams and sparse tool event loops. The cancel watcher in
   llama_cpp.py is the primary mechanism but the route-layer check
   must remain per-iteration for completeness. [10/10]

2. Reset prev_text on tool_start in gguf_tool_stream. When a tool
   cycle begins after visible content was already streamed, the
   route-layer cumulative delta tracker (prev_text) must be reset
   so the post-tool synthesis response is not truncated or dropped.
   [9/10]

3. Remove the _last_emitted gate from the XML safety net. The gate
   was added to prevent retroactive tool execution after visible
   content, but with prev_text now reset on tool_start (#2), the
   root cause is fixed and the safety net can correctly handle
   content-then-tool-XML responses (matching pre-PR behavior).
   [8/10]

* Use None instead of {} for empty auth headers in TTS methods

* Include accumulated metrics in STREAMING metadata check

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-27 03:13:38 -07:00
Lee Jackson
0233fe7f9c
studio: setup log styling (#4494)
* refactor(studio): unify setup terminal output style and add verbose setup mode

* studio(windows): align setup.ps1 banner/steps with setup.sh (ANSI, verbose)

* studio(setup): revert nvcc path reordering to match main

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio(setup): restore fail-fast llama.cpp setup flow

* studio(banner): use IPv6 loopback URL when binding :: or ::1

* Fix IPv6 URL bracketing, try_quiet stderr, _step label clamp

- Bracket IPv6 display_host in external_url to produce clickable URLs
- Redirect try_quiet failure log to stderr instead of stdout
- Clamp _step label to column width to prevent negative padding

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add sandbox integration tests for PR #4494 UX fixes

Simulation harness (tests/simulate_pr4494.py) creates an isolated uv
venv, copies the real source files into it, and runs subprocess tests
for all three fixes with visual before/after demos and edge cases.

Standalone bash test (tests/test_try_quiet.sh) validates try_quiet
stderr redirect across 8 scenarios including broken-version contrast.

39 integration tests total (14 IPv6 + 15 try_quiet + 10 _step), all
existing 75 unit tests still pass.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Truncate step() labels in setup.sh to match PS1 and Python

The %-15s printf format pads short labels but does not truncate long
ones.  Change to %-15.15s so labels wider than 15 chars are clipped,
matching the PowerShell .Substring(0,15) and Python label[:15] logic.

* Remove sandbox integration tests from PR

These test files are not part of the styling fix and should not
ship with this PR.

* Show error output on failure instead of suppressing it

- install_python_stack.py: restore _red for patch_package_file
  warnings (was downgraded to _dim)
- setup.ps1: capture winget output and show on failure for CUDA,
  Node, Python, and OpenSSL installs (was piped to Out-Null)
- setup.ps1: always show git pull failure warning, not just in
  verbose mode

* Show winget error output for Git and CMake installs on failure

Same capture-and-print-on-failure pattern already used for
Node, Python, CUDA, and OpenSSL winget installs.

* fix: preserve stderr for _run_quiet error messages in setup.sh

The step() helper writes to stdout, but _run_quiet's error header
was originally sent to stderr (>&2). Without the redirect, callers
that separate stdout/stderr would miss the failure headline while
still seeing the log body on stderr. Add >&2 to both step calls
inside _run_quiet to match main's behavior.

* feat: add --verbose flag to setup and update commands

Wire UNSLOTH_VERBOSE=1 through _run_setup_script() so that
'unsloth studio update --verbose' (and the deprecated 'setup')
passes the flag to setup.sh / setup.ps1 / install_python_stack.py.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-27 03:12:48 -07:00
Daniel Han
3a5e3bbd6d
Make Studio shortcuts launch in a visible terminal (#4638)
* Make Studio shortcuts launch in a visible terminal

Studio shortcuts (Desktop/Start Menu) previously launched the server as a
hidden background process. Closing the browser tab did not stop the server,
leaving users with no obvious way to shut it down. This change makes shortcuts
open a visible terminal window so users can see server output and close the
terminal to stop Studio.

Launcher changes (install.sh):
- Add TTY detection in the launcher's main section. When a TTY is present
  (foreground mode), the launcher spawns a background browser-opener and then
  exec's the studio process directly. This means closing the terminal sends
  SIGHUP to studio, stopping it cleanly. When no TTY is present (background
  mode, e.g. macOS .app or headless), the existing _spawn_terminal behavior
  is preserved.
- Add _open_browser_when_ready helper that polls health on the specific
  launch port and opens the browser once ready.
- Add WSL fallback in _open_browser: uses powershell.exe Start-Process or
  cmd.exe /c start instead of unreliable xdg-open under WSL.

Linux .desktop shortcut:
- Change Terminal=false to Terminal=true so the desktop environment opens
  the user's default terminal emulator for the launcher.

WSL support:
- Remove the early-return that skipped WSL entirely. WSL now gets the
  launcher script and studio.conf written.
- Add WSL shortcut creation: generates Windows Desktop and Start Menu .lnk
  files via a temp PowerShell script. Targets wt.exe (Windows Terminal) with
  automatic fallback to wsl.exe. Uses WSL_DISTRO_NAME for multi-distro setups.

Windows launcher (install.ps1):
- Add Find-FreeLaunchPort function that mirrors the Unix _find_launch_port
  logic, scanning Get-NetTCPConnection for busy ports and returning the first
  free port in the configured range.
- Replace the hardcoded $basePort with the dynamic port result, with a
  MessageBox error dialog if no free port is found.

* Fix review findings: lock race, WSL quoting, Windows port fallback

Foreground lock race (10/10 reviewers):
The foreground mode released the single-instance lock before exec,
allowing a second launcher to acquire the lock and race for the same
port during startup. Move lock release into the background subshell
so it only happens after the health check passes.

WSL shortcut quoting (10/10 reviewers):
WSL_DISTRO_NAME values with spaces (e.g. "Ubuntu Preview", "Fedora
Remix for WSL") were not quoted, causing the distro name to be split
across multiple arguments. Add double-quoting around the distro name
and launcher path in the generated shortcut arguments.

Windows port fallback (3/10 reviewers):
Find-FreeLaunchPort silently assumed no ports were listening when
Get-NetTCPConnection was unavailable, which could return 8888 even
when busy. Add a Test-PortBusy fallback that probes ports with
TcpListener when Get-NetTCPConnection fails. Also scope the
Get-NetTCPConnection query to only the port range we care about.

* Skip powershell.exe shortcut creation if wslpath fails

If wslpath -w fails (returns empty), do not attempt to pass a Linux-style
path to powershell.exe -- it would always fail. Only run powershell.exe
when we have a valid Windows path for the temp PS1 script.

* Remove dead code and fix background health poll target

- Remove unused _open_browser_when_ready function
- Background mode now polls only the specific _launch_port instead of
  scanning all ports via _find_healthy_port, matching foreground behavior
- Add launcher test harness (22 unit + 19 integration tests)

* Fix port probe scope, lock ownership, and T4 test coverage

- Test-PortBusy: bind on Any instead of Loopback to match Studio's
  0.0.0.0 bind scope (prevents false-free in fallback path)
- _release_lock: verify PID ownership before removing lock dir
  (prevents a timed-out subshell from deleting another launcher's lock)
- T4 test: fail first curl call so the test actually exercises the
  lock-contention wait path instead of short-circuiting via fast path

* Temporarily remove launcher test scripts

Tests will be re-added in a follow-up PR to keep this diff focused
on the launcher changes.
2026-03-27 03:12:26 -07:00
Daniel Han
6b5da2ea0f
Fix missing num_items_in_batch in unsloth_prediction_step (#4616)
* Fix missing num_items_in_batch in unsloth_prediction_step

unsloth_prediction_step calls compute_loss without num_items_in_batch
during evaluation. This causes _unsloth_pre_compute_loss to see
num_items_in_batch=None, which triggers a spurious warning for every
model when gradient_accumulation_steps > 1:

  "Unsloth: Not an error, but {model} does not accept num_items_in_batch.
   Using gradient accumulation will be very slightly less accurate."

The standard transformers prediction_step computes num_items_in_batch
via _get_num_items_in_batch before passing it to compute_loss. This
patch does the same in unsloth_prediction_step.

Tested on Llama-3.2-1B-Instruct and Olmo-3-7B-Instruct with
gradient_accumulation_steps=3 and eval_steps=3. Warning is gone and
eval loss is computed correctly for both.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Guard _get_num_items_in_batch for older transformers versions

_get_num_items_in_batch was added in transformers 4.46. Wrap the call
in try/except so older versions fall back to num_items_in_batch=None,
which preserves the original behavior of not passing it.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-27 03:06:59 -07:00
Michael Han
0ffac92cf4
Update Install instructions.md 2026-03-27 03:04:07 -07:00
Michael Han
19298a0b41
Update Uninstall instructions.md 2026-03-27 02:56:34 -07:00
Daniel Han
5c9a22b816
Fix Gemma3N audio training stride assertion with non-reentrant checkpointing (#4629)
* Fix Gemma3N audio training stride assertion with non-reentrant checkpointing

Gemma3N audio conformer processes variable-length audio tensors
that cause stride mismatches in AOT autograd compiled backward
when non-reentrant gradient checkpointing is used. The error
manifests as:

    AssertionError: expected size 2==2, stride 1928==1936 at dim=0

This happens because the audio conformer's conv/norm layers produce
tensors whose strides vary with audio clip duration, but AOT autograd
traces the backward graph assuming fixed strides from the first batch.

The notebook sets gradient_checkpointing_kwargs={"use_reentrant": False}
and TRL 0.27.0+ also forces this. Both override Unsloth's own
use_reentrant=True set during prepare_model_for_training.

Fix: intercept gradient_checkpointing_enable on Gemma3N models to
always force use_reentrant=True, regardless of what the notebook
or TRL passes.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-27 02:53:21 -07:00
Daniel Han
3c9f0ed149
fix: use unsloth[huggingfacenotorch] instead of --no-deps in no-torch mode (#4647)
The previous --no-deps approach skipped ALL dependencies, not just
torch. This left safetensors, transformers, datasets, accelerate, etc.
missing, causing PackageNotFoundError at runtime.

Fix: in no-torch mode, install unsloth[huggingfacenotorch] (which pulls
all runtime deps except torch), then install unsloth-zoo with --no-deps
(since zoo's published metadata still declares torch as a hard dep).
This gives a working no-torch environment with all non-torch packages.

Applied to all three installer files: install.sh, install.ps1, and
studio/install_python_stack.py.
2026-03-27 02:38:11 -07:00
Daniel Han
2ffc8d2cea
tests: add no-torch / Intel Mac test suite (#4646)
* tests: add no-torch / Intel Mac test suite

Add comprehensive test coverage for the no-torch / --no-torch installer
and Studio backend changes introduced in #4624.

Shell tests (tests/sh/test_mac_intel_compat.sh):
- version_ge edge cases (9 tests)
- Architecture detection + Python version resolution (4 tests)
- get_torch_index_url on Darwin (2 tests)
- UNSLOTH_NO_TORCH propagation via SKIP_TORCH (5 tests)
- E2E uv venv creation at Python 3.12 (3 tests)
- E2E torch skip with mock uv shim (4 tests)
- UNSLOTH_NO_TORCH env propagation (4 tests)
- --python override flag parsing + resolution (11 tests)
- --no-torch flag parsing (4 tests)
- SKIP_TORCH unification (3 tests)
- CPU hint printing (2 tests)

Python tests (tests/python/test_no_torch_filtering.py):
- _filter_requirements unit tests with synthetic + real requirements files
- NO_TORCH / IS_MACOS constant parsing
- Subprocess mock of install_python_stack() across platform configs
- install.sh --no-torch flag structural + subprocess tests

Python tests (tests/python/test_studio_import_no_torch.py):
- AST checks for data_collators.py, chat_templates.py, format_conversion.py
- Parametrized venv tests (Python 3.12 + 3.13) for no-torch exec
- Dataclass instantiation without torch
- format_conversion convert functions without torch
- Negative controls (import torch fails, torchao fails)

Python tests (tests/python/test_e2e_no_torch_sandbox.py):
- Before/after import chain tests
- Edge cases (broken torch, fake torch, lazy import)
- Hardware detection without torch
- install.sh logic tests (flag parsing, version resolution)
- install_python_stack filtering tests
- Live server startup tests (opt-in via @server marker)

* fix: address review comments on test suite

- Fix always-true assertion in test_studio_import_no_torch.py (or True)
- Make IS_MACOS test platform-aware instead of hardcoding Linux
- Restore torchvision + torchaudio in server test cleanup (not just torch)
- Include server stderr in skip message for easier debugging

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-27 02:33:45 -07:00
Daniel Han
e9ac785346
fix: install.sh Mac Intel compatibility + Studio no-torch support (#4624)
* fix: install.sh Mac Intel compatibility + Studio no-torch support (#4621)

On Intel Macs (x86_64), PyTorch has no wheels for torch >= 2.3, so the
installer crashes. Even when torch is absent, Studio crashes on startup
because two files have bare top-level torch imports.

Studio's GGUF inference (llama.cpp) does not need PyTorch. Training and
HF-inference already isolate torch to subprocesses. Only 2 files in the
server startup chain had top-level torch imports preventing startup.

Changes:
- install.sh: detect architecture, default to Python 3.12 on Intel Mac,
  skip torch install, add Python 3.13.8 guard for arm64, pass
  UNSLOTH_NO_TORCH env var to setup.sh
- data_collators.py: remove unused `import torch` (no torch.* refs)
- chat_templates.py: lazy-import IterableDataset into function bodies
- install_python_stack.py: add IS_MACOS/NO_TORCH constants, skip
  torch-dependent packages, skip overrides.txt, skip triton on macOS

No existing working flow changes. Linux/WSL and macOS arm64 behavior is
identical.

* tests: add test suite for Mac Intel compat + no-torch mode

Shell tests (test_mac_intel_compat.sh):
- version_ge edge cases (9 tests)
- Architecture detection for Darwin x86_64/arm64, Linux x86_64/aarch64
- get_torch_index_url returns cpu on simulated Darwin
- UNSLOTH_NO_TORCH propagation to both setup.sh branches

Python unit tests (test_no_torch_filtering.py):
- _filter_requirements with NO_TORCH_SKIP_PACKAGES
- NO_TORCH env var parsing (true/1/TRUE/false/0/unset)
- IS_MACOS constant check
- Overrides skip and triton macOS skip guards

Python import tests (test_studio_import_no_torch.py):
- data_collators.py loads in isolated no-torch venv
- chat_templates.py has no top-level torch imports
- Negative control confirms import torch fails without torch

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* tests: add E2E sandbox tests for Mac Intel no-torch mode

Replace static/synthetic test stubs with real sandbox tests:

- Shell: E2E uv venv creation at Python 3.12, mock uv shim to verify
  torch install is skipped when MAC_INTEL=true, dynamic env propagation
  test for UNSLOTH_NO_TORCH in both local and non-local install paths
- Python filtering: test real extras.txt and extras-no-deps.txt with
  NO_TORCH_SKIP_PACKAGES, subprocess mock of install_python_stack() for
  5 platform configs (NO_TORCH+macOS, Windows+NO_TORCH, normal Linux,
  Windows-only, macOS-only), VCS URL and env marker edge cases
- Python imports: parametrized Python 3.12+3.13 venv fixture, dataclass
  instantiation for all 3 collator classes, chat_templates.py exec with
  stubs, negative controls proving import torch and torchao install fail
  in no-torch venvs

91 total tests, all passing.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: address reviewer findings for Intel Mac no-torch mode

P1 fixes:
- Auto-infer NO_TORCH in install_python_stack.py via platform.machine()
  so `unsloth studio update` preserves GGUF-only mode without needing
  the UNSLOTH_NO_TORCH env var (6/10 reviewers)
- Add openai-whisper and transformers-cfg to NO_TORCH_SKIP_PACKAGES
  since both have unconditional torch dependencies (4/10 reviewers)
- Skip unsloth-zoo on Intel Mac --local installs (depends on torch)
  in both migrated and fresh install paths (1/10)
- Recreate stale 3.13 venvs as 3.12 on Intel Mac re-runs (1/10)
- Detect Apple Silicon under Rosetta via sysctl hw.optional.arm64
  and warn user to use native arm64 terminal (1/10)

P2 fixes:
- Wire new test files into tests/run_all.sh (4/10 reviewers)
- Add update-path tests (skip_base=False) for Intel Mac
- Add _infer_no_torch tests for platform auto-detection

P3 fixes:
- Fix macOS progress bar total (triton step skipped but was counted)
- Fix temp file leak when Windows + NO_TORCH filters stack

All tests pass: 30 shell, 66 Python (96 total).

* feat: add --python override flag to install.sh

Lets users force a specific Python version, e.g. ./install.sh --python 3.12.
Addresses M2 Mac users whose systems resolve to a problematic 3.13.x patch.
When --python is set, the Intel Mac stale-venv guard and 3.13.8 auto-downgrade
are skipped so the user's choice is respected.

* tests: add comprehensive E2E sandbox tests for no-torch mode

Add test_e2e_no_torch_sandbox.py with 7 test groups (43 tests total)
covering the full no-torch import chain, edge cases, and install logic:

- Group 1: BEFORE vs AFTER import chain comparison (proves the bug
  existed and the fix works by synthetically prepending top-level torch
  imports)
- Group 2: Dataclass instantiation without torch
- Group 3: Edge cases with broken/fake torch modules on sys.path
- Group 4: Hardware detection fallback to CPU without torch
- Group 5: install.sh flag parsing, version resolution, arch detection
- Group 6: install_python_stack.py NO_TORCH filtering
- Group 7: Live server startup without torch (marked @server, skipped
  when studio venv is unavailable)

All 43 tests pass on both Python 3.12 and 3.13 isolated venvs.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* feat: add --no-torch flag to install.sh/ps1, fix lazy import bug in dataset formatting

- Fix chat_templates.py: narrow torch IterableDataset import into inner
  try/except ImportError so dataset.map() works without torch installed
- Fix format_conversion.py: same lazy import fix for convert_chatml_to_alpaca
  and convert_alpaca_to_chatml
- Add --no-torch flag to install.sh with unified SKIP_TORCH variable
  (driven by --no-torch flag OR MAC_INTEL auto-detection)
- Add --no-torch flag to install.ps1 with $SkipTorch variable
- Print CPU hint when no GPU detected and --no-torch not set
- Replace MAC_INTEL guards with SKIP_TORCH in torch install sections
- Update shell tests (40 pass) and Python tests (90 pass)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: address reviewer findings for --no-torch installer paths

- Fix migrated-env branch in install.sh and install.ps1: check
  SKIP_TORCH first, then branch on STUDIO_LOCAL_INSTALL. Previously
  SKIP_TORCH+non-local fell into else and installed unsloth-zoo (which
  depends on torch), defeating --no-torch mode.
- Fix $env:UNSLOTH_NO_TORCH leak in install.ps1: always set to "true"
  or "false" instead of only setting on the true branch. Prevents stale
  no-torch state from leaking across runs in the same PS session.
- Fix install_python_stack.py update path: add NO_TORCH guard around
  base.txt install so unsloth studio update does not reinstall
  unsloth-zoo (which depends on torch) in no-torch mode.

* fix: install unsloth + unsloth-zoo with --no-deps in no-torch mode

Instead of skipping unsloth-zoo entirely (which breaks unsloth's
dependency on it), install both packages with --no-deps so they are
present but torch is not pulled in transitively. Applied consistently
across all no-torch paths: migrated-env, fresh-local, fresh-non-local
in install.sh, install.ps1, and install_python_stack.py.

* chore: temporarily remove test files (will be added in a follow-up)

* refactor: deduplicate SKIP_TORCH conditional branches in installers

Collapse if/else blocks that differ only by --no-deps into a single
branch with a conditional flag variable. Applied to migrated-env and
fresh-local paths in install.sh, install.ps1, and install_python_stack.py.

* fix: apply --no-deps to fresh non-local --no-torch install path

The non-local else branch was missing $_no_deps_arg/$noDepsArg, so
uv pip install unsloth would resolve torch from PyPI metadata (the
published unsloth package still declares torch as a hard dep). Now
--no-deps is applied consistently to all SKIP_TORCH code paths.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-27 02:09:21 -07:00
Daniel Han
d57a4d993d studio: fix chat CPU spike (#4632)
Inline querier identity changed every render, forcing useLiveQuery to
resubscribe continuously causing CPU spikes. Store querier in a ref and
only re-subscribe when explicit deps change.
2026-03-27 06:20:26 +00:00
Daniel Han
e62085a3d6
Fix repetition_penalty default causing 24% TPS drop in GGUF inference (#4634)
The ChatCompletionRequest Pydantic model defaulted repetition_penalty
to 1.1 when clients omitted the field. This silently forced
llama-server to perform per-token repetition scanning, dropping
streaming throughput from ~225 TPS to ~172 TPS (a 24% penalty).

The Studio frontend always sends repetition_penalty=1.0 explicitly,
so UI users were unaffected. But any API client hitting
/v1/chat/completions without setting the field (curl, third-party
integrations, Open WebUI, etc.) would get the slow path.

Benchmarked on Qwen3.5-4B Q4_K_XL, GPU 0:
- repeat_penalty=1.0: 225.2 TPS
- repeat_penalty=1.1: 172.7 TPS (24% slower)
- LM Studio (which applies rp internally): 170.8 TPS

This aligns the Pydantic default with the frontend default (1.0),
generate_chat_completion's function signature default (1.0), and
llama-server's own default (1.0).
2026-03-26 20:20:53 -07:00
Roland Tannous
e79a178200
Allow install_python_stack to run on Colab (#4633)
* Allow install_python_stack to run on Colab

The _COLAB_NO_VENV flag was setting _SKIP_PYTHON_DEPS=true, which
skipped both the PyPI version check (needs $VENV_DIR/bin/python) and
install_python_stack (uses sys.executable, works without a venv).

Introduce a separate _SKIP_VERSION_CHECK flag for the version check,
so install_python_stack still runs on Colab. The _SKIP_PYTHON_DEPS
flag remains available for the "versions match" fast path.

* Remove colab.py workarounds that broke transformers/hf-hub compatibility

PR #4601 added _pip_install_backend_deps(), _bootstrap_studio_venv(),
and _is_colab() to colab.py as workarounds for install_python_stack
being skipped on Colab. These workarounds:
- Stripped version constraints from studio.txt and installed into system Python
- Upgraded huggingface-hub to >=1.0, breaking Colab's pre-installed
  transformers which requires huggingface-hub<1.0

With install_python_stack now running on Colab (previous commit), these
workarounds are unnecessary — all deps are properly installed by setup.sh.
Restore colab.py to its original PR #4237 structure: just get_colab_url(),
show_link(), and start().

* Remove --local flag from setup.sh in Colab notebook

The --local flag is not needed for the standard Colab flow since
install_python_stack now runs on Colab and installs deps from PyPI.
2026-03-27 00:29:27 +04:00
Wasim Yousef Said
71781272dd
fix: add python-json-logger dependency to data-designer-deps (#4627) 2026-03-26 09:50:51 -07:00
Radouane Elhajali
a6fe743ebe
studio: humanize ETA display for long training runs (#4608)
* studio: humanize ETA display for long training runs

When training takes hours or days, the ETA displayed raw minutes
(e.g. '560m 50s'). This changes the format to:
- Under 1 hour: Xm Ys (unchanged)
- 1-24 hours: Xh Ym Zs
- Over 24 hours: Xd Xh Xm

* Fix formatDuration edge cases and consolidate duplicate for PR #4608

- Guard NaN/Infinity inputs with Number.isFinite() (matches formatNumber in same file)
- Add sub-minute branch so 30s displays as "30s" instead of "0m 30s"
- Accept undefined in type signature to match formatNumber pattern
- Remove duplicate formatDuration from history-card-grid.tsx and import the shared one

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-26 06:55:54 -07:00
Michael Han
937da02f6c
Update Unsloth_Studio_Colab.ipynb 2026-03-26 05:45:30 -07:00
Etherll
b3a3435ac3
fix: Windows installer fails on _yaml.pyd Access Denied (os error 5) (#4617)
* fix: avoid _yaml.pyd lock on Windows during dependency overrides

* fix: move pytorch_tokenizers and kernels to no-deps install to avoid Windows _yaml.pyd loc
2026-03-26 05:15:19 -07:00
Lee Jackson
352455610b
studio: align Dataset/Parameters/Training cards, fix expandable height, animate LoRA settings (#4614)
* fix(studio): align config cards, dynamic height for expanders, LoRA collapsible

* Fix clipping regressions in training, dataset, and params section cards

- training-section: Add hasMessage conditional so the card expands
  (min-h) when startError, vision/audio incompatibility, or config
  validation messages are present instead of always using fixed height
- dataset-section: Expand card when a local dataset is selected via
  upload (datasetSource === "upload" && selectedLocalDataset), not only
  when the Advanced panel is open
- params-section: Guard loraOpen behind isLora so switching to full
  fine-tune collapses the card instead of staying expanded from stale
  React useState

* Fix dataset card clipping for direct file uploads

Use uploadedFile instead of selectedLocalDataset in the card height
condition. selectedLocalDataset is derived from localDatasets.find()
which only resolves for Data Recipe entries, not direct file uploads
(.jsonl, .csv, .parquet, .arrow). The card already renders the Eval
Dataset panel based on uploadedFile (line 750), so the height gate
should match.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-26 04:05:30 -07:00
Wasim Yousef Said
07abcb46de
fix: normalize search matching for recommended models and LoRA picker (#4615)
Recommended models matching the query were filtered from HF results but the Recommended section was hidden during search, causing them to vanish entirely.

- Show filtered recommended models during search by introducing `filteredRecommendedIds`
- Switch `recommendedSet` to use filtered IDs when searching so dedup against HF results is correct
- Hide empty "Hugging Face" label when recommended matches cover the query
- Add `normalizeForSearch` helper to strip separators (spaces, hyphens, underscores, dots) so queries like "llama 3" match "Llama-3.2-1B" and "qwen 2.5" matches "Qwen2.5-7B" in both the recommended model filter and the LoRA adapter filter
2026-03-26 03:40:11 -07:00
Roland Tannous
6b3eb504b2
Fix Colab setup skipping llama.cpp installation (#4618)
* Fix Colab setup skipping llama.cpp installation

The early exit 0 in the Colab no-venv path prevented setup.sh from
ever reaching the llama.cpp install section. Remove the early exit
and instead guard only the venv-dependent Python deps section, so
execution continues through to the llama.cpp prebuilt/source install.

* Simplify _SKIP_PYTHON_DEPS initialization

* Add --local flag to setup.sh in Colab notebook
2026-03-26 13:55:46 +04:00
Abhinav
74ddef1402
fix: skip flex_attention for models with non-zero attention_dropout (#4605) 2026-03-26 01:12:23 -07:00
Michael Han
d4e9b708bb
Update Install instructions.md 2026-03-25 19:55:30 -07:00
Michael Han
d3049db427
Update install instructions.md 2026-03-25 19:04:10 -07:00
Roland Tannous
88a6dfc5cd Revert "Update README.md"
This reverts commit c30e1d2029.
2026-03-25 19:54:12 +00:00
Roland Tannous
c30e1d2029
Update README.md
remove newline from windows command
2026-03-25 23:26:37 +04:00
Daniel Han
9fa67809e6 Update README.md 2026-03-25 09:43:55 -07:00
Roland Tannous
c23c3a17e9
Update README.md (#4604)
Update install instructions for studio
2026-03-25 09:42:32 -07:00
Daniel Han
55db24fc31 Update _utils.py 2026-03-25 09:40:17 -07:00
Daniel Han
baabfa0a6e
Fix Colab huggingface-hub conflict, ensurepip fallback, bump to 2026.3.14 (#4603)
* Fix Colab huggingface-hub conflict, ensurepip fallback, bump to 2026.3.14

- colab.py / setup.sh: relax == pins to >= when installing studio.txt
  on Colab so huggingface-hub does not clobber Colab's bundled version
  (breaks transformers is_offline_mode import)
- install_python_stack.py: when uv is unavailable and pip is missing
  (uv-created venvs), bootstrap via ensurepip before attempting upgrade
- Bump version to 2026.3.14
- Bump installer min version pins to 2026.3.14

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-25 09:38:02 -07:00
Daniel Han
9cb698c774 Update _utils.py 2026-03-25 09:04:23 -07:00
Daniel Han
23eb7fc0a7
Fix Colab Studio launch and setup.ps1 box alignment (#4601)
* Fix Colab Studio launch and setup.ps1 box alignment

- colab.py: when the Studio venv is missing on Colab, pip-install
  backend dependencies (structlog, fastapi, etc.) from studio.txt
  into the current Python instead of failing with ModuleNotFoundError
- setup.sh: on Colab without a venv, install backend deps into system
  Python and skip venv-dependent sections (Python stack update,
  llama.cpp build) that would otherwise fail
- setup.ps1: use PadRight(47) for the done-line so "Setup Complete!"
  and "Update Complete!" both align with the box border

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-25 09:00:08 -07:00
Daniel Han
b713a5085a
Bump installer min version to 2026.3.12 (#4600) 2026-03-25 08:40:53 -07:00
Daniel Han
55d24d7c49
feat(studio): editable context length with Apply/Reset for GGUF settings (#4592)
* feat(studio): editable context length with Apply/Reset for GGUF model settings

Previously the Context Length field was read-only and the backend
hardcoded `-c 0`, ignoring custom values entirely. KV Cache Dtype also
triggered an immediate model reload with no way to cancel.

Backend:
- llama_cpp.py: pass the actual n_ctx value to `-c` instead of always 0
- models/inference.py: relax max_seq_length to 0..1048576 (0 = model
  default) so GGUF models with large context windows are supported

Frontend:
- chat-runtime-store: add customContextLength and loadedKvCacheDtype
  state fields for dirty tracking
- chat-settings-sheet: make Context Length an editable number input,
  stop KV Cache Dtype from auto-reloading, show Apply/Reset buttons
  when either setting has been changed
- use-chat-model-runtime: send customContextLength as max_seq_length
  in the load request, reset after successful load

* fix: preserve maxSeqLength for non-GGUF models in load request

customContextLength ?? 0 sent max_seq_length=0 for non-GGUF models,
breaking the finetuning/inference path that needs the slider value.

Now uses a three-way branch:
- customContextLength set: use it (user edited GGUF context)
- GGUF without custom: 0 (model's native context)
- Non-GGUF: maxSeqLength from the sampling slider

* fix: keep max_seq_length default at 4096 for non-GGUF callers

Only relax the bounds (ge=0 for GGUF's "model default" mode,
le=1048576 for large context windows). The default stays at 4096
so API callers that omit max_seq_length still get a sane value
for non-GGUF models.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix(studio): rename trust remote code toggle and hide when no model selected

- Rename "Trust remote code" to "Enable custom code"
- Shorten subtitle to "Only enable if sure"
- Hide the toggle when no model is loaded (already hidden for GGUFs)

* fix: restore ge=128 for max_seq_length validation

Keep the minimum at 128 so the API rejects nonsensical values.
GGUF path now sends the model's native context length (from
ggufContextLength) instead of 0 when the user has not customized it.
The upper bound stays at 1048576 for large-context GGUF models.

* feat(studio): replace Context Length input with slider

Use a ParamSlider (512 to model's native context, step 512) instead
of a small number input. Shows "Max" when at the model's native
context length. Consistent with the other slider controls in the
settings panel.

* feat(studio): add editable number input alongside Context Length slider

The slider and number input stay synced -- dragging the slider updates
the number, typing a number moves the slider. The input also accepts
values beyond the slider range for power users who need custom context
lengths larger than the model default.

* fix(studio): widen context length input and use 1024 step for slider

Make the number input wider (100px) so large values like 262144 are
fully visible. Change slider step from 512 to 1024 and min from 512
to 1024.

* fix(studio): context length number input increments by 1024

* fix(studio): cap context length input at model's native max

Adds max attribute and clamps typed/incremented values so the context
length cannot exceed the GGUF model's reported context window.

* fix(studio): point "What's new" link to changelog page

Changed from /blog to /docs/new/changelog.

* fix(studio): preserve custom context length after Apply, remove stale subtitle

- After a reload with a custom context length, keep the user's value
  in the UI instead of snapping back to the model's native max.
  ggufContextLength always reports the model's native metadata value
  regardless of what -c was passed, so we need to preserve
  customContextLength when it differs from native.
- Remove "Reload to apply." from KV Cache Dtype subtitle since the
  Apply/Reset buttons now handle this.

* feat(studio): auto-enable Search and Code tools when model supports them

Previously toolsEnabled and codeToolsEnabled stayed false after loading
a model even if it reported supports_tools=true. Now both toggles are
automatically enabled when the loaded model supports tool calling,
matching the existing behavior for reasoning.

* fix(studio): auto-enable tools in autoLoadSmallestModel path

The suggestion cards trigger autoLoadSmallestModel which bypasses
selectModel entirely. It was hardcoding toolsEnabled: false and
codeToolsEnabled: false even when the model supports tool calling.
Now both are set from the load response, matching the selectModel
behavior. Also sets kvCacheDtype/loadedKvCacheDtype for dirty
tracking consistency.

* fix(studio): re-read tool flags after auto-loading model

The runtime state was captured once at the start of the chat adapter's
run(), before autoLoadSmallestModel() executes. After auto-load enables
tools in the store, the request was still built with the stale snapshot
that had toolsEnabled=false. Now re-reads the store after auto-load so
the first message includes tools.

* fix(studio): re-read entire runtime state after auto-load, not just tools

The runtime snapshot (including params.checkpoint, model id, and all
tool/reasoning flags) was captured once before auto-load. After
autoLoadSmallestModel sets the checkpoint and enables tools, the
request was still built with stale params (empty checkpoint, tools
disabled). Now re-reads the full store state after auto-load so the
first message has the correct model, tools, and reasoning flags.

* feat(studio): add Hugging Face token field in Preferences

Adds a password input under Configuration > Preferences for users to
enter their HF token. The token is persisted in localStorage and
passed to all model validate/load/download calls, replacing the
previously hardcoded null. This enables downloading gated and private
models.

* fix(studio): use model native context for GGUF auto-load, show friendly errors

The auto-load paths and selectModel for GGUF were sending
max_seq_length=4096 which now actually limits the context window
(since we fixed the backend to respect n_ctx). Changed to send 0
for GGUF, which means "use model's native context size".

Also replaced generic "An internal error occurred" messages with
user-friendly descriptions for known errors like context size
exceeded and lost connections.

LoadRequest validation changed to ge=0 to allow the GGUF "model
default" signal. The frontend slider still enforces min=128 for
non-GGUF models.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix(studio): filter out FP8 models from model search results

Hide models matching *-FP8-* or *FP8-Dynamic* from both the
recommended list and HF search results. These models are not
yet supported in the inference UI.

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-25 08:32:38 -07:00
Daniel Han
6d6008a1ef
Add PID file tracking and unsloth studio stop command (#4598)
* Add PID file tracking and `unsloth studio stop` command

On macOS the .app shortcut launches Studio via osascript into a
Terminal window, then the launcher script exits. The server process
runs outside of the launcher's context with no PID file, so there
is no straightforward way to find or stop it.

This adds:
- PID file at ~/.unsloth/studio/studio.pid, written after the
  server starts and removed on graceful shutdown or via atexit
- `unsloth studio stop` command that reads the PID file and sends
  SIGTERM (or taskkill on Windows) to shut down the server

The PID file is only removed if it still contains the current
process ID, avoiding races when a new server instance replaces
a crashed one.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Move atexit PID cleanup into run_server()

The atexit registration was only in the __main__ block, so it
did not cover the `unsloth studio` CLI path that calls
run_server() directly via studio_default(). Moving it into
run_server() ensures the PID file is cleaned up on unexpected
exit regardless of entry point.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-25 08:27:27 -07:00
Daniel Han
561f0f39be Fix install.ps1 --local: pass script args to Install-UnslothStudio
The function was called with no arguments, so $args inside the function
was always empty. Script-level args (--local, --package) were never
forwarded. Use @args splatting to pass them through.
2026-03-25 15:14:51 +00:00
Daniel Han
289c7dd7bb Add --local and --package flags to install.ps1
Windows install.ps1 had no way to install from a local repo checkout,
unlike install.sh which supports ./install.sh --local. This adds:

- --local: install from the local repo via editable install (-e . --no-deps)
  after installing deps from PyPI, mirroring install.sh behavior
- --package: install a different package name for testing

The --local flag:
1. Validates pyproject.toml exists at the script's directory
2. Installs torch + unsloth deps normally
3. Overlays the local checkout with uv pip install -e <repo> --no-deps
4. Passes STUDIO_LOCAL_INSTALL and STUDIO_LOCAL_REPO to setup.ps1
2026-03-25 15:12:56 +00:00
Daniel Han
2683c2ab58
Add unsloth to User PATH on Windows after install (#4597)
After installation, `unsloth studio` only works if the user
activates the Studio venv first or uses the full absolute path.
The Desktop/Start Menu shortcuts work fine, but typing `unsloth
studio` in a fresh terminal does not.

This adds the venv Scripts dir to the persistent User PATH env
var (if not already present) so `unsloth studio` works from any
new terminal window. The current session is also updated via the
existing Refresh-SessionPath helper.
2026-03-25 08:00:44 -07:00
Roland Tannous
48a7884584
feat: multi-source model discovery (HF default, legacy cache, LM Studio) (#4591)
* feat: multi-source model discovery (HF default, legacy cache, LM Studio)

* Fix multi-source model discovery bugs

- Fix lmstudio_model_dirs: add ~/.lmstudio/models as default path,
  remove dead sys.platform branch, add dedup via seen set
- Fix _setup_cache_env: preserve legacy HF cache env vars when the
  legacy hub directory exists and is non-empty
- Fix _scan_lmstudio_dir: use absolute path for id field so
  is_local_path() returns True
- Remove LM Studio dirs from allowed_roots (scanned unconditionally)
- Replace bare except passes with logger.warning in legacy cache blocks
- Fix delete_cached_model to search both default and legacy HF caches
- Make lmstudio_dirs non-optional in TS interface (matches Python schema)
- Exclude lmstudio source from trainable model filter
- Remove unused import sys

* Scan HF default cache alongside legacy and active caches

When _setup_cache_env overrides HF_HUB_CACHE to the legacy Unsloth
path, the standard HF default cache (~/.cache/huggingface/hub) was
never scanned, hiding models downloaded before Unsloth Studio was
installed.

Add hf_default_cache_dir() and _all_hf_cache_scans() helper that
deduplicates and scans all three HF cache locations (active, legacy,
default). Used in list_local_models, list_cached_gguf,
list_cached_models, and delete_cached_model.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-25 07:48:04 -07:00
Daniel Han
ebe22c1e9e Update _utils.py 2026-03-25 07:30:40 -07:00
Daniel Han
366fb048d4
fix(studio): add bun cache validation to Windows setup.ps1 (#4596)
Port the bun cache corruption fix from setup.sh to setup.ps1.

bun's package cache can become corrupt, storing only package metadata
without actual content. This causes bun install to exit 0 but leave
binaries like tsc missing from node_modules/.bin/.

Changes:
- After bun install, verify tsc and vite exist in node_modules\.bin\
- Check for both bare names and .cmd wrappers (Windows creates both)
- If missing, clear the bun cache and retry once
- Only fall back to npm if the retry also fails
2026-03-25 07:27:08 -07:00
Daniel Han
3efea63e2f
fix(studio): source-build fallback prefers Unsloth's tested tag over upstream latest (#4593)
* fix(studio): source-build fallback prefers Unsloth's tested tag over upstream latest

When the prebuilt install fails and falls back to source build,
--resolve-llama-tag now queries the Unsloth release repo
(unslothai/llama.cpp) first to get the latest tested/approved tag
(e.g. b8508), instead of going straight to ggml-org/llama.cpp which
may return a newer untested tag (e.g. b8514).

This ensures the source-build fallback compiles the same version that
the prebuilt path would have installed, rather than a potentially
incompatible bleeding-edge release.

Resolution order for "latest":
  1. Unsloth release repo (tested/approved)
  2. ggml-org upstream (bleeding-edge)
  3. Raw requested tag string (last resort)

Changes:
- resolve_requested_llama_tag() accepts optional published_repo param
  with docstring explaining the resolution order
- CLI --resolve-llama-tag passes --published-repo through
- setup.sh and setup.ps1 pass --published-repo to --resolve-llama-tag
  with inline comments explaining the preference

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-25 07:25:47 -07:00
Daniel Han
bc9cf31478
Pin torch>=2.4,<2.11.0 in Studio installers (#4595)
torch 2.11.0 has a torch.compile/dynamo bug that causes a
StopIteration crash in dict_keys_getitem when compiling MoE
router functions (e.g. GptOssTopKRouter_forward). Pin to
<2.11.0 until the upstream fix lands.

Applies to both install.sh (Linux/macOS) and install.ps1
(Windows) fresh install paths.
2026-03-25 07:20:55 -07:00
Daniel Han
2e4569e06a
fix(studio): clear bun cache on failure and retry before falling back to npm (#4594)
bun's package cache can become corrupt, storing only package metadata
(package.json, README) without actual content (bin/, lib/). When this
happens, bun install exits 0 and reports packages as installed, but
binaries like tsc are missing from node_modules/.bin/.

For example, a corrupt typescript cache entry is 64KB (metadata only)
vs 23MB when correctly downloaded.

Changes:
- After bun install, verify tsc and vite exist in node_modules/.bin/
- If missing, clear the bun cache with bun pm cache rm and retry once
- Only fall back to npm if the retry also fails
- Revert bun installation to npm install -g bun (the binary is fine,
  the cache was the problem)
2026-03-25 07:05:02 -07:00
Daniel Han
457c42964f
fix(studio): validate bun install and retry from official source on failure (#4589)
bun install (specifically the npm "bun" shim v1.3.x installed via
npm install -g bun) can exit 0 while silently failing to install
packages. This causes the frontend build to fail with "tsc: not found"
or missing type declarations, since the fallback to npm only triggers
on a non-zero exit code.

Changes:

1. Initial bun install now tries the official bun.sh installer first
   (which gives a real bun runtime), falling back to npm install -g bun
   only if that fails.

2. After bun install reports success, verify that critical binaries
   (tsc, vite) actually exist in node_modules/.bin/. If they are
   missing, reinstall bun from the official source and retry once
   before falling back to npm.

3. Extract the bun install + validation logic into _try_bun_install()
   to avoid duplicating the check/cleanup across both attempts.
2026-03-25 06:38:32 -07:00
Roland Tannous
1f498a73e6 Revert "feat: multi-source model discovery (HF default, legacy cache, LM Studio)"
This reverts commit d56b115bb4.
2026-03-25 13:35:03 +00:00
Roland Tannous
d56b115bb4 feat: multi-source model discovery (HF default, legacy cache, LM Studio) 2026-03-25 13:24:46 +00:00
Daniel Han
ae2b1b97ba
fix(studio): add pip-installed nvidia CUDA libs to LD_LIBRARY_PATH for llama-server (#4590)
The prebuilt llama.cpp binary (cuda13-newer) links against
libcudart.so.13 and libcublas.so.13. When torch is installed via pip,
these libraries live in the venv's site-packages under
nvidia/cu13/lib/, not in /usr/local/cuda/.

The existing LD_LIBRARY_PATH logic only searched /usr/local/cuda*
paths (which have CUDA 12.x), so the CUDA backend failed to load
silently and llama-server fell back to CPU -- even with -ngl -1.

This adds a glob scan of the venv's nvidia package directories
(cu*, cudnn, nvjitlink) to LD_LIBRARY_PATH before launching
llama-server, matching where pip puts the CUDA runtime.

Tested on Colab with RTX PRO 6000 Blackwell (CUDA 13.0, pip torch):
before -- 3 MiB GPU, 0% util, CPU inference
after  -- 13317 MiB GPU, 77% util, full GPU inference

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-03-25 06:24:40 -07:00
Daniel Han
d87c21aebf
fix(studio): add -ngl -1 when model fits on GPU to enable GPU offloading (#4588)
When _select_gpus determines that a GGUF model fits on the selected
GPU(s), the code sets CUDA_VISIBLE_DEVICES but never passes -ngl
(number of GPU layers) to llama-server. Without -ngl or --fit,
llama-server defaults to 0 GPU layers and runs entirely on CPU.

This adds -ngl -1 (offload all layers) in the elif branch where
gpu_indices is set and use_fit is False, so models that fit in VRAM
actually use the GPU for inference.

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-03-25 06:14:33 -07:00
DoubleMathew
f4d8a246bf
Use prebuilt llama.cpp for unsloth studio setup (#4562)
* Use prebuilt llama.cpp for unsloth studio setup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix 3 issues that cause unnecessary fallback to source build

1. Make filelock import optional -- environments without filelock
   (e.g. minimal installs) crashed at import time instead of
   gracefully skipping the lock.

2. Use already-verified converter script from the hydrated source
   tree instead of re-downloading from raw.githubusercontent.com
   with no checksum. Adds symlink with copy fallback for the
   legacy filename.

3. Initialize $SkipPrebuiltInstall in setup.ps1 before first use
   to prevent potential uninitialized variable errors.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Keep network fallback in ensure_converter_scripts

Prefer the local verified copy from the hydrated source tree, but
retain the original network download as a fallback if the file is
missing. Create the legacy hyphenated filename as a symlink with a
copy fallback instead of writing a second full copy.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix 4 bugs in source-build fallback and binary_env paths

- setup.ps1: Replace git pull + checkout FETCH_HEAD with fetch + checkout -B
  to avoid detached HEAD state that breaks re-runs. Use pinned tag in both
  fetch and clone paths.
- setup.sh: Move rm -rf after cmake/git prerequisite checks so a missing
  tool no longer deletes the existing install. Add --branch tag to clone.
- install_llama_prebuilt.py: Add binary_path.parent to Linux LD_LIBRARY_PATH
  in binary_env() so bundled .so files in build/bin are found even without
  RPATH, matching the existing Windows PATH logic.
- Add test for binary_env LD_LIBRARY_PATH on Linux.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Handle unresolved "latest" tag in source-build fallback clone

When tag resolution fails and the requested tag is "latest", both
setup scripts now omit --branch from git clone so the default branch
is cloned instead of failing on a nonexistent "latest" branch/tag.
Similarly, the PS1 fetch path fetches the default ref when the tag
is "latest".

* Resolve actual latest ggml-org tag instead of using literal "latest"

When both Python tag resolution attempts fail and the requested tag
is "latest", query the GitHub API for the actual latest release tag
from ggml-org/llama.cpp (e.g. b8508) instead of passing the literal
string "latest" to git clone --branch, which would fail since no
such branch/tag exists.

setup.sh uses curl + python json parsing; setup.ps1 uses
Invoke-RestMethod. Both fall back to the raw requested tag if the
API call also fails.

* Try Unsloth release repo before ggml-org when resolving latest tag

When falling back to the GitHub API to resolve "latest", query the
Unsloth release repo (unslothai/llama.cpp) first since it has the
prebuilt binaries pinned to tested tags. Only fall back to
ggml-org/llama.cpp if the Unsloth repo query fails.

* Add comprehensive sandbox tests for PR #4562 bug fixes

35 tests covering all fixes across platforms:
- binary_env cross-platform (Linux LD_LIBRARY_PATH, Windows PATH,
  macOS DYLD_LIBRARY_PATH) with edge cases (dedup, ordering, existing paths)
- resolve_requested_llama_tag (concrete, latest, None, empty)
- setup.sh logic via subprocess: prereq check ordering (cmake/git missing
  preserves install), pinned tag in clone, fetch+checkout -B pattern,
  fetch failure warns instead of aborting
- "latest" tag resolution fallback chain (Unsloth API -> ggml-org ->
  raw) with mock curl: success, failure, malformed JSON, empty body,
  empty tag_name, env overrides
- Source code pattern verification for both .sh and .ps1 files

All 138 tests pass in isolated uv venv.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add binary_path.parent to macOS DYLD_LIBRARY_PATH in binary_env

macOS prebuilt .dylib files are overlaid into build/bin (same as
Linux), but binary_env only added install_dir to DYLD_LIBRARY_PATH.
Add binary_path.parent so the loader can find sibling dylibs even
without embedded loader paths.

Mirrors the existing fix for Linux LD_LIBRARY_PATH and the Windows
PATH pattern.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Guard --branch when resolved tag is "latest"; fix broken test assertion

When all API fallbacks fail and the tag stays as literal "latest",
omit --branch from git clone (clones default branch instead of
failing). Both setup.sh and setup.ps1 now check for "latest" before
passing --branch to git clone/fetch.

Also fix test_setup_ps1_clone_uses_branch_tag which used Python
tuple syntax (assert "x", "y" in z) that always passes. Changed to
assert "x" in z and "y" in z.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix macOS DYLD trailing colon, install_lock no-op, and debug log

- binary_env macOS: use dedupe_existing_dirs instead of raw string
  concatenation. Eliminates trailing colon in DYLD_LIBRARY_PATH
  (which causes dyld to search CWD for libraries) and deduplicates
  when binary_path.parent == install_dir. Now consistent with the
  Linux and Windows branches.
- install_lock: when filelock is not installed, use os.O_CREAT|O_EXCL
  as a fallback exclusive file lock with timeout, instead of yielding
  with no locking. Prevents concurrent installs from corrupting each
  other's staging directories.
- setup.ps1: remove [DEBUG] log line that printed to every user on
  every Windows setup run.

* Add stale-lock detection and atomic clone-then-swap

install_lock fallback (no filelock): write PID to lock file and
check if the holder process is still alive on contention. Dead PIDs
(ProcessLookupError) and unreadable lock files trigger immediate
cleanup. Live processes owned by other users (PermissionError) are
correctly recognized as alive -- the lock is not removed.

setup.sh/setup.ps1 source-build: clone into a temporary directory
first, then swap into place only on success. If git clone fails,
the existing install is preserved instead of being deleted by the
premature rm -rf.

* Remove redundant upstream_tag != release_tag check

load_approved_release_checksums compared checksums.upstream_tag
against the Unsloth release_tag, which are different namespaces
(upstream ggml-org tag vs Unsloth published tag). This only worked
because both happened to be "b8508" by convention. Would break if
Unsloth ever uses a different release naming scheme.

The existing check at parse_approved_release_checksums (line 950)
already validates the release_tag field correctly.

* Fix lock TOCTOU race and build-in-temp-dir swap

install_lock fallback: add os.fsync(fd) after writing PID to ensure
the PID is visible to racing processes before they check. Treat
empty lock files (PID not yet written) as "wait and retry" instead
of stale, closing the window where two processes could both see an
empty file, both unlink it, and both acquire the lock.

setup.sh/setup.ps1 source-build: clone AND build in a temp directory
(LLAMA_CPP_DIR.build.$$). Only swap into the final LLAMA_CPP_DIR
after the build succeeds. If clone or cmake or build fails, the temp
dir is cleaned up and the existing working install is preserved.
Previously, rm -rf ran after clone but before build, destroying the
existing install even if the build later failed.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-25 05:42:43 -07:00
Lee Jackson
cc1be75621
studio: stabilize reasoning panel scroll behavior and prevent composer overlap (#4587)
* fix(studio): reasoning panel scroll and thread footer overlap

* refactor(studio): dedupe reasoning scroll lock teardown
2026-03-25 05:32:31 -07:00
Roland Tannous
19e9c60a8e
Consolidate dual venvs and separate install from update (#4530)
* refactor: consolidate dual venvs into single ~/.unsloth/studio/unsloth_studio

* refactor: separate install.sh (first-time) from setup.sh (smart update with PyPI version check)

* fix: install.sh calls setup.sh directly, keep both setup and update CLI commands

* fix: use importlib.resources.files() directly without _path attribute

* fix: bootstrap uv before pip upgrade to handle uv venvs without pip

* fix: frontend 404 when launched via CLI, add global symlink to ~/.local/bin

* feat: add --local flag to install.sh and unsloth studio update for branch testing

* fix: resolve repo root from script location for --local installs

* feat: add --package flag to install.sh for testing with custom package names

* feat: add --package flag to unsloth studio update

* fix: always nuke venv in install.sh for clean installs

* revert: remove Windows changes, will handle in separate PR

* fix: error when --package is passed without an argument

* revert: restore Windows scripts to current main

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: always explicitly set STUDIO_LOCAL_INSTALL and STUDIO_PACKAGE_NAME env vars

* fix: pass explicit STUDIO_LOCAL_REPO env var for --local installs

* fix: align banner box for Setup vs Update labels

* deprecate: hide 'unsloth studio setup' command, point users to update/install.sh

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: check stdout not stdin for auto-launch detection (curl pipe fix)

* fix: update install URL to unsloth.ai/install.sh

* fix: update install.sh usage comments to unsloth.ai/install.sh

* fix: use --upgrade-package for base deps to preserve existing torch/CUDA installs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: --local install now also installs unsloth-zoo via base.txt before editable overlay

* fix: don't skip base packages for --local installs (editable needs unsloth-zoo)

* refactor: move --local full dep install to install.sh, keep SKIP_STUDIO_BASE for all paths

* feat: add migration support for old .venv and CWD-based installs in setup.sh

* Revert "feat: add migration support for old .venv and CWD-based installs in setup.sh"

This reverts commit 301291d002.

* feat: migrate old .venv layout in install.sh instead of always nuking

* feat: validate old .venv with torch CUDA test before migration, recovery message on launch failure

* fix: try CUDA then fall back to CPU for migration validation

* fix: upgrade unsloth/unsloth-zoo with --reinstall-package on migration to preserve torch

* remove: delete unused unsloth ui command (use unsloth studio instead)

* Fix Windows venv path mismatch between install.ps1, setup.ps1, and studio.py

install.ps1 was creating the venv CWD-relative ($VenvName = "unsloth_studio"),
setup.ps1 was using an absolute path to ".unsloth\studio\.venv", and studio.py
looks for ".unsloth\studio\unsloth_studio". All three paths were different, so
the Windows installer would never produce a working Studio setup.

install.ps1:
- Use absolute $StudioHome + $VenvDir matching the Linux install.sh layout
- Add 3-way migration: old .venv at STUDIO_HOME, CWD-relative ~/unsloth_studio
  from the previous install.ps1, or fresh creation with torch validation
- For migrated envs, upgrade unsloth while preserving existing torch/CUDA wheels
- Set SKIP_STUDIO_BASE=1 before calling setup.ps1 (matches install.sh behavior)
- Fix launch instructions to use the absolute venv path

setup.ps1:
- Change $VenvDir from ".unsloth\studio\.venv" to ".unsloth\studio\unsloth_studio"
- Add SKIP_STUDIO_BASE guard: error out if venv is missing when called from
  install.ps1 (which should have already created it)
- Differentiate "Setup" vs "Update" in banners based on SKIP_STUDIO_BASE

* setup.ps1: unconditionally error if venv missing, matching setup.sh

setup.sh always errors out if the venv does not exist (line 224-228),
telling the user to run install.sh first. setup.ps1 was conditionally
creating a bare venv with python -m venv when SKIP_STUDIO_BASE was not
set, which would produce an empty venv with no torch or unsloth. Now
setup.ps1 matches setup.sh: always error, always point to install.ps1.

* Fix --torch-backend=auto CPU solver dead-end on Linux, macOS, and Windows

On CPU-only machines, `uv pip install unsloth --torch-backend=auto`
falls back to unsloth==2024.8 because the CPU solver cannot satisfy
newer unsloth's dependencies. install.ps1 already solved this with a
two-step approach; this applies the same fix to install.sh and
install_python_stack.py.

install.sh: add get_torch_index_url() that detects GPU via nvidia-smi
and maps CUDA versions to PyTorch index URLs (matching install.ps1's
Get-TorchIndexUrl). Fresh installs now install torch first via explicit
--index-url, then install unsloth with --upgrade-package to preserve
the pre-installed torch. All 5 --torch-backend=auto removed from
primary paths.

install.ps1: add fallback else-branch when TorchIndexUrl is empty,
using --torch-backend=auto as last resort (matching install.sh).

install_python_stack.py: remove unconditional --torch-backend=auto
from _build_uv_cmd. Torch is pre-installed by install.sh/setup.ps1
by the time this runs. Callers that need it can set UV_TORCH_BACKEND.

Both install.sh and install.ps1 now share the same three-branch logic:
migrated env (upgrade-package only), normal (torch-first + index-url),
and fallback (--torch-backend=auto if URL detection fails).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use --reinstall-package for migrated envs on both Linux and Windows

For migrated environments (moved from legacy venv location),
--reinstall-package is better than --upgrade-package because it forces
a clean reinstall even if the same version is already installed. This
ensures proper .dist-info and .pyc state in the new venv location.

--upgrade-package remains correct for the fresh install path where
torch is already installed and we just want to add unsloth without
re-resolving torch.

* Address review findings: portability, parity, and stale comments

- Replace grep -oP (GNU Perl regex) with POSIX sed in
  get_torch_index_url() so the script works on BSD grep (macOS is
  already guarded by the Darwin early-return, but Alpine/BusyBox
  would silently get the wrong CUDA tag)
- Add LC_ALL=C before nvidia-smi invocation to prevent locale-dependent
  output parsing issues
- Add warning on stderr when nvidia-smi output is unparseable, matching
  install.ps1's [WARN] message
- Add explicit unsloth-zoo positional arg to install.ps1 migrated path,
  matching install.sh (--reinstall-package alone won't install it if it
  was never present in the migrated env)
- Fix stale comment in install_python_stack.py line 392 that still
  claimed --torch-backend=auto is added by _build_uv_cmd
- Add sed to test tools directory (function now uses sed instead of grep)

* Add --index-url to migrated env path to prevent CPU torch resolution

The migrated path runs uv pip install with --reinstall-package for
unsloth/unsloth-zoo. While uv should keep existing torch as satisfied,
the resolver could still re-resolve torch as a transitive dependency.
Without --index-url pointing at the correct CUDA wheel index, the
resolver would fall back to plain PyPI and potentially pull CPU-only
torch. Adding --index-url $TORCH_INDEX_URL ensures CUDA wheels are
available if the resolver needs them.

Applied to both install.sh and install.ps1.

* Revert --index-url on migrated env path

The original install.ps1 on main already handles the migrated path
without --index-url and it works correctly. --reinstall-package only
forces reinstall of the named packages while uv keeps existing torch
as satisfied. No need for the extra flag.

* Fix unsloth studio update --local not installing local checkout

studio.py sets STUDIO_LOCAL_REPO when --local is passed, but
install_python_stack.py never read it. The update path always
installed from PyPI regardless of the --local flag.

Add a local_repo branch that first updates deps from base.txt
(with --upgrade-package to preserve torch), then overlays the
local checkout as an editable install with --no-deps.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-25 05:24:21 -07:00
Daniel Han
3446e0c489
Add ROCm (AMD GPU) support to studio setup (#4585)
* Add support for ROCm in studio setup

* Fix ROCm detection bugs: ROCM_PATH resolution, CUDA guard, compiler selection

- Set GPU_BACKEND="cuda" when nvcc is found (CUDA path was unreachable)
- Guard ROCm detection with `if [ -z "$GPU_BACKEND" ]` so CUDA takes
  priority on mixed-toolchain hosts
- Rename ROCM_PATH to ROCM_HIPCC for the hipcc binary; resolve the
  actual ROCm root via readlink -f and hipconfig -R into ROCM_ROOT
- Export both ROCM_PATH and HIP_PATH as the resolved root directory
- Use HIPCXX via hipconfig -l instead of legacy CMAKE_C_COMPILER=hipcc
- Switch grep -oP to grep -oE for portability across Linux distros
- Use GPU_TARGETS (upstream cmake variable) instead of AMDGPU_TARGETS
- Remove stale hardcoded fallback targets; let cmake auto-detect instead

* Fix gfx regex to match gfx90a (MI210/MI250/MI250X)

The grep and bash regex used {3,4} digits after 'gfx', which silently
excluded gfx90a (2 digits + letter 'a') -- the architecture for AMD
Instinct MI210, MI250, and MI250X data-center GPUs. Change to {2,4}
so all real gfx targets from gfx90a through gfx1200 are matched.

---------

Co-authored-by: edamamez <eda.zhou@amd.com>
2026-03-25 04:50:23 -07:00
cz-03
7eb48512bc
feat(tokenizer): add get_tokenizer_info() diagnostic helper (#4436)
* feat(tokenizer): add get_tokenizer_info() diagnostic helper

Adds get_tokenizer_info(tokenizer) to tokenizer_utils.py returning a concise dict of key tokenizer properties class name, is_fast, vocab size, added token count, model_max_length, padding side, special tokens (bos, eos, pad, unk), chat template presence, and total special token count. All fields use getattr(..., None) fallbacks so the function never raises on unusual or partially initialized tokenizers. Exported via __all__ alongside the existing public helpers. Useful for logging, debugging, and surfacing tokenizer state in the Unsloth Studio UI.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix docstring, remove artifact, restore valuable comments in tokenizer_utils.py

- Fix get_tokenizer_info() docstring example: correct tokenizer_class to
  PreTrainedTokenizerFast, vocab_size to 128000, swap added_tokens_count (256)
  and special_tokens_count (3) to match actual Llama-3.2-1B-Instruct output
- Remove accidentally committed "# ... (rest of file unchanged)" diff artifact
- Restore fix_sentencepiece_gguf() docstring with llama.cpp upstream link
- Restore 10 comments containing upstream URLs, model-specific workarounds,
  and non-obvious context (issue #292, sentencepiece#121, Starling hack,
  Kaggle /tmp limit, Deepseek slow tokenizer, twitter/danielhanchen references)

* Revert "Fix docstring, remove artifact, restore valuable comments in tokenizer_utils.py"

This reverts commit 4e525b734b.

* Revert all deletions, keep only get_tokenizer_info() addition

Restore tokenizer_utils.py to main and add only the new
get_tokenizer_info() function and its __all__ entry.
All comment removals, dead code cleanup, and formatting
changes from the original PR are reverted.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-25 04:29:01 -07:00
Etherll
d69d60ff19
perf(studio): upgrade to Vite 8 + auto-install bun for faster frontend builds (#4522)
* perf(studio): upgrade to Vite 8 + auto-install bun for 3x faster frontend builds

* fix(studio): make bun-to-npm fallback actually reachable

setup.sh used run_quiet() for the bun install attempt, but run_quiet
calls exit on failure. This killed the script before the npm fallback
could run, making the "falling back to npm" branch dead code.

Replace the run_quiet call with a direct bun invocation that captures
output to a temp file (same pattern, but returns instead of exiting).

Also clean up partial node_modules left by a failed bun install before
falling back to npm, in both setup.sh and build.sh. Without this, npm
inherits a corrupted node_modules tree from the failed bun run.

* fix(studio): restore commonjsOptions for dagre CJS interop

The previous commit removed build.commonjsOptions, assuming Vite 8's
Rolldown handles CJS natively. While optimizeDeps.include covers the
dev server (pre-bundling), it does NOT apply to production builds.

The resolve.alias still points @dagrejs/dagre to its .cjs.js entry,
so without commonjsOptions the production bundle fails to resolve
the CJS default export. This causes "TypeError: e is not a function"
on /chat after build (while dev mode works fine).

Restore the original commonjsOptions block to fix production builds.

* fix(studio): use motion/react instead of legacy framer-motion import

* fix(studio): address PR review findings for Vite 8 + bun upgrade

Fixes:
  - Remove bun.lock from repo and add to .gitignore (npm is source of truth)
  - Use & bun install *> $null pattern in setup.ps1 for reliable $LASTEXITCODE
  - Add Remove-Item node_modules before npm fallback in setup.ps1
  - Print bun install failure log in setup.sh before discarding
  - Add Refresh-Environment after npm install -g bun in setup.ps1
  - Tighten Node version check to ^20.19.0 || >=22.12.0 (Vite 8 requirement)
  - Add engines field to package.json
  - Use string comparison for _install_ok in build.sh
  - Remove explicit framer-motion ^11.18.2 from package.json (motion pulls
    framer-motion ^12.38.0 as its own dependency — the old pin caused a
    version conflict)

* Fix Colab Node bypass and bun.lock stale-build trigger

Gate the Colab Node shortcut on NODE_OK=true so Colab
environments with a Node version too old for Vite 8 fall
through to the nvm install path instead of silently proceeding.

Exclude bun.lock from the stale-build probe in both setup.sh
and setup.ps1 so it does not force unnecessary frontend rebuilds
on every run.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Shine1i <wasimysdev@gmail.com>
2026-03-25 04:27:41 -07:00
Daniel Han
be2cd7087a
Add macOS and Linux desktop shortcuts to install.sh (#4568)
* Add macOS and Linux desktop shortcuts to install.sh

Adds create_studio_shortcuts() function that creates platform-native
shortcuts after `unsloth studio setup` completes, mirroring the Windows
shortcut behavior from PR #4558.

Linux: .desktop file in ~/.local/share/applications/ and ~/Desktop/
macOS: .app bundle in ~/Applications/ with Info.plist, exec stub, and
       optional .icns icon built from unsloth-gem.png via sips+iconutil

Both platforms share a Bash launcher script at
~/.local/share/unsloth/launch-studio.sh that provides:
- Health check with service fingerprint verification
- Port scanning (8888-8908) via ss/lsof
- PID-file single-instance guard (no flock dependency)
- Terminal spawning (macOS: Terminal.app; Linux: gnome-terminal etc.)
- Browser open after health poll with 60s timeout

WSL is skipped (no native desktop environment).

* Fix 6 issues found by 10 parallel reviewers

1. [10/10] Health check now supports wget as fallback to curl via
   _http_get() helper, matching the installer's own download() pattern.
   Previously wget-only systems would time out on every launch.

2. [9/10] Exe path substitution now escapes sed metacharacters (&, \, |)
   and shell single-quotes before injection, preventing launcher
   corruption for paths like /opt/R&D/bin/unsloth.

3. [4/10] Linux .desktop Exec= field now quotes the launcher path,
   fixing launches from home directories containing spaces.

4. [3/10] macOS AppleScript command now escapes backslashes and
   double-quotes before interpolation into do script "...", fixing
   Terminal.app launch failures.

5. [3/10] Single-instance guard now uses atomic mkdir instead of
   racy check-then-write PID file, preventing duplicate concurrent
   launches on rapid double-click.

6. [1/10] Launcher now scans for a free port via _find_launch_port()
   instead of always hardcoding -p 8888, so Studio starts correctly
   when another service already occupies port 8888.

Also fixed: `open` command on Linux (openvt) no longer incorrectly
triggers the macOS browser-open path -- now gated on uname=Darwin.

* Fix mktemp guard and exe path escaping from PR review comments

Two real issues identified from automated review comments:

1. Guard mktemp -d failure in macOS icns generation. If mktemp -d
   returned empty, dirname would resolve to / and rm -rf would attempt
   to delete the root directory. Now checks that the temp dir was
   actually created before proceeding.

2. Replace sed-based exe path substitution with a conf file approach.
   The previous sed escaping broke paths containing apostrophes
   (e.g. /home/O'Connor/) because the '\'' escape introduced
   backslashes that were then double-escaped by the metacharacter
   pass. Now writes UNSLOTH_EXE to a separate studio.conf file that
   the launcher sources at runtime, eliminating all sed metacharacter
   and shell quoting interaction issues.

   This also addresses the sed -i.bak portability concern (now moot
   since sed is no longer used on the launcher file).

* Fix unbound variable crash and per-user lock in launcher

- Use ${UNSLOTH_EXE:-} so set -u does not crash before the friendly
  error message when studio.conf is missing or empty.
- Append $(id -u) to the fallback lock path so each user gets their
  own lock directory when XDG_RUNTIME_DIR is unset.

* Mark desktop shortcut as trusted for GNOME/Nautilus

On modern GNOME desktops, chmod +x alone is not sufficient to make
a .desktop file launchable by double-click on ~/Desktop. Nautilus
requires the metadata::trusted attribute to be set via gio, otherwise
it shows a warning dialog instead of launching the application.
2026-03-25 03:37:37 -07:00
Daniel Han
6872c6e850
Remove advanced CodeQL workflow in favor of default setup (#4584)
The repo has both the CodeQL "default setup" (configured in repo
settings) and this advanced workflow file enabled. GitHub does not
allow both simultaneously, causing all PR CI runs to fail with:

  "CodeQL analyses from advanced configurations cannot be processed
   when the default setup is enabled"

Since the default setup already covers the same languages (Python,
JavaScript/TypeScript) with the same build-mode (none), remove the
redundant advanced workflow file.
2026-03-25 03:34:21 -07:00
dependabot[bot]
38405cc18c
build(deps): bump oxc-parser (#4571)
Bumps the npm-oxc-validator group in /studio/backend/core/data_recipe/oxc-validator with 1 update: [oxc-parser](https://github.com/oxc-project/oxc/tree/HEAD/napi/parser).


Updates `oxc-parser` from 0.116.0 to 0.121.0
- [Release notes](https://github.com/oxc-project/oxc/releases)
- [Changelog](https://github.com/oxc-project/oxc/blob/main/napi/parser/CHANGELOG.md)
- [Commits](https://github.com/oxc-project/oxc/commits/crates_v0.121.0/napi/parser)

---
updated-dependencies:
- dependency-name: oxc-parser
  dependency-version: 0.121.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: npm-oxc-validator
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-25 02:44:38 -07:00
dependabot[bot]
f294161e26
build(deps): bump the actions group with 2 updates (#4570)
Bumps the actions group with 2 updates: [actions/checkout](https://github.com/actions/checkout) and [github/codeql-action](https://github.com/github/codeql-action).


Updates `actions/checkout` from 4 to 6
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4...v6)

Updates `github/codeql-action` from 3 to 4
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/v3...v4)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions
- dependency-name: github/codeql-action
  dependency-version: '4'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-25 02:44:22 -07:00
Pete Kloehn
efedbe9740
Feature/add dependabot and codeql security checks (#4479)
* Add CodeQL analysis workflow configuration

* Add Dependabot configuration for package updates

Configure Dependabot to check for updates in various ecosystems weekly.

* Fix dependabot.yml: bun ecosystem, missing dir, grouping for PR #4479

1. studio/frontend uses bun.lock not package-lock.json, so change npm to bun
2. Add missing studio/backend/requirements/ pip entry (consumed by studio/setup.sh)
3. Add groups with patterns ["*"] to all pip/bun/npm entries to batch updates
   and avoid 30+ individual Dependabot PRs on the first run

* Consolidate pip blocks to fix overlapping directory violation

GitHub Dependabot forbids multiple same-ecosystem entries with
overlapping directories on the same branch. The root "/" directory
overlapped the 3 nested pip dirs. Merge all 4 pip blocks into one
using the `directories:` (plural) key.

Also remove redundant open-pull-requests-limit from the bun block
since grouping with patterns: ["*"] already limits PR count.

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-03-25 02:41:33 -07:00
Datta Nimmaturi
04359be333
[Studio] Try installing causal-conv1d from prebuilt wheels if avialable (#4547)
* Try installing causal-conv1d from prebuilt wheels if avialable

* Prefer installing mamba-ssm from wheel to speed up things

* undo python stack install changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "undo python stack install changes"

This reverts commit d943551092.

* add comments

* Fix wheel installer: model detection, platform tags, torch pin, error handling

- Add nemotron-h (hyphen) and granite-4.0-h / granitemoehybrid to model
  detection for both causal-conv1d and mamba-ssm. These hybrid Mamba models
  were silently skipped since nemotron_h (underscore) never matches real
  HF model IDs like nvidia/Nemotron-H-8B-Base, and granite was missing
  entirely despite being a supported model in model_config.py and loader.py.
- Fix _causal_conv1d_platform_tag to detect linux_aarch64 via
  platform.machine() instead of hardcoding linux_x86_64. Both upstream
  releases publish aarch64 wheels. Drop win_amd64 since neither repo
  publishes Windows wheels (avoids a wasted HTTP probe on every run).
- Pin torch to >=2.6.0,<2.11.0 instead of <=2.10.0 to add a version floor
  and document the wheel coverage range with upstream release links.
- Strip non-numeric suffixes from torch minor version so nightly builds
  like 2.7a0 correctly resolve to wheel tag torch2.7 instead of torch2.7a0.
- Use stderr=_sp.PIPE instead of stderr=_sp.STDOUT in the env probe so
  torch import warnings do not corrupt the JSON output.
- Add timeout=30 to the env probe subprocess to prevent indefinite hangs.
- Catch Exception (not just ImportError) on the existing-install check so
  ABI-broken installs with OSError/RuntimeError are retried rather than
  silently accepted.
- Guard uv invocation with shutil.which("uv") to prevent FileNotFoundError
  crash when uv is not on PATH. Wrap the top-level ensure calls in
  try/except so failures do not kill the training worker.
- Hoist _SSM_MODEL_SUBSTRINGS to module level.
- Remove redundant --torch-backend=auto flag from direct wheel URL install.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add LFM2 to causal-conv1d detection; stop training on install failure

- Add "lfm2" to _model_wants_causal_conv1d so Studio picks up the
  fast kernel path for Liquid Foundation Model 2.
- Replace silent logger.warning on SSM dependency install failure
  with an error event that tells the user to choose another model
  and stops the training job immediately.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Catch subprocess timeout in torch probe; narrow import guard to ImportError

- _probe_causal_conv1d_env: wrap subprocess.run in try/except for
  TimeoutExpired so a slow torch import returns None (falls back to
  PyPI) instead of killing the training job.
- _install_package_wheel_first: narrow except Exception to except
  ImportError on the __import__ check so unexpected errors from a
  broken module still propagate.

* Remove unconditional torch pin from install_python_stack

The torch>=2.6.0,<2.11.0 pin was added to ensure prebuilt
causal-conv1d / mamba-ssm wheels exist, but it runs at install
time for all users regardless of model choice. This can downgrade
or unnecessarily upgrade torch. The worker already handles wheel
compatibility at training time by probing the environment and
falling back to PyPI, so the install-time pin is not needed.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-25 02:22:26 -07:00
Wasim Yousef Said
926e74509d
feat(chat): cleaner tool UI, inline LaTeX, clickable links (#4561)
* feat(chat): ghost-style tool containers

Remove borders and card styling from tool call UI. ToolFallback
uses minimal padding with indented content. ToolGroup defaults
to ghost variant with subtle background for multi-tool grouping.

* feat(chat): compact web search source pills

Switch sources from vertical full-width badges to horizontal
wrapping pills with smaller icons.

* feat(chat): left-accent code and terminal tool UI

Replace bordered card layout with a left border accent for
Python and Terminal tool output. Add timer cleanup on unmount
for the copy button in both components.

* feat(chat): inline latex and clickable links

Enable single-dollar $...$ math rendering via createMathPlugin.
Add styled link component with target=_blank for external links.

* fix(chat): inline generating indicator, static tailwind classes, misc fixes

Move generating indicator from viewport footer into assistant
message using AnimatedShinyText shimmer. Only shows when message
content is empty, hides once tool calls or text appear.

Use static size class map in SourceIcon for Tailwind v4 compat.
Use unique keys for web search sources. Remove px-3 from ghost
tool group variant.

* fix(chat): only show generating indicator while message is running

Hide the shimmer when message is cancelled or errored with no
content, preventing stale loading UI on empty completed messages.

* fix: escape currency dollar signs in LaTeX math rendering and fix TS build error

- Add preprocessLaTeX() in lib/latex.ts to escape currency patterns ($5, $1,000, $5.99, $100K)
  before they reach the math parser, preventing false positives when singleDollarTextMath is enabled.
  Code blocks and already-escaped dollars are left untouched.
- Use preprocessLaTeX via useMemo in markdown-text.tsx so Streamdown receives clean input.
- Fix TS18048 in thread.tsx: message.status?.type (optional chaining) since status can be undefined.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-25 02:06:03 -07:00
Daniel Han
3998f67680
Bump Data Designer to 0.5.4 (removes litellm dependency) (#4569)
* Bump Data Designer to 0.5.4 (removes litellm dependency)

NVIDIA Data Designer v0.5.4 removes litellm entirely and replaces it
with native OpenAI and Anthropic adapters. This follows the litellm
supply chain incident where versions 1.82.7 and 1.82.8 were compromised
with a credential stealer.

Release notes: https://github.com/NVIDIA-NeMo/DataDesigner/releases/tag/v0.5.4

Changes:
- Bump data-designer, data-designer-config, data-designer-engine to 0.5.4
- Sync data-designer-deps.txt with 0.5.4 engine requirements:
  - Added: chardet, fsspec, mcp
  - Removed: python-json-logger, pymupdf, pymupdf4llm, mammoth
    (these remain in the unstructured-seed plugin which still needs them)
  - duckdb constraint relaxed from <1.5 to <2 (upstream fixed record_batch)
- Bump plugin lower bound to >=0.5.4

* Keep pymupdf, pymupdf4llm, mammoth in data-designer-deps

The unstructured-seed plugin is installed with --no-deps, so its
pyproject.toml dependencies are not auto-resolved. These three
packages are needed by the seed route (studio/backend/routes/
data_recipe/seed.py) and must remain in the explicit deps list.
2026-03-25 02:01:43 -07:00
Avaya Aggarwal
45d0a343b5
feat: Implement Q-GaLore optimizer and custom embedding learning rate… (#4511)
* feat: Implement Q-GaLore optimizer and custom embedding learning rate in the Unsloth trainer.

* feat: Implement QGaLoreAdamW8bit optimizer with 8-bit states, GaLore low-rank gradient projection, and optional INT8 weight quantization, along with supporting projector and tests.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* feat: Introduce Q-GaLore AdamW optimizer with low-rank quantized gradient projection and integrate into the trainer, along with dedicated tests.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* feat: Implement Q-GaLore AdamW optimizer with gradient projection and quantization, including trainer integration and corresponding tests.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix 3 bugs in Q-GaLore optimizer and add weight_quant forward hooks

1. Fix use-after-delete crash: move `del p._saved_data` after the
   weight decay block so decoupled weight decay can reference the
   current weights correctly (p.data).

2. Fix substring matching in make_q_galore_param_groups: split
   parameter names on "." and check exact component matches to
   prevent false positives (e.g. "not_q_proj" matching "q_proj").

3. Implement forward pre-hooks for weight_quant: after the optimizer
   quantizes weights to INT8, replace p.data with a 1-element
   placeholder to free float memory. A register_forward_pre_hook
   dequantizes back to float before each forward pass. The trainer
   calls install_weight_quant_hooks() when weight_quant is enabled.

4. Update test_weight_decay_uses_saved_data to match the fixed code
   path (decoupled decay uses p.data, expected value 2.7). Add
   test_weight_quant_hook_restores_float to verify the INT8-to-float
   hook round-trip.

All 24/24 Q-GaLore tests pass. Benchmarked on Llama-3.2-1B-Instruct
FFT: Q-GaLore saves 32% VRAM (10.63 -> 7.24 GB) with better loss
convergence (1.3 vs 2.0 at step 100). No regressions in 31-notebook
sweep across Llama, Qwen, Mistral, Phi, Gemma, vision, and GRPO.

* Default weight_quant to False in QGaloreConfig

Benchmarks show weight_quant=True adds ~1 GB on Llama-3.2-1B due to
INT8 copy/scale overhead exceeding savings from the placeholder trick.
Users can still opt in explicitly. The optimizer logic is unchanged.

* Optimize Q-GaLore projector and optimizer step performance

Projector (q_galore_projector.py):
- Use torch.svd_lowrank with oversampling p=10 (Halko et al. 2009) instead
  of full SVD for large matrices. Falls back to full SVD when min(m,n) <= 2*rank.
  SVD steps are 6-8x faster on Llama-3.2-1B (22s -> 3s for first step).
- Cache the dequantized ortho matrix between project() and project_back() to
  avoid redundant dequantization when quant=True.
- Replace F.cosine_similarity with torch.dot for 1-D unit vectors in the
  adaptive schedule. Remove unused torch.nn.functional import.
- Use collections.deque(maxlen=queue_size) instead of list with manual pop(0).

Optimizer (q_galore_adamw.py):
- Remove redundant .clone() on dequantized weights (line 151) and on float
  data before re-quantization (line 211). _dequantize already returns a fresh
  tensor and _quantize/_quantize_stochastic only reads its input.
- Consolidate per-group torch.cuda.synchronize() into a single call after
  all param groups complete.
- Use torch.empty instead of torch.zeros for the scalar placeholder tensor
  that is never read.

Verified: 24/24 unit tests pass. Llama-3.2-1B 61-step training produces
losses within 0.24% relative diff (correlation >0.9999) of the original.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-25 01:03:10 -07:00
Krishna Chaitanya
11606c5025
fix: remove auto wandb.finish() after train() to allow post-training evaluate() (#4564)
* fix: remove auto wandb.finish() after train() to allow post-training evaluate()

The prepare_for_training_mode wrapper unconditionally called wandb.finish()
after trainer.train() completed. This terminated the active W&B run, causing
trainer.evaluate() to fail with "You must call wandb.init() before wandb.log()".

Users who need multiple training runs in one session can call wandb.finish()
manually between runs to avoid data overwriting.

Fixes #3954

* fix: defer wandb.finish() to next train() call instead of removing it

Instead of calling wandb.finish() at the end of train() (which breaks
evaluate/log) or removing it entirely (which causes data overwriting on
multiple train() calls), defer it to the start of the next train() call.

This way:
- train() + evaluate() works (run stays open after train)
- train() + train() gets separate W&B runs (previous run finished first)
- train() + evaluate() + train() also works correctly

Also resets HF's WandbCallback._initialized flag so it re-calls
wandb.init() for the new run.

Fixes #3954

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-25 01:00:12 -07:00
Wasim Yousef Said
208862218d
feat(studio): training history persistence and past runs viewer (#4501)
* feat(db): add SQLite storage layer for training history

* feat(api): add training history endpoints and response models

* feat(training): integrate DB persistence into training event loop

* feat(ui): add training history views and card grid

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix(studio): address review issues in training history persistence

- Strip hf_token/wandb_token from config before SQLite storage
- Add UUID suffix to job_id for collision resistance
- Use isfinite() for 0.0 metric handling throughout
- Respect _should_stop in error event finalization
- Run schema DDL once per process, not per connection
- Close connection on schema init failure
- Guard cleanup_orphaned_runs at startup
- Cap _metric_buffer at 500 entries
- Make FLUSH_THRESHOLD a class constant
- Map 'running' to 'training' phase in historical view
- Derive LR/GradNorm from history arrays in historical view
- Fix nested button with div[role=button] in history cards
- Guard String(value) against null/undefined in config popover
- Clear selectedHistoryRunId on auto tab switch

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix(studio): address round-2 review findings across training backend and frontend

Backend (training.py):
- Move state mutation after proc.start() so a failed spawn does not wedge
  the backend with is_training=True
- Create DB run row eagerly after proc.start() so runs appear in history
  during model loading, not after first metric event
- Rewrite _flush_metrics_to_db() with snapshot-before-insert pattern to
  preserve metrics arriving during the write and retain buffer on failure
- Guard eval_loss with float() coercion and math.isfinite(), matching the
  existing grad_norm guard
- Increase pump thread join timeout from 3s to 8s to cover SQLite's
  default 5s lock timeout

Frontend (studio-page.tsx):
- Fix history navigation: check isTrainingRunning instead of
  showTrainingView in onSelectRun so completed runs are not misrouted
- Replace activeTab state + auto-switch useEffect with derived tab to
  eliminate react-hooks/set-state-in-effect lint violation

Frontend (historical-training-view.tsx):
- Add explicit "running" branch to message ternary so running runs no
  longer fall through to "Training errored"
- Derive loading from detail/error state and move cleanup to effect
  return to eliminate react-hooks/set-state-in-effect lint violation

Frontend (progress-section.tsx):
- Derive stopRequested from isTrainingRunning && stopRequestedLocal to
  eliminate react-hooks/set-state-in-effect lint violation and remove
  unused useEffect import

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix(studio): resolve 3 remaining bugs from round-2 review

1. Stuck on Current Run tab [12/20]: Only force "current-run" tab when
   isTrainingRunning is true, not when stale completed-run data exists.
   After training ends, users can freely navigate to Configure.

2. Incomplete metric sanitization [7/20]: Apply float() coercion and
   isfinite() guards to loss and learning_rate, matching the existing
   pattern used by grad_norm and eval_loss. Prevents TypeError from
   string values and NaN leaks into history arrays.

3. Stop button state leak across runs [10/20]: Add key={runtime.jobId}
   to ProgressSection so React remounts it when a new run starts,
   resetting stopRequestedLocal state.

* fix(studio): deduplicate loss/lr sanitization in training event handler

Reuse _safe_loss/_safe_lr from the progress update block instead of
re-sanitizing the same raw event values for metric history.

* fix(studio): restore loss > 0 guard to prevent eval steps injecting 0.0 into metric histories

Round-2/3 fixes relaxed the history append guard from `loss > 0` to
`loss is not None`, which let eval-only log events (where loss defaults
to 0.0) append fake zeros into loss_history and lr_history. Restore the
`loss > 0` check to match the worker's own has_train_loss gate. The
float() coercion and isfinite() sanitization from round-3 remain intact.

* fix(studio): resolve training history bugs — nullable loss/lr, tab nav, sparkline

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-25 00:58:55 -07:00
Daniel Han
3108750bb0
Remove duplicate frontend assets from wheel to reduce package size (#4567)
The wheel currently ships frontend/public/, frontend/src/, and
frontend/*.lock alongside frontend/dist/. These are build-time inputs
that Vite already copies into dist/ during the build step:

- public/ is copied verbatim into dist/ by vite build (28.6 MB duplicate)
- src/ is TSX source compiled into dist/assets/*.js (2.1 MB, not used at runtime)
- *.lock files are package manager lockfiles (0.9 MB, not used at runtime)

The backend only serves from frontend/dist/ (see main.py setup_frontend
and run.py frontend_path). Nothing references public/ or src/ at runtime.

This drops the wheel from ~62.7 MB to ~31 MB.
2026-03-24 23:48:49 -07:00
Lee Jackson
557743f027
studio: windows desktop shortcut launcher (#4558)
* feat(windows): add Studio desktop/Start shortcuts with health-check launcher

* chore(windows): bundle sloth.ico and set shortcut icons when valid

* chore(windows):add images/sloth.ico

* fix(windows): guard PSScriptRoot for Studio shortcut icon in iex installs

* fix(install): high-DPI sloth.ico and relocate to studio/frontend/publi

* chore(studio): update sloth.ico for clearer desktop and shell icons

* chore(studio): use unsloth.ico for Studio shortcut icon

* feat(windows): improve Studio shortcut launcher (fast health + browser UX)

* fix(windows): stable unsloth.ico URL and Unicode-safe Studio launcher scripts

* fix(windows): escape $ in exe path and write launcher UTF-8 with BOM

* fix(windows): skip shortcuts when Desktop or APPDATA paths are missing

* fix(install): log shortcut/icon/port failures and warn early on missing paths

* fix(install): guard missing LOCALAPPDATA before shortcut paths

* fix(install): harden New-StudioShortcuts and improve success messaging

* fix(install): include port 8908 in studio health check

* fix(install): fix launch-studio.ps1  quoting

* Fix launcher edge cases and normalize indentation in install.ps1

- Handle silent timeout: show a message when Studio is still starting
  but did not become healthy within the timeout, instead of exiting
  with no feedback
- Add -NoProfile to the visible PowerShell terminal launch so the
  user profile cannot hang or error before Studio runs
- Add a named mutex (Local\UnslothStudioLauncher) to prevent
  double-click from spawning duplicate terminals; second instance
  polls for health and opens the browser when ready
- Normalize indentation inside New-StudioShortcuts outer try block
  from mixed 8/12-space to consistent 12-space

* Simplify Get-CandidatePorts port dedup with Sort-Object -Unique

Replace the foreach/-notcontains loop with a single pipeline:
  $ports = (@($basePort) + $listening) | Sort-Object -Unique

* Harden health probe and handle abandoned mutex in launcher

- Test-StudioHealth now checks resp.service == 'Unsloth UI Backend' to
  avoid fingerprinting collisions with other local services on the same
  port range.
- Wrap the mutex WaitOne(0) call in a try/catch for
  AbandonedMutexException so the launcher recovers gracefully when a
  previous instance was killed while holding the mutex.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-24 23:41:02 -07:00
Krishna Chaitanya
9b989ee898
fix: prevent UnicodeEncodeError on Windows CP1252 consoles in studio setup (#4563)
* fix: prevent UnicodeEncodeError on Windows CP1252 consoles in studio setup

On Windows, `unsloth studio setup` crashes with a UnicodeEncodeError
when install_python_stack.py tries to print Unicode status glyphs
(, , ⚠️) to a console that uses a legacy code page like CP1252.

Add a _safe_print() helper that catches UnicodeEncodeError and
gracefully degrades emoji to ASCII equivalents ([OK], [FAIL], [!]).
Replace all print() calls that emit Unicode glyphs with _safe_print().

Fixes #4509

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Replace Unicode dashes with ASCII in install_python_stack.py

Box-drawing (U+2500) and em dash (U+2014) chars in section dividers
and comments are themselves not representable on CP1252 -- replace
with plain ASCII dashes for consistency with the fix.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-24 22:04:09 -07:00
TR-3B
8c94b461fb
Add GRPO resume vLLM cleanup guard (#4411)
* Add GRPO resume vLLM cleanup guard

* Guard GRPO resume sleep on vLLM sleep mode

* Harden GRPO resume vLLM cleanup guard

- Wrap llm.sleep(1) in try/except so a failed sleep does not block
  training resume (best-effort cleanup)
- Also check kwargs["model_path"] which transformers.Trainer.train()
  still accepts and normalizes to resume_from_checkpoint internally

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-24 21:37:45 -07:00
Wasim Yousef Said
085f9529b6
Regroup chat settings sidebar into focused sections (#4551)
* feat(chat): regroup settings sidebar into Model, Sampling, Tools, and Preferences sections

Split the monolithic Settings collapsible into focused sections with
icons. Model section shows context length and KV cache dtype for GGUF
models, trust remote code for non GGUF. Tools section groups auto heal,
max tool calls, and tool call timeout. Preferences section holds auto
title toggle.

* feat(chat): persist collapsible section open/closed state in localStorage

Remember which sections the user expanded or collapsed across sidebar
toggles, mobile sheet reopens, and browser sessions.

* fix(chat): harden collapsible state persistence and restore defaultOpen

- Validate localStorage values are booleans before using them, preventing
  corrupted entries like string "false" from being treated as truthy
- Use Object.hasOwn() instead of `in` operator to avoid prototype chain
  matches on keys like "constructor" or "toString"
- Restore defaultOpen={true} on Model and Preferences sections so they
  are expanded on first visit, matching the old Settings section behavior
- Fix misleading Context Length description to reflect it is read-only
- Downgrade console.error to console.warn for non-critical localStorage
  parse failures

* fix(chat): remove redundant disabled styles on Context Length input

The Input component already applies opacity-50 and cursor-not-allowed
via its disabled: variants. Specifying them unconditionally in the
className is redundant.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-24 19:39:27 -07:00
Daniel Han
acc881452f
fix: pin unsloth>=2026.3.11 in install.sh and install.ps1 (#4556)
Ensures both install scripts always pull a version that has the
litellm removal fix. Without the pin, stale uv/pip caches could
resolve the older 2026.3.10 which still had litellm in
data-designer-deps.txt, causing setup to fail at step 8/11
while PyPI has litellm quarantined.
2026-03-24 07:44:07 -07:00
Daniel Han
76a2f17470
fix(studio): remove litellm dep (quarantined on PyPI) (#4553)
litellm has been quarantined on PyPI due to a supply chain attack
in version 1.82.8 (malicious credential-stealing .pth file).
No versions are currently installable, which blocks
`unsloth studio setup` at step 8/11 (data-designer deps).

Remove litellm from the single-env data-designer requirements
so setup completes. litellm can be re-added once PyPI lifts the
quarantine.

Ref: https://github.com/BerriAI/litellm/issues/24512
2026-03-24 07:10:26 -07:00
Daniel Han
fac6f7887e Versioning 2026-03-24 06:50:36 -07:00
Daniel Han
95d2748278
fix: give @0xKushwaha git history credit for completion_only_loss fix (#4552)
* Revert "fix: handle prompt/completion datasets in slow-path BOS detection (#4548)"

This reverts commit fca83182af.

* fix: support completion_only_loss=True with prompt/completion dataset columns

When completion_only_loss=True, TRL rejects formatting_func but Unsloth's
patched _prepare_dataset/_prepare_non_packed_dataloader assumed either
formatting_func or dataset_text_field was always set, causing a catch-22.

Now handles prompt/completion columns as a third case for BOS token
detection, with a safe None fallback for all other cases.

(cherry picked from commit 978f78c6f1)

* fix: handle prompt/completion datasets in slow-path BOS detection

The slow-path check_text blocks in rl_replacements.py and
tokenizer_utils.py crash when a prompt/completion dataset is used
because they unconditionally access dataset[0][dataset_text_field]
even when the dataset does not have a text field.

This fixes both files to:
- Default dataset_text_field to None instead of raising when undefined
- Detect prompt/completion columns and concatenate them for BOS check
- Guard with isinstance(str) on both prompt and completion to handle
  conversational format (list of dicts) by setting test_text to None
- Add test_text is not None guard on has_bos_token_already to prevent
  AttributeError on NoneType.startswith()

This is the slow-path complement to unslothai/unsloth-zoo#560 which
fixes the fast-path in sft_prepare_dataset.

Closes #4486

(cherry picked from commit b6ce5786d0)

* fix: preserve chat_template BOS check when test_text is None

The has_bos_token_already guard wrapped both test_text.startswith()
and bos_token in chat_template with test_text is not None, which
disabled the chat_template BOS detection for conversational datasets
where test_text is set to None.

Split the guard so test_text is not None only applies to the
startswith() call, while bos_token in chat_template is always checked.

(cherry picked from commit 40bd8b8917)

---------

Co-authored-by: Ayush Kushwaha <148432773+ayushkushwaha240@users.noreply.github.com>
2026-03-24 06:38:57 -07:00
Daniel Han
fca83182af
fix: handle prompt/completion datasets in slow-path BOS detection (#4548)
* fix: handle prompt/completion datasets in slow-path BOS detection

The slow-path check_text blocks in rl_replacements.py and
tokenizer_utils.py crash when a prompt/completion dataset is used
because they unconditionally access dataset[0][dataset_text_field]
even when the dataset does not have a text field.

This fixes both files to:
- Default dataset_text_field to None instead of raising when undefined
- Detect prompt/completion columns and concatenate them for BOS check
- Guard with isinstance(str) on both prompt and completion to handle
  conversational format (list of dicts) by setting test_text to None
- Add test_text is not None guard on has_bos_token_already to prevent
  AttributeError on NoneType.startswith()

This is the slow-path complement to unslothai/unsloth-zoo#560 which
fixes the fast-path in sft_prepare_dataset.

Closes #4486

* fix: preserve chat_template BOS check when test_text is None

The has_bos_token_already guard wrapped both test_text.startswith()
and bos_token in chat_template with test_text is not None, which
disabled the chat_template BOS detection for conversational datasets
where test_text is set to None.

Split the guard so test_text is not None only applies to the
startswith() call, while bos_token in chat_template is always checked.
2026-03-24 05:27:59 -07:00
Michael Han
a41dbb6ab2
Add r/unsloth Reddit.md 2026-03-24 04:13:38 -07:00
Michael Han
381f509695
Adding Qwen3.5 RL.md 2026-03-24 04:06:23 -07:00
Wasim Yousef Said
c8057d911b
fix: system prompt ignored in unsloth inference (#4528)
* fix: system prompt was dropped in unsloth text and vision inference

* refactor: simplify system prompt message construction

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: use multimodal typed content parts for vision system message and add fallback

The system message content must use typed content parts
([{"type": "text", "text": ...}]) instead of a plain string to match
the multimodal processor contract (consistent with the audio path).
Plain strings cause some processors (e.g. LLaVA) to silently drop the
system prompt.

Also wraps processor.apply_chat_template in try/except so models that
reject the system role gracefully fall back to no system message with
a warning log.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: capture and log original exception in vision system prompt fallback

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-24 04:01:33 -07:00
Wasim Yousef Said
3dc212e218
fix: always show chat tool icons (#4525)
* fix: always show chat tool icons, gray out when model doesn't support them

Tool icons (Think, Search, Code) were hidden unless a model was loaded
and supported those features. Now they're always visible so users can
see and pre-select them. If a loaded model doesn't support a feature,
the button gets grayed out and disabled instead of being removed.

* refactor: centralize Qwen thinking params in store

* fix: disable tool buttons when no model is loaded

Change disabled condition from `modelLoaded && !supportsX` to
`!modelLoaded || !supportsX` so buttons are grayed out both when
no model is loaded and when the loaded model lacks the capability.

* Fix Qwen3 param clobbering and restore SuggestionItem capability guards

- Revert setReasoningEnabled() in the store to a pure boolean setter.
  Moving the Qwen3 param logic into it caused reconnect/load/refresh
  paths (which also call setReasoningEnabled) to silently overwrite
  user-customized or server-provided temperature/topP/topK/minP.

- Restore applyQwenThinkingParams() as a standalone function called
  only from explicit user toggle click handlers in thread.tsx and
  shared-composer.tsx, matching the pre-PR behavior.

- Re-add supportsReasoning/supportsTools guards in the SuggestionItem
  click handler so that clicking a suggestion card only activates
  tool toggles the loaded model actually supports.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-24 03:26:56 -07:00
Daniel Han
77b21333fb
fix(studio): restore scroll lock on reasoning panel collapse (#4545)
PR #4543 removed useScrollLock from ReasoningRoot, causing the thread
viewport to jump when a user collapses a reasoning panel. Restore the
hook to freeze scrollTop during the 200ms collapse animation, matching
the pattern used by tool-fallback.tsx and tool-group.tsx.
2026-03-24 02:27:06 -07:00
Wasim Yousef Said
1129ea44bc
fix(studio): show Windows-specific reset-password command on login error (#4529) 2026-03-23 23:04:00 -07:00
Daniel Han
5916bcb2e3
Fix Studio port conflict detection for loopback addresses (#4532)
* Fix port conflict detection when loopback address is held by another process

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use getaddrinfo for IPv6 host support, restore emojis in terminal output

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Guard against conn.pid being None in _get_pid_on_port

psutil.net_connections() can return entries with pid=None when the
current user lacks privileges to see the owning process (common on
macOS without root, Windows without admin, and some Linux configs).

psutil.Process(None) does not raise -- it silently returns the
current process, which would make the warning incorrectly blame
Unsloth Studio itself for blocking the port.

Skip entries with pid=None so the caller falls back to the generic
"port is already in use" message instead.

* Update studio/backend/run.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-23 22:34:47 -07:00
Lee Jackson
45e4a0473a
studio: stop scroll hijack during generation and fix thinking panel layout shift (#4543)
* fix(chat): stabilize thinking panel and thread scroll during generation

* fix: match ChatGPT scroll and thinking panel behavior

- Remove autoScroll={false} from thread viewport to restore default
  follow-scroll during streaming (pauses when user scrolls up, resumes
  at bottom)
- Rewrite reasoning panel state: auto-opens on stream start, user can
  close during streaming, auto-collapses when reasoning ends, user can
  re-expand after collapse

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-23 22:33:46 -07:00
Lee Jackson
01d7dce3f4
studio: persist system prompt and preset settings across navigation (#4538)
* fix(studio): harden system prompt persistence and storage fallback

* Exclude checkpoint from localStorage persistence for PR #4538

checkpoint is backend-owned state -- refresh() already syncs it from
getInferenceStatus() on every page load. Persisting it to localStorage
causes a stale model ID to survive across backend restarts, which
prevents auto-load from triggering when no model is actually loaded.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-23 22:21:04 -07:00
金黄色葡萄球君君
2b330e2f24
fix: store embedding_learning_rate on self in UnslothTrainingArguments (#4531)
Fixes #4492

The embedding_learning_rate parameter was assigned to a local variable
instead of self.embedding_learning_rate, causing UnslothTrainer.create_optimizer()
to always get None via getattr and silently fall back to a single param group.

Bug: embedding_learning_rate = embedding_learning_rate (no-op)
Fix: self.embedding_learning_rate = embedding_learning_rate
2026-03-23 21:08:29 -07:00
pre-commit-ci[bot]
a5be6904a6
[pre-commit.ci] pre-commit autoupdate (#4542)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.15.6 → v0.15.7](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.6...v0.15.7)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-23 14:55:27 -07:00
Datta Nimmaturi
cd65584f19
Update issue template 2026-03-23 10:10:15 +05:30
Daniel Han
1ecb55faa2 Update _utils.py 2026-03-22 08:23:40 -07:00
Daniel Han
797ddd201e
Fix Studio silently exiting on Windows without error output (#4527)
* Fix Studio silently exiting on Windows without error output

On Windows, `unsloth studio` launches a child process via
subprocess.Popen to run the server in the studio venv. If the child
crashes (e.g. due to a missing package), the parent just calls
typer.Exit(rc) with no message -- the user sees "Launching Unsloth
Studio... Please wait..." and then the prompt returns with zero
feedback.

Root cause: `data_designer_unstructured_seed` is imported at the top
level in seed.py. If this package is not installed in the studio venv,
the entire import chain (seed.py -> routes/__init__.py -> main.py ->
run_server()) crashes with ModuleNotFoundError. Since run.py has no
try/except around run_server() and studio.py does not report nonzero
exit codes, the failure is completely silent.

Changes:
- run.py: wrap run_server() in try/except, print clear error with
  traceback to stderr. Also reconfigure stderr encoding on Windows so
  tracebacks with non-ASCII paths do not cause secondary failures.
- studio.py: print an error message when the child process exits with
  a nonzero code on Windows, so the user knows something went wrong.
- seed.py: make data_designer_unstructured_seed import optional with
  a try/except fallback. The server starts normally and only returns
  HTTP 500 if the unstructured seed endpoints are actually called.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Skip Anaconda/Miniconda Python when creating Studio venv on Windows

Conda-bundled CPython ships modified DLL search paths that prevent
torch from loading c10.dll on Windows. The Studio server fails
silently at startup because the venv was created with conda's Python.

Standalone CPython (python.org, winget, uv) does not have this issue.

Both install.ps1 and setup.ps1 now skip any Python binary whose path
contains conda, miniconda, anaconda, miniforge, or mambaforge when
selecting the interpreter for the studio venv. If only conda Python
is available, the scripts print an error with instructions to install
standalone CPython.

* Fix multi-file preview crash and improve setup.ps1 Python discovery

Addresses review findings [10/10] and [8/10]:

1. seed.py: _read_preview_rows_from_multi_files() had a hard import
   of build_multi_file_preview_rows inside the function body, bypassing
   the optional-plugin guard. Moved it into the top-level try/except
   block and added a None guard matching the other functions.

2. setup.ps1: Python discovery now probes py.exe (Python Launcher)
   first, uses Get-Command -All to look past conda entries that shadow
   standalone CPython further down PATH, skips WindowsApps stubs, and
   resolves the actual executable path so venv creation does not
   re-resolve back to a conda interpreter.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Check sys.base_prefix to catch venvs created from conda Python

A venv created from conda Python (e.g. C:\Users\danie\.venv) has a
path that does not contain "conda", but sys.base_prefix still points
to the conda install (e.g. C:\Users\danie\miniconda3). The previous
path-only check missed this case entirely.

Both install.ps1 and setup.ps1 now use a Test-IsConda helper that
checks both the executable path AND sys.base_prefix against the
conda/miniconda/anaconda/miniforge/mambaforge pattern. This catches:
- Direct conda Python executables
- Venvs created from conda Python (base_prefix reveals the origin)

* Fix install.ps1 passing version string to uv venv instead of resolved path

Find-CompatiblePython returned a bare version string (e.g. "3.13")
which was passed to `uv venv --python 3.13`. uv performs its own
interpreter discovery and can resolve that version string back to a
conda Python, defeating the entire conda-skip logic.

Now Find-CompatiblePython returns a hashtable with both .Version (for
display) and .Path (the resolved absolute executable path). The venv
is created with `uv venv --python <absolute-path>`, ensuring uv uses
the exact interpreter we validated.

* Quote resolved Python path in uv venv call for paths with spaces

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-22 08:23:03 -07:00
Daniel Han
866cb33ce0 Update _utils.py 2026-03-22 06:14:35 -07:00
NuoFang
4cedeba8c2
fix(studio): prevent ModuleNotFoundError in dataset.map() on Windows (#4473)
* fix(studio): prevent ModuleNotFoundError in dataset.map() on Windows

On Windows, dataset.map() uses "spawn", which requires workers to
import compiled modules from disk. Previously, clear_unsloth_compiled_cache()
deleted the entire directory, causing workers to crash when looking for
UnslothSFTTrainer.py.

Changes:
1. Added `preserve_patterns` to cache cleanup to keep `Unsloth*Trainer.py`
   on Windows while clearing model-specific files.
2. Added the cache directory to PYTHONPATH for spawn workers.
Linux/macOS behavior is unchanged.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix spawn-platform coverage, CWD path mismatch, and race condition for PR #4473

- Extend platform guard from win32-only to include macOS (also uses spawn
  since Python 3.8, same ModuleNotFoundError would occur)
- Replace fragile CWD-based PYTHONPATH registration with centralized
  register_compiled_cache_on_path() that uses the same __file__-relative
  _CACHE_DIRS already used by cache_cleanup -- fixes path mismatch when
  studio is launched from a directory other than the repo root
- Move PYTHONPATH registration to the top of _train_worker(), before any
  dataset.map() call (previously it ran late in config assembly, after
  dataset formatting which also calls dataset.map())
- Update inference.py model-unload to preserve trainer files on spawn
  platforms, preventing a race where unloading a model via inference tab
  would delete UnslothSFTTrainer.py while training workers are importing it

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix cache-dir precedence reversal in register_compiled_cache_on_path()

Iterating _CACHE_DIRS in forward order while calling insert(0) each time
reverses the declared priority: later entries shadow earlier ones. When
multiple compiled-cache directories exist, spawned workers could import a
stale trainer from the wrong cache.

Fix: iterate in reverse so that the highest-priority entry (first in
_CACHE_DIRS) is inserted last and ends up at position 0 in sys.path and
PYTHONPATH.

* fix: harden worker-count helpers against cpu_count=None and desired<=0

- safe_num_proc: guard os.cpu_count() with `or 1`, clamp multi-GPU
  path with max(1, min(4, desired)), clamp return with max(1, desired)
- safe_thread_num_proc: same os.cpu_count() guard and return clamp
- Add regression tests (31 L1 unit + 10 sandbox edge-case tests)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove regression tests from PR

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-22 06:11:24 -07:00
Daniel Han
62e3de181f
Update weather dashboard suggestion to request HTML code output (#4523)
The previous prompt "Show me a live weather dashboard, no API key needed"
was too vague. The new wording explicitly asks for HTML code, which
produces more useful and consistent responses.
2026-03-22 06:09:48 -07:00
Leo Borcherding
71c77d4e96
fix(install.ps1): fix non-NVIDIA package resolution — split torch+unsloth install (#4515)
* fix(install.ps1): split torch+unsloth install to fix non-NVIDIA package resolution

--torch-backend=auto on a non-NVIDIA Windows machine causes uv to resolve
unsloth==2024.8 (pre-CLI, no unsloth.exe). Fix: detect GPU robustly (PATH +
hardcoded fallback paths, mirrors setup.ps1), install torch first with an
explicit --index-url (CUDA variant for NVIDIA, CPU for everyone else), then
install unsloth separately without --torch-backend so the solver always picks
a modern release that ships the Studio CLI.

Closes the remaining gap flagged in #4478.

* fix(install.ps1): align warning with setup.ps1, add --upgrade, handle CUDA 11.x

- Match the no-GPU warning message to studio/setup.ps1 wording
  (chat-only GGUF mode, driver download link)
- Add CUDA 11.x floor check in Get-TorchIndexUrl so old drivers
  fall back to CPU wheels instead of silently getting cu124
- Log a warning when nvidia-smi output cannot be parsed
- Add --upgrade to both uv pip install calls so re-runs pick up
  newer package versions

* revert --upgrade from uv pip install calls

uv pip install already resolves to the latest satisfying version;
--upgrade is unnecessary and could force unwanted re-installs.

* fix: replace frozen cu124 fallbacks with cu126, guard CUDA 11.x

cu124 wheels are frozen at torch 2.6.0 -- falling back to them pins
users to an outdated PyTorch.  Three issues fixed in both install.ps1
and setup.ps1:

1. CUDA 12.0-12.5 now maps to cu126 (was cu124).
2. CUDA 11.x and older now falls back to cpu (was cu124, which would
   silently install incompatible GPU wheels).
3. Parse-failure and no-nvidia-smi fallbacks updated to cu126/cpu.

Adds tests/test_cuda_wheel_mapping.py covering the mapping logic,
nvidia-smi parsing, PS1 file sync, PyTorch index URL validation,
and sandbox torch installs.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove test file from PR branch

Test file kept locally, not needed in the PR.

* fix: map CUDA 11.x to cu118 instead of cpu

PyTorch still publishes cu118 wheels (up to torch 2.7.1), so CUDA 11.x
users get GPU-accelerated torch rather than being forced to CPU-only.
Only CUDA 10.x and older fall back to cpu.

* fix: revert CUDA 12.0-12.5 to cu124, handle cpu tag in setup.ps1

CUDA 12.0-12.5 drivers only support up to their reported CUDA version,
so cu126 wheels (built with CUDA 12.6) fail to load. Revert the catch-
all for 12.0-12.5 back to cu124.

Also fix setup.ps1 caller: when Get-PytorchCudaTag returns "cpu" (e.g.
CUDA 10.x driver), the installer now correctly skips Triton and prints
"CPU-only" instead of "CUDA support (cpu)".

* fix: add --upgrade to unsloth install for stale venv repair

On reruns against an existing venv, uv pip install unsloth makes no
changes if unsloth==2024.8 is already installed (it satisfies the
constraint). Adding --upgrade only to the unsloth install ensures
stale installs get repaired without forcing a multi-GB torch
re-download.

* fix: use --upgrade-package to avoid clobbering torch CUDA wheels

`--upgrade unsloth` re-resolves torch from default PyPI, stripping the
+cuXXX suffix installed in step 1.  `--upgrade-package unsloth unsloth`
upgrades only unsloth (and pulls missing deps like transformers, trl)
while preserving the pinned torch from the CUDA-specific index.

* docs: explain why split-install and --upgrade-package are needed

Expand the inline comment block to document both design decisions:
1. Why torch is installed separately (solver fallback to 2024.8)
2. Why --upgrade-package is used instead of --upgrade (preserves CUDA wheels)

---------

Co-authored-by: LeoBorcherding <LeoBorcherding@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-22 05:41:58 -07:00
Daniel Han
100b8857f2
Fix Studio crash on Anaconda/conda-forge Python (#4484)
* Fix Studio crash on Anaconda Python due to platform._sys_version() parse failure

Anaconda and conda-forge modify sys.version to include distributor
metadata between pipe characters, e.g.:

    3.12.4 | packaged by Anaconda, Inc. | (main, ...) [MSC v.1929 ...]

Python's platform._sys_version() has a hardcoded regex that cannot
parse this format, raising ValueError. CPython closed this as "not
planned" (cpython#102396) since Anaconda modified the binary.

This breaks the import chain: run.py -> structlog -> rich -> attrs,
which calls platform.python_implementation() at module scope.

Fix: before any library imports, strip the pipe segments, parse the
cleaned version string via the standard parser, and cache the result
under the original sys.version key so all subsequent platform calls
hit the cache.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add defensive fallback for unpaired pipe edge cases in version patch

Address Gemini review suggestion: if the paired-pipe regex leaves
residual pipes (hypothetical single-pipe distributor metadata), fall
back to extracting the version number and the parenthesized build
info directly. Wrap the entire patch in try/except so unexpected
version string formats degrade gracefully instead of crashing the
patch itself.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor into shared _platform_compat module, cover colab.py entrypoint

Address reviewer feedback:

1. Extract the Anaconda/conda-forge sys.version fix into a shared
   _platform_compat.py module that wraps platform._sys_version() with
   a retry-on-ValueError fallback. This is more robust than cache-seeding
   because it handles all future platform._sys_version() calls, not just
   the first one.

2. Import the fix from both run.py and colab.py entrypoints, so Studio
   no longer crashes on Anaconda Python regardless of the launch path.

3. The wrapper is idempotent (guarded by a flag) and handles edge cases:
   paired pipes (Anaconda, conda-forge), unpaired pipes (hypothetical),
   and standard CPython strings (no-op since ValueError is never raised).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Replace monkey-patch with cache-prime, fix colab.py duplicate sys.path, cover main.py

- Rewrite _platform_compat.py: replace function-wrapping monkey-patch with
  one-shot cache seed (_seed_sys_version_cache). Parses cleaned sys.version
  once and seeds platform._sys_version_cache so the stdlib parser never sees
  the problematic Anaconda/conda-forge pipe-delimited string. No function
  replacement, no idempotency flag, no reload edge cases.

- colab.py: remove duplicate backend_path sys.path insertion after
  _bootstrap_studio_venv(). The early insertion (before _platform_compat
  import) already covers it. This also fixes backend/ ending up behind
  venv site-packages in sys.path ordering.

- run.py: move PYTHONWARNINGS=ignore before _platform_compat import to
  preserve original intent of suppressing warnings early.

- main.py: add sys.path + _platform_compat import before route imports,
  covering the direct `uvicorn main:app` launch path.

- Add test_platform_compat.py with 7 tests covering Anaconda, conda-forge,
  and standard CPython version strings, plus the loggers import chain.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove test_platform_compat.py from PR

* Handle Format B conda-forge version strings with duplicate paren groups

Some conda-forge builds produce sys.version with the build info both
before and after the pipe label (e.g. "3.9.7 (default, ...) | packaged
by conda-forge | (default, ...) \n[GCC 7.5.0]"). After stripping the
pipe segment, two consecutive (...) groups remain, which still fails
platform._sys_version(). Add a second regex pass to drop the duplicate
paren group.

* Guard _sys_version call with try/except to avoid making things worse

If the cleaned version string is still unparseable by the stdlib regex
(e.g. nested parens, exotic multi-pipe formats), silently give up
instead of letting ValueError propagate at import time -- which would
be a worse crash than the original deferred one.

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-22 05:36:55 -07:00
Andrew Barnes
2c5d3c48ec
fix: subprocess crash during map operation on Windows (#4507)
* fix: handle Windows subprocess crash during dataset.map()

Windows uses spawn (not fork) for multiprocessing. Spawned workers
cannot resolve Unsloth's dynamically compiled cache modules from
unsloth_compiled_cache/, causing ModuleNotFoundError and RuntimeError
during dataset.map() tokenization.

Add two platform-guarded patches for sys.platform == "win32":
1. Force HF_DATASETS_MULTITHREADING_MAX_WORKERS=1 and set spawn method
2. Monkey-patch Dataset.map() to force num_proc=None

Fixes #4490

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* address review: extend spawn fix to macOS, add multiprocess fallback

- Change platform checks from sys.platform == "win32" to
  sys.platform != "linux" so macOS (also spawn-based) is covered
- Wrap multiprocess import in try/except falling back to stdlib
  multiprocessing when the multiprocess package isn't installed
- Rename _win32_safe_map to _spawn_safe_map to reflect broader scope

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: replace global Dataset.map monkey-patch with targeted num_proc routing

The previous approach had issues: Patch 1 set HF_DATASETS_MULTITHREADING_MAX_WORKERS
and forced set_start_method (dead code on platforms already using spawn), and Patch 2
globally monkey-patched Dataset.map() (too broad, missed Dataset.filter()).

Replace with a two-layer fix:

1. Studio layer: Add dataset_map_num_proc() that returns None on spawn platforms
   (Windows, macOS). Unlike num_proc=1 which still creates Pool(1) and spawns a
   worker, num_proc=None runs Dataset.map()/filter() truly in-process.
   Update all dataset.map() callsites to use it. ThreadPoolExecutor callers
   (format_conversion.py) keep using safe_num_proc() since threads are unaffected.

2. Root-cause layer: Propagate UNSLOTH_COMPILE_LOCATION via PYTHONPATH on spawn
   platforms so spawned workers can import compiled modules. Mirrors the .venv_t5
   pattern in worker.py. Does not import unsloth_zoo.compiler (heavy torch/triton
   imports). Completely skipped on Linux.

Also extend safe_num_proc() to return 1 on macOS (was only guarding Windows),
and narrow the transformers 5.x dataloader guard from != "linux" to explicit
("win32", "darwin").

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: add safe_thread_num_proc() for ThreadPoolExecutor callsites

safe_num_proc() correctly caps to 1 on macOS/Windows for process-based
multiprocessing, but format_conversion.py reuses it for ThreadPoolExecutor
workers. Threads share address space and are unaffected by spawn, so
capping to 1 makes image URL downloads sequential -- a real regression.

Add safe_thread_num_proc() that skips the platform guard but keeps the
cpu_count heuristic, and switch both ThreadPoolExecutor callsites in
format_conversion.py to use it.

* fix: remove double-wrap in dataset_num_proc + fix num_proc=1 in datasets route

- trainer.py:3009: Replace safe_num_proc(max(1, os.cpu_count() // 4))
  with max(1, (os.cpu_count() or 1) // 4) to avoid double-wrapping
  inside dataset_map_num_proc which already calls safe_num_proc
- trainer.py:15-20: Clarify comment on PYTHONPATH propagation
- datasets.py:445: Change num_proc=1 to num_proc=None for 10-row
  preview slice (avoids unnecessary multiprocessing overhead)

* fix: guard os.cpu_count() against None in worker-count helpers

os.cpu_count() can return None on some platforms. Use (os.cpu_count() or 1)
to prevent TypeError in safe_num_proc() and safe_thread_num_proc().

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-22 05:21:09 -07:00
DoubleMathew
4c1a6cb962
gate on min uv version and shortcut python candidate search if known (#4489)
* gate on min uv version and shortcut python candidate search if known

* fix sort -V cross compat issue, run_quiet early exit on llamacpp, autolaunch

* update launch message

* Fix PR comments

* auto launch and find open port

* remove dev install

* Fix review findings: major-version guard, non-fatal port fallback, tty comment, restore local

* Remove autolaunch, clean up dead state and debug noise

- Remove find_open_port, TTY-gated autolaunch, and </dev/tty
  redirection from install.sh; just print launch instructions
- Remove unused BEST_MAJOR variable from studio/setup.sh
- Remove stray "finished finding best python" debug echo
- Fix stale comment "below 3.12" to "below 3.11"

* Reject prerelease uv at exact minimum version boundary

* Remove 2>/dev/null from version_ge numeric comparisons

Let non-numeric version parts surface errors on stderr
instead of being silently swallowed.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-22 05:20:25 -07:00
Daniel Han
64f9f389a0 Created using Colab 2026-03-22 04:57:26 -07:00
Velsa
981f477e31
fix: reconfigure stdout to UTF-8 on Windows to prevent UnicodeEncodeError on startup (#4493)
* fix: reconfigure stdout UTF-8 on Windows to prevent UnicodeEncodeError from emoji

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: default frontend_path when None to fix blank page when venv is pre-activated

* Restore Windows UTF-8 stdout fix dropped in earlier commit

The cp1252 console encoding on Windows cannot render emoji characters
used in startup messages (e.g. print(" Frontend loaded ...")).
This causes UnicodeEncodeError and crashes the server before it starts.

Place sys.stdout.reconfigure(encoding="utf-8", errors="replace") at the
top of run_server(), unconditionally before any print() or structlog
call, so all emoji output is covered -- including the frontend status
messages and silent=True paths that the original placement missed.

Guarded by sys.platform == "win32" and hasattr check, so it is a no-op
on Linux/macOS and safe in non-standard stdout environments (Jupyter,
piped IO).

* fix: preserve run_server(None) as headless, fix CLI frontend kwarg

Remove the frontend_path=None fallback in run_server() that changed
None from "headless/API-only" to "mount bundled frontend", breaking
backwards compatibility for embedders.

The blank-page bug was actually caused by the CLI wrappers always
passing frontend_path=frontend (even when frontend=None), which
overrode run_server()'s default. Fix studio.py and ui.py to only
pass frontend_path when the user explicitly sets --frontend.

* fix: use timeout loop for shutdown event in ui command

Match studio_default()'s shutdown loop that uses a 1-second timeout
on Event.wait(). Without a timeout, the bare wait() blocks at the C
level on Linux, preventing Python from delivering SIGINT (Ctrl+C).

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-22 04:49:59 -07:00
Leo Borcherding
96edad9c95
PR: Fix/cuda minimum check and abort (#4517)
* fix: add CUDA minimum version check and abort for llama.cpp (>= 12.4)

- setup.ps1/setup.sh: abort with clear error if CUDA toolkit < 12.4
  (llama.cpp requirement); link to cuda-toolkit-archive for upgrade
- setup.ps1: promote CUDA VS integration copy failure from WARN to
  ERROR + exit 1; remove manual-copy hack instructions per Roland —
  correct fix is re-installing CUDA/MSBuild, not a manual workaround

Fixes: https://github.com/unslothai/unsloth/issues/4437
Reported by: Sebastien

* fix: wipe stale studio venv when torch CUDA tag changes

When the NVIDIA driver is updated, the required PyTorch CUDA tag changes
(e.g. cu124 -> cu130) but setup.ps1 was silently reusing the existing
.venv, leaving the old torch wheel in place and breaking the UI for
everyone on the next setup run.

Before creating/reusing the venv, inspect the installed torch version
string. If its CUDA tag does not match what the current driver requires,
wipe the venv so we always get a clean, correct install.

* Fix CUDA version check: portability, non-fatal fallback, stale venv detection

- setup.sh: Replace grep -oP with POSIX sed for macOS compatibility
- setup.sh: Replace exit 1 with NVCC_PATH="" to fall back to CPU-only build
- setup.sh: Move version check before -DGGML_CUDA=ON append
- setup.sh: Add else branch warning when nvcc version is unparseable
- setup.ps1: Replace exit 1 with $NvccPath=$null for non-fatal CUDA fallback
- setup.ps1: Add driver vs toolkit guidance in version warning
- setup.ps1: Guard CUDA env/VS integration setup with if ($NvccPath)
- setup.ps1: VS integration catch: downgrade to WARN, restore source/dest paths
- setup.ps1: Stale venv: detect CPU torch and untagged wheels, not just +cuNNN
- setup.ps1: Stale venv: rebuild on failed torch import
- setup.ps1: Stale venv: wrap Remove-Item in try/catch for locked files

* Remove incorrect CUDA >= 12.4 check, keep only stale venv detection

llama.cpp has no hard minimum CUDA version -- it builds with CUDA as old
as 11.2 and degrades features gracefully via #if CUDART_VERSION guards.
The 12.4 figure was the default Docker/CI baseline, not a build requirement.

Reverted:
- CUDA version check in setup.sh (entirely removed)
- CUDA version check in setup.ps1 (entirely removed)
- VS integration catch block cosmetic changes (restored to main)
- if ($NvccPath) guard around CUDA env setup (not needed without version check)

Kept:
- Stale venv detection in setup.ps1: detects torch CUDA tag mismatch
  (cu124 vs cu130, cpu vs cuXXX, broken torch import) and rebuilds venv

* Fix stale venv detection: incomplete venvs, timeout, fatal delete failure

- Add 30s timeout for torch import probe via ProcessStartInfo/WaitForExit
- Use Test-Path -PathType Container to reject files masquerading as venv dir
- Trigger rebuild when python.exe is missing (incomplete venv)
- Make Remove-Item failure fatal ([ERROR] + exit 1) instead of warn-and-continue
- Move $expectedTorchTag computation inside -not $shouldRebuild guard

---------

Co-authored-by: LeoBorcherding <LeoBorcherding@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-22 04:46:36 -07:00
Daniel Han
bcf28466c2
fix: exclude .ipynb from ruff pre-commit hook (#4521)
The ruff pre-commit hook runs on all file types by default, including
.ipynb notebooks. Colab notebooks are authored in Colab's editor and
can contain IPython magics (%cd, !git) that ruff cannot parse. This
causes pre-commit.ci to fail on unrelated PRs when a notebook on main
has syntax ruff does not understand.

Add `exclude: '\.ipynb$'` to the ruff hook so notebooks are skipped.
2026-03-22 03:25:58 -07:00
Daniel Han
17dc83dc34 Created using Colab 2026-03-22 01:56:35 -07:00
Michael Han
d50e605f08
Update README.md 2026-03-21 14:55:18 -07:00
Sridhar Nandigam
a20c824711
FIX: Broken link to NVIDIA DataDesigner in README (#4500) 2026-03-21 14:41:09 -07:00
Wasim Yousef Said
50cccfd55e
feat(chat): server-side timings, context display & source hover cards (#4467)
* feat(chat): add server-side timings and context display for GGUF

Extract timings/usage metadata from llama-server SSE stream and forward
through the full stack. Replace client-side estimates with accurate
server-reported metrics (prompt eval, tok/s, token counts, cache hits).
Add context window usage bar to chat top nav.

* feat(chat): source badges with hover cards and 2-row collapse

- Add hover cards to source badges showing favicon, title, URL and
  snippet description on hover
- Limit source badges to 2 rows with +X more expand/collapse
- Parse snippet from web search results for hover card descriptions
- Replace individual Source rendering with grouped SourcesGroup component

* fix(chat): add null guards for server timings edge cases

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix(chat): reset contextUsage on thread switch, remove unused context-display

* fix(chat): stop double-counting completion tokens in tool-calling path

* fix(chat): skip metadata events in llm_assist consumers

* fix(chat): hide context usage bar in compare mode

* fix(chat): harden timings pipeline and context usage persistence

Accumulate prompt_ms, predicted_ms, and predicted_n from intermediate
tool-detection passes so the final metadata reflects total server work.
Persist contextUsage in message metadata (Dexie) and restore on thread
load. Add type guard in gguf_stream_chunks for unexpected dict events.
Clear contextUsage when entering compare mode.

* feat(chat): make GGUF stream metadata OpenAI-compatible

* fix(chat): address PR review feedback

* feat(chat): address PR review feedback

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-20 23:42:01 -07:00
Wasim Yousef Said
dd283b0605
feat(studio): multi-file unstructured seed upload with better backend extraction (#4468)
* fix(recipe-studio): prevent fitView from zooming to wrong location on recipe load

* feat: add pymupdf/python-docx deps and unstructured uploads storage root

* feat: add POST /seed/upload-unstructured-file endpoint

* feat: add multi-file chunking with source_file column

* feat: update frontend types and API layer for multi-file upload

* feat: round-robin preview rows across source files

Ensures every uploaded file is represented in the preview table
by cycling through sources instead of just taking the first N rows.

* fix: disable OCR, fix auto-load timing, fix persistence on reload

- Disable pymupdf4llm OCR with write_images=False, show_progress=False
- Replace onAllUploaded callback with useEffect that detects uploading→done
  transition (avoids stale closure reading empty file IDs)
- Fix importer to preserve file IDs from saved recipes instead of clearing
  (clearing only happens at share time via sanitizeSeedForShare)

* fix: harden unstructured upload with input validation and state fixes

Validate block_id/file_id with alphanumeric regex to prevent path
traversal, use exact stem match for file deletion, add error handling
for metadata writes and empty files, fix React stale closures and
object mutations in upload loop, and correct validation logic for
unstructured seed resolved_paths.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: address PR review - legacy path import, share sanitizer, sync effect

Promote legacy source.path into resolved_paths for old unstructured
recipes, clear source.paths in share sanitizer to prevent leaking local
filesystem paths, and gate file sync effect to dialog open transition
so users can actually delete all uploaded files.

* fix: CSV column fix (BOM + whitespace + unnamed index re-save) for #4470

* fix: harden unstructured upload flow and polish dialog UX

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-20 13:22:42 -07:00
Michael Han
f113f3511d
Update Install method.md 2026-03-20 05:17:05 -07:00
Daniel Han
ef0491e0fe
Fix Windows installer Python detection and winget error handling (#4483)
* Fix Windows installer Python detection and winget error handling

The PowerShell installer crashes on some Windows machines due to two
issues:

1. Windows Store App Execution Aliases: Get-Command finds the stub at
   WindowsApps\python.exe, then python --version writes to stderr.
   With $ErrorActionPreference = "Stop" on PowerShell 5.1, stderr
   from native commands becomes a terminating error, killing the
   script before it tries to install Python.

2. winget "already installed" exit code: winget returns -1978335189
   (APPINSTALLER_CLI_ERROR_UPDATE_NOT_APPLICABLE) when the package is
   already at the latest version. The script treated any non-zero exit
   as failure. The fallback Get-Command check could also find the
   Store stub or fail if Python was partially uninstalled.

Changes:

- Add Find-CompatiblePython helper that tries the py launcher first,
  then python3/python via Get-Command -All, explicitly skipping any
  WindowsApps stubs. All invocations wrapped in try-catch so stderr
  never triggers ErrorActionPreference.

- Replace exit-code-based winget error handling with outcome-based:
  re-detect Python after install, retry with --force if not found,
  show actionable manual install instructions on final failure.

- Deduplicate PATH entries in Refresh-SessionPath to prevent unbounded
  growth from repeated machine+user path prepending.

* Address reviewer feedback: wrap winget calls, remove blanket WindowsApps filter

Three fixes based on code review:

1. Wrap all winget install calls in $ErrorActionPreference = "Continue"
   blocks so that winget stderr (progress bars, warnings) does not
   become a terminating error on PowerShell 5.1. This matches the
   pattern already used in studio/setup.ps1 line 983.

2. Remove the blanket *\WindowsApps\* path filter that rejected all
   WindowsApps executables including valid Microsoft Store Python
   installs. Instead, rely on the existing try-catch + version regex
   probing to determine if a candidate is functional. Non-functional
   entries (App Execution Alias stubs) fail the try-catch and are
   skipped naturally.

3. Use $pyLauncher.Source (resolved path) instead of bare py name,
   add -CommandType Application to avoid matching aliases/functions,
   and derive winget package ID from $PythonVersion variable instead
   of hardcoding Python.Python.3.13.

* Add back WindowsApps filter for python3/python fallback path

The App Execution Alias stubs in WindowsApps can open the Microsoft
Store as a side effect when invoked, even though the try-catch handles
the error. Since the py launcher (tried first) already detects
legitimate Store Python -- Store packages include py since Python
3.11 -- filtering WindowsApps in the python3/python fallback is safe
and avoids the Store popup.

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-03-20 02:01:23 -07:00
Leo Borcherding
239ca98643
fix: detect AMD/no-NVIDIA GPU early in Windows installer and guard unsloth.exe existence (#4478)
* fix(install.ps1): detect AMD/no-NVIDIA GPU early and guard unsloth.exe existence

When a user has an AMD GPU (no nvidia-smi), uv's --torch-backend=auto
resolves to CPU torch, which constrains the solver to unsloth==2024.8.
That ancient release has no unsloth.exe CLI entry point, so the subsequent
& \ studio setup call throws a confusing PowerShell
'module could not be loaded' CommandNotFoundException instead of a
clear error.

Two fixes:
- Detect nvidia-smi early; if no NVIDIA GPU is found, print a clear
  error explaining AMD/Intel GPUs are unsupported and exit before
  wasting time installing the wrong package version.
- Guard Test-Path \ before invoking it, so any future case
  where the CLI entry point is missing produces a readable error
  instead of a cryptic PowerShell exception.

Fixes: unsloth_studio\Scripts\unsloth.exe CommandNotFoundException
on AMD GPU systems (Windows).

* fix(install.ps1): correct GPU support message - AMD is Linux-only via ROCm

* Slim down to just the unsloth.exe existence guard

Remove the early NVIDIA GPU detection gate -- Studio supports Windows
and Mac without a GPU (finetuning is simply disabled). The GPU gate
was blocking legitimate non-NVIDIA users from installing.

Keep only the Test-Path guard on unsloth.exe before invoking it. This
turns the confusing PowerShell CommandNotFoundException into a clear
error message pointing at the likely cause (older unsloth version
resolved by the package solver that does not include the Studio CLI).

* Fix quickstart link in unsloth.exe guard message

---------

Co-authored-by: LeoBorcherding <LeoBorcherding@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-20 01:48:45 -07:00
Roland Tannous
ebe45981dd
feat: support GGUF export for non-PEFT models + fix venv_t5 switching for local checkpoints (#4455)
* feat: support full model GGUF export, disable incompatible methods in UI

* fix: resolve base model from config.json for venv_t5 export switching

* feat: detect BNB-quantized models and disable all export methods for quantized non-PEFT checkpoints

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: relocate Ollama Modelfile alongside GGUFs during non-PEFT export cleanup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-20 12:13:18 +04:00
Manan Shah
be901ecdea
Adding launch command to install scripts (#4477)
* Adding launch command to install scripts

* Making launch only for interactive env
2026-03-20 10:45:33 +04:00
Michael Han
07aabf45c0
Update Install instructions.md 2026-03-19 21:51:10 -07:00
Daniel Han
d0e5a1d61e
Fix macOS install.sh: stdin consumption and Python discovery (#4472)
* Fix macOS install.sh: stdin consumption and Python discovery

Two issues when running `curl | sh` on macOS:

1. Commands like `brew install` consume bytes from the piped stdin,
   causing the shell to lose its place in the script. The remaining
   source code gets printed as text instead of being executed, so
   users have to run the installer twice. Fixed by redirecting stdin
   from /dev/null for brew, apt-get, xcode-select, and the uv
   installer subprocess.

2. setup.sh searches for Python 3.11-3.13 on the system PATH via
   `compgen -c`. On macOS systems that only have Python 3.9 and/or
   3.14, this fails with "No Python version between 3.11 and 3.13
   found" even though uv already installed Python 3.13 into the
   venv. Fixed by adding the venv's bin/ to PATH before invoking
   `unsloth studio setup`.

* Guard PATH export against empty VENV_ABS_BIN

If cd into the venv bin/ fails, VENV_ABS_BIN would be empty and
PATH would start with ":", causing the current directory to be
searched for executables. Wrap the export in a non-empty check.
2026-03-19 11:52:32 -07:00
Michael Han
29270a3726
Data recipes now works for Mac and CPU.md 2026-03-19 07:26:28 -07:00
Daniel Han
3faa9af148 Update _utils.py 2026-03-19 02:31:45 -07:00
Daniel Han
709a611356 Update README.md 2026-03-19 02:28:53 -07:00
Daniel Han
074a07981e Merge branch 'main' of https://github.com/unslothai/unsloth 2026-03-19 02:26:46 -07:00
Daniel Han
2b8bfa5b19 Update README.md 2026-03-19 02:26:18 -07:00
Datta Nimmaturi
729a0cb0ae
[studio] full finetuning studio (#4461)
* full finetuning studio

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update studio/backend/core/training/trainer.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-19 02:18:46 -07:00
Manan Shah
6f129a214b
Fix Install commands for Windows + 1 line installs (#4447)
* One liner setup for unsloth studio

* Fix install scripts: system deps, activation bugs, curl/wget support

- install.sh: detect platform (macOS/Linux/WSL) and check for missing
  system dependencies (cmake, git, build-essential, libcurl4-openssl-dev).
  Prompt user once for permission to install all missing packages via
  brew (macOS) or sudo apt-get (Linux/WSL). Add wget fallback via
  download() helper since curl is not always present on minimal Linux
  installs. Fix nested curl|sh stdin stealing by downloading uv installer
  to a tempfile first. Replace venv activation (no-op in a pipe subshell)
  with explicit --python flag for uv pip install and direct venv binary
  invocation. Add idempotency guard for venv creation. Redirect stdin
  on unsloth studio setup to prevent pipe consumption. On macOS, check
  for Xcode Command Line Tools and trigger install if missing.

- install.ps1: wrap script body in Install-UnslothStudio function so
  that errors use return instead of exit (exit kills the terminal when
  run via irm|iex). Remove activate.ps1 invocation entirely -- use
  explicit --python path for uv pip install and & $UnslothExe for
  studio setup. This avoids both the child-scope activation bug (& vs
  dot-source) and the execution policy error on default Windows systems.
  Add winget availability check with clear error message. Fix PATH
  refresh to append registry paths instead of replacing the session PATH.
  Add uv installer fallback via astral.sh PowerShell script if winget
  install does not put uv on PATH. Broaden Python version check to
  accept 3.11-3.13. Add idempotency guard for venv creation.

- README.md: add wget one-liner alternative for systems without curl.

* Fix Tailwind CSS v4 .gitignore bug on Windows (#4444)

- Add .gitignore hiding workaround to setup.ps1 (matching existing
  setup.sh logic) so venv .gitignore files containing "*" don't prevent
  Tailwind's oxide scanner from finding .tsx source files
- Add CSS size validation to setup.sh, setup.ps1, and build.sh to catch
  truncated Tailwind builds early
- Remove stray force-rebuild overrides that made the "skip build if
  current" cache check dead code in both setup scripts
- Add rm -rf dist to build.sh to force clean rebuilds for wheel packaging

* Change default port 8000 to 8888, fix installer bugs, improve UX

- Change default Studio port from 8000 to 8888 across all entry points
  (run.py, studio.py, ui.py, colab.py, vite.config.ts, setup scripts)
- Update launch banner: "Launching with studio venv..." to
  "Launching Unsloth Studio... Please wait..."
- Add "Open your web browser" banner and rename labels
  (Local -> Local Access, External -> Worldwide Web Address)
- Fix venv idempotency: check for bin/python instead of just directory
  existence, clean up partial venvs on retry
- Fix build.sh CSS validation: handle empty CSS case that silently
  bypassed the check with "integer expression expected"
- Fix install.sh sudo handling: try apt-get without sudo first (works
  when root), then escalate with per-package tracking and user prompt
- Fix install.ps1: check exit code from studio setup, fail on error
- Add pciutils to WSL GGUF build dependencies
- Apply same smart apt-get escalation pattern to studio/setup.sh

* Use detected Python version for venv, abort on non-apt Linux

- install.ps1: detect existing Python 3.11/3.12/3.13 and use that
  version for venv creation instead of always forcing 3.13
- install.sh: exit with error on non-apt Linux distros when required
  packages cannot be auto-installed, instead of silently continuing

* Make sudo permission prompt more prominent with warning banner

* Add Accept [Y/n] sudo prompt to studio/setup.sh for consistency

* Fix native command exit code handling and sudo decline flow

install.ps1: Add $LASTEXITCODE checks after winget (Python), uv venv,
and uv pip install calls. $ErrorActionPreference only catches PowerShell
cmdlet errors, not native executable failures. The Python check also
handles winget returning non-zero for "already installed".

setup.sh: Skip llama-server build when user declines sudo or sudo is
unavailable. Previously the script continued to section 8 which would
fail with confusing errors (e.g. "gcc: command not found") since
build-essential was never installed.

* Move rm -rf llama.cpp inside build branch to preserve existing install

When _SKIP_GGUF_BUILD is set (user declined sudo or sudo unavailable),
the previous rm -rf would destroy an already-working llama-server before
the skip check ran. Move it inside the else branch so existing builds
are preserved when the rebuild is skipped.

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-19 02:09:09 -07:00
Wasim Yousef Said
6c2bfebb20
fix(studio): mobile navbar layout and chat settings sheet (#4458)
* fix(studio): mobile navbar layout and chat settings sheet

* fix(studio): portal select dropdowns inside sheet modal subtree
2026-03-19 02:04:53 -07:00
Manan Shah
72b768e0be
Fixing Qwen3.5 bug and adding Outetts dependencies (#4459)
* Fixing Qwen3.5 bug and adding Outetts dependencies

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestion from @danielhanchen

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-19 01:52:07 -07:00
Manan Shah
e793c378db
turning data recipes on for mac (#4454) 2026-03-19 11:11:27 +04:00
Michael Han
e6a42d0073
Update Install instructions.md 2026-03-18 20:13:48 -07:00
Michael Han
6f9d8ad4c3
Add BETA in README.md 2026-03-18 17:15:10 -07:00
Daniel Han
8b4a0f2191 Update README.md 2026-03-18 11:13:12 -07:00
Daniel Han
e0a9e772d1 Update README.md 2026-03-18 09:57:44 -07:00
Datta Nimmaturi
d4c8c0cb84
Make instructions mac friendly (#4432) 2026-03-18 09:48:02 -07:00
Daniel Han
28407a1742 Update _utils.py 2026-03-18 09:10:36 -07:00
Daniel Han
8582ce3e9c
Fix studio chat crash on Mac: vendor check_signal_escape_patterns (#4431)
* Fix studio crash on Mac: vendor check_signal_escape_patterns from unsloth_zoo

Vendor the `check_signal_escape_patterns` function from
`unsloth_zoo.rl_environments` directly into `tools.py`. The function is
pure Python (only uses stdlib `ast`) and has zero GPU dependencies, but
importing it from unsloth_zoo triggers `unsloth_zoo.__init__` which calls
`get_device_type()` at module scope -- raising NotImplementedError on
Apple Silicon Macs.

By vendoring the code, the safety checks still run on all platforms
(Mac, Linux, Windows) without needing unsloth_zoo at all.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-18 09:10:13 -07:00
Daniel Han
e38212281a
Fix TypeScript build errors in studio frontend (#4429)
- tool-ui-python.tsx: use explicit tuple type instead of `as const` to
  match the mutable `[BundledTheme, BundledTheme]` expected by Streamdown
- chat-adapter.ts: add missing `argsText` field required by
  ToolCallMessagePart and fix `args` type to use ReadonlyJSONObject
2026-03-18 08:44:38 -07:00
Michael Han
7d270825fb
Update README.md 2026-03-18 08:30:53 -07:00
Daniel Han
596fae1de2 Update _utils.py 2026-03-18 08:29:09 -07:00
Daniel Han
9c95148045
Fix tool call parsing, add tool outputs panel and UI improvements (#4416)
* Add elapsed timer to tool status pill in Studio

Show a count-up seconds timer (0s, 1s, 2s, ...) next to the tool status
text in the composer area. Helps users gauge how long a tool call (web
search, code execution) has been running. Timer resets when a new tool
starts and disappears when all tools finish.

* Fix tool call parsing, add tool outputs panel and reasoning copy button

Backend:
- Rewrite tool call XML parser to use balanced-brace JSON extraction
  instead of greedy regex, fixing truncation on nested braces in
  code/JSON arguments
- Handle optional closing tags (</tool_call>, </function>, </parameter>)
  that models frequently omit
- Support bare <function=...> tags without <tool_call> wrapper
- Strip tool call markup from streamed content so raw XML never leaks
  into the chat UI
- Use a persistent ~/studio_sandbox/ working directory for tool
  execution so files persist across calls within a session
- Emit tool_start/tool_end SSE events so the frontend can display
  tool inputs and outputs

Frontend:
- Add collapsible "Tool Outputs" panel below assistant messages showing
  each tool call's input and output with copy buttons
- Add copy button to reasoning blocks
- Add elapsed timer to tool status pill
- Update project URLs in pyproject.toml (http -> https, add docs link)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add interactive HTML preview with fullscreen toggle for code blocks

HTML code fences now render an interactive sandboxed iframe preview
below the syntax-highlighted code, similar to how SVG fences show
an image preview. The iframe uses sandbox="allow-scripts" to allow
JavaScript execution while blocking access to the parent page.

Includes a fullscreen toggle (enlarge/minimize button) that expands
the preview into a viewport overlay, dismissible via button, Escape
key, or backdrop click. A streaming placeholder prevents partial
HTML from rendering mid-stream.

* Add tool call settings: auto-heal toggle, max iterations, timeout

Add three user-configurable tool call settings to the Studio Settings panel:

- Auto Heal Tool Calls: toggle to control fallback XML parsing of malformed
  tool calls from model output (default: on)
- Max Tool Calls Per Message: slider 0-40 + Max to cap tool call iterations
  per message (default: 10)
- Max Tool Call Duration: slider 1-30 minutes + Max to set per-tool-call
  execution timeout (default: 5 minutes)

All settings persist to localStorage and flow through the full stack:
frontend store -> API request -> Pydantic model -> route -> llama_cpp -> tools.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix tool call timeout: respect no-limit and apply to web search

- Use a sentinel to distinguish timeout=None (no limit) from the default
  (300s). Previously None was silently replaced with _EXEC_TIMEOUT.
- Pass the configured timeout to DDGS() for web searches so the setting
  applies uniformly to all tool types.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add input validation bounds and per-thread sandbox isolation

- Add ge=0 constraint to max_tool_calls_per_message (rejects negative values)
- Add ge=1 constraint to tool_call_timeout (minimum 1 second)
- Thread session_id from frontend through backend to tool execution
- Scope sandbox directories per conversation: ~/studio_sandbox/{thread_id}/
- Backwards compatible: API callers without session_id use ~/studio_sandbox/

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix non-monotonic streaming and Python temp script path

- Split tool markup stripping into closed-only (mid-stream) and full
  (final flush) to prevent cumulative text from shrinking mid-stream
- Enforce monotonicity: only emit when cleaned text grows, so the
  proxy's delta logic (cumulative[len(prev_text):]) never breaks
- Place Python temp scripts in the sandbox workdir instead of /tmp so
  sys.path[0] points to the sandbox and cross-call imports work

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Sanitize session_id to prevent path traversal in sandbox

Strip path separators and parent-dir references from session_id before
using it as a directory name. Verify the resolved path stays under
~/studio_sandbox/ as a second guard.

* feat(chat): proper assistant-ui tool call UIs with sources

Replace custom metadata-based ToolOutputsGroup with native assistant-ui
tool-call content parts. Backend SSE tool_start/tool_end events now emit
proper { type: "tool-call" } parts from the adapter, enabling per-tool
UIs registered via tools.by_name in MessagePrimitive.Parts.

- Web search: Globe icon, Source badges with favicons, auto-collapse
  when LLM starts responding
- Python: Code icon, syntax-highlighted code via Streamdown/shiki,
  output block with copy
- Terminal: Terminal icon, command in trigger, output with copy
- ToolGroup wraps consecutive tool calls (skips for single calls)
- Sources component renders URL badges at end of message
- Flattened code block CSS (single border, no nested boxes)

* fix(inference): respect empty enabled_tools allowlist

`if payload.enabled_tools:` is falsy for [], falling through to
ALL_TOOLS. Use `is not None` so an explicit empty list disables
all tools as intended.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Shine1i <wasimysdev@gmail.com>
2026-03-18 08:28:02 -07:00
Daniel Han
11b5e7abf3 Update README.md 2026-03-18 08:15:07 -07:00
Daniel Han
d45abae5b3 Update README.md 2026-03-18 08:12:20 -07:00
Daniel Han
7ddb660b0c
revert: always rebuild frontend, override caching with _NEED_FRONTEND_BUILD=true (#4427)
* revert: remove frontend build caching from setup scripts

The mtime-based caching introduced in #4404/#4413 can incorrectly skip
frontend builds -- e.g. after git pull when filesystem timestamps are
not preserved, or after our Tailwind v4 discovery that the site-packages
.gitignore must be hidden before vite build (which the cached path
doesn't handle).

Always rebuild the frontend on setup. The build takes ~15s and is
safer than risking a stale dist/.

* revert: disable frontend build caching, keep code commented out

Caching disabled by always setting _NEED_FRONTEND_BUILD=true.
The mtime-based logic is preserved in comments for future re-enabling.

Reasons for disabling:
- Git does not preserve file timestamps, so cached dist/ can appear
  newer than freshly checked-out source after a pull
- Tailwind v4 requires hiding site-packages/.gitignore before vite
  build; the cache path bypasses this, producing broken CSS

* revert: always rebuild frontend, remove mtime caching

* revert: always rebuild frontend, override caching with _NEED_FRONTEND_BUILD=true
2026-03-18 07:37:53 -07:00
Daniel Han
2a7646c4ca Update README.md 2026-03-18 07:27:04 -07:00
Daniel Han
e9fa12acd3 Update pyproject.toml 2026-03-18 07:26:40 -07:00
Daniel Han
1ab020115e Update pyproject.toml 2026-03-18 07:17:20 -07:00
Daniel Han
6bf81e4a48 Update README.md 2026-03-18 06:59:37 -07:00
Daniel Han
38217bcdcc Update README.md 2026-03-18 06:58:42 -07:00
Daniel Han
9c89d7b22b Update README.md 2026-03-18 06:52:27 -07:00
Daniel Han
7517e0fb2f Update README.md 2026-03-18 06:33:54 -07:00
Daniel Han
52f9c30513
fix: exclude nemotron_h from flex_attention (#4424)
* fix: exclude nemotron_h from flex_attention

NemotronHForCausalLM does not support flex_attention and raises:
  NotImplementedError: NemotronHForCausalLM does not support an
  attention implementation through torch's flex_attention.

Add nemotron_h to the exclusion list alongside gpt_oss and mllama
so Unsloth falls back to the default attention implementation.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-18 06:11:11 -07:00
Wasim Yousef Said
51c08ded9b
fix(studio): deduplicate context length validation and sync input with store (#4423) 2026-03-18 06:06:54 -07:00
Daniel Han
95bfc50b35
Fix inference stall during prefill (retry storm) (#4409)
* Fix inference stall during prefill by removing retry storm

The _stream_with_retry method used a 0.5s read timeout and retried by
sending a brand new POST request each time. During prompt prefill (which
can take 5-30+ seconds for long contexts or reasoning models), this
caused 10-60 duplicate requests that forced llama-server to restart
processing from scratch each time, resulting in 10-20s stalls visible
as "Generating" with no progress in the UI.

Fix: send the request ONCE with a 120s read timeout for the initial
response headers. Cancel support during the prefill wait is handled by
a background thread that monitors cancel_event (checked every 0.3s)
and closes the response to unblock the httpx read immediately. This
preserves the ability to stop/cancel/refresh during generation.

The existing 0.5s timeout on the httpx.Client is still used by
_iter_text_cancellable for per-token cancel checking during streaming
(after prefill), which is unaffected by this change.

* Fix race in cancel watcher when response is not yet created

When cancel_event fires before client.stream() returns (response is
still None), the watcher would hit return and exit without closing
anything. The main thread stays blocked for up to 120s.

Fix: after cancel is requested, keep polling _response_ref every 0.1s
until the response object appears (then close it) or _cancel_closed
is set (main thread finished on its own).

* Minor cleanup: remove redundant None check, add debug logging in cancel watcher

Address Gemini review: cancel_event is guaranteed non-None when the
watcher thread runs, and logging the close exception aids debugging.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Retry r.close() on failure instead of giving up

If r.close() raises, stay in the polling loop and retry rather than
returning and leaving the main thread blocked for up to 120s.

* fix: keep short read timeout during token streaming

The prefill_timeout (read=120s) was passed to client.stream(), which
applied to ALL reads -- not just the initial response headers. This
meant _iter_text_cancellable's ReadTimeout-based cancel checking was
broken during token streaming: the Stop button could take up to 120s
to respond instead of 0.5s.

Fix: keep the client's short read timeout (0.5s) for the stream call.
During prefill, catch ReadTimeout in a loop and re-check cancel_event
instead of re-sending the POST (which was the original retry storm).
Once the first bytes arrive, yield the response with a PrependStream
wrapper so iter_text() sees the buffered first chunk.

This preserves both:
- Fast cancel during prefill (via cancel watcher + ReadTimeout loop)
- Fast cancel during streaming (via _iter_text_cancellable's 0.5s
  ReadTimeout, which now fires correctly again)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: swap to short-timeout stream after prefill completes

Address two review issues:

1. _PrependStream did not inherit from httpx.SyncByteStream, so
   Response.iter_raw() would raise RuntimeError. Replaced with a
   _ShortTimeoutStream that inherits SyncByteStream properly.

2. client.stream() entry itself raises ReadTimeout during slow prefill
   (before headers arrive). The previous fix tried to catch this at
   the body-read level but missed the connection-level timeout.

New approach: keep the 120s read timeout for client.stream() so the
connection survives long prefills. Once headers arrive, replace the
response stream with _ShortTimeoutStream -- a wrapper that uses a
background reader thread and a Queue with a short get() timeout to
re-raise ReadTimeout at the original 0.5s interval. This way
_iter_text_cancellable's cancel-checking remains responsive during
token streaming while prefill gets the long timeout it needs.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: move _ShortTimeoutStream before LlamaCppBackend class

The class was placed inside LlamaCppBackend's body, splitting the
class in two and making _codec_mgr and other attributes unreachable.
Move it to module level before LlamaCppBackend.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: remove _ShortTimeoutStream, use watcher for all cancel

_ShortTimeoutStream had two critical issues:
1. Raising ReadTimeout from a generator kills it -- Python finalizes
   generators after an uncaught exception, so the next next() call
   hits StopIteration and streaming ends mid-response.
2. The unbounded Queue in the background reader loses backpressure,
   causing memory spikes with slow clients.

Simpler approach: use the 120s read timeout for the entire stream and
rely on the cancel watcher thread for all cancellation (both prefill
and streaming). The watcher closes the response on cancel_event,
which unblocks any blocking httpx read within ~0.3s. This eliminates
the need for short timeout tricks entirely.

Cancel latency:
- Prefill: ~0.3s (watcher polls cancel_event every 0.3s)
- Streaming: ~0.3s (same watcher mechanism)
- Both faster than the old 0.5s ReadTimeout approach

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docs: clarify cancel limitations in _stream_with_retry

The docstrings claimed ~0.3s cancel in all cases, but httpx cannot
interrupt a blocked read before the response object exists. Update
the docstrings to accurately describe the behavior:

- Cancel during prefill (header wait) is deferred until headers arrive
- Cancel during streaming works via response.close() from the watcher
- _iter_text_cancellable docstring updated to reflect the watcher-based
  cancel mechanism instead of the old ReadTimeout polling

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-18 05:10:32 -07:00
Michael Han
0922a2bb17
Update README.md 2026-03-18 04:21:18 -07:00
Daniel Han
1f12ba16df
Combine studio setup fixes: frontend caching, venv isolation, Windows CPU support (#4413)
* Allow Windows setup to complete without NVIDIA GPU

setup.ps1 previously hard-exited if nvidia-smi was not found, blocking
setup entirely on CPU-only or non-NVIDIA machines. The backend already
supports CPU and MLX (Apple Silicon) in chat-only GGUF mode, and the
Linux/Mac setup.sh handles missing GPUs gracefully.

Changes:
- Convert the GPU check from a hard exit to a warning
- Guard CUDA toolkit installation behind $HasNvidiaSmi
- Install CPU-only PyTorch when no GPU is detected
- Build llama.cpp without CUDA flags when no GPU is present
- Update doc comment to reflect CPU support

* Cache frontend build across setup runs

Skip the frontend npm install + build if frontend/dist already exists.
Previously setup.ps1 nuked node_modules and package-lock.json on every
run, and both scripts always rebuilt even when dist/ was already present.

On a git clone editable install, the first setup run still builds the
frontend as before. Subsequent runs skip it, saving several minutes.
To force a rebuild, delete frontend/dist and re-run setup.

* Show pip progress for PyTorch download on Windows

The torch CUDA wheel is ~2.8 GB and the CPU wheel is ~300 MB. With
| Out-Null suppressing all output, the install appeared completely
frozen with no feedback. Remove | Out-Null for the torch install
lines so pip's download progress bar is visible. Add a size hint
so users know the download is expected to take a while.

Also moves the Triton success message inside the GPU branch so it
only prints when Triton was actually installed.

* Guard CUDA env re-sanitization behind GPU check in llama.cpp build

The CUDA_PATH re-sanitization block (lines 1020-1033) references
$CudaToolkitRoot which is only set when $HasNvidiaSmi is true and
the CUDA Toolkit section runs. On CPU-only machines, $CudaToolkitRoot
is null, causing Split-Path to throw:

  Split-Path : Cannot bind argument to parameter 'Path' because it is null.

Wrap the entire block in `if ($HasNvidiaSmi -and $CudaToolkitRoot)`.

* Rebuild frontend when source files are newer than dist/

Instead of only checking if dist/ exists, compare source file timestamps
against the dist/ directory. If any file in frontend/src/ is newer than
dist/, trigger a rebuild. This handles the case where a developer pulls
new frontend changes and re-runs setup -- stale assets get rebuilt
automatically.

* Fix cmake not found on Windows after winget install

Two issues fixed:

1. After winget installs cmake, Refresh-Environment may not pick up the
   new PATH entry (MSI PATH changes sometimes need a new shell). Added a
   fallback that probes cmake's default install locations (Program Files,
   LocalAppData) and adds the directory to PATH explicitly if found.

2. If cmake is still unavailable when the llama.cpp build starts (e.g.
   winget failed silently or PATH was not updated), the build now skips
   gracefully with a [SKIP] warning instead of crashing with
   "cmake : The term 'cmake' is not recognized".

* Fix frontend rebuild detection and decouple oxc-validator install

Address review feedback:

- Check entire frontend/ directory for changes, not just src/.
  The build also depends on package.json, vite.config.ts,
  tailwind.config.ts, public/, and other config files. A change
  to any of these now triggers a rebuild.
- Move oxc-validator npm install outside the frontend build gate
  in setup.sh so it always runs on setup, matching setup.ps1
  which already had it outside the gate.

* Show cmake errors on failure and retry CUDA VS integration with elevation

Two fixes for issue #4405 (Windows setup fails at cmake configure):

1. cmake configure: capture output and display it on failure instead of
   piping to Out-Null. When the error mentions "No CUDA toolset found",
   print a hint about the CUDA VS integration files.

2. CUDA VS integration copy: when the direct Copy-Item fails (needs
   admin access to write to Program Files), retry with Start-Process
   -Verb RunAs to prompt for elevation. This is the root cause of the
   "No CUDA toolset found" cmake failure -- the .targets files that let
   MSBuild compile .cu files are missing from the VS BuildCustomizations
   directory.

* Address reviewer feedback: cmake PATH persistence, stale cache, torch error check

1. Persist cmake PATH to user registry so Refresh-Environment cannot
   drop it later in the same setup run. Previously the process-only
   PATH addition at phase 1 could vanish when Refresh-Environment
   rebuilt PATH from registry during phase 2/3 installs.

2. Clean stale CMake cache before configure. If a previous run built
   with CUDA and the user reruns without a GPU (or vice versa), the
   cached GGML_CUDA value would persist. Now the build dir is removed
   before configure.

3. Explicitly set -DGGML_CUDA=OFF for CPU-only builds instead of just
   omitting CUDA flags. This prevents cmake from auto-detecting a
   partial CUDA installation.

4. Fix CUDA cmake flag indentation -- was misaligned from the original
   PR, now consistently indented inside the if/else block.

5. Fail hard if pip install torch returns a non-zero exit code instead
   of silently continuing with a broken environment.

* Remove extra CUDA cmake flags to align Windows with Linux build

Drop GGML_CUDA_FA_ALL_QUANTS, GGML_CUDA_F16, GGML_CUDA_GRAPHS,
GGML_CUDA_FORCE_CUBLAS, and GGML_CUDA_PEER_MAX_BATCH_SIZE flags.
The Linux build in setup.sh only sets GGML_CUDA=ON and lets llama.cpp
use its defaults for everything else. Keep Windows consistent.

* Address reviewer round 2: GPU probe fallback, Triton check, stale binary rebuild

1. GPU detection: fallback to default nvidia-smi install locations
   (Program Files\NVIDIA Corporation\NVSMI, System32) when nvidia-smi
   is not on PATH. Prevents silent CPU-only provisioning on machines
   that have a GPU but a broken PATH.

2. Triton: check $LASTEXITCODE after pip install and print [WARN]
   on failure instead of unconditional [OK].

3. Stale llama-server: check CMakeCache.txt for GGML_CUDA setting
   and rebuild if the existing binary does not match the current GPU
   mode (e.g. CUDA binary on a now-CPU-only rerun, or vice versa).

* Fix frontend rebuild detection and npm dependency issues

Addresses reviewer feedback on the frontend caching logic:

1. setup.sh: Fix broken find command that caused exit under pipefail.
   The piped `find | xargs find -newer` had paths after the expression
   which GNU find rejects. Replaced with a simpler `find -maxdepth 1
   -type f -newer dist/` that checks ALL top-level files (catches
   index.html, bun.lock, etc. that the extension allowlist missed).

2. setup.sh: Guard oxc-validator npm install behind `command -v npm`
   check. When the frontend build is skipped (dist/ is cached), Node
   bootstrap is also skipped, so npm may not be available.

3. setup.ps1: Replace Get-ChildItem -Include with explicit path
   probing for src/ and public/. PowerShell's -Include without a
   trailing wildcard silently returns nothing, so src/public changes
   were never detected. Also check ALL top-level files instead of
   just .json/.ts/.js/.mjs extensions.

* Fix studio setup: venv isolation, centralized .venv_t5, uv targeting

- All platforms (including Colab) now create ~/.unsloth/studio/.venv
  with --without-pip fallback for broken ensurepip environments
- Add --python sys.executable to uv pip install in install_python_stack.py
  so uv targets the correct venv instead of system Python
- Centralize .venv_t5 bootstrap in transformers_version.py with proper
  validation (checks required packages exist, not just non-empty dir)
- Replace ~150 lines of duplicated install code across 3 worker files
  with calls to the shared _ensure_venv_t5_exists() helper
- Use uv-if-present with pip fallback; do not install uv at runtime
- Add site.addsitedir() shim in colab.py so notebook cells can import
  studio packages from the venv without system-Python double-install
- Update .venv_t5 packages: huggingface_hub 1.3.0->1.7.1, add hf_xet
- Bump transformers pin 4.57.1->4.57.6 in requirements + constraints
- Add Fast-Install helper to setup.ps1 with uv+pip fallback
- Keep Colab-specific completion banner in setup.sh

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix nvidia-smi PATH persistence and cmake requirement for CPU-only

1. Store nvidia-smi as an absolute path ($NvidiaSmiExe) on first
   detection. All later calls (Get-CudaComputeCapability,
   Get-PytorchCudaTag, CUDA toolkit detection) use this absolute
   path instead of relying on PATH. This survives Refresh-Environment
   which rebuilds PATH from the registry and drops process-only
   additions.

2. Make cmake fatal for CPU-only installs. CPU-only machines depend
   entirely on llama-server for GGUF chat mode, so reporting "Setup
   Complete!" without it is misleading. GPU machines can still skip
   the llama-server build since they have other inference paths.

* Fix broken frontend freshness detection in setup scripts

- setup.sh: Replace broken `find | xargs find -newer` pipeline with
  single `find ... -newer` call. The old pipeline produced "paths must
  precede expression" errors (silently suppressed by 2>/dev/null),
  causing top-level config changes to never trigger a rebuild.
- setup.sh: Add `command -v npm` guard to oxc-validator block so it
  does not fail when Node was not installed (build-skip path).
- setup.ps1: Replace `Get-ChildItem -Include` (unreliable without
  -Recurse on PS 5.1) with explicit directory paths for src/ and
  public/ scanning.
- Both: Add *.html to tracked file patterns so index.html (Vite
  entry point) changes trigger a rebuild.
- Both: Use -print -quit instead of piping to head -1 for efficiency.

* Fix bugs found during review of PRs #4404, #4400, #4399

- setup.sh: Add || true guard to find command that checks frontend/src
  and frontend/public dirs, preventing script abort under set -euo
  pipefail when either directory is missing

- colab.py: Use sys.path.insert(0, ...) instead of site.addsitedir()
  so Studio venv packages take priority over system copies. Add warning
  when venv is missing instead of silently failing.

- transformers_version.py: _venv_t5_is_valid() now checks installed
  package versions via .dist-info metadata, not just directory presence.
  Prevents false positives from stale or wrong-version packages.

- transformers_version.py: _install_to_venv_t5() now passes --upgrade
  so pip replaces existing stale packages in the target directory.

- setup.ps1: CPU-only PyTorch install uses --index-url for cpu wheel
  and all install commands use Fast-Install (uv with pip fallback).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix _venv_t5_is_valid dist-info loop exiting after first directory

Remove premature break that caused the loop over .dist-info directories
to exit after the first match even if it had no METADATA file. Now
continues iterating until a valid METADATA is found or all dirs are
exhausted.

* Capture error output on failure instead of discarding with Out-Null

setup.ps1: 6 locations changed from `| Out-Null` to `| Out-String` with
output shown on failure -- PyTorch GPU/CPU install, Triton install,
venv_t5 package loop, cmake llama-server and llama-quantize builds.

transformers_version.py: clean stale .venv_t5 directory before reinstall
when validation detects missing or version-mismatched packages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix ModuleNotFoundError when CLI imports studio.backend.core

The backend uses bare "from utils.*" imports everywhere, relying on
backend/ being on sys.path. Workers and routes add it at startup, but
the CLI imports studio.backend.core as a package -- backend/ was never
added. Add sys.path setup at the top of core/__init__.py so lazy
imports resolve correctly regardless of entry point.

Fixes: unsloth inference unsloth/Qwen3-8B "who are you" crashing with
"No module named 'utils'"

* Fix frontend freshness check to detect all top-level file changes

The extension allowlist (*.json, *.ts, *.js, *.mjs, *.html) missed
files like bun.lock, so lockfile-only dependency changes could skip
the frontend rebuild. Check all top-level files instead.

* Add tiktoken to .venv_t5 for Qwen-family tokenizers

Qwen models use tiktoken-based tokenizers which fail when routed through
the transformers 5.x overlay without tiktoken installed. Add it to the
setup scripts (with deps for Windows) and runtime fallback list.

Integrates PR #4418.

* Fix tiktoken crash in _venv_t5_is_valid and stray brace in setup.ps1

_venv_t5_is_valid() crashed with ValueError on unpinned packages like
"tiktoken" (no ==version). Handle by splitting safely and skipping
version check for unpinned packages (existence check only).

Also remove stray closing brace in setup.ps1 tiktoken install block.

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-18 03:52:25 -07:00
Wasim Yousef Said
7b07ad0fa3
fix(studio): UI fixes for chat and studio routes (#4419)
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-18 03:47:13 -07:00
Daniel Han
65acefd2a6
feat(studio): infinite scroll for recommended models list (#4414)
* feat(studio): infinite scroll for recommended models list

The model selector showed a hard cap of 4 GGUFs + 4 safetensors in the
Recommended section. Users who wanted to browse more had to search
manually on Hugging Face.

Backend: increase the default model pool from 8+8 to 40+40 (the HF
fetch already pulls 80, so no extra network cost).

Frontend: replace the static 4+4 cap with on-demand lazy loading.
A page counter tracks how many groups of 4 to show per category.
An IntersectionObserver on a sentinel div at the bottom of the list
increments the page when the user scrolls down. Models are interleaved
in groups of 4 GGUFs then 4 hub models per page for a balanced view.

Key implementation details:
- Callback ref for the sentinel so the observer attaches reliably on
  first popover open (useRef would miss the initial mount)
- Observer disconnects after each fire and re-attaches via useEffect
  with a 100ms layout delay to prevent runaway page loading
- VRAM info fetched incrementally via useRecommendedModelVram on the
  visible slice only
- recommendedSet uses visible IDs so HF search dedup stays correct

* refactor: address review feedback on recommended infinite scroll

- Simplify visibleRecommendedIds: use findIndex to locate the GGUF/hub
  split point instead of re-filtering the entire array each time.
  recommendedIds is already sorted GGUF-first, so a single slice is
  enough.

- Fix VRAM refetch churn: pass the full recommendedIds (stable across
  page increments) to useRecommendedModelVram instead of the growing
  visibleRecommendedIds slice. The hook derives its stableKey from the
  sorted+joined input, so passing the same pool on every page avoids
  redundant HF modelInfo requests.
2026-03-18 03:17:01 -07:00
Michael Han
67d3519cab
Update README.md 2026-03-17 23:04:54 -07:00
Daniel Han
767c31b0e0 Update README.md 2026-03-17 22:53:11 -07:00
Daniel Han
24753290ba Update README.md 2026-03-17 22:50:55 -07:00
Lee Jackson
9232126734
fix(studio): use explicit Cancel for model load toast (#4377) 2026-03-17 22:39:51 -07:00
Daniel Han
f3f52e2d84
Use blobless clone in README install instructions (#4403)
Reduces clone size from ~50MB to ~5MB by skipping blobs that are
no longer in the current tree but still in git history.
2026-03-17 22:07:21 -07:00
Daniel Han
3a28446a54
Trim ~255 MB of unused packages from Studio setup (#4395)
* Comment out large unused packages from Studio setup requirements

Audited all packages installed by `unsloth studio setup` against actual
imports in unsloth, unsloth_zoo, and studio/backend. The following have
zero imports anywhere and are the largest offenders by disk size:

- gradio (148 MB) in studio.txt -- Studio uses React + FastAPI, not Gradio
- executorch (41.5 MB) in extras-no-deps.txt -- no imports found
- scikit-learn (31.8 MB) in extras.txt -- no imports found
- MeCab (19.9 MB) in extras.txt -- Japanese tokenizer, no imports found
- coremltools (10.2 MB) in extras.txt -- Apple CoreML, no imports found
- uroman (4.0 MB) in extras.txt -- romanization tool, no imports found

Total savings: ~255 MB (~32% of the 805 MB installed by setup).

Each line is commented out with the package size annotated so they can be
re-enabled easily if needed in the future.

* Restore scikit-learn -- needed by sentence_transformers

sentence_transformers is installed with --no-deps in extras-no-deps.txt,
so its sklearn dependency is not auto-resolved. Multiple modules in
sentence_transformers import sklearn at the top level (evaluation,
util/similarity), so removing scikit-learn would break embedding jobs.
2026-03-17 21:32:38 -07:00
Coenraad Loubser
ca87669937
Unused return value causes build failures (#4385)
* Unused return value causes build failures

* Update toast messages to include model loading status
2026-03-17 20:57:27 -07:00
DoubleMathew
fd72376a7e
Fix/studio full finetuning (#4391)
* Wire Studio full finetuning into training loaders

* Preserve load_model positional compatibility
2026-03-17 20:47:26 -07:00
Daniel Han
0c8d407793
Rename cli/ to unsloth_cli/ to fix namespace collision with stringzilla (#4393)
* Rename cli/ to unsloth_cli/ to fix namespace collision with stringzilla

stringzilla installs a namespace package at cli/ (cli/split.py, cli/wc.py)
in site-packages without an __init__.py. When unsloth is installed as an
editable package (pip install -e .), the entry point script does
`from cli import app` which finds stringzilla's namespace cli/ first and
fails with `ImportError: cannot import name 'app' from 'cli'`.

Non-editable installs happened to work because unsloth's cli/__init__.py
overwrites the namespace directory, but this is fragile and breaks if
stringzilla is installed after unsloth.

Renaming to unsloth_cli/ avoids the collision entirely and fixes both
editable and non-editable install paths.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update stale cli/ references in comments and license files

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-17 20:40:21 -07:00
Michael Han
75da2e00c2
Update install instructions.md 2026-03-17 20:04:04 -07:00
Michael Han
8bca62aa78
Dual License clarification.md 2026-03-17 18:48:00 -07:00
Michael Han
e138a3d48b
Update install instructions.md 2026-03-17 16:08:40 -07:00
Wasim Yousef Said
03736a82ba
Relax frontend unused local check (#4388) 2026-03-17 16:04:11 -07:00
Michael Han
523ebf1e2f
Update Unsloth_Studio_Colab.ipynb 2026-03-17 15:42:38 -07:00
Manan Shah
93ab09d195
[Feature] compare for 2 diff models (#4356)
* compare for 2 diff models

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolving gemini comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix(studio): refine model-load toast stop action and compare selector sizing (#4369)

Co-authored-by: imagineer99 <samleejackson0@gmail.com>

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: imagineer99 <samleejackson0@gmail.com>
2026-03-17 22:58:34 +04:00
Michael Han
2a8d6b2b82
Update Unsloth_Studio_Colab.ipynb 2026-03-17 11:31:09 -07:00
Michael Han
01dfbab5a8
Update Unsloth_Studio_Colab.ipynb 2026-03-17 11:27:04 -07:00
Michael Han
a46cb120bb
Update Unsloth_Studio_Colab.ipynb 2026-03-17 11:24:54 -07:00
Michael Han
051b6f27f9
Update Unsloth_Studio_Colab.ipynb 2026-03-17 11:16:20 -07:00
Michael Han
a943705b4c
Update Unsloth_Studio_Colab.ipynb 2026-03-17 11:14:45 -07:00
Michael Han
685a0348e1
Update Unsloth_Studio_Colab.ipynb 2026-03-17 10:00:57 -07:00
Michael Han
881e057964
Unsloth Studio update.md 2026-03-17 08:42:03 -07:00
Daniel Han
880b59a301 Update README.md 2026-03-17 08:03:32 -07:00
Michael Han
deb76dfa1d
Update README.md 2026-03-17 07:57:46 -07:00
Daniel Han
1fffd0e17a Merge branch 'main' of https://github.com/unslothai/unsloth 2026-03-17 07:54:41 -07:00
Daniel Han
ebfaa18094 Update pyproject.toml 2026-03-17 07:54:32 -07:00
Michael Han
c60636695c
Unsloth Studio.md 2026-03-17 07:53:50 -07:00
Daniel Han
0acd1c7eec
studio: improve onboarding UX, tooltips, and training defaults (#4355)
* studio: improve onboarding UX, tooltips, and training defaults

- Change splash text to "Train and run LLMs locally"
- Add "Chat Only" card with BubbleChatIcon to skip directly to chat
- Add Skip/Skip to Chat buttons in sidebar and footer
- Back button on step 1 returns to splash screen instead of being disabled
- Change "Watch video guide" to "Get started with our guide" with new URL
- Update intro text to mention all model types + chat
- Make all tooltips clickable (in addition to hover) via React context
- Strip surrounding quotes from pasted HF tokens
- Rename "Eval Split" to "Evaluation Split"
- Add SparklesIcon to "Auto Detect" format option
- Change step 4 heading to "Choose your training parameters"
- Default max_steps to 60
- Learning rate displayed in scientific notation with +/- stepper
- Context length options capped by model's max_position_embeddings (via AutoConfig)
- Fix "QLORA"/"LORA" to "QLoRA"/"LoRA" in summary step
- Backend: add max_position_embeddings to model config endpoint

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* compare for 2 diff models

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolving gemini comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: disable thinking for Qwen3.5 <9B and always for AI Assist

- Change Qwen3.5 thinking threshold from <=2B to <9B (0.8B, 2B, 4B
  all disable thinking by default; 9B+ enables it)
- Always pass enable_thinking=False in AI Assist helper calls
  (_run_with_helper and _generate_with_backend) regardless of chat
  thinking settings

* studio: address PR review comments

- Extract _get_max_position_embeddings helper to DRY config extraction
- Fix "Skip to Chat" to navigate to /chat on step 1 (was /studio)

* fix: comment out debug print statements

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: skip Shiki highlighting for incomplete SVG code fences

While streaming SVG content, the syntax highlighter (Shiki) re-parses
the entire growing SVG on every token, blocking the main thread and
freezing the code area until the fence closes. Show a plain-text
preview for incomplete SVG fences instead, similar to how Mermaid
diagrams show a placeholder while streaming.

* studio: fix default top_k from 50/40 to 20 for chat inference

Per Qwen3.5 docs (unsloth.ai/docs/models/qwen3.5), top_k should be 20
for both thinking and non-thinking modes. The model-specific config in
inference_defaults.json already had top_k=20 for Qwen3.5, but the
generic fallback defaults were wrong:
- Frontend DEFAULT_INFERENCE_PARAMS.topK: 50 -> 20
- Backend generate_chat_completion top_k: 40 -> 20
- Backend generate_chat_completion_with_tools top_k: 40 -> 20
- Frontend title generation top_k: 40 -> 20

* studio: set universal inference defaults for unknown models

Default params for any model without specific config:
  temperature=0.6, top_p=0.95, top_k=20, min_p=0.01,
  presence_penalty=0.0, repetition_penalty=1.0

Models with entries in inference_defaults.json (Qwen3.5, Gemma-3,
Llama, etc.) override these with their recommended values.

Updated in: frontend DEFAULT_INFERENCE_PARAMS, backend Pydantic
request models, and backend generate_chat_completion defaults.

* studio: only trust_remote_code for unsloth/ models in AutoConfig

Only set trust_remote_code=True when the model name starts with
"unsloth/". All other models default to False for safety.

* studio: move Generating spinner above the composer

The "Generating" spinner was below the send message bar, causing
the bar to jump up and down. Move it above the composer in both
the regular thread view and the welcome/empty view.

* studio: adjust toast close button position away from edge

Move the X close button on toasts (like "Starting model...") from
top-1.5 to top-3 and add right-3, giving more breathing room from
the top-right corner.

* studio: make Think button smaller with tighter icon-text gap

Reduce gap from 1.5 to 0.5, padding from px-2.5/py-1 to px-2/py-0.5,
and icon from size-3.5 to size-3.

* studio: multiple onboarding and chat UX improvements

- Move Generating spinner above composer (fixes jumping send bar)
- Make Think button smaller with tighter icon-text gap
- Chat card now inside grid (same size as Audio/Embeddings cards)
- Rename "Chat Only" to "Chat"
- Chat card requires Continue to proceed (no auto-advance)
- Continue on Chat selection skips onboarding and goes to /chat
- Tooltip (i) click on Chat card doesn't trigger navigation
- Step 1 footer Back button goes back to splash (label is "Back")
- Splash "Skip Onboarding" renamed to "Skip to Chat", navigates to /chat
- Toast close button moved away from edge

* studio: align Skip to Chat button, add Skip to footer

- Sidebar "Skip to Chat" now uses primary (green) Button style with
  arrow icon, full width, aligned like step items. Shows on all steps.
- Footer: added "Skip" outline button next to Continue that goes
  directly to /studio with progress saved (markOnboardingDone)

* studio: change default max steps from 30 to 60 in toggle hook

The DEFAULT_MAX_STEPS in use-max-steps-epochs-toggle.ts was still 30,
used as fallback when toggling from epochs back to max steps.

* studio: extend context length options to 262K

CONTEXT_LENGTHS now includes 65536, 131072, 262144 in addition to
the existing 512-32768 range. The onboarding step filters these by
the model's max_position_embeddings (e.g. Nemotron-3-Nano-4B has
262144), showing powers of 2 up to the model's maximum.

* studio: auto-select LoRA vs QLoRA based on model size and GPU memory

After selecting a model in onboarding, detect the total model weight
file size from HF Hub (safetensors/bin files). Then estimate memory
needed: model_size_gb * 1.5 * context_scale, where context_scale is:
  - <=8192 tokens: 1.0x
  - >8192 tokens: 1.7x
  - >=16384 tokens: 2.0x
  - >=32768 tokens: 4.0x

If the estimate fits in free GPU VRAM, default to LoRA (16-bit).
Otherwise default to QLoRA (4-bit).

Backend changes:
- Add model_size_bytes to ModelDetails (models.py)
- Add _get_model_size_bytes() using HfApi.repo_info (routes/models.py)
- Add vram_free_gb to get_gpu_summary (hardware.py)

Frontend changes:
- Add autoSelectTrainingMethod() in training-config-store.ts
- Called after model defaults are loaded
- Add model_size_bytes to ModelConfigResponse type
- Add vramFreeGb to HardwareInfo hook

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: rename "Importing ML libraries..." to "Importing Unsloth..."

* studio: show model/dataset in training status, fix LoRA/QLoRA casing

- Training status now shows 'Training "model_name"' and 'Dataset = ...'
  instead of generic "Starting training..."
- Fix Studio progress section to show QLoRA/LoRA instead of QLORA/LORA

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: rename 'Skip to Chat' to 'Skip Onboarding' on splash screen

* studio: add presence_penalty support for chat inference

Add presence_penalty as a parameter across the full stack:
- Backend: llama_cpp.py generate_chat_completion/with_tools, Pydantic
  models (inference.py), routes/inference.py pass-through
- Frontend: InferenceParams type, DEFAULT_INFERENCE_PARAMS (0.0),
  chat-adapter.ts payload, chat-settings-sheet.tsx slider (0-2),
  model defaults loading from inference_defaults.json
- Set Qwen3.5 default presence_penalty to 1.5 per official docs
- Default for unknown models is 0.0 (off)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: fix Chat card deselecting Text and aligning with other cards

* studio: fix presence_penalty not loading from inference defaults

The inference_config.py load_inference_config() was not including
presence_penalty in the returned config dict, so the Qwen3.5
default of 1.5 from inference_defaults.json never reached the
frontend. Added it to the config builder.

* studio: add delete button for cached models in model selector

Add trash icon on each downloaded model row (GGUF and safetensors) with
confirmation dialog. Backend DELETE /api/models/delete-cached endpoint
uses huggingface_hub scan_cache_dir + delete_revisions to cleanly remove
cached repos, refusing if the model is currently loaded.

* studio: restore inference defaults, reasoning, and tools on page refresh

On page refresh with a model already loaded, the frontend was not
re-applying model-specific inference defaults (presence_penalty,
temperature, etc.) or restoring reasoning/tools support flags.

Backend: Add inference config, supports_reasoning, supports_tools,
and context_length to InferenceStatusResponse.

Frontend: In the refresh callback, when an active model is detected,
apply mergeRecommendedInference and restore reasoning/tools flags
with proper Qwen3.5 size-based defaults.

* studio: fix delete dialog closing before async completes

Prevent AlertDialogAction's default close behavior with
e.preventDefault() so the dialog stays open during deletion.
Also block onOpenChange dismiss while deleting is in progress.

* fix: add Dict and Any imports to inference models

* studio: fix Qwen3.5 reasoning threshold in frontend load path

The frontend loadModel handler had the old threshold (<=2) for
disabling reasoning on small Qwen3.5 models. Changed to <9 to
match the backend. This was causing 4B to not properly disable
thinking by default when auto-loaded.

* studio: move GGUF delete to per-variant level

For GGUF repos, the trash icon now appears on each downloaded variant
row inside the quantization expander instead of on the repo-level row.
Backend accepts optional variant param to delete specific GGUF files
(blob + symlink) rather than the entire repo cache.

* studio: restore ggufContextLength on page refresh

The Max Tokens slider was capped at 32768 on page refresh because
ggufContextLength was not restored from the status response.
Now set it from statusRes.context_length on reconnect.

* fix: remove <think> from Qwen3.5 response template marker

The train-on-responses-only feature uses template markers to find
where the assistant response starts. The Qwen3.5 response marker
included '<think>\n' which is only present when thinking mode is
enabled. With thinking disabled (default for <9B), the marker
never matched, causing 100% of samples to be dropped.

Changed response marker from '<|im_start|>assistant\n<think>\n'
to '<|im_start|>assistant\n' which works regardless of thinking mode.

* studio: fix sloth ASCII art alignment in training overlay

* fix: correct sloth ASCII art alignment to match Unsloth banner

* studio: add Python and terminal tool calling to chat

Register python and terminal tools alongside web search. Python
executor validates imports (stdlib only) via unsloth_zoo
rl_environments, runs code in a subprocess sandbox with 5-min
timeout and cancel support. Terminal executor blocks dangerous
commands (rm, sudo, etc.) and runs in a temp directory.

Update llama_cpp tool loop to show tool-specific status messages
and pass cancel_event through to executors. Rename composer
toggle from "Search" to "Tools" and show TerminalIcon for
execution status pills.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: fix Nemotron/transformers 5.x support, onboarding navigation, port binding

Backend:
- Dynamic transformers 5.x detection via tokenizer_config.json fetch
  (checks for TokenizersBackend class, cached per-model)
- Bump transformers 5.x version from 5.2.0 to 5.3.0 across all workers,
  setup scripts (setup.sh, setup.ps1)
- Auto-enable trust_remote_code for unsloth/* models needing transformers 5.x
  (workaround for NemotronH config parsing bug in transformers)
- Auto-install mamba-ssm/causal-conv1d for SSM models (NemotronH, Falcon-H1)
  with --no-build-isolation --no-deps to avoid torch version conflicts
- Add SO_REUSEADDR to port check in run.py (fixes Colab proxy stale connection
  falsely reporting port as in-use)

Frontend:
- Fix "Skip to Chat" navigation: use window.location.href instead of React
  Router navigate() to bypass useEffect redirect race
- Fix "Skip Onboarding" on splash: navigates to /studio (not /chat)
- Fix onboarding guard: only check isOnboardingDone() on initial mount
- Fix Chat card on step 1: add sr-only spacer for consistent alignment
- Fix Chat+Text both selected: clear RadioGroup value when Chat is selected

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: split tools toggle into Search and Code buttons

Replace the single "Tools" toggle with two independent toggles:
- "Search" (globe icon) enables web search only
- "Code" (terminal icon) enables Python and terminal execution

Add enabled_tools list field to the inference payload so the
backend only registers the tools the user has toggled on. Both
toggles appear in the main composer and the compare composer.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: fix tool calling import validation and error logging

Replace unsloth_zoo-dependent import checker with a standalone
ast-based validator using sys.stdlib_module_names. This properly
blocks non-stdlib imports (numpy, requests, etc.) and returns a
clear error message to the model so it can rewrite using only
stdlib.

Add full traceback to tool streaming error logs for debugging.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: parse gpt-oss harmony channels for clean safetensors chat output

gpt-oss models emit multi-channel output via harmony protocol tokens
(<|channel|>analysis<|message|>... and <|channel|>final<|message|>...).
TextIteratorStreamer with skip_special_tokens=True strips the special
tokens but leaves channel names concatenated with content, producing
garbled output like "analysisWe need to...assistantfinalHello!".

Add HarmonyTextStreamer that decodes with skip_special_tokens=False,
parses harmony markup via regex, and emits <think>analysis</think>
for the analysis channel and plain text for the final channel --
reusing the existing frontend reasoning UI.

Also expose supports_reasoning=True for non-GGUF gpt-oss models in
the /status endpoint so the frontend enables the Think toggle.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: use unsloth_zoo for Python sandbox validation

Set UNSLOTH_IS_PRESENT=1 and import check_python_modules and
check_signal_escape_patterns directly from unsloth_zoo instead
of a standalone fallback. This gives us the full Unsloth
validation including stdlib-only import checks and signal/timeout
escape pattern detection.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: allow all imports in Python tool sandbox

Remove stdlib-only import restriction. Keep signal escape
pattern detection via unsloth_zoo for safety.

* studio: fix ReadTimeout on tool streaming final pass

The 0.5s read timeout used for cancel-checking during streaming
also fires when waiting for the first response from llama-server
(e.g. reasoning model thinking for 15+ seconds). Add
_stream_with_retry() context manager that retries on ReadTimeout
while checking cancel_event, so the model has unlimited time to
think before producing the first token. Applied to both the
regular streaming path and the tool-calling final pass.

* fix: rewrite HarmonyTextStreamer with stateful incremental parsing

The delta-on-transformed approach had two critical bugs:

1. Before the full <|channel|>X<|message|> pattern was complete, the
   strip-tokens fallback emitted "analysis" as plain text. Then when
   the regex matched, _transform returned a completely different format
   (<think>...</think>) and the delta was computed against the wrong
   base string, producing fragments like "think>", "nk>", ">".

2. Even with full matches, the closing </think> tag shifted position
   as content grew, so text[prev_len:] produced garbled deltas.

Replace with stateful incremental parsing that:
- Buffers until a complete channel+message pair is seen
- Emits <think> once when analysis channel first appears
- Streams analysis content deltas (computed on channel content directly)
- Emits </think> once when final channel first appears
- Streams final content deltas
- Closes open think tags in end()

Also skip the generic all_special_tokens stripping in
_clean_generated_text for gpt-oss since HarmonyTextStreamer already
produces clean output and the generic stripping was mangling <think>
tags.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: strip all <|...|> tokens in gpt-oss cleanup, not just harmony subset

The gpt-oss tokenizer has added tokens like <|return|> (id=200002) that
are not part of the harmony channel protocol but can leak into output.
The previous regex only stripped channel|message|start|end tokens.

Broaden the _clean_generated_text regex for gpt-oss to <\|[a-z_]+\|>
which catches all pipe-delimited tokens (return, constrain, reserved,
etc.) without matching <think>/<\/think> tags.

Verified: gpt-oss all_special_tokens are only <|return|>,
<|reserved_200017|>, <|startoftext|> -- none overlap with <think>.
The harmony tokens (channel, message, start, end) are added_tokens
but not in all_special_tokens.

* fix: hide config-only model repos from cached models list

Repos that only have metadata/config files cached (no .safetensors or
.bin weight files) were showing up in the Downloaded list with tiny
sizes like "1.8 KB" or "24 KB". These are just leftover config
snapshots from architecture checks, not usable models.

Filter the cached-models endpoint to only include repos that contain
actual model weight files (.safetensors or .bin).

* studio: fix toast description text contrast in dark mode

Add explicit !text-muted-foreground to toast description classNames
so secondary text (e.g. "Releases VRAM and resets inference state.")
is readable in dark mode.

* studio: fix Chat card icon alignment with size-4 spacer

Replace sr-only span (takes no space) with a size-4 shrink-0 div
matching the RadioGroupItem dimensions in other cards, so the Chat
icon aligns vertically with Text/Audio/Vision/Embeddings icons.

---------

Co-authored-by: workspace <user@workspace.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Manan17 <shahmanan170602@gmail.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
2026-03-17 07:46:07 -07:00
Daniel Han
29f7fddac6 Studio UI 2026-03-17 07:44:54 -07:00
Michael Han
f3b6e0e486
Add files via upload 2026-03-17 06:42:25 -07:00
Roland Tannous
c6bd55ec61
fix(llm_assist): disable thinking mode for helper model JSON output (#4358)
* fix(llm_assist): disable thinking mode for helper model JSON output

Pass enable_thinking=False to generate_chat_completion() in both
_run_with_helper() and _generate_with_backend() so the Qwen3.5-4B
helper model produces clean JSON instead of wrapping responses in
<think> tags.

* fix(llm_assist): log per-request enable_thinking=False override

Add info-level log lines so the user can see that each helper/advisor
request overrides the server-level thinking default to False.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-17 15:58:08 +04:00
Roland Tannous
a0aba96ebd
fix: comment out debug print statements (#4357) 2026-03-17 15:43:27 +04:00
Daniel Han
37fe04f7bf
studio: add SVG preview, fix streaming bug and model selector state (#4354)
- Add SVG preview rendering below code blocks using safe data URI
  in <img> tag. Includes sanitization to block script/event handlers.
- Fix GGUF streaming crash: cache response.iter_text() iterator
  instead of creating a new one on every loop iteration.
- Fix model selector showing "Select model..." after auto-load by
  re-reading store state after setCheckpoint before setParams.
- Remove unused warmupToastShown variable (TS6133 build error).
- Change default suggestion to "Draw an SVG of a cute sloth".
2026-03-17 02:34:05 -07:00
Datta Nimmaturi
33dc47da72
Fix spacing in setup.sh echo statements 2026-03-17 14:53:55 +05:30
Roland Tannous
df01a139a0
fix: remove unused warmupToastShown variable to fix TS6133 build error (#4353) 2026-03-17 02:03:15 -07:00
Daniel Han
fe05b700dc
studio: fix slow cancellation of GGUF generation (#4352)
The streaming loop used response.iter_text() with timeout=None, which
blocks until the next chunk arrives from llama-server. On large models
like Qwen3.5-27B where each token takes seconds, pressing Stop in the
UI would not take effect until the next token was produced.

Fix by using a 0.5s read timeout and a new _iter_text_cancellable()
helper that checks cancel_event between timeout windows and explicitly
closes the response when cancelled. Applied to both the regular chat
completion and tool-calling streaming paths.
2026-03-17 01:47:21 -07:00
Daniel Han
b437c9a36d
studio: update Creative/Precise presets, show "Off" for disabled samplers (#4350)
Creative: temperature=1.5, min_p=0.1, top_p=Off (1.0), top_k=Off (0)
Precise: temperature=0.1, top_p=0.95, top_k=80, min_p=0.01

Also show "Off" in the slider label for top_p=1.0, top_k=0, and
repetition_penalty=1.0 since those values disable their respective
samplers. Changed top_k slider min from -1 to 0.
2026-03-17 01:32:18 -07:00
Daniel Han
ee6f057cc2
studio: show "Off" for repetition penalty = 1 (#4349) 2026-03-17 01:28:33 -07:00
Daniel Han
c00a993a68
studio: fix stale GGUF metadata, update helper model, auth improvements (#4346)
* studio: switch helper model to Qwen3.5-4B-GGUF

Replace Qwen3-4B-Instruct-2507-GGUF with Qwen3.5-4B-GGUF as the
default helper model for LLM-assisted dataset detection. Same
UD-Q4_K_XL variant.

* studio: fix stale GGUF metadata when switching models (#4347)

Reset _supports_reasoning, _supports_tools, _context_length, and
_chat_template at the start of _read_gguf_metadata() to prevent
stale settings from a previous model leaking into the next load.

Co-authored-by: Daniel Han <daniel@unsloth.ai>

* studio: change login error to "Incorrect password", add reset-password CLI

- Login error now says "Incorrect password" instead of the generic
  "Incorrect username or password" since Studio only has one account.
- Add `unsloth studio reset-password` command that deletes the auth
  database so a fresh admin account with a new random password is
  created on the next server start.

* studio: include reset command in login error message

* studio: change password setup subtitle wording
2026-03-17 01:22:08 -07:00
Daniel Han
eeffa4c065
studio: web search, KV cache dtype, training progress, inference fixes
## Summary
- Add web search tool calling for GGUF models (Search toggle, DuckDuckGo via ddgs)
- Add KV cache dtype dropdown (f16/bf16/q8_0/q5_1/q4_1) in Chat Settings
- Fix Qwen3/3.5 inference defaults per official docs (thinking on/off params)
- Enable reasoning by default for Qwen3.5 4B and 9B
- Replace "Generating" toast with inline spinner
- Fix stop button via asyncio.to_thread (event loop no longer blocked)
- Fix CUDA 12 compat lib paths for llama-server on CUDA 13 systems
- Fix auto-load model name not appearing in selector
- Training progress messages + dataset_num_proc fix

Integrated PRs:
- #4327 (imagineer99): BETA badge alignment (already in tree)
- #4340 (Manan Shah): prioritize training models in model selection
- #4344 (Roland Tannous): setup.sh macOS python version compatibility
- #4345 (Manan Shah): revamp model+dataset checking logic
2026-03-17 00:30:01 -07:00
pluesclues
f5e1f52b48
Add check to disable xformers on newer GPUs (#4342)
Disable xformers for GPUs with compute capability >= 12 to ensure compatibility with newer hardware.
2026-03-16 22:42:38 -07:00
Michael Han
a804325171
Update Unsloth_Studio_Colab.ipynb 2026-03-16 22:30:12 -07:00
Michael Han
674ce29131
Update Unsloth_Studio_Colab.ipynb 2026-03-16 22:28:58 -07:00
Michael Han
f0afafd4ba
Update Unsloth_Studio_Colab.ipynb 2026-03-16 22:16:39 -07:00
Michael Han
227759df61
Update Unsloth_Studio_Colab.ipynb 2026-03-16 22:15:07 -07:00
Datta Nimmaturi
bbf6414caf
Fix formatting of launch command in setup.ps1 2026-03-17 10:19:16 +05:30
Leo Borcherding
df98569f12
studio: improve Colab notebook, redesign ready popup, and clean up install output (#4339)
* Removing .precommit config

* edited colab comments

* studio: update Unsloth_Studio_Colab.ipynb

* studio: update Unsloth_Studio_Colab.ipynb

* studio: add Colab T4 GPU metadata to force T4 instance

* style: update colab popup to black/white theme with gem icon and play button

* feat: center landscape image in colab notebook

* style: shrink popup to fit content, truncate URL display

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* feat: center landscape image in colab notebook

* feat: use GitHub raw URL for studio landscape image in notebook

* chore: update colab notebook

* feat: add studio landscape colab display image and update notebook

* feat: update notebook with studio landscape image

* style: remove colors, add progress bar, add VERBOSE flag to install output

* docs: add comments explaining VERBOSE flag and progress bar

* chore: update colab notebook

* fix: define VERBOSE, _STEP, _TOTAL at module level to fix NameError

---------

Co-authored-by: LeoBorcherding <LeoBorcherding@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-16 21:39:25 -07:00
Daniel Han
dc2879a048
Fix xformers Blackwell guard: broader coverage and root cause docs (#4338)
* Remove outdated xformers Blackwell version guard

The guard at _utils.py:976-989 blocked xformers 0.0.32.post2 on
Blackwell/RTX 50x/Jetson GPUs (SM 10.0/11.0/12.0) due to a FA3
dispatch bug that caused CUDA errors (issue #1329).

This is no longer needed because:

1. xformers fixed the FA3 dispatch in 0.0.33.post2 by capping it
   at SM <= 9.0, so FA3 is never attempted on Blackwell. The FA2
   backend works correctly via PTX forward compatibility.

2. The only blocked version (0.0.32.post2) was built for torch 2.8.0
   and cannot load on torch 2.9+ due to ABI mismatch, so the guard
   never actually triggers for any current user.

3. The existing _register_extensions() check plus the except Exception
   fallback already handle broken xformers installs gracefully by
   falling back to SDPA.

Verified on NVIDIA RTX PRO 6000 Blackwell (SM 12.0) with both
pre-built wheels (0.0.33.post2) and source builds -- all attention
tests pass with exact numerical match vs SDPA.

* Update xformers Blackwell guard with root cause and broader coverage

Changes to the xformers version guard for Blackwell/RTX 50x/Jetson GPUs:

1. Broaden version check from `in (0.0.32.post2,)` to `<= 0.0.32.post2`
   to cover all versions with the broken FA3 dispatch, not just one.

2. Add `DEVICE_TYPE == "cuda"` guard to avoid calling
   `get_device_capability()` on non-CUDA devices (XPU, etc.).

3. Document the root cause: xformers <= 0.0.32.post2 used
   `capability >= (9, 0)` in the FA3 dispatch, which matched
   Blackwell SM 12.0 and attempted sm_90a Hopper kernels on it.
   Fixed upstream in 0.0.33 with `<= (9, 0)`.

4. Update error message to include the installed version, mention
   the fix (upgrade to >= 0.0.33), and keep the build-from-source
   fallback. The raise is caught by `except Exception` which shows
   the message when UNSLOTH_ENABLE_LOGGING is set and falls back
   to SDPA.

Verified on NVIDIA RTX PRO 6000 Blackwell (SM 12.0):
- xformers 0.0.33.post2 pre-built wheel: works (FA2 via PTX)
- xformers source build: works (FA2 native)
- Both have exact numerical match vs SDPA

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-03-16 21:30:03 -07:00
Daniel Han
6912a15a42
fix: add Qwen3.5 version gate in loader dispatch (#4335)
* fix: add Qwen3.5 version gate in loader dispatch (#4188)

Qwen3.5 (model_type qwen3_5) only exists in transformers >= 5.0.0.
Without this gate, loading a Qwen3.5 model on transformers 4.x gives
an unhelpful generic error. This adds a clear version check before the
qwen3 dispatch to prevent substring misrouting and give a useful error
message pointing users to upgrade.

No dedicated FastQwen3_5Model is needed -- the compiler already applies
fused CE automatically via apply_fused_lm_head for both
Qwen3_5ForCausalLM and Qwen3_5ForConditionalGeneration. The generic
FastModel fallback path handles everything.

FORCE_FLOAT32 already has qwen3_5 on main.

Tested on transformers 5.3.0: Qwen3.5-0.8B 4bit, 1.38 GB peak memory.
Backwards compatible: import unsloth works on transformers 4.57.6.

* fix: update FORCE_FLOAT32 comment for qwen3_5

The (1+w) RMSNorm pattern does not overflow float16 since Qwen3_5RMSNorm
computes in float32 internally. The actual reason FORCE_FLOAT32 is needed
is that Qwen3.5 GDN layers produce NaN grad norms during float16 training.
Updated the comment to reflect the real reason.

* fix: move qwen3_5 version check before dispatch chain

The elif block intercepted qwen3_5 on transformers >= 5.0.0 without
setting dispatch_model, causing UnboundLocalError at line 715.

Move the version check before the if/elif dispatch chain so on
transformers >= 5.0.0 the model_type falls through to the generic
FastModel path as intended.

* fix: qwen3_5 requires transformers >= 5.2.0, not 5.0.0

Checked all 5.x releases:
- 5.0.0: no qwen3_5 module
- 5.1.0: no qwen3_5 module
- 5.2.0: qwen3_5 available

* fix: move qwen3_5 version check into AutoConfig error handler

The previous version check at the dispatch chain was unreachable --
AutoConfig.from_pretrained fails first with a generic "does not
recognize this architecture" error on transformers < 5.2.0, so
execution never reached the check.

Move the qwen3_5-specific error message into the AutoConfig exception
handler where "architecture" errors are caught. This intercepts the
error before the generic message and gives users a clear upgrade path.

Also remove the now-redundant check before the dispatch chain.
Both FastLanguageModel and FastModel paths are covered.

Tested: transformers 4.57.6 shows the Qwen3.5-specific error,
transformers 5.3.0 loads and trains normally.

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-03-16 20:37:42 -07:00
Leo Borcherding
262271a20d
Fix/colab comment edits (#4317)
* Removing .precommit config

* edited colab comments

* studio: update Unsloth_Studio_Colab.ipynb

* studio: update Unsloth_Studio_Colab.ipynb

* studio: add Colab T4 GPU metadata to force T4 instance

* style: update colab popup to black/white theme with gem icon and play button

* feat: center landscape image in colab notebook

* style: shrink popup to fit content, truncate URL display

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* feat: center landscape image in colab notebook

* feat: use GitHub raw URL for studio landscape image in notebook

* chore: update colab notebook

---------

Co-authored-by: LeoBorcherding <LeoBorcherding@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-16 16:15:46 -07:00
pre-commit-ci[bot]
1c3f201943
[pre-commit.ci] pre-commit autoupdate (#4332)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.15.5 → v0.15.6](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.5...v0.15.6)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-16 14:41:49 -07:00
Roland Tannous
46f9be3dd1
fix: Resolve CUDA toolkit mismatch on multi-CUDA Windows systems (#4324)
* fix: prefer existing CUDA_PATH toolkit to avoid version mismatch on multi-CUDA systems

* fix: validate GPU arch support before accepting CUDA toolkit (sm_120 + CUDA 12.4 fallback)

* debug: add temporary CUDA compatibility check print

* fix: auto-copy CUDA VS integration files when missing (No CUDA toolset found)

* fix: return false when nvcc --list-gpu-arch unavailable (reject old toolkit, scan for newer)

* fix: re-sanitize CUDA env vars before cmake build (survives Refresh-Environment)

* fix: use --list-gpu-code (sm_*) instead of --list-gpu-arch (compute_*) for arch probing
2026-03-16 18:16:16 +04:00
Daniel Han
44dcf30b9b
studio: per-model inference defaults, GGUF slider fix, reasoning toggle (#4325)
* studio: extract param count from model name as fallback

When HuggingFace API doesn't return totalParams for a model,
extract the param count from the model name (e.g. "Qwen3-0.6B"
-> "0.6B", "Llama-3.2-1B-Instruct" -> "1B"). Applied to both
the recommended list and HF search results.

* studio: read GGUF context_length via fast header parser, set max tokens

- Fast GGUF metadata reader (~30-55ms) parses only KV header, skips
  tensor data and large arrays (tokenizer vocab etc)
- Extracts context_length and chat_template from GGUF metadata
- Returns context_length in LoadResponse for frontend to use
- Frontend sets maxTokens to actual context_length for GGUFs (e.g.
  262144 for Qwen3.5-9B, 131072 for Qwen2.5-7B)
- Max Tokens slider shows "Max" and is locked for GGUFs
- Auto-load path also uses actual context_length from load response
- Toast auto-dismiss (5s) and close button for auto-load toast

* studio: GGUF TTS audio support (from PR #4318)

Add GGUF TTS audio generation via llama-server. When a GGUF model
loads, the backend probes its vocabulary to detect audio codecs
(SNAC/BiCodec/DAC/CSM/Whisper). If detected, the codec is pre-loaded
and the model is reported as audio to the frontend.

During chat, TTS models route to the audio generation path which sends
a per-codec prompt to llama-server's /completion endpoint, extracts
generated tokens/text, and decodes to WAV using AudioCodecManager.

Also strips base64 audio data from prior assistant messages to prevent
context overflow.

Co-authored-by: Manan Shah <mananshah511@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove package-lock.json from tracking

* studio: per-model inference defaults, GGUF max tokens fix, reasoning toggle

- Add inference_defaults.json with per-model-family sampling parameters
  for ~50 families (Qwen3.5, Qwen3, Gemma-3, Llama-3, DeepSeek, etc.).
  Values sourced from unslothai/docs and Ollama params blobs.

- Family-based lookup in inference_config.py: extracts model family from
  identifier, matches against patterns (longest match first), merges with
  priority: model-specific YAML > family JSON > default.yaml.

- Fix GGUF Max Tokens slider locked at "Max": store ggufContextLength
  separately from maxTokens so the slider is adjustable (step=64).

- Fix Ministral YAML: top_p was literal string "default", now 0.95.

- Add reasoning toggle for thinking models (Qwen3.5, Qwen3, DeepSeek-R1,
  DeepSeek-V3.1, etc.): detect enable_thinking support from GGUF chat
  template metadata, pass --jinja to llama-server, send
  chat_template_kwargs per-request. Frontend shows "Reasoning is ON/OFF"
  pill button next to attachment button in composer.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: remove default system prompt injection

Backend was injecting "You are a helpful AI assistant." when no system
prompt was provided. Neither unslothai/docs nor Ollama specify a default
system prompt for most models. Now defaults to empty string, letting the
model's own chat template handle system behavior.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: use lightbulb icons and "Think" label for reasoning toggle

Lightbulb on when thinking enabled, lightbulb-off when disabled.
Label is just "Think" in both states; grayed out styling when off.

* studio: fix HTML file upload breaking chat

Replace SimpleTextAttachmentAdapter with custom TextAttachmentAdapter
(excludes text/html) and HtmlAttachmentAdapter that strips tags via
DOMParser, removing scripts/styles and extracting readable text content
instead of dumping raw HTML markup into the conversation.

* studio: show chat template in Configuration panel

Display the model's Jinja2 chat template in a new "Chat Template"
section under Settings (now open by default). For GGUFs, reads from
GGUF metadata; for safetensors, reads from tokenizer.chat_template.

Template is editable with a "Restore default chat template" button
that appears when modified. Section only shows when a model with a
chat template is loaded.

* studio: editable chat template with Apply & Reload

Chat template section now functional:
- Editing the template shows "Apply & Reload" (reloads model with
  custom template) and "Revert changes" buttons
- For GGUFs: writes template to temp .jinja file, passes
  --chat-template-file to llama-server on reload
- For non-GGUF: passes chat_template_override in load request
- Settings section now open by default
- selectModel supports forceReload to reload same model

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: fix DeepSeek reasoning detection and auto-load metadata

- Set _model_identifier before _read_gguf_metadata so DeepSeek
  "thinking" template detection works (was always None before)
- Populate ggufContextLength, supportsReasoning, reasoningEnabled,
  defaultChatTemplate in autoLoadSmallestModel GGUF path

* studio: add spacing before BETA badge in navbar

Add gap-1.5 on the logo Link container to space the BETA label
from the wordmark.

Co-authored-by: Imagineer99 <Imagineer99@users.noreply.github.com>

* studio: vertically center BETA badge with logo

---------

Co-authored-by: Manan Shah <mananshah511@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Imagineer99 <Imagineer99@users.noreply.github.com>
2026-03-16 06:37:55 -07:00
Roland Tannous
6d12a6b13b
Improve AI Assist: Update default model, model output parsing, logging, and dataset mapping UX (#4323)
* Strip <think> blocks from LLM assist model output

* Add debug logging for raw LLM assist output

* Quiet llama-server logs, use structlog in llm_assist

* Fix think-tag stripping when response is inside tags

* Remove debug logging of raw model output

* Clarify GGUF download logs: show cache hit vs actual download

* Clarify heuristic-detected mapping in UI text

* Default helper model to Qwen3-4B-Instruct-2507 UD-Q4_K_XL

* Remove package-lock.json from tracking, add to .gitignore

* Auto-open mapping dialog on Start Training for custom_heuristic format

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use last think block when extracting inner content (review feedback)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-16 16:04:35 +04:00
Daniel Han
11449208f4
Fix VLM GRPO matmul shape mismatch in _get_per_token_logps_and_entropies (#4301)
* Fix VLM GRPO matmul shape mismatch in _get_per_token_logps_and_entropies

VLM models (e.g. Qwen2.5-VL) can return logits [B*T, vocab_size] instead
of hidden states [B*T, hidden_dim] from their forward pass. When this
happens, chunked_hidden_states_selective_log_softmax tries to compute
logits @ lm_head.t() which fails with a shape mismatch.

Add a shape guard in the VLM branch of _get_per_token_logps_and_entropies:
check output.shape[-1] against lm_head.shape[1] (hidden_dim). When hidden
states are returned, the existing path is taken. When logits are returned,
scaling/softcapping/temperature are applied manually and
chunked_selective_log_softmax is used instead.

Also add chunked_selective_log_softmax to the import from unsloth_zoo.

The text-only branch (pixel_values is None) is unchanged.

Companion PR to unslothai/unsloth-zoo for grpo_accumulated_loss.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove redundant scaling in logits fallback path

When COMPILE_DISABLE=1 and the model returns logits directly, scaling
and softcapping are already applied by the model forward. Only
temperature (a GRPO training parameter) needs to be applied.

* Pass temperature to chunked_selective_log_softmax instead of manual cast

Use the new temperature parameter in chunked_selective_log_softmax
(added in companion zoo PR) to avoid casting the entire logits tensor
to float32 before the function call.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-16 03:54:16 -07:00
Daniel Han
356538d760 Apply use_reentrant removal to all TRL trainer configs, not just GRPO
The existing fix that removes use_reentrant=False from
gradient_checkpointing_kwargs was gated behind RLConfig_name ==
"GRPOConfig", so only GRPOConfig was protected. SFTConfig, DPOConfig,
KTOConfig, CPOConfig, ORPOConfig etc. were all still affected.

Remove the GRPOConfig guard so the fix applies to all compiled trainer
configs when TRL >= 0.27.0.

This is defense-in-depth alongside the unsloth_zoo fix that forces
use_reentrant=True in unsloth_checkpoint() itself.
2026-03-16 03:51:35 -07:00
Daniel Han
ec9a0906eb studio: GGUF unlimited context, auto-load, settings UX, recommended list
- GGUF: use -c 0 for model's native context size (no 4096 cap)
- GGUF: hide Max Seq Length slider (irrelevant), set Max Tokens to Max
- Non-GGUF: default Max Tokens to 4096
- Max Tokens slider shows "Max" label when at ceiling for GGUFs
- Run non-GGUF load_model in asyncio.to_thread for progress polling
- Auto-load smallest downloaded model when chatting without selection
- Wait for in-progress model load before inference (modelLoading store flag)
- Recommended list: 4 GGUFs + 4 hub models after case-insensitive dedup
- Model selector waits for cached data before rendering
- Toast close button repositioned, Sampling section open by default
- Add logging to _get_repo_size_cached exception handler
2026-03-16 02:46:56 -07:00
pre-commit-ci[bot]
9945843fa9 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-16 02:46:56 -07:00
Daniel Han
991a2bfc35 studio: GGUF unlimited context, auto-load, wait-for-load, UX fixes
- Use -c 0 for llama-server (model's native context size, no 4096 cap)
- Run non-GGUF backend.load_model in asyncio.to_thread for progress polling
- Auto-load smallest downloaded model when user chats without selecting one
- Wait for in-progress model load before inference (no "No model loaded" error)
- Add modelLoading flag to zustand store for cross-component coordination
- Dynamic top models: send 8 GGUFs + 8 hub models, frontend caps 4+4 after dedup
- Case-insensitive dedup: downloaded models correctly hide from recommended list
- Prevent duplicate toasts: guard against double selectModel calls
- Model selector waits for cached data before rendering (no empty flash)
- Toast close button positioned at top-right with proper spacing
- Sampling section expanded by default in chat settings
- Global toast close button styling fix
2026-03-16 02:46:56 -07:00
Daniel Han
20c6d9a26a Set repetition_penalty default to 1.0 (disabled) everywhere
Change all repetition_penalty defaults from 1.1 (or 1.05/1.2 in
presets) to 1.0 across the entire backend and frontend. Most models
handle repetition well on their own and a non-1.0 penalty can degrade
output quality, especially for code, structured output, and creative
tasks.

Files changed:
- Backend: inference.py, llama_cpp.py, orchestrator.py, worker.py,
  models/inference.py (Field defaults)
- Frontend: chat-settings-sheet.tsx (Creative/Precise presets),
  runtime-provider.tsx (auto-title generation)
2026-03-16 02:46:56 -07:00
Daniel Han
b985471637 Increase default max tokens to 8192, disable repetition penalty
- maxTokens: 2048 -> 8192. The old 2048 limit caused generation to
  stop mid-output for longer responses (e.g. reasoning/thinking models
  that produce long chain-of-thought before the answer).
- repetitionPenalty: 1.1 -> 1.0 (disabled). Most models handle
  repetition well on their own. A penalty of 1.1 can hurt quality
  for creative tasks like code generation and ASCII art.
- Change welcome message from "Run LLMs or test your fine-tune" to
  "Chat with your model".
2026-03-16 02:46:56 -07:00
Daniel Han
8ffd86012f Change "Stop loading" to outlined "Stop" button 2026-03-16 02:46:56 -07:00
Daniel Han
9cbeecc16a Incorporate PR #4304 toast UX improvements
Merge the toast UX refactor from PR #4304 (by @Shine1i):
- Toast duration 5s default with close button (X) for manual dismiss
- Inline progress bar component (ModelLoadInlineStatus) shown in the
  header after toast is dismissed
- Model switch warning only for image compatibility (not generic)
- activeThreadId tracked in store via ActiveThreadSync
- Loading state cleanup via resetLoadingUi helper
- Toast uses Infinity duration during loading with onDismiss handler

Re-applied non-GGUF download progress additions on top:
- getDownloadProgress for all models (not just GGUF)
- hasShownProgress flag, loadingModelRef race condition checks
- First poll at 500ms, bytes-only fallback when expected size unknown
2026-03-16 02:46:56 -07:00
Daniel Han
042598d9f1 Suppress model-switch warning on empty chat threads
Don't show "Model changed for this chat" toast when the thread has
no messages. On a fresh page load with a stale thread from a previous
session, this warning is confusing. The warning is only useful
mid-conversation to alert about image compatibility with the new model.

When messages.length === 0, silently update the thread's modelId and
proceed with loading.
2026-03-16 02:46:56 -07:00
Daniel Han
f4d54a8de7 Fix vision detection subprocess using undefined logger
The _VISION_CHECK_SCRIPT subprocess used logger.info() but logger was
never defined in the subprocess context. This caused a NameError on
every vision check, making all transformers 5.x models (Qwen3.5,
GLM, etc.) fall back to text-only mode even when they support vision.

Replace logger.info() with print() since the parent process reads
the subprocess stdout via result.stdout.
2026-03-16 02:46:56 -07:00
Daniel Han
3a5d751f19 Add logging to download-progress exception handler 2026-03-16 02:46:56 -07:00
Daniel Han
2642f6d21d Add sloth emoji to section labels, friendlier network error
- Add sloth emoji prefix to "Downloaded" and "Recommended" section
  labels in the Hub model picker so they are visually distinct.
- Replace browser network errors ("NetworkError when attempting to
  fetch resource" / "Failed to fetch") with a clearer message:
  "Studio isn't running -- please relaunch it."
2026-03-16 02:46:56 -07:00
pre-commit-ci[bot]
a45babc620 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-16 02:46:56 -07:00
Daniel Han
d417407087 Convert images to PNG before sending to llama-server
llama-server uses stb_image internally which does not support WebP,
TIFF, AVIF, and other formats that browsers accept for upload.
Uploading a WebP image to a vision GGUF model caused a 400 error:
"Failed to load image or audio file" / "failed to decode image bytes".

Convert all uploaded images to PNG via PIL before base64-encoding and
forwarding to llama-server. This handles WebP, TIFF, BMP, GIF, AVIF,
and any other format PIL supports. RGBA images are converted to RGB
first since PNG with alpha can cause issues in some vision pipelines.
2026-03-16 02:46:56 -07:00
pre-commit-ci[bot]
c842e019d8 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-16 02:46:56 -07:00
Daniel Han
39854f4429 Auto-download mmproj for vision-capable GGUF models
GGUF repos with mmproj files (e.g. Qwen3.5-0.8B-GGUF) are already
detected as vision-capable by list_gguf_variants(), and is_vision is
set correctly in ModelConfig. However, the HF download path only
downloaded the main GGUF file without the mmproj projection file,
so llama-server started without --mmproj and rejected image uploads
with "text-only model" errors.

Add _download_mmproj() to LlamaCppBackend that:
- Lists repo files for mmproj*.gguf matches
- Prefers mmproj-F16.gguf (best quality), falls back to any mmproj
- Downloads via hf_hub_download (uses the same HF cache)

In load_model(), when is_vision=True and no explicit mmproj_path was
provided (HF mode), auto-download the mmproj after the main GGUF.
The downloaded path is passed to llama-server via --mmproj.
2026-03-16 02:46:56 -07:00
Daniel Han
f20c7ca54d Friendlier unsupported model errors, show estimated download size
1. Backend: When a model fails with "No config file found" or similar
   unsupported-model errors, wrap the message with "This model is not
   supported yet. Try a different model." instead of showing the raw
   Unsloth exception.

2. Frontend: Compute estimated download size from the HF search API's
   safetensors.parameters dtype breakdown (BF16=2B/param, I32=4B/param,
   F32=4B/param, etc.) and show it in the model picker instead of just
   the param count. For example, Kimi-K2.5 now shows "~554 GB" instead
   of "171B" (which was misleading since 171B params != 171GB download).
2026-03-16 02:46:56 -07:00
Daniel Han
1471c63b96 Fix download progress bugs: false completion, stale UI, dedup
Three fixes on top of the download progress feature:

1. Backend: Replace broken "no .incomplete = done" completion check
   with a 95% byte threshold. HF downloads files sequentially, so
   between files there are briefly no .incomplete files even though
   the download is far from done (e.g. Kimi-K2.5 reported "done"
   after downloading 22KB of config files out of 595GB).

2. Frontend: Track hasShownProgress flag. Only show "Download
   complete. Loading into memory..." if we actually displayed
   download progress before. For already-cached models where the
   first poll returns progress=1.0, this avoids the misleading
   "Download complete" message.

3. Frontend: Deduplicate recommended vs downloaded -- filter out
   models already in the "Downloaded" section. Cache the fetched
   lists at module level so re-mounting the popover does not flash
   an empty "Downloaded" section.
2026-03-16 02:46:56 -07:00
pre-commit-ci[bot]
e03a809994 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-16 02:46:56 -07:00
Daniel Han
b84f167d5a Add download progress bar for non-GGUF models in Chat
Previously only GGUF models showed download progress in Chat. Non-GGUF
models (safetensors, bnb quantized, etc.) showed a static message with
no progress indication. This adds progress tracking for all model types
and fixes several related issues.

Backend:
- Add /api/models/download-progress endpoint that checks the HF cache
  blobs directory for completed and .incomplete files. Uses model_info()
  (cached per repo) to determine expected total size for percentage.
- Add /api/models/cached-models endpoint that lists non-GGUF model repos
  from the HF cache via scan_cache_dir().
- Fix progress stuck at 0.99: when no .incomplete files remain, report
  1.0 immediately (blob deduplication can make byte totals mismatch).

Frontend:
- Remove the ggufVariant gate so download progress polling works for all
  non-cached models, not just GGUFs.
- Use GGUF-specific endpoint when variant + expectedBytes available,
  otherwise use the general download-progress endpoint.
- Fix toast stuck after load: check loadingModelRef.current before and
  after the async poll to prevent overwriting the success toast.
- First poll at 500ms instead of waiting for the 2s interval.
- Show downloaded non-GGUF models in the Hub model picker "Downloaded"
  section alongside GGUFs.
2026-03-16 02:46:56 -07:00
Roland Tannous
08b5879101
fix: Ctrl+C not terminating backend on Linux (#4316)
* fix: Ctrl+C not breaking out of backend on Linux

threading.Event.wait() without a timeout blocks at the C level on
Linux, preventing Python from delivering SIGINT.  Use a 1-second
timeout loop so the interpreter can process pending signals.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-16 11:58:09 +04:00
Manan Shah
164b5a5b06
[Feature] studio: user can upload eval dataset (#4307)
* user can upload eval dataset, removed bugs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolving merge conflicts

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolving gpt comments

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
2026-03-16 11:15:50 +04:00
Daniel Han
6c2a593522 Fix setup.sh crash on Mac with empty gitignore array
The `set -u` (nounset) flag in setup.sh causes `${_HIDDEN_GITIGNORES[@]}`
to fail with "unbound variable" when no parent .gitignore with `*` is
found (common on Mac where the install is not inside a Python venv).

Use the `${arr[@]+"${arr[@]}"}` idiom to safely expand empty arrays
under nounset mode.
2026-03-15 22:33:04 -07:00
Daniel Han
a8f02c9f3f Fix studio frontend build producing empty Tailwind CSS
Two issues caused the studio frontend to render without any styling
when installed via `pip install` (non-editable):

1. `pyproject.toml` package-data only included `frontend/dist/**/*`.
   The `include-package-data = true` setting relies on `git ls-files`,
   which fails in isolated builds (pip/uv copy source to a temp dir
   without `.git`). This meant `frontend/src/`, `package.json`,
   `vite.config.ts`, and other build files were missing from the
   installed package. Tailwind had no source files to scan.

2. Python venvs auto-create a `.gitignore` with a bare `*` pattern.
   Tailwind v4's oxide scanner walks parent directories and respects
   `.gitignore` -- so even when source files are present, the venv's
   `*` pattern causes the scanner to skip all `.tsx` files. The result
   is a 34KB CSS skeleton with zero utility classes instead of the
   expected 265KB.

Additionally, Vite adds `crossorigin` to script/link tags by default.
This forces CORS mode on font subresource loads, which Firefox
HTTPS-Only Mode does not exempt -- causing all @font-face downloads
to fail silently when Studio is served over HTTP.

Changes:
- pyproject.toml: Expand package-data to include frontend source,
  config files, setup scripts, and backend requirements using glob
  patterns (no node_modules)
- studio/setup.sh: Temporarily hide parent .gitignore files containing
  a bare `*` during `npm run build`, with trap-based restoration
- studio/backend/main.py: Strip `crossorigin` attributes from HTML
  at serve time so fonts load correctly on any protocol
2026-03-15 22:00:00 -07:00
Lee Jackson
15e7d0dd5c
fix: preserve save_steps when toggling to epochs mode (#4308) 2026-03-16 08:43:49 +04:00
Lee Jackson
7b1ea88739
studio: simplify auth UX to password-only login (#4305)
* feat(studio): switch to password-only login and simplify first-time setup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: align change-password button state with validation rules

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
2026-03-16 03:02:58 +04:00
Roland Tannous
0818f78617
Graceful shutdown on Windows (signal handlers for Ctrl+C) (#4306)
* fix: graceful shutdown on Windows (signal handlers for Ctrl+C)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-16 03:01:15 +04:00
Leo Borcherding
ce8e530e6d
Fix/colab plugin editable install (#4281)
* fix: update Colab notebook to use public unsloth repo and correct paths

* Update studio/Unsloth_Studio_Colab.ipynb

For efficiency, especially in environments like Colab, it's better to perform a shallow clone of the repository. This fetches only the latest commit from the specified branch, which is significantly faster and uses less disk space than cloning the entire project history.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update Unsloth_Studio_Colab.ipynb

* studio: add standard Unsloth header, news, section headings, and footer to Colab notebook

* studio: refine Colab notebook section headings and cell cleanup

---------

Co-authored-by: LeoBorcherding <LeoBorcherding@users.noreply.github.com>
2026-03-16 01:34:37 +04:00
Lee Jackson
1e3aa4ff92
studio: add max steps and epochs toggle switch (#4296)
* feat: add Epochs toggle for Max Steps

* refactor: dedupe max-steps/epochs toggle logic and fix input bug

* fix(studio): max-steps input validation and prevSaveSteps seed in epochs mode

---------

Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
2026-03-16 01:33:51 +04:00
Manan Shah
b2dce8e3a8
chat only with gguf for mac devices (#4300)
* chat only with gguf for mac devices

* resolving gpt comments

* add change-password for chat only

* hide lora adaptors dropdown

* solving gpt comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* addressing the comment

* fixing auth flow

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-15 23:20:48 +04:00
pre-commit-ci[bot]
050240b27a [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-15 05:24:06 -07:00
Daniel Han
11612f6dc9 studio: fix GGUF download UX -- progress bar, cancel, sorting, auto-scroll
- Run GGUF load_model in asyncio.to_thread so the event loop stays free
  for progress polling during download (was blocking all requests).
- Extract download phase out of the lock in LlamaCppBackend.load_model
  so unload_model/cancel can take effect immediately during download.
- Fix "downloaded" badge for split GGUFs: check total cached bytes
  across all shards vs expected size, not just first shard existence.
- Respect CUDA_VISIBLE_DEVICES in /api/system GPU reporting so the
  frontend GGUF fit estimation uses actual available VRAM.
- Sort tight variants (need CPU offload) smallest-first instead of
  largest-first -- closer to GPU budget = faster inference.
- Fix cancel: use refs instead of React state for abort controller and
  toast ID so both cancel buttons (text + toast) work reliably. Make
  cancel synchronous (fire-and-forget unload) for instant UI response.
  Check abortCtrl.signal.aborted after loadModel returns to prevent
  ghost model state. Skip rollback and suppress errors on cancel.
- Dynamic top 4 GGUF models fetched from HF API sorted by downloads,
  prepended to the default recommended list.
- Remove turnAnchor="top" for auto-scroll to bottom during generation.
- Set default toast duration to 10s (was infinite for loading toasts).
- Deduplicate cached GGUF repos using scan_cache_dir API (fixes
  Qwen/X-GGUF vs qwen/x-gguf duplicates from lowercased HF cache).
- Pre-compile repo_id validation regex to silence CodeQL ReDoS warning.
- Change welcome text and default suggestion text.
2026-03-15 05:24:06 -07:00
Daniel Han
bb57236e29 studio: revert -- always respect CUDA_VISIBLE_DEVICES in GPU memory query 2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]
851cb2af68 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-15 05:24:06 -07:00
Daniel Han
5603ced75f studio: ignore CUDA_VISIBLE_DEVICES in GPU memory query for llama-server
_get_gpu_free_memory was filtering by CUDA_VISIBLE_DEVICES, so with
CUDA_VISIBLE_DEVICES='0' set by the training env, llama-server only
saw 1 GPU and used --fit for CPU offloading instead of spreading
across all 8 GPUs.

Since llama-server manages its own GPU allocation (the _select_gpus
method picks GPUs and sets CUDA_VISIBLE_DEVICES for the subprocess),
the query must see ALL physical GPUs to make the right decision.
2026-03-15 05:24:06 -07:00
Daniel Han
1dfba866be studio: fix download progress -- track per-variant, include incomplete blobs
1. Progress endpoint now takes a variant parameter and only counts
   .gguf files matching that variant (not all files in the repo cache,
   which would include previously downloaded variants)

2. Tracks .incomplete files in HF blobs dir for in-progress single-shard
   downloads, capping at 99% until the file is fully committed

3. Fixed loading text: "Loading model..." for cached, "Downloading
   model..." for new downloads, with appropriate descriptions

4. Wording: "Downloading and loading model. Large models can take a
   while." instead of "This may include downloading."
2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]
b1dda44745 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-15 05:24:06 -07:00
Daniel Han
475ba417dc studio: context-aware loading text + download progress bar
1. Loading text: shows "Loading model..." for cached models,
   "Downloading model..." for new downloads. Toast description
   adapts accordingly.

2. Download progress: polls /api/models/gguf-download-progress every
   2s during downloads, updating the toast with percentage and GB
   downloaded. Progress is estimated by checking the HF cache folder
   size against the expected total bytes.

3. Passes isDownloaded and expectedBytes through the full chain from
   variant click to selectModel for accurate UI state.
2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]
061de08f86 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-15 05:24:06 -07:00
Daniel Han
1e9d19126b studio: fix P1 issues from PR review comments
1. n_gpu_layers kwarg: accept (and ignore) in load_model signature
   so callers like llm_assist.py don't get TypeError

2. mmproj exclusion: filter out mmproj files in _find_smallest_fitting_variant
   so fallback doesn't pick a tiny vision projection as the "model"

3. Shard preservation after fallback: re-discover shards for the
   fallback variant instead of resetting to empty list, so split
   GGUFs download all shards

4. Orphan cleanup safety: only kill llama-server processes whose
   cmdline contains ".unsloth/", avoiding termination of unrelated
   llama-server instances on the same machine

5. Path expression sanitization: validate repo_id format before using
   it in cache directory lookups
2026-03-15 05:24:06 -07:00
Daniel Han
cf45ff7232 studio: fix downloaded check -- compare basename not full path
The variant filename includes a subfolder prefix (e.g.
UD-Q4_K_XL/Kimi-K2.5-UD-Q4_K_XL-00001-of-00013.gguf) but rglob
returns just the filename. Use Path.name for the comparison.
2026-03-15 05:24:06 -07:00
Daniel Han
92670a90dd studio: fix case-insensitive HF cache lookup for downloaded GGUF variants
HF cache dirs use the exact case from the repo_id at download time
(e.g. models--unsloth--kimi-k2.5-gguf) which may differ from the
canonical HF repo_id (unsloth/Kimi-K2.5-GGUF). Use case-insensitive
matching to find the cache directory.
2026-03-15 05:24:06 -07:00
Daniel Han
7b65073311 studio: show 'downloaded' badge instead of 'recommended' when variant is cached 2026-03-15 05:24:06 -07:00
Daniel Han
bcb382def9 studio: sort downloaded GGUF variants before recommended
Downloaded variants now take priority over the recommended badge in
sort order. Within the same tier (downloaded+fits, etc.), recommended
still sorts first. Order: downloaded -> recommended -> fits -> tight -> OOM
2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]
64ab7554b1 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-15 05:24:06 -07:00
Daniel Han
4d35699c65 studio: show downloaded status in GGUF variant list, sort downloaded first
- Backend: /gguf-variants now checks HF cache for each variant's file
  and returns a downloaded flag per variant
- Frontend: downloaded variants sort before non-downloaded (after
  recommended), and show a green "downloaded" badge
- Sort order: recommended -> downloaded+fits -> downloaded+tight ->
  fits -> tight -> OOM
2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]
904ac86f4a [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-15 05:24:06 -07:00
Daniel Han
897d8b426a studio: interruptible GGUF downloads, cached models endpoint, Downloaded section
1. Interruptible downloads: load_model now checks a cancel event
   between shard downloads. unload_model sets the event so cancel
   stops the download at the next shard boundary.

2. /api/models/cached-gguf endpoint: scans the HF cache for
   already-downloaded GGUF repos with their total size and cache path.

3. "Downloaded" section in Hub model picker: shows cached GGUF repos
   at the top (before Recommended) so users can quickly re-load
   previously downloaded models without re-downloading.
2026-03-15 05:24:06 -07:00
Daniel Han
226ece0c9e studio: fix cancel to actually kill llama-server during loading
The unload endpoint checked is_loaded (requires healthy=True), but
during initial loading the server is not yet healthy. Cancel had no
effect because the unload route fell through to the Unsloth backend.

Fix: add is_active property (process exists, loading or loaded) and
check it in the unload route so cancel kills llama-server even during
the download/loading phase.

Also: toast cancel button now properly triggers the backend unload.
2026-03-15 05:24:06 -07:00
Daniel Han
a0fdf03340 studio: add Cancel button to model loading toast popup
Replace toast.promise with a manual toast.loading that includes a
Cancel action button. Users can now cancel model downloads/loads from
the toast notification itself, not just from the header bar spinner.
2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]
1c4efa6c3d [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-15 05:24:06 -07:00
Daniel Han
c59f028150 studio: kill orphaned llama-server processes on startup
When the studio process is killed (SIGTERM/SIGKILL), atexit handlers
may not run in the subprocess orchestrator, leaving llama-server
processes orphaned and holding GPU memory. This caused OOM errors when
trying to load a new model after a studio restart.

On init, LlamaCppBackend now runs pgrep to find and SIGKILL any stale
llama-server processes before starting fresh.
2026-03-15 05:24:06 -07:00
Daniel Han
7b19cb418e studio: sort TIGHT (CPU offload) GGUF variants after GPU-only fits
Sort order is now: recommended -> fits (largest first) -> tight/CPU
offload (largest first) -> OOM (smallest first). Previously tight
variants were mixed with fits variants.
2026-03-15 05:24:06 -07:00
Daniel Han
5bb783850a studio: GGUF OOM accounts for CPU offload via --fit (GPU + system RAM)
Updated GGUF fit classification to match llama-server's --fit behavior:

- fits:  model <= 70% of total GPU memory (all GPUs)
- tight: model > 70% GPU but <= 70% GPU + 70% available system RAM
         (llama-server uses --fit to offload layers to CPU)
- OOM:   model exceeds both GPU and system RAM budgets

useGpuInfo now also returns systemRamAvailableGb from /api/system so the
frontend can compute the combined GPU+RAM budget.
2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]
1625565da2 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-15 05:24:06 -07:00
Daniel Han
c9c485a7b0 studio: use nvidia-smi for all GPUs + 70% VRAM threshold for GGUF OOM
Two fixes for accurate GGUF OOM detection:

1. /api/system now uses nvidia-smi to enumerate all physical GPUs
   instead of torch.cuda which only sees CUDA_VISIBLE_DEVICES. This
   matches llama-server which can use all GPUs regardless of the env
   var. Falls back to torch-based detection if nvidia-smi unavailable.

2. Frontend GGUF OOM check now uses 70% of total GPU memory as the
   budget, matching the PR's _select_gpus logic (30% reserved for KV
   cache and compute buffers). Previously used checkVramFit's 100%
   threshold which was too generous.
2026-03-15 05:24:06 -07:00
Daniel Han
f5f631e5d1 studio: add cancel button for model loading/downloading
Adds a Cancel button next to the "Downloading model..." spinner so
users can abort long downloads. Clicking it aborts the in-flight load,
calls unloadModel to kill any running llama-server process, and clears
the loading state.
2026-03-15 05:24:06 -07:00
Daniel Han
4600131fea studio: sort OOM GGUF variants smallest-to-largest
OOM variants are more useful sorted ascending by size since smaller ones
are more likely to run with --fit. Non-OOM variants remain largest-first
(best quality).
2026-03-15 05:24:06 -07:00
Daniel Han
ea45370ab8 studio: use total multi-GPU VRAM for OOM checks, recommend smallest when all OOM
Two fixes for GGUF variant dropdown:

1. useGpuInfo now sums memory across all GPU devices instead of only
   reading devices[0]. This matches llama-server's multi-GPU allocation
   where models can be split across GPUs.

2. When the backend-recommended variant (e.g. UD-Q4_K_XL) exceeds total
   GPU VRAM, the frontend picks the largest variant that fits instead.
   If all variants are OOM, it recommends the smallest one (most likely
   to work with --fit).
2026-03-15 05:24:06 -07:00
Daniel Han
10c4db04d8 studio: fix React hooks order -- move useMemo before early returns
The useMemo for sortedVariants was placed after the loading/error early
returns, which violated React's rules of hooks (hooks must be called in
the same order every render). Move it before the conditional returns.

Fixes: Minified React error #310
2026-03-15 05:24:06 -07:00
Daniel Han
3c1b8d7ab7 studio: sort GGUF dropdown client-side -- recommended first, OOM last, rest by size descending
Move the sort logic from the backend to the frontend GgufVariantExpander
component where GPU VRAM info is available. The backend now does a simple
size-descending sort. The frontend pins the recommended variant at the
top, pushes OOM variants to the bottom, and sorts the rest by file size
descending (largest/best quality first).
2026-03-15 05:24:06 -07:00
Daniel Han
dd2d979b40 studio: sort GGUF quants largest-first so best quality that fits is at the top 2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]
1f861e185b [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-15 05:24:06 -07:00
Daniel Han
2f5347cb4d studio: sort GGUF quant variants -- recommended first, then UD by size, then standard by size
The variants list was returned in HuggingFace file listing order (alphabetical),
making the dropdown confusing (e.g. BF16 before Q4_0). Now sorted as:

1. Recommended variant (from _pick_best_gguf) pinned at top
2. Other UD (Unsloth Dynamic) variants sorted by disk size ascending
3. Non-UD variants sorted by disk size ascending
2026-03-15 05:24:06 -07:00
Daniel Han
928868f07d studio: auto-find free port if requested port is in use
If the requested port (default 8000) is already in use, auto-
increment and try the next port, up to 20 attempts. Prints a
message like "Port 8000 is in use, using port 8001 instead".

Previously, if port 8000 was busy, uvicorn would fail with
"[Errno 98] address already in use" and the studio would not
start. Now it gracefully finds the next free port.

Uses socket.bind() to check availability before starting uvicorn.
Cross-platform (Linux, macOS, Windows).
2026-03-15 05:24:06 -07:00
Daniel Han
ab6fdccfb5 studio: reorder GGUF preference -- UD-Q4_K_XL first, all UD above standard
Reorder _GGUF_QUANT_PREFERENCE so all UD (Unsloth Dynamic) variants
come before standard quants. UD-Q4_K_XL is the default (best
size/quality tradeoff), followed by other UD quants in decreasing
preference order.

For repos without UD variants (e.g., bartowski), falls through to
standard quants starting with Q4_K_M.

Verified with:
  - unsloth/Qwen3.5-35B-A3B-GGUF -> UD-Q4_K_XL
  - bartowski/Qwen_Qwen3.5-35B-A3B-GGUF -> Q4_K_M
  - unsloth/DeepSeek-V3.2-GGUF -> UD-Q4_K_XL (9 shards)
  - unsloth/Llama-3.2-1B-Instruct-GGUF -> UD-Q4_K_XL
2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]
1dba26012c [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-15 05:24:06 -07:00
Daniel Han
8ccb461570 studio: group GGUF shards by variant in size-based fallback
The smallest-fitting-variant fallback now groups split GGUF shards
by their variant prefix and sums all shard sizes per variant.

For example, DeepSeek-V3.2 UD-Q4_K_XL has 9 shards totaling
379.8 GB. The previous code treated each shard as a separate
"variant" and would have incorrectly selected a single 50 GB shard
as fitting, ignoring the other 8 shards needed.

Tested with unsloth/DeepSeek-V3.2-GGUF (237 GGUF files, 27
variants from 150 GB to 1.25 TB). Correctly groups and sorts
all variants by total size.
2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]
d5a18e5a00 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-15 05:24:06 -07:00
Daniel Han
93ec05ced2 studio: default to UD-Q4_K_XL for GGUFs, fall back to smallest
Two changes for GGUF variant selection:

1. Default variant preference now starts with UD-Q4_K_XL (Unsloth
   Dynamic quantization) which provides better quality per bit than
   standard Q4_K_M. Also added UD-Q2_K_XL, UD-IQ2_M, UD-IQ1_M,
   UD-IQ1_S as small fallback options.

2. If the selected variant doesn't fit on disk, automatically fall
   back to the smallest GGUF variant in the repo that does fit.
   Queries all GGUF file sizes via get_paths_info() and picks the
   smallest one under the free disk space limit. If nothing fits,
   raises a clear error.

This means users with limited disk space won't get a download
error -- they'll get a smaller quantization instead.
2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]
12f3f4361d [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-15 05:24:06 -07:00
Daniel Han
38d700ecb0 studio: check disk space before downloading GGUF models
Query file sizes from HuggingFace via get_paths_info() before
downloading, and compare against free disk space on the cache
partition. Raises a clear error if there is not enough space,
instead of failing mid-download.

Uses get_paths_info() instead of repo_info() because xet-stored
repos return size=None from repo_info().siblings, but
get_paths_info() returns the actual file sizes.

If the size check fails for any reason (network error, API change),
it logs a warning and continues with the download anyway.
2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]
f4fbbcaec8 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-15 05:24:06 -07:00
Daniel Han
f4f69f16a6 studio: centralize cache directory for all downloads
Set HF_HOME, HF_HUB_CACHE, HF_XET_CACHE, UV_CACHE_DIR, and
VLLM_CACHE_ROOT to a unified location under ~/.unsloth/studio/cache/
on startup. This keeps all model downloads, datasets, and caches
in one place instead of scattered across ~/.cache/huggingface,
~/.cache/uv, etc.

Layout:
  ~/.unsloth/studio/cache/
    huggingface/       (HF_HOME)
      hub/             (HF_HUB_CACHE -- model/dataset downloads)
      xet/             (HF_XET_CACHE -- xet blob store)
    uv/                (UV_CACHE_DIR -- uv package cache)
    vllm/              (VLLM_CACHE_ROOT -- vllm compiled kernels)

Only sets variables that are not already in the environment, so
user overrides (e.g. HF_HOME=/data/models) are respected.

Cross-platform: uses Path.home() which resolves correctly on
Linux (~), macOS (~), and Windows (C:\Users\<user>).
2026-03-15 05:24:06 -07:00
Daniel Han
f1293fe7d8 studio: respect existing CUDA_VISIBLE_DEVICES in GPU selection
If CUDA_VISIBLE_DEVICES is already set in the environment (e.g.,
by the user or a wrapper script), only consider those GPUs when
selecting devices for llama-server. nvidia-smi reports all physical
GPUs regardless of CUDA_VISIBLE_DEVICES, so we filter its output
to match the allowed set.

Without this, the GPU selector could pick a GPU outside the user's
allowed set, overriding their restriction.
2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]
e885d7308e [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-15 05:24:06 -07:00
Daniel Han
12183e0656 studio: smart GPU allocation for GGUF inference
Automatically select the best GPU(s) for a GGUF model based on
file size and available VRAM, instead of relying on hardcoded
-ngl -1 or letting llama-server guess.

Logic:
1. Measure total GGUF file size (including split shards)
2. Query free memory per GPU via nvidia-smi
3. If the model fits in 70% of the most-free GPU's memory,
   pin to that single GPU (CUDA_VISIBLE_DEVICES=X, no --fit)
4. If it needs multiple GPUs, pick the N most-free GPUs
   (CUDA_VISIBLE_DEVICES=X,Y, no --fit)
5. If it's too large for all GPUs combined, omit
   CUDA_VISIBLE_DEVICES and use --fit on to let llama-server
   handle partial offloading

The 70% threshold accounts for KV cache and compute buffers
that sit on top of the model weights.

Removed the -ngl parameter (was hardcoded to -1). llama-server's
default of "auto" handles layer offloading correctly, especially
with --fit on for oversized models.

Tested on 8x B200:
  - 1B model (0.75 GB):  picks 1 GPU, no --fit
  - 27B model (17 GB):   picks 1 GPU, no --fit
  - 405B model (230 GB): picks 2 GPUs, no --fit
  - 2TB model:           all GPUs, --fit on
2026-03-15 05:24:06 -07:00
pre-commit-ci[bot]
7202f81985 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-15 05:24:06 -07:00
Daniel Han
80d84a5b5f studio: optimize llama-server flags for single-user studio
Refactor command building (deduplicate HF/local paths) and add
flags for better performance:

- --parallel 1: studio is single-user, so only 1 inference slot
  is needed. The previous auto-detect picked 4 slots, wasting
  VRAM on 3 unused KV caches.
- --flash-attn on: force flash attention for faster inference.
  Default is "auto" which may not always enable it.
- --fit on: auto-adjust parameters to fit in available device
  memory. Already the default but now explicit.

Also cleaned up the duplicated command building for HF vs local
mode into a single block.
2026-03-15 05:24:06 -07:00
Daniel Han
887e7a31c4 studio: don't cap max_tokens for GGUF inference
Remove the hard max_tokens=2048 default and le=4096 cap for GGUF
chat completions. When max_tokens is not set (None), the field is
omitted from the llama-server payload entirely, letting the model
generate until it produces an EOS token or hits the context limit.

This is critical for thinking/reasoning models (Qwen3.5, DeepSeek-R1,
etc.) where the thinking phase alone can consume 1000+ tokens before
the actual answer. With the previous 2048 default, simple questions
like "What is 2+2?" used all tokens on thinking and produced empty
visible responses.

Changes:
- llama_cpp.py: max_tokens default None, only include in payload
  when explicitly set
- models/inference.py: default None, remove le=4096 cap
- routes/inference.py: pass max_tokens directly, no "or 2048" fallback

llama-server handles omitted max_tokens gracefully (generates until
EOS or context limit). The context size (-c flag, default 4096) acts
as the hard upper bound.
2026-03-15 05:24:06 -07:00
Daniel Han
961720c1b1 studio: handle reasoning_content in GGUF streaming
llama-server sends thinking/reasoning tokens as "reasoning_content"
in the SSE delta (separate from "content"). The studio was only
reading delta.content, so all reasoning tokens from models like
Qwen3.5, Qwen3-Thinking, DeepSeek-R1, etc. were silently dropped.

This caused "replies with nothing" for thinking models: the model
would spend its entire token budget on reasoning, produce zero
content tokens, and the user would see an empty response.

Fix: read reasoning_content from the delta and wrap it in
<think>...</think> tags. The frontend already has full support
for these tags (parse-assistant-content.ts splits them into
reasoning parts, reasoning.tsx renders a collapsible "Thinking..."
indicator).

Verified with Qwen3.5-27B-GGUF (UD-Q4_K_XL):
  - Before: "What is 2+2?" -> empty response (all tokens in reasoning)
  - After: shows collapsible thinking + answer "4"
2026-03-15 05:24:06 -07:00
Roland Tannous
477e68675b
Fix: Compare Mode Deadlock, Cancel Event Poisoning & IPC Optimization (#4303)
* fix: resolve compare mode deadlock, cancel_event poisoning, and add dispatcher-based IPC optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert to 2048 tokens

* refactor: extract dispatcher timeout values into named constants

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: guard dispatcher shutdown against active compare mailboxes

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-15 16:11:44 +04:00
Wasim Yousef Said
e280b0bebc
miscallenous studio (#4293)
* miscallenous studio

* chore: upload dataset misc

* chore: redudancy studio cleanup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: adress the pr comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: adress comments about recipes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-15 14:42:11 +04:00
Roland Tannous
f44857b2df
PR: Windows Setup Improvements (#4299)
* quiet llama.cpp build, smarter CUDA install via winget, accept Python 3.11-3.13

* studio: hide Python traceback when setup script exits with error

* setup.ps1: auto-add Python Scripts dir to PATH so 'unsloth' command works in new terminals

* setup.ps1: fix GPU check to run nvidia-smi instead of just checking command existence

* setup.ps1: fix PATH check to use exact entry comparison instead of substring match

* setup.ps1: validate Python probe exit code before persisting Scripts PATH
2026-03-14 23:59:49 +04:00
Wasim Yousef Said
629199e3a6
fix: remove old comments (#4292)
* fix: quotation marks

* diceware passphrase generation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-14 16:50:13 +04:00
pre-commit-ci[bot]
b20b3b80df [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-14 00:54:09 -07:00
Daniel Han
4b6f5c76c1 studio: probe-based --system detection for uv
Replace _in_virtualenv() heuristic with a runtime probe. At
bootstrap time, try a dry-run uv install without --system. If
that fails (exit code 2, "No virtual environment found"), retry
with --system to confirm it works. This handles all environments
correctly: venvs, Colab (system Python), local machines, containers.
2026-03-14 00:54:09 -07:00
Daniel Han
9b7eaf8f0c studio: make uv optional + fix --system for Colab
Three fixes based on review:

1. Make uv truly optional: _bootstrap_uv() now only checks if uv is
   already on PATH. It no longer tries to pip install uv. If uv is
   not present, pip is used with zero changes to behavior.

2. Add --system flag for Colab: on Colab there is no venv (packages
   install into system Python). uv requires --system in this case,
   otherwise it errors with "No virtual environment found". Added
   _in_virtualenv() check that detects VIRTUAL_ENV, sys.real_prefix,
   or sys.base_prefix != sys.prefix.

3. Fix label printed twice on uv fallback: when uv fails and falls
   back to pip, the label now says "(pip)" to distinguish from the
   initial uv attempt, instead of printing the same label twice.

Tested:
  - venv path: no --system flag, uv installs correctly
  - no-venv path (Colab sim): --system flag added automatically
  - full unsloth studio setup + training run (Llama-3.2-1B, 10 steps)
2026-03-14 00:54:09 -07:00
Daniel Han
a7a66a66b9 studio: address review feedback
install_python_stack.py:
- Print uv error output on failure for debuggability
- Refactor pip_install() to use early return after uv success,
  removing duplicated pip command path

setup.sh:
- Guard nvidia-smi command substitution with || true so it does
  not abort the script under set -euo pipefail when nvidia-smi
  fails (e.g., containerized environments, driver quirks)
- Read all GPU compute capabilities and deduplicate, so
  mixed-GPU hosts get kernels built for all present architectures
  instead of only the first GPU
2026-03-14 00:54:09 -07:00
Daniel Han
6dda8c4c23 studio: revert combined targets, keep separate builds
Restore separate cmake --build calls for llama-server and
llama-quantize on both setup.sh and setup.ps1. The combined
approach made llama-quantize failure fatal, but it was originally
best-effort (|| true on Linux, [WARN] on Windows). The timing
savings from combining was only ~2.7s, not worth the semantic
change.

The Ninja + arch detection speedups are preserved (55s vs 1m 37s).
2026-03-14 00:54:09 -07:00
Daniel Han
e4a5da8d96 studio: combine llama.cpp build targets in setup.ps1
Build llama-server and llama-quantize in a single cmake --build
invocation on Windows, matching the same optimization done in
setup.sh. This allows MSBuild to better parallelize the two targets.

The Visual Studio generator is kept as-is (not switching to Ninja on
Windows since VS generator is the standard approach and interacts
with MSBuild).
2026-03-14 00:54:09 -07:00
Daniel Han
f8dc7c9a5c studio: speed up llama.cpp build with Ninja + arch detection
Three improvements to the llama.cpp build step in setup.sh:

1. Detect GPU compute capability via nvidia-smi and limit
   CMAKE_CUDA_ARCHITECTURES to the current GPU. Without this, cmake
   builds for all default CUDA architectures which is very slow.

2. Use Ninja build generator when available. Ninja has better
   parallelism than Make for CUDA compilation.

3. Build both llama-server and llama-quantize targets in a single
   cmake --build invocation for better parallelism.

4. Add --threads=0 to CMAKE_CUDA_FLAGS for multi-threaded nvcc
   compilation.

Measured on 192-core machine with B200 (sm_100):
  Make (all archs):       very slow (minutes for each arch)
  Make (single arch):     1m 37s
  Ninja (single arch):    55s
  Speedup:                ~1.7x

Combined with the uv change, total setup goes from ~4m 35s to ~1m 40s.
2026-03-14 00:54:09 -07:00
pre-commit-ci[bot]
174d61e0f5 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-14 00:54:09 -07:00
Daniel Han
a537ece7eb studio: use uv for Python package installs (8x faster)
Replace pip with uv in install_python_stack.py to speed up the Python
dependency installation phase of `unsloth studio setup`.

- Add _bootstrap_uv() that checks for uv on PATH, and if not found,
  installs it via pip. Falls back to pip if uv is unavailable.
- Translate pip flags to uv equivalents (--no-cache-dir dropped since
  uv caching is fast, --force-reinstall becomes --reinstall).
- Add --torch-backend=auto so uv auto-detects CUDA version for
  PyTorch ecosystem packages.
- Per-install fallback: if any uv install step fails, it retries that
  step with pip before exiting.

Measured on clean venv setup:
  Python packages (pip):  2m 28s
  Python packages (uv):  18s
  Speedup:               ~8x

Total setup time goes from ~4m 35s to ~2m 30s (llama.cpp build is
now the bottleneck at 1m 40s).
2026-03-14 00:54:09 -07:00
Daniel Han
2bb72a2244 Revert "add support for mixtral"
This reverts commit c8f712b614.
2026-03-13 22:39:15 -07:00
tohrnii
943e8f6d84 add support for mixtral
(cherry picked from commit a55b740062)
2026-03-13 22:39:15 -07:00
Daniel Han
936c18424e Revert "patch vlm trainer to resize images"
This reverts commit 481b22fdf4.
2026-03-13 22:39:07 -07:00
oliveirabruno01
aa8d91b241 patch vlm trainer to resize images
(cherry picked from commit 14c282c4ec)
2026-03-13 22:39:07 -07:00
Daniel Han
b8eee7a8ba Revert "Initial changes: Refactor Attention"
This reverts commit a2af843271.
2026-03-13 22:38:57 -07:00
Shikhar Mishra
7502195443 Initial changes: Refactor Attention
(cherry picked from commit 5a7237abfd)
2026-03-13 22:38:57 -07:00
Daniel Han
49132ced50 Revert "feat: Add Mixtral model support"
This reverts commit 99c302d873.
2026-03-13 22:38:49 -07:00
Shikhar Mishra
659281c508 feat: Add Mixtral model support
(cherry picked from commit 2258875885)
2026-03-13 22:38:49 -07:00
Daniel Han
30a18786bf Revert "Improve documentation on how to export model from Colab"
This reverts commit 703c235a7d.
2026-03-13 22:38:41 -07:00
Vishwanath Martur
022a5d566a Improve documentation on how to export model from Colab
Related to #1615

Add documentation and function for exporting models from Colab to local machines.

* **README.md**: Add a new section titled "Exporting Models from Colab to Local Machine" under " Finetune for Free" with detailed steps for exporting models from Colab to local machines.
* **CONTRIBUTING.md**: Add a note about the new documentation section for exporting models from Colab.
* **unsloth/save.py**: Add a new function `export_model_to_local` to handle exporting models from Colab to local machines.

(cherry picked from commit 0361bd658f)
2026-03-13 22:38:41 -07:00
Daniel Han
c5fa314937 Revert "adding tools to be able to profile model fwds to see what to turn into kernels"
This reverts commit d32b00ecd8.
2026-03-13 22:38:31 -07:00
cm2435
12898b5bef adding tools to be able to profile model fwds to see what to turn into kernels
(cherry picked from commit 6db5b126b6)
2026-03-13 22:38:31 -07:00
LeoBorcherding
3ab282fd40 fix: install data-designer plugin non-editable for Colab compatibility
Editable installs (-e) work via a .pth file that is only processed at
Python startup. In Colab the kernel is already running when setup.sh
installs the plugin, so the .pth file never gets picked up and
data_designer_unstructured_seed is not importable.

Remove -e so pip copies the package files directly into site-packages,
which the live kernel can find immediately. Local venv installs are
unaffected since the venv is always created fresh before install.
2026-03-13 13:44:08 -07:00
pre-commit-ci[bot]
6baa181890 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-13 13:38:19 -07:00
Daniel Han
eb7637013e Update CODEOWNERS 2026-03-13 13:38:19 -07:00
Roland Tannous
b95242a80f fix: only skip frontend build for PyPI prebuilt (site-packages + dist check) 2026-03-13 20:26:10 +00:00
Roland Tannous
bf54225f86 fix: site-packages + dist check for frontend
build, fix ruff blank lines
2026-03-13 20:13:34 +00:00
Roland Tannous
0e0325127d Revert "site-packages + dist check"
This reverts commit 82063d8edb.
2026-03-13 20:09:41 +00:00
Roland Tannous
82063d8edb site-packages + dist check 2026-03-13 20:04:15 +00:00
Roland Tannous
8ce2b64df7 allow install from source 2026-03-13 20:04:15 +00:00
Daniel Han
1f99dee027
fix(seed): disable remote code execution in seed inspect dataset loads (#4275)
* fix(seed): disable remote code execution for seed inspect loads

* fix(test): use __file__-relative path in seed test

The test used a CWD-relative path (`studio/backend/routes/...`) which
only resolved when pytest was invoked from the repo root. Use
`Path(__file__).resolve()` so the test passes regardless of CWD.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Test <test@test.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-13 19:37:43 +04:00
Daniel Han
88c7b08faa
fix: prevent ai-assist model config RCE via untrusted Hugging Face repos (#4274)
* fix: disable remote code loading for ai-assist model hint lookup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-13 19:29:11 +04:00
Roland Tannous
e539965740 fix error for chat template 2026-03-13 15:18:04 +00:00
Roland Tannous
8108f1bf11 Fix nvm/npmrc prefix conflict in setup.sh 2026-03-13 08:59:51 +00:00
Daniel Han
3e8f085474 Limit rocm711-torch291 to Linux 2026-03-13 01:40:56 -07:00
sstamenk
a54a913431 Add more ROCm/PyTorch combinations
(cherry picked from commit d02aa7f9c3)
2026-03-13 01:40:56 -07:00
sstamenk
c752c8107a Add more ROCm/PyTorch versions
(cherry picked from commit ed6877fadd)
2026-03-13 01:40:56 -07:00
Daniel Han
51bf500f57
Remove Blackwell flex attention disable workaround from studio (#4273)
The studio was disabling flex attention entirely on Blackwell+ GPUs
(sm_120 and above) by setting UNSLOTH_ENABLE_FLEX_ATTENTION=0 at
startup. This was a workaround for the flex_attention backward kernel
exceeding shared memory limits on these GPUs.

The root cause is now fixed in unsloth-zoo (PR #542) which patches the
backward kernel config selection to generate safe fallback configs that
fit within the GPU's shared memory limit. With that fix, flex attention
works correctly on Blackwell GPUs and provides a ~1.3x speedup over
the SDPA fallback.
2026-03-13 01:35:17 -07:00
Daniel Han
37b8d5e440
remove duplicate import (#4271)
(cherry picked from commit d1f4fb5d6a)

Co-authored-by: electron271 <66094410+electron271@users.noreply.github.com>
2026-03-13 00:26:38 -07:00
Daniel Han
d6e40df8fa
Fix llm_int8_skip_modules for VLM dynamic quants on transformers 5.x (#4249)
Fix `llm_int8_skip_modules` not being respected for VLMs with dynamic quantization on transformers 5.x.

Dynamic quant checkpoints (e.g. `gemma-3-4b-it-unsloth-bnb-4bit`) encode skip paths as `language_model.model.layers.*`, but the live module tree on 5.x surfaces them as `model.language_model.layers.*`. This prefix mismatch causes `should_convert_module` to miss the skip list, so 22 modules meant to stay in 16-bit get wrapped in `Linear4bit` without a `quant_state`, producing "Skipping ... no quant_state found" warnings.

Patches `should_convert_module` to expand both the module name and the skip patterns into all equivalent alias forms before matching. Guarded by `hasattr` so it is a no-op on transformers 4.x where the bug does not exist.

Closes #4208
2026-03-13 00:17:00 -07:00
Daniel Han
1ca441a3f3
[Feature] VLMs support for GRPO (#4265)
* Updated rl and rl_replacements

* Revert "Updated rl and rl_replacements"

This reverts commit 077fd5996daa73c9c58c9f213657f33f47f5d73b.

---------

Co-authored-by: Sinoué GAD <85933501+GAD-cell@users.noreply.github.com>
2026-03-12 16:09:02 -07:00
Daniel Han
74c1497f2f
[Feature] Support Sequence Classification (#4264)
* initial commit for sequence classification implementation

* Revert "initial commit for sequence classification implementation"

This reverts commit 0f3200cdf2dfb8446e5d69dcbe40d6f70bc520e7.

---------

Co-authored-by: Rabin Tiwari <84705625+rabintiwari45@users.noreply.github.com>
2026-03-12 16:08:49 -07:00
Daniel Han
96ff5c5f61
Update CODEOWNERS for studio and cli (#4266)
* Update CODEOWNERS for studio and cli

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-12 15:16:38 -07:00
Daniel Han
c26aa1a1e8 Restore non-studio files from main after history recovery 2026-03-12 21:48:45 +00:00
Daniel Han
6f0bca70f8 Merge remote-tracking branch 'studio/feature/merge-build-main' into history-recovery-candidate 2026-03-12 21:48:30 +00:00
Daniel Han
17ae3d3cba Revert "Studio (#4237)"
This reverts commit f08aef1804.
2026-03-12 21:48:23 +00:00
Roland Tannous
2b04c0da40 add build.sh 2026-03-12 20:52:42 +00:00
Roland Tannous
47654cb91c Final cleanup 2026-03-12 18:28:04 +00:00
Roland Tannous
3613166b6d Delete studio/frontend/README.md 2026-03-12 22:20:34 +04:00
Roland Tannous
8002913b7c Delete studio/frontend/AGENTS.md 2026-03-12 22:20:23 +04:00
Roland Tannous
a2baf80511 Update license headers 2026-03-12 17:23:10 +00:00
Roland Tannous
7de1c18c14 Update llm_assist.py 2026-03-12 21:06:04 +04:00
Roland Tannous
063cdc6072 Update llm_assist.py 2026-03-12 20:32:04 +04:00
Roland Tannous
5798a34606 Update run.py 2026-03-12 19:28:48 +04:00
Roland Tannous
9bb64fbd96 Update run.py 2026-03-12 18:53:19 +04:00
Roland Tannous
10bee32f3d Update run.py 2026-03-12 18:30:00 +04:00
Roland Tannous
20aeb2ef19 Update studio.py 2026-03-12 18:13:56 +04:00
Roland Tannous
7881fc253f Update install_python_stack.py 2026-03-12 18:06:37 +04:00
Roland Tannous
542a25977a Update run.py 2026-03-12 18:01:14 +04:00
Roland Tannous
f598e69f38 Update studio.py 2026-03-12 17:30:10 +04:00
Roland Tannous
874711912d Update studio.py 2026-03-12 17:28:08 +04:00
Roland Tannous
788b120114 Update setup.ps1 2026-03-12 17:11:20 +04:00
Roland Tannous
3cf27589a6 Remove AGENTS.md from frontend folder 2026-03-12 12:00:42 +00:00
Roland Tannous
a98164af50 Remove README.md from frontend folder 2026-03-12 11:59:56 +00:00
Daniel Han
36785caf80 Cache packed sequence metadata to reduce D2H syncs across layers (#4243)
* packing optimziation with cache to reduce D2H copy

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cache per device to avoid race condition for multi-gpu

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add cache freeing up func

---------

Co-authored-by: ruixiangw <ruixiangw@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ruixiang <wangruixiang07@outlook.com>
2026-03-12 03:37:49 -07:00
Daniel Han
f08aef1804 Studio (#4237)
* Rebuild Studio branch on top of main

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix security and code quality issues for Studio PR #4237

- Validate models_dir query param against allowed directory roots
  to prevent path traversal in /api/models/local endpoint
- Replace string startswith() with Path.is_relative_to() for
  frontend path traversal check in serve_frontend
- Sanitize SSE error messages to not leak exception details to
  clients (4 locations in inference.py)
- Bind port-discovery socket to 127.0.0.1 instead of all interfaces
  in llama_cpp backend
- Import datasets_root and resolve_output_dir in embedding training
  function to fix NameError and use managed output directory
- Remove stale .gitignore entries for package-lock.json and test
  directories so tests can be tracked in version control
- Add venv-reexecution logic to ui CLI command matching the studio
  command behavior

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Move models_dir path validation before try/except block

The HTTPException(403) was inside the try/except Exception handler,
so it would be caught and re-raised as a 500. Moving the validation
before the try block ensures the 403 is returned directly and also
makes the control flow clearer for static analysis (path is validated
before any filesystem operations).

* Use os.path.realpath + startswith for models_dir validation

CodeQL py/path-injection does not recognize Path.is_relative_to() as
a sanitizer. Switched to os.path.realpath + str.startswith which is
a recognized sanitizer pattern in CodeQL's taint analysis. The
startswith check uses root_str + os.sep to prevent prefix collisions
(e.g. /app/models_evil matching /app/models).

* Never pass user input to Path constructor in models_dir validation

CodeQL traces taint through Path(resolved) even after a startswith
barrier guard. Fix: the user-supplied models_dir is only used as a
string for comparison against allowed roots. The Path object passed
to _scan_models_dir comes from the trusted allowed_roots list, not
from user input. This fully breaks the taint chain.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-12 03:36:19 -07:00
Roland Tannous
400b6ecede Update setup.ps1 2026-03-12 02:44:25 +04:00
Roland Tannous
220a7bb1ed Update setup.sh 2026-03-12 02:42:43 +04:00
Roland Tannous
11e74b2dc5 resolved conflicts 2026-03-11 20:58:25 +00:00
Roland Tannous
1087216cb5 Merge branch 'fix/pre-merge-cleanup' into feature/merge-build-final 2026-03-11 20:56:49 +00:00
Roland Tannous
6f77c63229 refactor: remove project_root passing, use self-resolved paths and ~/.unsloth/studio
- Workers now compute backend_path and venv_t5 locally via Path(__file__)
- Moved .venv_t5 to ~/.unsloth/studio/.venv_t5
- Added ensure_studio_directories() call on server startup
- Expanded CLI studio command into sub-app with setup subcommand
2026-03-11 20:32:18 +00:00
Manan17
fbccac8cee shifting setup & co inside studio 2026-03-11 20:19:52 +00:00
Shine1i
bbb4cd0f0b feat(studio): add auth-specific paths and integrate auth database location 2026-03-11 20:19:52 +00:00
Shine1i
7012b8396f fix(studio): update temporary directory path to use system temp dir 2026-03-11 20:19:52 +00:00
Shine1i
904e440513 feat(studio): studio storage roots path utilities 2026-03-11 20:19:52 +00:00
Roland Tannous
d6e4a0644f resolved format_conversion conflict 2026-03-11 19:53:53 +00:00
Roland Tannous
6926a8b091 fix: prefer tabular files over archives in Tier 1 dataset preview
Tier 1 check-format was picking images.zip over testmini.parquet,
causing wrong columns (image/label) and broken VLM mapping.
Also log first VLM conversion failure instead of swallowing silently.
2026-03-11 19:13:11 +00:00
Roland Tannous
a63196c93e updated on completion response markers for qwen3.5 2026-03-11 19:00:29 +00:00
Roland Tannous
e455b307be add fmpeg system support for linux and windows 2026-03-11 18:50:11 +00:00
Roland Tannous
adf919f47a Remove test_llama_cpp.ps1 2026-03-11 18:18:19 +00:00
Roland Tannous
a6aa9a0efa Remove test_llama_cpp.ps1 from tracking 2026-03-11 18:10:25 +00:00
Roland Tannous
859bfe23c4 Merge pull request #375 from unslothai/feature/llm-assist-detection
Feature/llm assist detection
2026-03-11 22:02:33 +04:00
Roland Tannous
b274e9e0c6 chore: merge nightly & update dataset preview dialog mapping text 2026-03-11 17:00:14 +00:00
Roland Tannous
0e3ac91e2a feat: target AI Assist mapping prompts for audio & embedding models 2026-03-11 16:55:43 +00:00
Roland Tannous
9dac1bedf9 Merge remote-tracking branch 'origin/nightly' into feature/llm-assist-detection 2026-03-11 16:23:09 +00:00
Roland Tannous
6d6a62821e Merge pull request #374 from unslothai/fix/model-caching-issues
Fix: Normalize HuggingFace model identifiers to lowercase
2026-03-11 18:40:26 +04:00
Roland Tannous
7862e70211 fix: lowercase remote Hugging Face model IDs in ModelConfig and routes to prevent caching mismatches with Unsloth 2026-03-11 14:20:25 +00:00
Roland Tannous
fb211f3254 Merge pull request #372 from unslothai/fix/input-focus-clipping
Input focus outline clipped by container
2026-03-11 18:12:04 +04:00
Roland Tannous
774c9b17fd Merge pull request #373 from unslothai/feature/structlog-logging-system
feat: integrate structlog, configure workers for prod logging, and mi…
2026-03-11 16:52:49 +04:00
Roland Tannous
817f2e8dcc feat: integrate structlog, configure workers for prod logging, and migrate print statements 2026-03-11 12:33:16 +00:00
imagineer99
014695b38a fix: scope overflow-visible to studio collapsibles 2026-03-11 11:26:43 +00:00
imagineer99
984f4a4acb fix: input focus outline clipping 2026-03-11 11:11:57 +00:00
Roland Tannous
ee063c5910 Merge pull request #367 from unslothai/fix/yaml-syntax
Modified to fix the yaml syntax for unsloth_Qwen3-14B-Base-unsloth-bnb-4bit
2026-03-11 13:39:48 +04:00
Roland Tannous
16eb39b53d Merge pull request #369 from unslothai/fix/model-mappping-syntax
fixed string concatenation in model mapping
2026-03-11 12:39:13 +04:00
Samit
379bbbdbdd fixed string concatenation in model mapping 2026-03-11 00:07:26 -07:00
Samit
822050bf57 modified to fix the yaml syntax 2026-03-10 23:58:51 -07:00
Manan Shah
2aa9322167 Merge pull request #365 from unslothai/fix/gguf-gemma-with-text
fixing gguf export for gemma with text
2026-03-10 17:59:22 -07:00
Manan17
5ca623a166 fixing gguf export for gemma with text 2026-03-11 00:58:22 +00:00
Wasim Yousef Said
739838bd48 Merge pull request #364 from unslothai/feature/chat-seq-slider
chat seq slider
2026-03-11 01:56:48 +01:00
Shine1i
4a8a96b1af chat seq slider 2026-03-11 01:41:25 +01:00
Manan Shah
cce274717b Merge pull request #357 from unslothai/feat/embedding-models
feat: add embedding model training support
2026-03-10 14:59:20 -07:00
Manan17
983c20bbb2 local model's embedding nature check 2026-03-10 21:58:45 +00:00
Manan17
294a3d3e47 fix: reset isEmbeddingModel in error fallback paths to prevent stale state 2026-03-10 21:33:13 +00:00
Roland Tannous
5b042165e6 Merge pull request #363 from unslothai/feature/enable-all-modalities
Removed audio and embedding from coming soon
2026-03-11 01:32:43 +04:00
Manan17
bc5a72dd8c fix: local directory dataset loading 2026-03-10 21:29:51 +00:00
imagineer99
8d6f88577f chore: removed audio and embedding from coming soon 2026-03-10 21:29:18 +00:00
Wasim Yousef Said
98e3396fbe Merge pull request #356 from unslothai/fix/summary-step-spacing-and-colors
Redesign summary step with consistent card layout, spacing and icons
2026-03-10 22:26:10 +01:00
Wasim Yousef Said
27311e3986 Merge pull request #362 from unslothai/feature/setup-no-llama-nuke
fix(setup): stop deleting llama.cpp in setup
2026-03-10 22:25:12 +01:00
Manan Shah
f696ef81e8 Merge branch 'nightly' into feat/embedding-models 2026-03-10 14:16:05 -07:00
Roland Tannous
08d9c84f1f Merge pull request #359 from unslothai/fix/stream-manual-slice-dataset
fix: stream HF dataset when manual slice is specified
2026-03-11 01:13:51 +04:00
Manan17
9523e5c1f9 fixing embedding model search 2026-03-10 21:12:24 +00:00
Shine1i
2895518f0c fix(setup): stop nuking llama.cpp in setup 2026-03-10 22:03:01 +01:00
Wasim Yousef Said
29e56e8649 Merge pull request #361 from unslothai/fix/tooltip-z-index
Increase tooltip z-index to appear above dropdowns
2026-03-10 22:01:30 +01:00
imagineer99
d572c43814 fix: increase tooltip z-index to appear above dropdowns 2026-03-10 20:57:12 +00:00
Manan17
3b0b002b34 fixing logging for each step 2026-03-10 20:32:40 +00:00
Roland Tannous
21ef22a9ff fix: skip streaming when dataset_slice_start > dataset_slice_end
Prevents training on the wrong row range when start exceeds end by
falling back to full download where existing clamping handles it.
2026-03-10 20:21:34 +00:00
imagineer99
5dcbf86d09 fix: reject negative manual dataset slices
Prevent negative Train Split Start/End values in the dataset advanced UI and sanitize payload mapping so negative slice values are never sent to the backend.

Made-with: Cursor
2026-03-10 20:13:46 +00:00
Roland Tannous
226f251589 fix: guard against negative dataset_slice_end before streaming
Fall back to full download when dataset_slice_end is negative,
avoiding an empty stream.take(0) that would produce a broken dataset.
2026-03-10 20:12:42 +00:00
Roland Tannous
b91cdda2b9 Merge pull request #354 from unslothai/fix/audio-train-completions
fix: uncheck train_on_completions for audio models
2026-03-11 00:05:51 +04:00
Roland Tannous
949f2ac87e Merge pull request #358 from unslothai/fix/sharded-gguf
fix: download all GGUF shards for split models
2026-03-11 00:05:01 +04:00
Roland Tannous
970a029108 fix: stream HF dataset when manual slice is specified
Instead of downloading the full dataset and then slicing, use
streaming mode to only fetch the rows needed (up to slice_end + 1)
when a manual dataset slice is configured.
2026-03-10 19:50:53 +00:00
Roland Tannous
c986174c56 fix: preserve zero-valued dataset slice boundaries in embedding worker
Use explicit None checks instead of falsy `or` for slice_start and
slice_end so that a valid slice_end=0 is not replaced with the full
dataset length.
2026-03-10 19:33:10 +00:00
Roland Tannous
b84202e8db fix: restrict shard siblings to exact basename and total count
startswith(prefix) could match unrelated split variants whose names
extend the selected file's prefix (e.g. model-Q8_0-v2-00001-of-...).
Now builds an exact regex from the chosen file's base prefix and shard
total so only true siblings are downloaded.
2026-03-10 19:28:26 +00:00
Shine1i
18a60b930a chore/fix(studio): add placeholder dropdowns for dataset subset and splits in disabled state 2026-03-10 20:27:11 +01:00
Roland Tannous
b8678a3ed6 fix: pass hf_token for gated embedding models and key cache by token
- Forward hf_token to FastSentenceTransformer.from_pretrained() so
  private/gated embedding repos authenticate correctly
- Key _embedding_detection_cache by (model_name, hf_token) tuple so
  unauthenticated lookups don't shadow subsequent authenticated ones
2026-03-10 19:20:12 +00:00
Roland Tannous
d635846b8d fix: use exact variant matching and shard-prefix discovery for split GGUFs
Substring matching (e.g. "Q8_0" in filename) could match superset
variants like "IQ8_0", causing wrong quantizations to be downloaded.
Now uses word-boundary regex for variant matching and discovers split
shards by shared filename prefix rather than treating all variant
matches as shards.
2026-03-10 19:13:03 +00:00
Roland Tannous
d6ae910edc fix: propagate is_embedding into worker subprocess config
start_training() cherry-picks kwargs into a config dict but was missing
is_embedding, so config.get("is_embedding", False) in worker.py always
returned False and embedding training never ran.
2026-03-10 19:05:47 +00:00
Roland Tannous
defa761fb2 fix: download all GGUF shards for split models (e.g. 7B Q8_0)
LlamaCppBackend.load_model() only downloaded the first matching GGUF
file. For split models (e.g. 7B Q8_0 with 3 shards), llama-server
needs all shards present. Now collects and downloads all matching files.
2026-03-10 19:04:10 +00:00
Roland Tannous
846cc2cf2a fix: always force-uncheck trainOnCompletions for pure audio models in dataset check
Separate pure-audio from audio-VLM logic in runDatasetCheck so pure
audio models are always forced to trainOnCompletions=false regardless
of dataset type, while audio VLMs (gemma3n) only uncheck when the
dataset is audio.
2026-03-10 19:02:49 +00:00
Roland Tannous
d9f2d08267 fix: reset isAudioModel on model config fetch failure
Clear stale isAudioModel in the fallback path when getModelConfig
fails, preventing a previously-selected audio model's flag from
leaking into the next model selection.
2026-03-10 19:00:56 +00:00
Roland Tannous
5a086353ab feat: add embedding model training support
Add end-to-end embedding/sentence-transformer training pipeline using
FastSentenceTransformer, SentenceTransformerTrainer, and
MultipleNegativesRankingLoss with BatchSamplers.NO_DUPLICATES.

Backend:
- Add is_embedding_model() detection via HF tags + pipeline_tag
- Add /check-embedding/ API route and EmbeddingCheckResponse
- Extend derive_model_type() to return "embeddings"
- Add _run_embedding_training() in worker.py with progress callbacks,
  stop handling, LoRA (task_type=FEATURE_EXTRACTION), and model saving
- Add is_embedding field to TrainingStartRequest and ModelDetails
- Add YAML configs for 5 models: all-MiniLM-L6-v2, bge-m3,
  embeddinggemma-300m, gte-modernbert-base, Qwen3-Embedding-0.6B

Frontend:
- Wire isEmbeddingModel flag through store, API types, and mappers
- Force packing=false, train_on_completions=false, warmup_ratio=0.03
- Hide packing and train_on_completions checkboxes for embedding models
- Auto-set modelType to "embeddings" from backend model_type response
2026-03-10 18:10:09 +00:00
Roland Tannous
5b8f5bc554 fix: improve advisor prompts for more reliable column role assignment
- Pass 1: clearer definition of "conversational" vs non-conversational,
  constrained dataset_type to specific enum values
- Pass 2: much more explicit worked examples with step-by-step reasoning,
  added "skip" role for metadata columns, stronger reminder at end that
  all-user is wrong
- Pass 3: returns raw text instead of JSON for cleaner system prompts,
  removed system message to give model more freedom
2026-03-10 18:01:20 +00:00
Roland Tannous
1430bbc604 fix: uncheck train_on_completions for audio models
Pure audio models (orpheus, sparktts, whisper, sesame-csm) now
always have trainOnCompletions auto-unchecked when selected.
Gemma3n (audio_vlm) only unchecks when the dataset is audio.

- Add is_audio to frontend ModelConfigResponse (backend already returns it)
- Add isAudioModel state to training config store
- Auto-set trainOnCompletions=false for pure audio models on model load
- Auto-set trainOnCompletions=false for audio VLMs when dataset is audio
- Respect manual user override via existing _trainOnCompletionsManuallySet flag
2026-03-10 17:39:35 +00:00
imagineer99
c895cc56a4 fix: redesign summary step with consistent card layout, icons, and compact spacing 2026-03-10 17:38:10 +00:00
Roland Tannous
cb389fb756 Merge pull request #353 from unslothai/feat/dataset-shortlist-and-model-type
Curated dataset shortlists and model type plumbing
2026-03-10 21:31:16 +04:00
Roland Tannous
2fc50ff0cf refactor: advisor maps columns to roles instead of generating templates
The advisor now only assigns columns to user/assistant roles and
generates a system prompt. Templates (user_template, assistant_template)
are removed entirely — the LLM was frequently putting all columns in
user or copying actual data values into templates.

Column values are now used directly as message content, grouped and
concatenated by role. This is simpler, more robust, and prevents the
class of bugs where the advisor generates bad template content.
2026-03-10 17:17:27 +00:00
Roland Tannous
21cff233e5 feat: add model_type field to backend /config and /list responses
Derive a single model_type string ("text" | "vision" | "audio" | "embeddings")
from existing is_vision and audio_type detection, so the frontend doesn't have
to infer modality from scattered boolean flags.
2026-03-10 16:54:19 +00:00
Roland Tannous
a30153e1bb fix: improve Pass 2 prompt to correctly split INPUT/OUTPUT columns
The LLM was putting all columns in user_template (e.g. summarization
dataset had both document AND summary as user input). Fixed by:

- Reframed system message: explicitly states user=INPUT, assistant=OUTPUT
- Added 4 concrete correct examples (summarization, NLI, translation, QA)
  showing exactly how to split columns
- Added "NEVER put the output/target column in the user template" rule
- Added sanity check: if assistant_template has no column placeholders,
  reject the result and fall back to simple classification
2026-03-10 16:47:50 +00:00
Roland Tannous
5db251b31c fix: include label mapping in Pass 3 system prompt generation
Pass 3 now sees the label mapping from Pass 2 (e.g. "0 = does not follow,
1 = follows, 2 = entailed") so the generated system prompt can explain
what each label value means. Also bumped to 2-4 sentences to give room
for the label descriptions.
2026-03-10 16:21:54 +00:00
Roland Tannous
49a4089dfa feat: Beta badge, generated System column, fix table scroll
- Add "Beta" badge next to AI Assist button text
- When advisor generates a system prompt, show it as a "System (generated)"
  column prepended to the data table so user can see it alongside data
- Fix table being squished to near-zero height when advisor notification
  banner is present: add min-h-[250px] to table wrapper, change body
  from overflow-hidden to overflow-auto
2026-03-10 16:14:16 +00:00
Roland Tannous
78489e41c4 refactor: 3-pass advisor — dedicated system prompt generation
Pass 1: Classify dataset type (unchanged)
Pass 2: Generate user/assistant templates + label mapping + column roles
  (system_prompt removed from this pass to keep it focused)
Pass 3: Generate system prompt (only for non-conversational datasets)
  - Dedicated pass with focused prompt that sees the templates from Pass 2
  - Skipped entirely for conversational datasets
  - Produces specific, task-relevant system prompts
2026-03-10 16:07:30 +00:00
Roland Tannous
76cc5b19cb fix: show generated templates in UI, make system prompt optional
- System prompt is now optional — LLM only generates one when the task
  is ambiguous from the data alone (persona, domain, format constraints)
- Sanitize system_prompt extraction (handle literal "null" string)
- Show system prompt, user template, and assistant template in the
  advisor notification banner so user can see exactly what was generated
- Templates displayed in monospace with labeled sections
2026-03-10 16:01:57 +00:00
Roland Tannous
48a5e49313 fix: remove Pass 3 self-scoring, trust Pass 2 output directly
The LLM was bad at scoring its own conversion quality — rejecting good
Pass 2 output (score 5/10 for a perfectly usable conversion). Instead:
- Remove Pass 3 entirely (saves ~0.4s and one inference call)
- Trust Pass 2 output and return it to the user
- Build notification from Pass 1 classification info instead
- User can always adjust mapping via dropdowns if they disagree
2026-03-10 15:56:48 +00:00
Roland Tannous
ed849b7d0d fix: advisor quality gate, better prompts, always show AI Assist button
- Reject advisor result when Pass 3 scores < 6 or is_acceptable=false,
  falls back to simple column classification instead of using bad output
- Improved Pass 2 prompt: explicit rules for label_mapping completeness,
  {column_name} vs {column_name_name} for mapped labels, column_roles
  must match which template uses them
- Build suggested_mapping from ALL template-referenced columns (not just
  first match per role) — fixes hypothesis being dropped from SNLI mapping
- Guard against LLM returning literal string "null" for revised_system_prompt
- Always show AI Assist button when available, even when mapping looks complete
2026-03-10 15:51:14 +00:00
Roland Tannous
ab58121cd8 fix: harden template mapping for complex column types and curly braces
- Handle dict columns (e.g. squad answers) by extracting text instead
  of raw repr()
- Handle list columns by joining or extracting single value
- Catch ValueError in .format() calls (stray { } in column data)
- Add missing json import to dataset_utils.py
2026-03-10 15:43:35 +00:00
Roland Tannous
202780c32c feat: Dataset Conversion Advisor — multi-pass LLM for non-conversational datasets
Non-conversational HF datasets (e.g. stanfordnlp/snli) were naively mapped
column→role, producing poor training results. The AI Assist button now runs
a 3-pass advisor using Qwen 7B that:
1. Fetches the HF dataset card/README to understand the dataset purpose
2. Classifies the dataset type and determines if conversion is needed
3. Generates a system prompt, user/assistant templates with {column}
   placeholders, and label mappings (e.g. 0→entailment)
4. Validates the conversion quality (score ≥7/10 required)

Architecture: advisor metadata flows as __-prefixed keys in
custom_format_mapping (e.g. __system_prompt, __user_template,
__assistant_template, __label_mapping). The existing _apply_user_mapping()
detects these keys and routes to template-based conversation construction.
No __ keys = existing simple mode (backwards compatible).

Backend: upgraded llm_assist.py (7B default, multi-pass advisor,
HF card fetching), extended API models, added _apply_template_mapping()
to dataset_utils.py.

Frontend: extended store with advisor state fields, wired AI Assist
to store templates/system prompt, inject __ metadata in training request,
show advisor notification banner in mapping card.
2026-03-10 15:39:56 +00:00
Roland Tannous
c2dd0f4cf1 fix: download all GGUF shards for split models (e.g. 7B Q8_0)
LlamaCppBackend.load_model() and precache_helper_gguf() only downloaded
the first matching GGUF file. For split models (e.g. 7B Q8_0 with 3
shards), llama-server needs all shards present. Now collects and
downloads all matching files.
2026-03-10 15:08:20 +00:00
Roland Tannous
7f1fd28acd debug: decode first sample after train_on_completions masking 2026-03-10 14:08:14 +00:00
imagineer99
3de197ac31 rename: tts model type to audio for broader category support 2026-03-10 13:28:49 +00:00
Roland Tannous
49b29fb1fd debug: fix dataset access - result is a dict, use dataset['dataset'] 2026-03-10 13:19:31 +00:00
imagineer99
968f11f60a feat: infer tts model type from backend is_audio flag 2026-03-10 12:57:40 +00:00
Roland Tannous
21cd9f9d02 debug: improve sample preview with type info and traceback 2026-03-10 12:56:24 +00:00
Roland Tannous
a36c073770 debug: switch to print() for subprocess visibility 2026-03-10 12:49:01 +00:00
Roland Tannous
97612af993 debug: add temporary log statements for dataset preview and VLM instruction 2026-03-10 12:35:55 +00:00
imagineer99
8cba556bea feat: curated dataset shortlists and model type plumbing 2026-03-10 12:00:09 +00:00
Roland Tannous
5d471d7e4a feat: add AI Assist button for user-triggered column classification
Move LLM-assisted column mapping from silent /check-format automation
to an explicit "AI Assist" button in the dataset mapping dialog. This
makes the feature transparent and user-controlled.

- Remove llm_classify_columns() from check_dataset_format() (heuristic-only)
- Remove auto-save suggested_mapping from use-training-actions.ts
- Add POST /api/datasets/ai-assist-mapping endpoint (receives preview
  samples from frontend, no dataset re-loading needed)
- Add AiAssistMappingRequest/Response models
- Add aiAssistMapping() frontend API function
- Add Sparkles AI Assist button to DatasetMappingCard with loading state
- Wire up handleAiAssist handler in dataset-preview-dialog.tsx
2026-03-10 11:09:01 +00:00
Roland Tannous
6ae931ca46 Merge pull request #343 from unslothai/fix/cli-changes
Fix/cli changes
2026-03-10 14:38:35 +04:00
Roland Tannous
a26a5cc6be Merge pull request #352 from unslothai/fix/cancel-training
Fix/cancel training
2026-03-10 14:38:30 +04:00
Roland Tannous
0ec340d3e1 fix: LLM-assisted mapping flows from /check-format to training
- Frontend auto-saves suggested_mapping into datasetManualMapping when
  check-format returns requires_manual_mapping=false, so the mapping
  flows to training via custom_format_mapping (no redundant AI calls)
- Backend returns meaningful warning when column detection fails
  (LLM-generated or static fallback) for both text and VLM datasets
- /check-format endpoint merges check_dataset_format warnings with
  existing URL-based image detection warnings
2026-03-10 09:58:58 +00:00
Roland Tannous
f7ca361c5c feat: add LLM-assisted dataset detection using ephemeral GGUF helper
Uses Qwen2.5-3B-Instruct Q8_0 via LlamaCppBackend to complement
heuristic-based dataset detection when heuristics are uncertain.

- New llm_assist.py: VLM instruction generation, column classification,
  and user-friendly warning generation for dataset issues
- Pre-cache helper GGUF on FastAPI startup (background thread)
- Reorder training pipeline: dataset processing runs BEFORE model load
  to avoid VRAM contention (detect → dataset → model → train)
- Add pre_detect_and_load_tokenizer() for lightweight detection
- LLM warnings on VLM conversion failures (broken URLs, missing images)
- LLM column classification fallback when heuristics return unknown
- Graceful degradation: all paths unchanged when helper unavailable
2026-03-10 09:20:45 +00:00
Manan17
fd7ca8bda8 distinguish cancel and stop for force terminate 2026-03-10 02:35:32 +00:00
pre-commit-ci[bot]
bced78373f [pre-commit.ci] pre-commit autoupdate (#4192)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.15.4 → v0.15.5](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.4...v0.15.5)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-09 19:29:08 -07:00
Manan17
9be55f0c1b fixing cancel training 2026-03-10 02:20:56 +00:00
Roland Tannous
daa50d0756 Revert "Merge pull request #347 from unslothai/feature/studio-storage-roots"
This reverts commit 6b43e33ff1, reversing
changes made to 9edadaf21f.
2026-03-10 01:52:47 +00:00
Manan17
8c493bbd20 CLI fix for backend changes 2026-03-10 01:51:00 +00:00
Roland Tannous
6b43e33ff1 Merge pull request #347 from unslothai/feature/studio-storage-roots
update studio storage roots
2026-03-10 05:49:42 +04:00
Roland Tannous
9edadaf21f Merge pull request #350 from unslothai/fix/vision-datasets-fix
Fix VLM dataset detection and conversion
2026-03-10 05:48:27 +04:00
Roland Tannous
8488c2b1df fix: fall back to auto-detection when user VLM mapping fails
Instead of erroring out when custom_format_mapping fails conversion,
clear it and let auto-detection try. Handles stale cached mappings.
2026-03-10 01:42:25 +00:00
Roland Tannous
dd6c38cc7b fix: probe image column candidates when multiple exist
When multiple image columns are found, probes them (HEAD for URLs,
os.path.exists for paths) and picks the first that works.
Skips probing when top candidate is PIL/dict (score >= 75).
2026-03-10 01:38:33 +00:00
Roland Tannous
81adc47b6e fix: prefer URL image columns over bare filenames, add value-based fallback
find_image_column now scores candidates by resolvability (PIL > dict > URL > path)
and has a Pass 2 value-based fallback for columns not matching image keywords.
Fixes phiyodr/coco2017 picking file_name (unresolvable) over coco_url (resolvable).
2026-03-10 01:36:19 +00:00
Roland Tannous
d6803de35a fix: detect list-of-strings text columns and pick random element for VLM conversion
Handles datasets like phiyodr/coco2017 where captions is a list of strings.
2026-03-10 01:32:19 +00:00
Roland Tannous
0b8325ab96 feat: add ShareGPT+image VLM format support and improve image column detection
- Detect and convert ShareGPT/ChatML conversations with <image> placeholders
- Add file_name/filename as image column keywords
- Detect image paths and URLs by value (string ending in .jpg/.png/etc)
2026-03-10 01:27:36 +00:00
Roland Tannous
56d02a3b57 fix: use word-boundary matching for image/audio column detection
Substring matching caused false positives like 'pic' in 'topic',
leading to non-deterministic image column selection.
2026-03-10 00:38:02 +00:00
Manan17
32569fc8a8 shifting setup & co inside studio 2026-03-09 23:48:31 +00:00
Shine1i
109db14817 feat(studio): add auth-specific paths and integrate auth database location 2026-03-09 23:48:31 +00:00
Shine1i
958bdef43e fix(studio): update temporary directory path to use system temp dir 2026-03-09 23:48:31 +00:00
Shine1i
5301514775 feat(studio): studio storage roots path utilities 2026-03-09 23:48:31 +00:00
Roland Tannous
32bbccc573 fix: resolve bare-filename images via HF repo lookup
Datasets like VQAonline store image filenames (e.g. "img.png") without
the directory prefix. Build a basename→repo_path lookup using
list_repo_files, then resolve each file via hf_hub_download.
2026-03-09 23:37:00 +00:00
Roland Tannous
c272c4f844 fix: prefer tabular files over archives in Tier 1 dataset preview
Tier 1 check-format was picking images.zip over testmini.parquet,
causing wrong columns (image/label) and broken VLM mapping.
Also log first VLM conversion failure instead of swallowing silently.
2026-03-09 22:00:20 +00:00
Roland Tannous
65413c95fb Merge pull request #349 from unslothai/license/agpl3-studio
Add AGPL-3.0 SPDX headers to all source files
2026-03-10 00:30:29 +04:00
Roland Tannous
d882678fe4 Add AGPL-3.0 SPDX headers to all source files 2026-03-09 20:17:45 +00:00
Roland Tannous
198ca7efce Merge pull request #348 from unslothai/license/agpl3-studio
Main license file for studio codebase
2026-03-09 23:38:08 +04:00
Roland Tannous
ac2906f357 Add AGPL-3.0 license to studio folder 2026-03-09 19:36:25 +00:00
Wasim Yousef Said
a4bc6330a0 Merge pull request #344 from unslothai/style/ui-feedback
Refine UI spacing, icons, and border radius per feedback
2026-03-09 19:24:12 +01:00
Wasim Yousef Said
971ef40d85 Merge pull request #345 from unslothai/feature/fixes-client
feat(studio): fix chat code block actions and some training view changes
2026-03-09 19:22:19 +01:00
Shine1i
1fe8995f1c feat(recipe-studio): add support for managing tools by provider in tool profiles 2026-03-09 19:19:14 +01:00
Shine1i
2ccb75f2b7 Merge remote-tracking branch 'origin/nightly' into feature/fixes-client 2026-03-09 19:07:42 +01:00
Roland Tannous
b6811bc5c4 Merge pull request #342 from unslothai/local-dataset
dataset upload
2026-03-09 21:22:23 +04:00
Roland Tannous
022bafaf92 store uploaded datasets under assets/datasets/uploads instead of ~/.cache 2026-03-09 17:06:36 +00:00
Roland Tannous
ae89101e81 Revert "narrow stale selection guard to only skip clearing for uploaded files"
This reverts commit fbcd111a70.
2026-03-09 16:51:30 +00:00
Roland Tannous
ffeefd15d1 Merge pull request #346 from unslothai/fix/eval-loss-worker-filtering
fix: eval loss broken after subprocess isolation refactor
2026-03-09 20:43:19 +04:00
Roland Tannous
41351e1566 fix: split dataset 80/20 when eval split matches train split 2026-03-09 16:36:44 +00:00
Shine1i
542d9126cc chore(data-recipe): bump data-designer to 0.5.2 and pin duckdb<1.5 2026-03-09 17:27:02 +01:00
Roland Tannous
6eba6fff43 fix: disable Start Training when eval_steps set without eval split 2026-03-09 16:20:00 +00:00
Shine1i
1f37b76b19 feat(recipe-studio): remove MCP tools-related dialogs and refactor tool profile management logic 2026-03-09 17:04:15 +01:00
Datta Nimmaturi
cff1e554fc [trl] Trl v0.28 (and above) rl fixes (#4156)
* Refactor loss computation to include completion_mask

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes for trl 0.28 and above

Remove sync/reload weights calls , remove vllm.LLM instantiation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor loss computation to include completion_mask

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes for trl 0.28 and above

Remove sync/reload weights calls , remove vllm.LLM instantiation

* patch rpc in openenv for newer trl

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pluesclues <136766175+pluesclues@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-09 12:01:49 -04:00
Roland Tannous
2a11e79b8b fix: restore eval_enabled early signal for subprocess training 2026-03-09 15:35:49 +00:00
Shine1i
c67f4ba29f feat(recipe-studio): improve UI responsiveness and fix JSON preview handling 2026-03-09 16:04:46 +01:00
Roland Tannous
c3185d5d98 fix: allow eval-only progress events through worker callback filter 2026-03-09 14:39:49 +00:00
Shine1i
1ae8fb402b feat(studio): centralize chart styling and formatting 2026-03-09 15:34:15 +01:00
Roland Tannous
fbcd111a70 narrow stale selection guard to only skip clearing for uploaded files 2026-03-09 14:01:02 +00:00
Roland Tannous
1d06e2f54c switch dataset upload from base64 JSON to multipart/form-data with streamed writes 2026-03-09 13:55:45 +00:00
Roland Tannous
56412f2362 include all candidate files when scanning a directory, not just the first 2026-03-09 13:52:45 +00:00
Shine1i
d66bc2760b feat(studio): rework chart settings with a new preferences store and revamped settings UI 2026-03-09 14:47:20 +01:00
Roland Tannous
c998227fec add client-side file size validation before upload 2026-03-09 13:38:00 +00:00
Roland Tannous
4c5ded4c52 normalize uploaded filename extension to lowercase for consistent downstream checks 2026-03-09 13:35:55 +00:00
imagineer99
8fa021e839 style: reduce border radius on onboarding summary cards 2026-03-09 13:33:00 +00:00
Roland Tannous
19b2a0a0da Merge pull request #340 from unslothai/fix/auth-audio
Added auth to audio generate endpoint
2026-03-09 17:25:49 +04:00
Roland Tannous
87c8d7b3da Merge pull request #338 from unslothai/fix/trust-code
Exposed trust_remote_code through the UI
2026-03-09 17:19:56 +04:00
Roland Tannous
91dd7fc762 merge nightly, resolve conflict in use-chat-model-runtime 2026-03-09 13:19:17 +00:00
Roland Tannous
c719f1ba54 training: restore YAML fallback for trust_remote_code (no UI toggle) 2026-03-09 13:10:24 +00:00
imagineer99
d1c544b705 style: remove playground sidebar right border 2026-03-09 13:08:59 +00:00
Roland Tannous
7989cd4567 respect trust_remote_code toggle, return helpful error when required 2026-03-09 13:06:55 +00:00
Shine1i
95f9e0ba41 feat(studio): add support for code block actions including copy and download options in markdown blocks 2026-03-09 13:07:54 +01:00
Roland Tannous
4858204c62 backend: resolve trust_remote_code from YAML when not set by frontend 2026-03-09 11:58:23 +00:00
imagineer99
e188a7b067 style: improve navbar spacing and icon rendering 2026-03-09 11:37:19 +00:00
Roland Tannous
1ddd138da8 add trust_remote_code to BackendTrainingDefaults type 2026-03-09 10:51:28 +00:00
Roland Tannous
a1105d8ef3 wire trust_remote_code from YAML configs to frontend toggles 2026-03-09 10:15:15 +00:00
Roland Tannous
5e36ae2629 add trust_remote_code defaults to all model configs 2026-03-09 09:54:52 +00:00
Roland Tannous
e1b6798a4e Merge pull request #329 from unslothai/fix/add-title
modified the title
2026-03-09 13:10:17 +04:00
Manan17
a08b73e385 remove file size limit 2026-03-09 07:04:02 +00:00
Manan17
d132730f6b CLI fix for backend changes 2026-03-09 07:00:06 +00:00
Manan17
a49638c504 dataset upload 2026-03-09 05:50:18 +00:00
Wasim Yousef Said
91e81227bd Merge pull request #273 from unslothai/feature/data-reciper-enchansments
UX + layout polish & WIP data-reciper client & backend finalization p2
2026-03-09 02:57:32 +01:00
Shine1i
f41a552c29 feat(recipe-studio): enforce run name validation for full runs and refine validation UI 2026-03-09 02:53:59 +01:00
Shine1i
3b1663b1e9 feat(recipe-studio, datasets): improve dataset handling and update metadata logic 2026-03-09 02:47:32 +01:00
Shine1i
84f005fb25 feat(recipe-studio, studio): dataset logic, refine run settings, and improve validation UI 2026-03-09 02:06:12 +01:00
samit
2db36c0b30 added auth to audio generate endpoint 2026-03-08 17:46:54 -07:00
Roland Tannous
254f10e37a Merge pull request #328 from unslothai/fix/chat-unloading-model
fixed model unload before load without validation
2026-03-09 04:40:05 +04:00
Shine1i
e00d7c6745 feat(studio): refine dataset selection logic with Hugging Face and local dataset support 2026-03-09 01:16:45 +01:00
samit
662cb1c440 Adding trust_remote_code to the orchestrator and worker 2026-03-08 16:44:41 -07:00
Shine1i
a2dde15367 merge nightly 2026-03-09 00:32:33 +01:00
samit
86e94b5844 exposed trust_remote_code through the UI 2026-03-08 16:28:56 -07:00
Roland Tannous
2c879bf4a4 Merge pull request #320 from unslothai/fix/stale-dataset-split-on-switch
Fix/stale dataset split on switch
2026-03-09 00:14:09 +04:00
Roland Tannous
1bf0d39f3d Merge pull request #323 from unslothai/fix/vision-dataset-search-filter
fixing update model type
2026-03-09 00:13:50 +04:00
Roland Tannous
d98d4da6c8 Merge pull request #223 from unslothai/feature/support-for-audio-models
Adding support for audio llms
2026-03-09 00:09:59 +04:00
Roland Tannous
7b76fccb9b fix: loosen executorch pin for python 3.13 compat 2026-03-08 19:56:09 +00:00
Roland Tannous
a1778d6655 fix: replace is_dataset_multimodal with is_dataset_image/is_dataset_audio in training orchestrator 2026-03-08 19:40:00 +00:00
Roland Tannous
d10012d9fb silence pip check output 2026-03-08 19:22:31 +00:00
Roland Tannous
0f11415c22 make pip check non-fatal for known third-party conflicts 2026-03-08 19:21:28 +00:00
Manan17
80b704d7b7 Audio_VLM bug fix 2026-03-08 19:14:07 +00:00
Roland Tannous
5a52a0131a add extras-no-deps install step for audio model support 2026-03-08 18:49:55 +00:00
imagineer99
cc0aa43d5d fix: replace favicon with branded sloth icon 2026-03-08 18:46:42 +00:00
Roland Tannous
7ee81dd7df feat: route audio inference (TTS, ASR, Whisper) through orchestrator/worker subprocess 2026-03-08 18:25:27 +00:00
Roland Tannous
f4393ed3e5 fix: pin streamdown package versions to avoid type mismatch 2026-03-08 16:29:28 +00:00
Wasim Yousef Said
efaae9dc4c Merge pull request #308 from unslothai/fix/browser-autofill-hf-token
Prevent browser credential autofill in HF token fields
2026-03-08 16:51:11 +01:00
Wasim Yousef Said
2001abfb26 Merge pull request #331 from unslothai/fix/slider-fill-alignment
Align slider fill bar with thumb across value range
2026-03-08 16:50:38 +01:00
Roland Tannous
1e39e7d05f fix: handle structured audio part type in chat adapter 2026-03-08 14:18:16 +00:00
Daniel Han
1fe8f9061b Bug fixed version 2026-03-08 06:39:19 -07:00
DoubleMathew
c3b7614bd5 Fix gpt temporary patch for grpo to happen after compile (#4180)
* Fix gpt temporary patch for grpo to happen after compile

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-08 06:14:38 -07:00
pluesclues
f7baf6cc02 Completion mask fix (#4140)
* Refactor loss computation to include completion_mask

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-08 06:05:47 -07:00
Roland Tannous
1435dbaf59 merge nightly into audio branch (mock test) 2026-03-08 10:23:44 +00:00
imagineer99
075bfe961b fix: track live slider values for uncontrolled mode and scope fill to horizontal 2026-03-08 10:06:17 +00:00
Manan17
ef714f010e adding export support 2026-03-08 04:18:20 +00:00
Roland Tannous
ff56a5f785 Merge pull request #318 from unslothai/fix/recharts-dimension-warning
Fix Recharts -1 dimension warning on chart mount
2026-03-08 03:34:58 +04:00
Roland Tannous
aab35f2ed3 Merge pull request #324 from unslothai/feature/subprocess-isolation-version-switching
Subprocess isolation for training, inference, and export with automatic transformers version switching
2026-03-08 03:34:12 +04:00
Roland Tannous
a7c34b42be fix: clear stale model state on failed inference subprocess reload 2026-03-07 23:32:53 +00:00
Roland Tannous
4f766bbe25 fix: reset checkpoint metadata on failed export checkpoint reload 2026-03-07 23:29:34 +00:00
Roland Tannous
1dab9f57b6 fix: validate pip exit codes for .venv_t5 installs in setup.ps1 2026-03-07 23:25:13 +00:00
Roland Tannous
adf0fa6f81 add .venv_t5/ to .gitignore 2026-03-07 23:21:48 +00:00
Manan17
6487f81113 check fir gated repo 2026-03-07 21:32:50 +00:00
Manan17
905f989521 fixing sesame model 2026-03-07 19:06:00 +00:00
Roland Tannous
8454e6dd2b fix: scope dataloader_num_workers=0 to Windows + transformers 5.x only 2026-03-07 17:55:59 +00:00
Roland Tannous
ef9184c731 fix: prevent training hang on Windows by adding triton-windows support 2026-03-07 17:53:36 +00:00
Roland Tannous
e25705a211 fix: propagate PYTHONPATH to child subprocesses, revert tokenizer patching 2026-03-07 11:28:24 +00:00
Roland Tannous
9330588015 fix: patch TokenizersBackend in export output after save_pretrained 2026-03-07 10:57:51 +00:00
Roland Tannous
76c78afb8f fix: patch TokenizersBackend by model name - Qwen3.5→Qwen2Tokenizer, GLM→PreTrainedTokenizer 2026-03-07 10:29:59 +00:00
Roland Tannous
d60cd2843f fix: patch Qwen3.5 broken tokenizer_class TokenizersBackend across all backends 2026-03-07 09:43:25 +00:00
Strahinja Stamenkovic
1db4a013a1 Conditionally enable 4bit on CDNA for bitsandbytes>=v0.49.2 (#4161) 2026-03-07 01:33:40 -08:00
Roland Tannous
29fa91be07 fix: bump transformers to 5.2.0 and pin huggingface_hub in setup.ps1 2026-03-07 09:12:12 +00:00
Roland Tannous
bd60562145 fix: bump transformers 5.x pin from 5.1.0 to 5.2.0 for Qwen3.5 support 2026-03-07 09:10:09 +00:00
Roland Tannous
0b3397cc3a fix: fail fast if runtime pip install of transformers 5.x fails 2026-03-07 08:40:25 +00:00
Roland Tannous
f3aeceeb24 fix: join prior pump thread before starting new training job 2026-03-07 08:37:03 +00:00
Roland Tannous
f7a3092cbd fix: correct project root depth in model_config.py vision check 2026-03-07 08:15:29 +00:00
Roland Tannous
728420b290 fix: drain stale events from resp_queue after generation cancel 2026-03-07 08:12:16 +00:00
samit
3eef1dcfe3 modified the title 2026-03-06 22:33:30 -08:00
Samit
5f902af456 fixed model unload before load 2026-03-06 22:01:27 -08:00
Roland Tannous
25b51fad3b fix: wait for training shutdown before export load, clear stop flag on reset
1. Export route: stop_training() only signals the subprocess — wait up to
   30s for it to actually exit before loading the export checkpoint, avoiding
   a GPU memory race.

2. Training reset: clear _should_stop so /api/train/status returns phase=idle
   instead of staying stuck on phase=stopped after a user-triggered stop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 04:16:10 +00:00
Roland Tannous
f45f1f74c0 fix: add /v1 proxy entry to vite dev server config
Without this, /v1/chat/completions requests in local dev are served by
Vite instead of being proxied to the FastAPI backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 04:09:28 +00:00
Roland Tannous
609e3168a1 fix: serialize generation with _gen_lock to prevent concurrent queue readers
Two overlapping /chat/completions requests could both read from the shared
resp_queue, consuming and dropping each other's token events. Replace the
request_id filtering (which silently dropped non-matching messages) with a
threading.Lock that serializes generation — correct for single-GPU inference.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 04:06:51 +00:00
Roland Tannous
9470957bb9 fix: log final GGUF file locations after relocation 2026-03-06 18:04:27 +00:00
Roland Tannous
5a828ebd43 fix: increase export timeout to 1 hour for large model GGUF conversion 2026-03-06 17:59:42 +00:00
Roland Tannous
4b7ad23b3a feat: broaden Qwen3.5 matching to cover entire family 2026-03-06 16:48:28 +00:00
Roland Tannous
ed1e63c814 feat: add Qwen3.5-35B-A3B and Qwen3-Next to transformers 5.x model list 2026-03-06 10:54:48 +00:00
Manan17
be517cc958 derive effective model type from isVisionModel for dataset search filterin 2026-03-06 08:47:56 +00:00
Shine1i
5e5feb5c00 feat(recipe-studio, validators): tweak OXC validator with lint suppression support and improve error normalization logic 2026-03-06 09:40:53 +01:00
Roland Tannous
d910759121 feat: add OpenAI-compatible /v1/chat/completions endpoint 2026-03-06 07:48:09 +00:00
Manan17
cb3f3f4d0c fixing update model type 2026-03-06 07:44:57 +00:00
imagineer99
15efb0f235 fix: harden chart container sizing with legacy event rechecks 2026-03-06 07:16:36 +00:00
Roland Tannous
c3bc19494f fix: pin huggingface_hub==1.3.0 in .venv_t5 (satisfies transformers 5.x) 2026-03-06 06:19:28 +00:00
imagineer99
005e8ac671 fix: preserve chart sizing updates without ResizeObserver 2026-03-06 06:06:50 +00:00
Roland Tannous
c5f4503b9e fix: unload competing subprocesses before load across all routes 2026-03-06 06:05:31 +00:00
Roland Tannous
6b32af0bdc feat: subprocess-based export, pin huggingface_hub==0.36.0 2026-03-06 06:03:09 +00:00
imagineer99
ad2acbc07a fix: align slider fill bar with thumb across value range 2026-03-06 05:21:44 +00:00
Roland Tannous
b5cfd0952c fix: use subprocess with transformers 5.x for vision detection
Models like GLM-4.7-Flash have architectures (glm4_moe_lite) that
AutoConfig in the main process (transformers 4.57.x) can't recognize.
Instead of a raw config.json workaround, run the AutoConfig check in
a subprocess with .venv_t5/ activated — same pattern as training and
inference workers. This is more robust and consistent.
2026-03-06 04:51:23 +00:00
Roland Tannous
e5c7a18f72 fix: handle unrecognized model architectures in vision detection
AutoConfig.from_pretrained() fails for models needing transformers 5.x
(e.g. glm4_moe_lite) when running with 4.57.x. Add a raw config.json
fallback that bypasses AutoConfig's architecture registry — fetches
config.json directly from local path or HuggingFace Hub and checks
for vision indicators without needing the architecture to be registered.
2026-03-06 04:46:51 +00:00
Roland Tannous
1167be2798 refactor: consolidate version switching to .venv_t5, remove .venv_overlay
All version switching now uses .venv_t5/ (pre-installed by setup.sh).
The old .venv_overlay/ with runtime pip installs is removed.
ensure_transformers_version() (used only by export) now does a
lightweight sys.path swap instead of pip installing at runtime.
2026-03-06 04:37:06 +00:00
Manan17
821ba4936f Fixing dataset split issues 2026-03-06 01:13:55 +00:00
Shine1i
93063c3212 feat(recipe-studio, validators): extend OXC validator with code shape support and integrate into recipe studio 2026-03-06 02:04:05 +01:00
Shine1i
f118216898 feat(recipe-studio): add inference_timeout configuration and validation logic 2026-03-06 00:51:28 +01:00
Shine1i
cf8cb9109b feat(recipe-studio): add support for inference_extra_body configuration with collapsible UI and enhanced validation logic 2026-03-05 23:33:48 +01:00
Roland Tannous
cbe2896705 fix: unload inference model before training to free GPU memory
When starting training, shut down the inference subprocess first
so the training subprocess has full GPU memory available.
2026-03-05 22:28:11 +00:00
imagineer99
5d907b0449 fix: guard recharts ResponsiveContainer behind measured container dimensions 2026-03-05 21:54:27 +00:00
Roland Tannous
31334cece1 fix: indentation error in orchestrator load_model 2026-03-05 19:43:30 +00:00
Shine1i
3cafc0506e feat(data-recipes, validators): extend OXC validator with linting mode support and integrate new modes into recipe studio 2026-03-05 20:19:31 +01:00
Roland Tannous
5bd6fac80e fix: always spawn fresh subprocess per model load
Reusing a subprocess after unsloth patches torch internals causes
inspect.getsource() failures when loading a different model type.
Each load now gets a clean Python interpreter.
2026-03-05 19:15:37 +00:00
Roland Tannous
7fc563731a fix: use mp.Event for instant cross-process generation cancel
Replaces cmd_queue-based cancel polling with a shared mp.Event.
Fixes two issues:
- Loading a new model while generating no longer hangs (cancel is instant)
- Subprocess shuts down cleanly after explicit stop generation
2026-03-05 18:54:17 +00:00
Shine1i
552eb06bed feat(data-recipes, validators): add OXC validator runtime and integration with recipe studio 2026-03-05 19:48:26 +01:00
Roland Tannous
4eabc74f34 feat: subprocess-based inference for transformers version switching
Inference now runs in a persistent subprocess, solving the same
transformers version-switching problem that was fixed for training.
The subprocess stays alive between requests (model in GPU memory)
and is only restarted when switching transformers versions.

New files:
- core/inference/worker.py: subprocess entry point with command loop
- core/inference/orchestrator.py: parent-side proxy with same API

Modified:
- core/inference/__init__.py: exports orchestrator as default backend
- routes/inference.py: removed in-process ensure_transformers_version()
2026-03-05 17:47:57 +00:00
Roland Tannous
1e04149ddf fix: handle None job_id before first training run 2026-03-05 16:59:37 +00:00
Roland Tannous
842c05e75a fix: lazy imports in core/__init__ to prevent subprocess importing ML libs early 2026-03-05 16:56:45 +00:00
Roland Tannous
9696bd557a fix: exclude bitsandbytes from module purge to prevent duplicate operator registration 2026-03-05 16:40:20 +00:00
Roland Tannous
e3a1811c79 fix: remove in-process version switching from models routes 2026-03-05 16:22:32 +00:00
Roland Tannous
878f8f3924 fix: remove UnslothTrainer/get_trainer from core __init__ exports 2026-03-05 15:57:07 +00:00
Roland Tannous
f8bd4303f7 feat: subprocess-based training for transformers version switching 2026-03-05 15:40:32 +00:00
Shine1i
b277308b7e merge: nightly into feature/data-reciper-enchansments 2026-03-05 14:51:08 +01:00
Shine1i
9a5cea201a feat(recipe-studio): runtime edge handling with template refs and reversed edge support 2026-03-05 14:46:48 +01:00
Shine1i
889b3f78a8 refactor(studio): replace inputValue with searchQuery for improved clarity, add input reason tracking, and streamline dataset filtering logic 2026-03-05 14:06:17 +01:00
Shine1i
cc74d01df7 feat(recipe-studio): improve tab switch fit logic with animation and delay support 2026-03-05 13:29:38 +01:00
Shine1i
a66b1678e8 feat(recipe-studio): normalize and slugify run_name, update job naming logic 2026-03-05 12:25:51 +01:00
Shine1i
e30fc87187 refactor(studio): add local data-recipe dataset selection + training wiring 2026-03-05 12:25:51 +01:00
Shine1i
85d92281f3 feat(data-recipes, recipe-studio): refactor and enhance recipe templates with updated model configurations, structure changes, and added validation logic 2026-03-05 12:14:01 +01:00
Shine1i
bffda3a479 feat(recipe-studio): persist advanced collapsible states across components and sessions 2026-03-05 11:56:40 +01:00
Shine1i
9062755e8f feat(data-recipes, recipe-studio): recipies changes, image context selector 2026-03-05 11:46:42 +01:00
Shine1i
337cb4de8d feat(data-recipes): update recipe templates 2026-03-05 11:29:20 +01:00
Manan17
79cc850a50 remove tracked OuteTTS embedded repo reference 2026-03-05 08:44:23 +00:00
Manan17
9909111982 resolved merge conflicts 2026-03-05 07:59:43 +00:00
Manan17
c723f8d4da fix SNAC training crash on variable-length sequences with DataCollatorForSeq2Seq 2026-03-05 07:04:53 +00:00
Roland Tannous
81b4928e99 Merge nightly into feature/transformers-v5-support 2026-03-05 06:49:44 +00:00
Roland Tannous
4e9c248fa8 Merge pull request #314 from unslothai/fix/vlm-dataset-conversion-error-handling-local
Fix VLM training abort on URL-based dataset conversion failure
2026-03-05 10:10:58 +04:00
Roland Tannous
c171573a8f fix: check for http(s) prefix instead of bare string type for URL detection 2026-03-05 06:10:10 +00:00
Roland Tannous
657cdaa151 fix: remove benchmark scripts from git tracking
These are standalone benchmark scripts that were force-added despite being
gitignored. They have no test functions and run network calls at module
level, which breaks pytest collection in CI.
2026-03-05 06:06:47 +00:00
imagineer99
69299c168c feat(data-recipes): add OCR learning recipe template 2026-03-05 00:58:29 +00:00
Roland Tannous
8218cb651e Merge pull request #313 from unslothai/feature/index-range-dataset-slicing
Fix: clear dataset slice state on file upload
2026-03-05 03:48:10 +04:00
Roland Tannous
352fe023a5 fix: clear dataset slice state when switching to uploaded file
Prevents stale slice values from silently truncating uploaded datasets.
2026-03-04 23:42:23 +00:00
Roland Tannous
9ca45826d4 feat: parallel URL image probe with time estimate and progress reporting
- Add 200-sample parallel probe using ThreadPoolExecutor + safe_num_proc
  to estimate download speed and failure rate before full conversion
- Abort with clear error if >=30% of probe images fail to download
- Show estimated download time in the training overlay modal
- Parallel batch conversion for URL-based datasets (vs sequential for local)
- Add warning field to /check-format response for URL-based image datasets
- Display URL warning in dataset preview dialog (amber banner)
- Thread progress_callback from trainer through format_and_template_dataset
  to convert_to_vlm_format for real-time status updates
2026-03-04 23:40:38 +00:00
Roland Tannous
195c1a3ce3 test: add parallel download benchmark with ThreadPoolExecutor 2026-03-04 23:29:43 +00:00
Roland Tannous
f59eaad212 feat: add tqdm progress bar to VLM conversion and download benchmark test 2026-03-04 23:29:43 +00:00
Roland Tannous
50885a7aa3 fix: add early probe to fail fast on datasets with too many broken image URLs 2026-03-04 23:29:43 +00:00
Roland Tannous
fdc23f4a43 fix: use fsspec for URL image downloads with per-sample error handling 2026-03-04 23:29:43 +00:00
Roland Tannous
8039eebcd5 test: add URL image loading comparison script 2026-03-04 23:29:43 +00:00
Roland Tannous
2b704221f7 fix: abort training pipeline on dataset conversion failure 2026-03-04 23:29:43 +00:00
Roland Tannous
929c3e9e1e fix: cast URL image columns to HF Image() type in VLM conversion 2026-03-04 23:29:43 +00:00
Roland Tannous
f55153249d Merge pull request #312 from unslothai/feature/index-range-dataset-slicing
Add index range dataset slicing to Studio training page
2026-03-05 03:25:21 +04:00
Roland Tannous
40f2dc517f fix: remove unnecessary tooltip copy from train split start 2026-03-04 23:24:09 +00:00
Roland Tannous
8199e0d2c0 refactor: move train split slice controls back to Advanced section
Place Train Split Start / End inputs inside the Advanced collapsible
with descriptive tooltips clarifying they slice the training split.
Revert the selectors component to its original eval-split-only layout.
2026-03-04 23:24:09 +00:00
Roland Tannous
5f0559926c refactor: move index range fields next to eval split in 3-col grid
Place Slice Start and Slice End inputs alongside the Eval Split
selector in a single row (grid-cols-3) so the dataset card stays
compact. Remove the duplicate controls from the Advanced section.
2026-03-04 23:24:09 +00:00
Roland Tannous
a80188848d feat: add index range dataset slicing to studio training page
Add Start/End index inputs under Advanced in the dataset card,
allowing users to slice a dataset by row range before training.
Wired end-to-end: frontend store, API payload, backend Pydantic
model, and trainer dataset loading (inclusive on both ends).
2026-03-04 23:24:09 +00:00
Roland Tannous
505487f66a Merge pull request #311 from unslothai/revert-310-feature/index-range-dataset-slicing
Revert "Add index range dataset slicing to Studio training page"
2026-03-05 03:23:31 +04:00
Roland Tannous
91783c0fb2 Revert "Add index range dataset slicing to Studio training page" 2026-03-05 03:21:07 +04:00
Roland Tannous
9f9d480e63 Merge pull request #310 from unslothai/feature/index-range-dataset-slicing
Add index range dataset slicing to Studio training page
2026-03-05 03:20:31 +04:00
Roland Tannous
42ee6fe443 fix: remove unnecessary tooltip copy from train split start 2026-03-04 23:15:49 +00:00
Roland Tannous
7f8c0867d5 refactor: move train split slice controls back to Advanced section
Place Train Split Start / End inputs inside the Advanced collapsible
with descriptive tooltips clarifying they slice the training split.
Revert the selectors component to its original eval-split-only layout.
2026-03-04 23:07:36 +00:00
Roland Tannous
07bbe7bae5 refactor: move index range fields next to eval split in 3-col grid
Place Slice Start and Slice End inputs alongside the Eval Split
selector in a single row (grid-cols-3) so the dataset card stays
compact. Remove the duplicate controls from the Advanced section.
2026-03-04 22:35:17 +00:00
Roland Tannous
11ebea6a4b feat: add index range dataset slicing to studio training page
Add Start/End index inputs under Advanced in the dataset card,
allowing users to slice a dataset by row range before training.
Wired end-to-end: frontend store, API payload, backend Pydantic
model, and trainer dataset loading (inclusive on both ends).
2026-03-04 21:48:40 +00:00
Roland Tannous
880633e42b test: add parallel download benchmark with ThreadPoolExecutor 2026-03-04 14:30:11 +00:00
Roland Tannous
e4ec16296e feat: add tqdm progress bar to VLM conversion and download benchmark test 2026-03-04 13:30:27 +00:00
Manan Shah
950e405c89 Delete studio/TESTING.md 2026-03-04 03:47:19 -07:00
Manan17
a5825f8d44 dynamic detection of audio models and fixing autoencoder issues 2026-03-04 10:44:44 +00:00
imagineer99
6c7d61d70e fix: prevent browser credential autofill in HF token fields 2026-03-04 08:49:17 +00:00
Roland Tannous
5ee9479e37 fix: add early probe to fail fast on datasets with too many broken image URLs 2026-03-04 08:05:40 +00:00
Roland Tannous
722744cf04 fix: use fsspec for URL image downloads with per-sample error handling 2026-03-04 07:50:55 +00:00
Roland Tannous
6ba669c8eb test: add URL image loading comparison script 2026-03-04 07:39:35 +00:00
Roland Tannous
645d7d357a fix: abort training pipeline on dataset conversion failure 2026-03-04 06:42:48 +00:00
Roland Tannous
34fb9ec973 fix: cast URL image columns to HF Image() type in VLM conversion 2026-03-04 06:42:37 +00:00
Roland Tannous
2575b9e37d Merge pull request #305 from unslothai/fix/dropdown-layout-shift
Prevent select dropdowns from shifting layout when opened
2026-03-04 10:19:27 +04:00
Roland Tannous
12a4350a61 Merge pull request #306 from unslothai/fix/hf-dataset-error-message
Sanitize dataset script errors and persist training start error
2026-03-04 10:14:26 +04:00
Roland Tannous
43bf599b33 Remove overly broad .py check from dataset error normalization 2026-03-04 06:13:47 +00:00
Roland Tannous
2d7d3cd27e Merge pull request #287 from unslothai/fix/duplicate-def-inference
Deleted duplicate definitions for load_for_eval, load_adapter, and load_model_simple in core Inference
2026-03-04 10:06:04 +04:00
Roland Tannous
46550ecf24 Merge pull request #289 from unslothai/fix/datasets-auth
Added auth to dataset endpoints
2026-03-04 08:21:42 +04:00
Shine1i
29299d73b8 merge: nightly into feature/data-reciper-enchansments
resolve setup.sh conflict by keeping nightly installer flow and preserving local data-designer plugin install via install_python_stack.py
2026-03-03 22:21:04 +01:00
Shine1i
2473043fe1 feat(recipe-studio): enhance edge synchronization logic with layout direction support 2026-03-03 22:17:50 +01:00
Shine1i
6997919c65 feat(recipe-studio): add support for naming full runs, enhance empty states, and refine UI components 2026-03-03 21:56:37 +01:00
Shine1i
df553fc955 feat(recipe-studio): add authentication to API requests and backend routes 2026-03-03 21:32:39 +01:00
imagineer99
a0f4566173 fix: sanitize dataset script errors and persist training start error 2026-03-03 20:15:23 +00:00
Shine1i
1334b24bea refactor(recipe-studio): update UI components with consistent styling and improved hierarchy 2026-03-03 21:02:57 +01:00
Roland Tannous
50b88bfb34 Updated README 2026-03-03 18:42:35 +00:00
imagineer99
84bb57f208 fix: prevent select scroll-lock margin from shifting layout 2026-03-03 18:37:46 +00:00
Roland Tannous
4ddf59f781 Merge pull request #296 from unslothai/feature/windows-native-support
PR: Windows Native Support + llama.cpp Build Migration
2026-03-03 22:23:35 +04:00
Roland Tannous
7bc235bed2 Merge branch 'nightly' into feature/windows-native-support 2026-03-03 22:23:18 +04:00
Roland Tannous
58b00db5cb chore: add cross-platform Python installer with updated unsloth patch URLs 2026-03-03 17:31:59 +00:00
Roland Tannous
a4d2853fbc fix: align llama-server binary discovery with upstream unsloth-zoo paths 2026-03-03 17:03:01 +00:00
Daniel Han
892caf5eb7 Update _utils.py 2026-03-03 08:29:33 -08:00
Daniel Han
a665c9b57d Also patch accelerate's is_wandb_available for trl callbacks path (#4148)
trl/trainer/callbacks.py imports is_wandb_available from
accelerate.utils, not from transformers. The original fix in #4147
only patched the transformers version, so `from trl import GRPOTrainer`
still crashed via the callbacks.py -> accelerate -> wandb path.

Must patch both the source module (accelerate.utils.imports) AND the
re-export namespace (accelerate.utils) since Python's
`from accelerate.utils import X` reads from the latter, which holds
its own cached reference.
2026-03-03 08:28:55 -08:00
Daniel Han
f4da8c3819 Update _utils.py 2026-03-03 07:14:15 -08:00
Daniel Han
cb13a1fe04 Fix broken wandb import crashing unsloth startup (#4147)
* Fix broken wandb import crashing unsloth startup

When wandb is installed but broken (e.g., wandb < 0.19.11 with
protobuf >= 6.0), the import chain unsloth -> trl -> transformers ->
is_wandb_available() -> import wandb crashes with:

  ImportError: cannot import name 'Imports' from
  'wandb.proto.wandb_telemetry_pb2'

This happens because transformers' is_wandb_available() has no
try/except around `import wandb`. The error propagates up and kills
`from unsloth import FastLanguageModel` even though wandb is optional.

Add disable_broken_wandb() following the same pattern as
disable_torchcodec_if_broken(). It proactively tries importing wandb
during early init, and if the import fails, patches
is_wandb_available() to return False and sets WANDB_DISABLED=true.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-03 07:08:12 -08:00
Datta Nimmaturi
f840119fa4 Fixup mapper issues and resolve properly (#4124)
* Fixup mapper issues and resolve properly

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-03 06:57:25 -08:00
Daniel Han
e238fd14aa Update __init__.py 2026-03-03 06:55:08 -08:00
Daniel Han
9b4a216b57 Update 2026-03-03 06:53:58 -08:00
Mustafa Eyceoz
6762a380e3 Fix multi-node distributed training with single GPU per node (#4143) 2026-03-03 20:15:41 +05:30
Roland Tannous
b1b9262198 fix: update GGUF save paths to use ~/.unsloth/llama.cpp with Windows support (#4138)
* fix: update GGUF save paths to use ~/.unsloth/llama.cpp with Windows support

* fix: quote LLAMA_CPP_DEFAULT_DIR in fallback shell commands to handle paths with spaces

* refactor: deduplicate platform-specific build instructions in quantization error message

* chore: remove accidentally committed PR description file

* Fix import safety and f-string bugs in save.py

- H4: Add defensive try/except for LLAMA_CPP_DEFAULT_DIR and IS_WINDOWS imports
  with fallback defaults, so save.py works even if zoo PR #526 is not merged yet
- H5: Fix Kaggle error path using plain "Error: {e}" instead of f"Error: {e}",
  so the actual exception is shown to users

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-03 06:34:09 -08:00
Lei Zhenyuan
6d42e0a7c8 add intel support for torch210 within pyproject.toml (#4144)
* add intel support for torch210

* fix for typo
2026-03-03 06:33:45 -08:00
Datta Nimmaturi
b7ec64c96f [Fix] lm_head lora save (#4106)
* Fix lm_head lora save

* Fix _need_to_train_embeddings guard for lm_head LoRA targets

When lm_head is already in final_modules as a LoRA target, the
_need_to_train_embeddings block should not also add it to
modules_to_save. This prevents dual-wrapping (LoRA + modules_to_save
on the same module) which causes assertion failures downstream.

Check if embed_tokens/lm_head are already being trained as LoRA
targets before adding them to modules_to_save. Also prevents
duplicate entries with elif guards.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-03 06:30:13 -08:00
金黄色葡萄球君君
1ebf994da1 fix(ROCm): restrict is_rdna() to ROCm-officially-supported RDNA GPUs (#4136)
Current arch.startswith("gfx1") incorrectly matches:
  - RDNA1 (gfx10xx) and RDNA2 (gfx103x): not ROCm supported
  - gfx1102 (RX 7600), gfx1103 (Phoenix APU): not in ROCm support matrix
  - gfx1150/1151/1152 (RDNA3.5 APUs): not in ROCm support matrix

Replace with explicit whitelist aligned to the ROCm Linux support matrix:
  https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html

  gfx1100 - RDNA3 discrete (RX 7900 series, PRO W7900/W7800)
  gfx1101 - RDNA3 discrete (RX 7800/7700 series, PRO W7700)
  gfx1200 - RDNA4 discrete (RX 9060 series)
  gfx1201 - RDNA4 discrete (RX 9070 series, AI PRO R9700)

Mirrors the existing is_cdna() pattern. Avoids silently applying
unverified Triton kernel tuning to unsupported hardware.
2026-03-03 03:05:38 -08:00
金黄色葡萄球君君
5e781900fb Revert "perf(ROCm): optimize chunked CE loss num_warps for RDNA GPUs (#4123)" (#4139)
This reverts commit 721bf4852a.
2026-03-03 03:05:32 -08:00
Shine1i
0166cb6d38 feat(recipe-studio): add HF repo ID inference and reset logic for HF state 2026-03-03 11:37:49 +01:00
Shine1i
c88cce8185 refactor(seed): package unstructured seed reader as local Data Designer plugin 2026-03-03 11:22:04 +01:00
Shine1i
95dd202ab3 merge nightly into feature/data-reciper-enchansments 2026-03-03 11:14:18 +01:00
Shine1i
bdc825298d feat(seed): backend unstructured seed reader + server-side chunking, remove client chunk splitter 2026-03-03 11:11:26 +01:00
Roland Tannous
bded396923 Merge pull request #297 from unslothai/fix/fix-pip-issues
fix: make pip check non-fatal and install jedi for Colab compatibility
2026-03-03 13:35:29 +04:00
Manan17
f04c684d8a variable changes and some cleanup 2026-03-03 09:35:11 +00:00
Roland Tannous
f190d5a16d fix: make pip check non-fatal and install jedi for Colab compatibility 2026-03-03 09:34:35 +00:00
Shine1i
7d35463abc feat(recipe-studio): add execution progress island and collapsible advanced options for validators 2026-03-03 10:34:32 +01:00
Michael Han
59f7a9006a Qwen3.5 Update.md
Updated with Qwen3.5 Small models
2026-03-02 23:33:22 -08:00
pre-commit-ci[bot]
2089c158a7 [pre-commit.ci] pre-commit autoupdate (#4141)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.15.2 → v0.15.4](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.2...v0.15.4)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-02 21:48:36 -08:00
Etherll
65f212b940 Add Qwen 3.5 to FORCE_FLOAT32 (#4134)
* Add Qwen3.5 to FORCE_FLOAT32

* fix vision encoder dtype mismatch

* revert vision cast changes
2026-03-02 13:36:28 -06:00
Roland Tannous
87f2b2a9db Merge branch 'nightly' into feature/support-for-audio-models 2026-03-02 15:55:25 +04:00
Roland Tannous
c64e50b46f Patch unsloth-zoo llama_cpp.py and unsloth save.py from windows-support branch 2026-03-02 10:45:09 +00:00
Roland Tannous
e280e457d1 Move llama.cpp clone/build from in-tree to ~/.unsloth/llama.cpp
- setup.sh: builds at ~/.unsloth/llama.cpp instead of ./llama.cpp
- setup.ps1: builds at %USERPROFILE%/.unsloth/llama.cpp
- inference llama_cpp.py: searches ~/.unsloth/ first, in-tree as legacy
- export.py: updated comments (unsloth-zoo handles path natively)
2026-03-02 04:04:41 +00:00
DoubleMathew
a835b266ef Fix auto padding free logic to respect user passed False (#4128)
* Fix auto padding free logic to respect user passed

* Update unsloth/trainer.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-01 19:30:47 -08:00
Wasim Yousef Said
dc7976c534 Merge pull request #294 from unslothai/fix/navbar-center-tabs-shift
Prevent navbar tab shift when navigating across pages
2026-03-01 15:42:19 +01:00
Wasim Yousef Said
2df2d671ce Merge pull request #290 from unslothai/fix/model-dropdown-visual-consistency
Standard OOM/TIGHT model status indicators across model dropdowns
2026-03-01 15:41:39 +01:00
Roland Tannous
674cc67d78 Tighten Python bounds to >= 3.11, < 3.14 (matching setup.sh), only auto-install if missing 2026-03-01 13:05:10 +00:00
Roland Tannous
d5644d2d0d Add Python 3.12 prerequisite check with auto-install via winget 2026-03-01 13:05:10 +00:00
Roland Tannous
0267ba0a18 Auto-enable Windows Long Paths via UAC elevation during setup 2026-03-01 13:05:10 +00:00
Roland Tannous
6536bfb33b Remove unused CMP0194 cmake policy (eliminates cmake warning) 2026-03-01 13:05:10 +00:00
Roland Tannous
453f423d22 Simplify: use winget OpenSSL.Dev instead of vcpkg for HTTPS support 2026-03-01 13:05:10 +00:00
Roland Tannous
2e102b683e Add vcpkg/curl[ssl] for HTTPS support in llama-server, enable LLAMA_CURL=ON 2026-03-01 13:05:10 +00:00
Roland Tannous
6e5a3d1744 Download GGUF via huggingface_hub instead of llama-server -hf (fixes HTTPS not supported on Windows) 2026-03-01 13:05:10 +00:00
Roland Tannous
9eb0ff074b Add .venv/Scripts to User PATH so unsloth-studio works without activation 2026-03-01 13:05:10 +00:00
Roland Tannous
8e22b16bd8 Simplify completion banner: no venv activation needed 2026-03-01 13:05:10 +00:00
Roland Tannous
12867f701b Auto-add CUDA DLLs to PATH when launching llama-server on Windows 2026-03-01 13:05:10 +00:00
Roland Tannous
d79fe439ed Warn user to uninstall incompatible CUDA toolkit instead of failed side-by-side 2026-03-01 13:05:10 +00:00
Roland Tannous
b22c5b6ed8 Fallback: try descending CUDA versions if exact driver-max install fails 2026-03-01 13:05:10 +00:00
Roland Tannous
7b4d074857 Always persist compatible CUDA_PATH to User registry (overwrite stale values) 2026-03-01 13:05:10 +00:00
Roland Tannous
7576552717 Fix: scan side-by-side CUDA installs, pick compatible toolkit version 2026-03-01 13:05:10 +00:00
Roland Tannous
3521de7040 Build llama.cpp in-tree, auto-detect driver CUDA version for compatible toolkit 2026-03-01 13:05:10 +00:00
Roland Tannous
8d272ff8d5 Auto-detect driver CUDA version, install compatible toolkit instead of latest 2026-03-01 13:05:10 +00:00
Roland Tannous
bccbd26f3a Fix non-ASCII chars in test script for Windows PS 5.1 2026-03-01 13:05:10 +00:00
Roland Tannous
1684e48b1e Add llama-cpp Windows test script, fix binary lookup paths 2026-03-01 13:05:10 +00:00
Roland Tannous
f036a70681 Fix llama-server binary lookup for Windows (.exe, Release dir, ~/.unsloth) 2026-03-01 13:05:10 +00:00
Roland Tannous
7e021886c8 Force num_proc=1 on Windows to avoid slow spawn overhead 2026-03-01 13:05:10 +00:00
Roland Tannous
bd7c17708b Set short TORCHINDUCTOR_CACHE_DIR to fix Windows MAX_PATH crash 2026-03-01 13:05:10 +00:00
Roland Tannous
e1cc5e61b1 Fix npm Invalid Version: delete package-lock.json, relax Node constraint 2026-03-01 13:05:10 +00:00
Roland Tannous
aba3d8e29b Enforce Node LTS (v20-v22), add npm error checking, clean node_modules 2026-03-01 13:05:10 +00:00
Roland Tannous
2dfe0abaa1 Fix npm stderr crash on Windows ErrorActionPreference 2026-03-01 13:05:10 +00:00
Roland Tannous
662a1eb9d5 Fix Windows frontend build, add setup.bat, ANSI colors, aliases 2026-03-01 13:05:10 +00:00
Roland Tannous
783f0caf5f add setup.bat 2026-03-01 13:05:10 +00:00
Roland Tannous
28bac1859a Extract shared install_python_stack.py for cross-platform setup 2026-03-01 13:05:10 +00:00
Roland Tannous
4aba18375b Merge pull request #295 from unslothai/fix/fix-local-vision-gguf-loading
fix: support mmproj for local vision GGUF models + fix Windows pipe d…
2026-03-01 17:03:24 +04:00
Roland Tannous
ff93c97024 fix: support mmproj for local vision GGUF models + fix Windows pipe deadlock 2026-03-01 12:58:38 +00:00
Shine1i
761c84f92b feat(recipe-studio): introduce validator blocks for code validation with Python and SQL engines 2026-03-01 13:01:00 +01:00
Shine1i
891739a56a feat(recipe-studio): add LLM trace modes and reasoning content extraction support 2026-03-01 12:01:48 +01:00
Shine1i
b7ee065ffd refactor(recipe-studio): add image preview support for dataset and LLM configurations p2 2026-03-01 11:21:10 +01:00
Shine1i
c3c65cded8 feat(recipe-studio): add image preview support for dataset and LLM configurations p1 2026-03-01 10:57:51 +01:00
Shine1i
4d718db5a0 feat(recipe-studio): auto-fit editor viewport on tab switch, track manual viewport adjustments 2026-03-01 10:21:02 +01:00
Daniel Han
54119f2060 rl: guard warnings_issued before TRL estimate_tokens write (#4034)
Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-03-01 00:42:37 -08:00
金黄色葡萄球君君
afd5b687aa Fix global dequantize buffer dtype mismatch across mixed-precision loads (#4026)
Fix global dequantize buffer dtype mismatch when loading multiple 4-bit models with different dtypes in the same process. Adds dtype check alongside existing None check for WEIGHT_BUFFER in both CUDA/HIP and XPU paths.
2026-03-01 00:15:47 -08:00
Manan17
c636fd5a42 code cleanup 2026-03-01 08:04:38 +00:00
金黄色葡萄球君君
721bf4852a perf(ROCm): optimize chunked CE loss num_warps for RDNA GPUs (#4123)
Use 16 warps for RDNA in the chunked cross-entropy forward kernel
(large vocab > 65536), matching the existing CDNA optimization.

Benchmarked on W7900 (gfx1100) with actual unsloth kernels (5 trials, median):
  - Chunked CE forward (BS=65536): 16 warps = 2.4-2.6x faster than 32
  - All other kernels (LayerNorm, RoPE, SwiGLU): default heuristic is
    already optimal for RDNA; no modification needed.

Depends on: #4109 (provides is_rdna() detection)
2026-02-28 23:59:34 -08:00
金黄色葡萄球君君
48e8f78042 fix(ROCm): prevent false TMA support detection on AMD GPUs (#4126)
TMA (Tensor Memory Accelerator) is an NVIDIA Hopper+ feature that does
not exist on AMD GPUs.  However, _check_tma_support() incorrectly
returns True on ROCm because:

1. torch.cuda.get_device_capability() returns (11, 0) for gfx1100,
   satisfying the >= 9 check intended for Hopper (sm_90).
2. ROCm Triton exports tl.make_tensor_descriptor (the symbol exists
   even though the hardware does not support TMA).

This would cause MoE grouped_gemm to attempt TMA operations on AMD
GPUs, leading to runtime failures.

Fix: early-return False for HIP devices, matching the existing XPU
guard.
2026-02-28 23:59:27 -08:00
金黄色葡萄球君君
17795e4f14 fix(Triton): ensure float32 eps in RMS LayerNorm rsqrt for HIP/ROCm (#4110)
* fix(Triton): ensure float32 eps in RMS LayerNorm rsqrt for HIP/ROCm

On HIP (AMD ROCm), Triton constexpr eps may not promote to float32
in rsqrt, causing numerical instability (NaN/Inf) on RDNA GPUs
(gfx1100, gfx1151 Strix Halo, etc.).

Use tl.full((), eps, tl.float32) to explicitly create a float32
scalar before adding to row_var in rsqrt. Applied to both standard
and Gemma RMS LayerNorm forward kernels.

Tested on W7900 (gfx1100): full test suite passed (dim 512-2048,
bf16/fp16, various seqlen).

Related: #3385, #3588

* Apply same float32 eps fix to layernorm.py for PR #4110

layernorm.py has the identical tl.constexpr eps pattern in
layernorm_forward that can misfire on HIP/ROCm. Apply the same
tl.full((), eps, tl.float32) fix for consistency.

Both testing_suite_layernorm (standard LayerNorm) and
testing_suite_layernorm (RMS LayerNorm) pass on NVIDIA after
this change.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-02-28 23:59:22 -08:00
金黄色葡萄球君君
8a8dcd48dd fix(ROCm): Comprehensive RDNA GPU support - fix Gemma3 NaN & add is_rdna() (#4109)
* fix(ROCm): comprehensive RDNA GPU support - fix Gemma3 NaN & add is_rdna()

- Add is_rdna() detection for RDNA3/3.5/RDNA4 consumer GPUs (gfx11xx, gfx1151, gfx12xx)
- Disable torch.compile for Gemma3 on HIP to fix NaN loss (fixes #3385, #4029)
- Export is_cdna/is_rdna from kernels for downstream use
- Import is_rdna into cross_entropy_loss for future RDNA-specific tuning

Tested on AMD Radeon PRO W7900 (gfx1100) with ROCm 7.1:
  ✓ Gemma3-1B: loss 3.37→3.25 (no NaN)
  ✓ Llama-3.2-1B: loss 2.44→2.37 (no NaN)
  ✓ Qwen2.5-1.5B: loss 1.89→1.85 (no NaN)
  ✓ RMS LayerNorm Triton kernel: bf16/fp16 PASSED
  ✓ Cross Entropy Loss Triton kernel: 32K/256K vocab PASSED

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review: scope compile disable to RDNA only, use partial mode, remove unused import

Changes based on Daniel's review:
1. (HIGH) Replace DEVICE_TYPE=='hip' with is_rdna() to avoid disabling
   torch.compile on CDNA GPUs (MI250X/MI300X/MI350) where it works fine
2. (MEDIUM) Use 'partial' instead of '1' for UNSLOTH_COMPILE_DISABLE to
   only disable model forward compilation while keeping loss compilation,
   matching the existing Sesame pattern
3. (LOW) Remove unused is_rdna import from cross_entropy_loss.py (F401)

* Remove redundant is_cdna/is_rdna exports from kernels/__init__.py

These functions are imported directly from .utils where needed
(e.g. cross_entropy_loss.py, loader.py). No external code imports
them from the unsloth.kernels namespace.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-28 23:59:17 -08:00
金黄色葡萄球君君
a3a1c3457f fix(ROCm): remove fix_rocm_triton_key_error — based on a false premise (#4125)
The function (introduced in #3923) assumed that the absence of
`triton.runtime.triton_key` on ROCm means torch.compile will crash.
Investigation shows this is incorrect:

1. `triton.runtime.triton_key` was renamed/removed in the ROCm Triton
   fork — it does not exist at that path.  However,
   `triton.compiler.compiler.triton_key` (the path torch._inductor
   actually imports) EXISTS and works correctly on ROCm.

2. Both call-sites in torch._inductor (codecache.py and
   async_compile.py) already wrap the import in try/except, so even a
   genuinely missing triton_key would be handled gracefully.

3. Comprehensive testing on ROCm 7.1 + Triton 3.4.0 + gfx1100 confirms
   torch.compile works correctly for matmul, cross-entropy, RMSNorm,
   multi-layer transformer forward+backward, and LoRA — all without
   triton.runtime.triton_key.

The original code was also ineffective (environment variables set after
torch import have no effect on torch._dynamo config), so removing it
has zero behavioral change on existing installations.

Supersedes the compile-disable portion of #3923.
2026-02-28 23:59:12 -08:00
imagineer99
471fc8fd90 fix: prevent navbar tab shift when navigating across pages 2026-03-01 03:02:49 +00:00
Manan17
c48437848d revamping up the code and adding inference 2026-03-01 02:30:31 +00:00
Manan17
ab2ac39017 Changes with audio training 2026-03-01 02:27:45 +00:00
Manan17
ac27edde35 merging with nightly 2026-03-01 02:27:45 +00:00
imagineer99
42f5ba5fcc fix: standardize OOM/TIGHT model status indicators across model dropdowns 2026-03-01 00:02:15 +00:00
samit
ece51dca13 updated fetch to auth fetch in the frontend 2026-02-28 02:10:12 -08:00
samit
d07397c81e added auth to dataset endpopints 2026-02-28 01:17:43 -08:00
Shine1i
ce33d673b9 feat(markdown): fix Mermaid integration with error handling and copy button 2026-02-27 20:02:40 +01:00
samit
862b4100d2 deleted duplicate definitions 2026-02-27 06:00:28 -08:00
Roland Tannous
89d0a98192 Merge pull request #284 from unslothai/fix/drop-model-task-hard-filter
Apply HF task filtering only for empty model queries
2026-02-27 15:46:17 +04:00
imagineer99
49c319c2f7 fix: only apply HF task filter for empty model search queries 2026-02-27 11:31:53 +00:00
Wasim Yousef Said
ad7c1ceb40 Merge pull request #268 from unslothai/fix/delete-custom-config
Updated the delete custom preset in chat tab (filters)
2026-02-27 01:38:01 -08:00
Shine1i
6d74943dea keep checkpoint on preset apply 2026-02-27 10:36:07 +01:00
Wasim Yousef Said
7c364b594c Merge pull request #269 from unslothai/fix/fine-tuned-model-tooltip
Added tooltip for the fine tuned models in chat page
2026-02-27 01:34:49 -08:00
Wasim Yousef Said
5cd3d85882 Merge pull request #262 from unslothai/fix/truncated-text
Reduced name truncation on the training page
2026-02-27 01:33:05 -08:00
Shine1i
3c728f5eb3 merge nightly 2026-02-27 10:31:37 +01:00
Roland Tannous
6e12e2536d Merge pull request #267 from unslothai/fix/config-switch
Preserving model name during configuration type switch in chat page
2026-02-27 13:23:29 +04:00
Roland Tannous
5be7ae925d Merge pull request #266 from unslothai/fix/dataset-search-remove-size-download-badges
Remove dataset metadata badges from HF dataset dropdowns
2026-02-27 13:22:49 +04:00
Roland Tannous
cef36ee8e8 Merge pull request #282 from unslothai/fix/inference-auth
Added auth to inference endpoints
2026-02-27 13:18:38 +04:00
Roland Tannous
32d68b99e3 Merge pull request #278 from unslothai/fix/reorder-model-type-cards-onboarding
Reorder model type cards in onboarding to show Text first
2026-02-27 13:17:00 +04:00
Roland Tannous
90b06063d6 Merge pull request #275 from unslothai/fix/show-size-gguf
Passes metadata to get model size
2026-02-27 13:15:38 +04:00
Manan17
168957a87a Aggregating sharded models, showing fit/oom for quantizations 2026-02-27 08:23:15 +00:00
samit
b18a14d369 added auth to inference endpoints 2026-02-27 00:20:36 -08:00
Daniel Han
9248091500 Fix Whisper auto_model mapping fallback for concrete model classes (#4115) 2026-02-26 23:43:57 -08:00
Manan17
2fea4cadd3 Passes metadata to get model size 2026-02-27 07:38:55 +00:00
Roland Tannous
24c931c374 Merge pull request #276 from unslothai/fix/rebuild-llamacpp-setup
rebuild llama cpp for setup
2026-02-27 10:20:32 +04:00
imagineer99
4aecfb80d4 fix: reorder model type cards in onboarding to show Text first 2026-02-27 06:16:26 +00:00
Manan17
cd7bdf4224 rebuild llama cpp for setup 2026-02-27 04:35:52 +00:00
Daniel Han
d9089de0f7 Guard Gemma3N variants from flex attention defaults (#4116) 2026-02-26 17:48:38 -08:00
Daniel Han
7c68ec439f Update README.md (#4119) 2026-02-26 09:18:29 -08:00
Daniel Han
618ac74ae0 Update README.md (#4118) 2026-02-26 08:06:21 -08:00
Shine1i
bf594c89de refactor(recipe-studio): split page logic into graph/runtime hooks + floating run controls 2026-02-26 16:42:35 +01:00
Roland Tannous
9a35f26307 Merge branch 'nightly' 2026-02-26 19:38:49 +04:00
Shine1i
4c97591e4c fix(recipe-studio): prevent stale empty fitView from offsetting first block zoom 2026-02-26 16:23:27 +01:00
Shine1i
b7edf4e3cd refactor(recipe-studio): simplify runtime graph flow + guard stale active execution lock p2 2026-02-26 15:37:48 +01:00
Shine1i
8a996afbfb feat(recipe-studio): add live execution graph state (active flows, node status, editor lock) p1 2026-02-26 15:27:46 +01:00
Shine1i
7ed6ad1e0c fix(recipe-studio): support user.* refs validation + toggle user badge details; style user refs/node amber 2026-02-26 14:27:36 +01:00
Wasim Yousef Said
77502494d5 Merge pull request #272 from unslothai/feature/data-reciper-enchansments
feat(recipe-studio): UX + layout polish & WIP data-reciper client & backend finalization p1
2026-02-26 05:10:37 -08:00
Shine1i
00a869f837 refactor(data-recipe): centralize json+stage constants, tighten parser/errors, sync seed ui 2026-02-26 14:06:53 +01:00
Shine1i
e4b64f3cd5 refactor(data-recipe): split recipe backend routes for readability (seed/validate/jobs) 2026-02-26 14:05:32 +01:00
Shine1i
aaf62095fe feat(recipe-studio): add jinja ref validation UI for llm/expression fields 2026-02-26 13:35:07 +01:00
Shine1i
1e3d50f876 feat(recipe-studio): add block sheet search + clearer icons in sheet 2026-02-26 12:55:09 +01:00
Shine1i
81a7e38aa6 chore: simplify recipe drag payload parsing 2026-02-26 12:48:04 +01:00
Shine1i
9ee7633bc1 feat(recipe-studio): add sidebar drag-drop block creation + spawn sheet added blocks at viewport center 2026-02-26 12:45:38 +01:00
Shine1i
04d6f5e67b feat(recipe-studio): sanitize shared seed payload + add inline seed UX with HF search 2026-02-26 12:23:05 +01:00
Shine1i
75857cdcea feat(recipe-studio): polish import/llm editors, refs preview, copy toast, and note layout behavior 2026-02-26 11:44:57 +01:00
Shine1i
a53d8cb272 fix(recipe-studio): preserve note positions during auto-layout and fit workflow only 2026-02-26 11:30:24 +01:00
Shine1i
d1047646a9 feat(recipe-studio): optimize model infra auto-layout handles and centering 2026-02-26 11:07:07 +01:00
Shine1i
dd3e1e7293 refactor(recipe-studio): simplify aux node graph logic and remove dead handle/sync code 2026-02-26 10:43:26 +01:00
Shine1i
48f7d40e87 feat(ui): normalize recipe dialogs + chip/category UX polish 2026-02-26 10:13:01 +01:00
Shine1i
1de4b73244 fix: recipe studio dialog combobox click-select + simplify model provider form 2026-02-26 10:03:47 +01:00
Roland Tannous
cbbccefcdf Merge pull request #271 from unslothai/fix/setup-python-version-bounds
fix(setup): enforce Python >= 3.11 and < 3.14 version bounds
2026-02-26 12:13:32 +04:00
Michael Han
e8ae589e84 Qwen3.5 update.md 2026-02-25 23:56:48 -08:00
Roland Tannous
e81516320d fix(setup): restrict Python to >=3.11 and <3.14
Adds lower bound (>= 3.11) and tightens upper bound (< 3.14) for
Python version discovery in setup.sh. Extracts bounds into
MIN_PY_MINOR / MAX_PY_MINOR variables for easy future updates.
2026-02-26 11:54:16 +04:00
Roland Tannous
28e0218263 Merge pull request #270 from unslothai/fix/gguf-export-relocation
Fix GGUF exports saving to wrong directory and missing from chat model selector
2026-02-26 11:48:15 +04:00
Roland Tannous
ed18f9b9dd Flatten GGUF subdirs in export and fix metadata lookup in scanner 2026-02-26 11:35:04 +04:00
Roland Tannous
90f012a444 Write export metadata for GGUF exports to fix Unknown base model 2026-02-26 11:24:32 +04:00
Roland Tannous
2ce63f09c4 Add gguf to toLoraSummary inline type 2026-02-26 10:59:39 +04:00
Roland Tannous
1b822a943c Revert "Add gguf to frontend export_type unions"
This reverts commit 782af39949.
2026-02-26 10:56:26 +04:00
Roland Tannous
782af39949 Add gguf to frontend export_type unions 2026-02-26 10:54:49 +04:00
Roland Tannous
609ae4809a Merge pull request #229 from unslothai/feat/dataset-list-sorting
Feat: Sort and filter dataset search results by model type relevance
2026-02-26 10:40:02 +04:00
Roland Tannous
ea9b22000e Merge pull request #245 from unslothai/fix/datetime-utc-python39-compatibility
fix: replace datetime.UTC with timezone.utc for Python 3.9+ compatibility
2026-02-26 10:37:01 +04:00
imagineer99
852dff564e feat: added datasets of size 5M and 10M to pretraining size category 2026-02-26 06:32:45 +00:00
imagineer99
6e535ed0eb fix: filter OCR datasets from non-vision hub results 2026-02-26 06:27:52 +00:00
Roland Tannous
e0127c0d4c Merge branch 'nightly' 2026-02-26 10:25:41 +04:00
Roland Tannous
808f4655c9 Merge pull request #243 from unslothai/fix/setup-unbound-variable
resolved unbound variable error
2026-02-26 10:18:21 +04:00
samit
04aee4a4c6 updated to make the delete preset work 2026-02-25 21:36:50 -08:00
imagineer99
1c55e2fbaa fix: remove dataset metadata badges from HF dataset dropdowns 2026-02-26 03:57:55 +00:00
samit
7bc752d5f9 passed checkpoint as a parameter to presets 2026-02-25 18:08:02 -08:00
Wasim Yousef Said
2f84bbf0e0 Merge pull request #264 from unslothai/fix/attachment-tsx-type-error
fix(attachment): replace never exhaustive check to fix Colab TS2322 b…
2026-02-25 17:19:44 -08:00
Leo Borcherding
62c6fd9f46 fix(attachment): replace never exhaustive check to fix Colab TS2322 build error
`attachment.type` resolves to `string & {}` via @assistant-ui/store@0.1.6's
generic type chain when installed through npm (package-lock.json), breaking
the `const _exhaustiveCheck: never = type` exhaustive check pattern.

Replace with a direct throw that compiles cleanly across library versions
while preserving identical runtime behaviour.

Fixes #263

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 17:56:02 -06:00
Daniel Han
3fc6cfd32d Fix transformers v5 RoPE inv_freq corruption and generate() BatchEncoding compat (#4112)
* Fix transformers v5 RoPE inv_freq corruption during model loading

Transformers v5 initializes models on the meta device, then
_move_missing_keys_from_meta_to_device() replaces all non-persistent
buffers with torch.empty_like() (uninitialized memory). Vanilla
transformers restores inv_freq via _init_weights() checking for
original_inv_freq, but Unsloth's LlamaRotaryEmbedding subclasses
lack this attribute, so inv_freq stays corrupted with garbage values.

This caused 5-11x higher training loss on transformers v5 for all
models using Unsloth's rope (Llama 3.x, Qwen3, Mistral, TinyLlama,
Granite). Models using native transformers rope (Gemma, Phi-4,
Falcon-H1) were unaffected.

The fix recomputes inv_freq from the stored base/dim after model
loading, applies model-specific scaling via _apply_inv_freq_scaling(),
and rebuilds cos/sin caches. Also handles LongRopeRotaryEmbedding
(Phi-3.5 style short/long inv_freq). Guarded by transformers >= 5.0.0
so it is a no-op on v4.

Tested on: Llama 3.1 8B, Llama 3.2 3B, Qwen3 14B, Qwen3 4B, Phi-4,
TinyLlama, Mistral 7B, Gemma2 2B, Falcon-H1 -- all v5 losses now
match v4 baselines to < 0.004 absolute difference.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Unpack BatchEncoding in generate() for v4/v5 backwards compatibility

Old notebooks pass the full tokenizer output as input_ids:

    inputs = tokenizer(..., return_tensors="pt").to("cuda")
    model.generate(input_ids=inputs, ...)

This worked on transformers v4 because generate() internally
extracted the tensor. Transformers v5 calls .shape on input_ids
directly, which crashes since BatchEncoding has no .shape attribute.

Fix: in unsloth_fast_generate(), detect when input_ids is a dict-like
object (BatchEncoding) and unpack its contents into separate kwargs
before forwarding to the underlying generate(). This makes both old
and new notebook patterns work on both v4 and v5.

* Remove redundant seen_ids dedup in _fix_rope_inv_freq

named_modules() already deduplicates with remove_duplicate=True (default).
Also clarify that native v5 rotary classes (Gemma3 etc.) have original_inv_freq
which transformers v5's _init_weights() uses to restore inv_freq, so they do
not need this fix.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-25 08:18:45 -08:00
DoubleMathew
6d0f864369 Fix/pr 3699 leftpad prefill main (#4100)
* Fix left-padding masks and positions in batched decode/prefill

* Fix batched generation with left padding

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix attention mask handling, padding_idx zeroing, and Mistral batched generation

1. attention_dispatch.py: Fall back from flash/xformers to SDPA when an
   attention_mask is present, since flash attention only supports causal
   masking via flag and cannot consume arbitrary padding masks.

2. gemma2.py: Apply attention_mask during decode inference for bsz > 1.
   Guard against boolean SWA/GA flags with isinstance check. Slice mask
   to match K/V length when sliding window is active. Remove dead
   commented-out SDPA branch (SDPA does not support softcapping).

3. granite.py: Apply attention_mask during decode inference for bsz > 1.
   Remove dead commented-out SDPA branch and misleading comment.

4. mistral.py: Fix 2D-to-4D padding mask conversion -- convert 0/1 mask
   to additive format (0 for keep, -inf for mask) before combining with
   the causal mask. Force SDPA backend when attention_mask is present.

5. llama.py: Skip zeroing embed_tokens.weight[padding_idx] when the
   embedding is weight-tied to lm_head, since zeroing the shared weight
   forces logit(pad) = 0 which is higher than real token logits in models
   like Gemma, causing the decoder to emit pad tokens as gibberish. Also
   add eos != pad guard, clean up unused _seq_length variable, and fix
   get_max_cache_shape handling.

6. vision.py: Same padding_idx fix as llama.py for the vision model
   loading path.

Tested on gemma-2b-it, gemma-2-2b-it, Llama-3.2-1B, Mistral-7B-v0.3,
Qwen2.5-0.5B, Qwen3-0.6B with flash-attn 2.8.3 active. All outputs
coherent, zero crashes, zero resize warnings.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Inference path optimizations: eliminate per-layer GPU-CPU sync, cache inspect.signature, add Granite SDPA split

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* More inference path optimizations across model files

- gemma: hoist rotary_seq_len computation to model level (eliminates N
  per-layer GPU-CPU syncs from position_ids.max().item()), pre-convert
  attention mask to bool once for all layers, use scalar float multiply
  instead of torch.tensor allocation for embedding scaling
- gemma2: use in-place tanh_() for softcap attention, use scalar float
  multiply for embedding scaling
- granite: pre-convert attention mask to bool once for all layers
- cohere: use in-place neg_() for rotary embedding (consistent with
  all other model files)
- falcon_h1: use in-place mul_() for key_multiplier scaling
- llama: use in-place tanh_() for logit softcapping

* Revert scalar multiply for Gemma/Gemma2 embedding scaling

The original torch.tensor(..., dtype=hidden_states.dtype) is intentional:
sqrt(3072) rounds to 55.5 in bfloat16 vs 55.4256 in float32. A plain
scalar multiply may compute at higher precision internally, producing
different results. Restore the explicit dtype-cast tensor to match the
training path in LlamaModel_fast_forward.

* Fix hardcoded cuda:0 device strings and add Cohere .eq(0) bool mask

Replace 15 hardcoded "cuda:0" with f"{DEVICE_TYPE_TORCH}:0" across
gemma.py, gemma2.py, cohere.py, and falcon_h1.py to support multi-GPU
and non-CUDA devices (XPU, etc.). Add .eq(0) bool mask pre-conversion
in CohereModel_fast_forward_inference for batched inference consistency
with llama.py, granite.py, and gemma.py.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Disable flex_attention for Mllama (Llama 3.2 Vision)

Mllama's _update_causal_mask uses the deprecated make_flex_block_causal_mask
which creates a BlockMask with Q_LEN=KV_LEN=total_seq_len. During decode
with KV cache, q_len=1 but the block_mask still has Q_LEN=total_seq_len,
causing a ValueError. This is an upstream transformers issue -- newer models
use flex_attention_mask from masking_utils which handles decode correctly
via cache_position, but mllama has not been updated yet.

Add mllama to the exclusion list in prefer_flex_attn_if_supported alongside
gpt_oss so it falls back to sdpa, which works correctly for both training
and inference.

* Fix off-by-one in sliding window K/V slicing for gemma2, qwen3, falcon_h1, cohere

The old formula `slicing_tokens = 1 - sliding_window` uses negative indexing
that keeps `sliding_window - 1` tokens instead of `sliding_window`. For example
with sliding_window=32 and kv_seq_len=100, `1-32 = -31` keeps indices 69..99
(31 tokens) instead of the correct 68..99 (32 tokens).

Replace with `start = kv_seq_len - sliding_window` to match the fix already
applied in llama.py and the canonical definition in transformers masking_utils
(sliding_window_overlay: kv_idx > q_idx - W, which keeps exactly W tokens).

Also add attention_mask slicing after K/V trim in qwen3, falcon_h1, and cohere
to prevent mask/K dimension mismatch during batched SDPA inference, matching
the pattern already used in llama.py.

Currently only gemma2 (sliding_window=4096) is actively affected. The other
three models have sliding_window=None in their configs so the code path is
not triggered, but this keeps it correct for any future models that set it.

* Fix Gemma2 softcapping order: apply mask after softcap, not before

The attention mask must be applied AFTER logit softcapping, not before.
Both the Google DeepMind reference implementation (google-deepmind/gemma,
gm/nn/_modules.py lines 254-277) and transformers' eager_attention_forward
(gemma2/modeling_gemma2.py lines 187-193) use this order:

  1. logits = Q @ K^T * scale
  2. logits = tanh(logits / softcap) * softcap   # softcap first
  3. logits = logits + mask                       # mask after
  4. probs  = softmax(logits)

The PR had the mask addition before softcapping, which causes tanh to
clamp the -inf mask values to -softcap instead of preserving them as -inf
for softmax. While the practical impact is small (masked positions get
~1e-23 probability instead of exact zero), this should match upstream.

* Clarify GQA condition precedence and remove stale comments

Add explicit parentheses to grouped query attention conditions in
llama.py, qwen3.py, granite.py to make operator precedence clear.
The expression `bsz == 1 or not X and Y` relies on Python binding
`not` > `and` > `or` which is correct but easy to misread.

Remove dead commented-out code (`# else: # Knn, Vnn = Knn, Vnn`)
and stale mask comments (`# if attention_mask ...`) from the bsz==1
fast path in llama, qwen3, cohere, falcon_h1, gemma2 inference
functions. These were leftover from the pre-batched-inference
structure and no longer apply.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-02-25 07:21:04 -08:00
Daniel Han
9b51b14b2b Support Python 3.14 in package metadata (#4113) 2026-02-25 07:17:16 -08:00
Roland Tannous
c21cf2ffcf Add GGUF tag for exported models in chat page selector 2026-02-25 19:01:47 +04:00
Roland Tannous
bfb1403032 Relocate GGUF exports into exports/ directory 2026-02-25 18:54:39 +04:00
Datta Nimmaturi
3f9e03ff1b Allow fp8 for non fast inference (#3904)
* Allow fp8 for non fast inference

* Extensive fp8 alow and quantizer patch

* Clean up commented-out code, duplicate import, and revert unnecessary Version() changes

- Delete commented-out FP8 fast_inference guard in FastModel (loader.py)
  instead of leaving it commented -- matches FastLanguageModel which was
  properly deleted
- Delete commented-out fast_inference guard in loader_utils.py
- Remove duplicate `from transformers import GenerationConfig, CompileConfig`
  in vision.py (line 112 already imports both plus AutoConfig)
- Revert Version(trl.__version__) back to Version(trl) in trainer.py --
  trainer.py imports Version from unsloth_zoo.utils which already handles
  module objects

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-02-25 06:52:18 -08:00
Daniel Han
00fe9a40c0 Add resilience to TRL internal API reclassification (#4111)
* Add resilience to TRL internal API reclassification

TRL is moving toward v1.0 and will reclassify several
currently-importable symbols as internal with no stability
guarantees. This adds try/except cascading imports with local
fallbacks so Unsloth keeps working regardless of whether TRL
removes, moves, or restructures these symbols.

Changes:
- rl.py: Add try/except cascade for unwrap_model_for_generation
  with local contextmanager fallback. Wire sanitize_logprob from
  RL_REPLACEMENTS into the compiled trainer template (same pipeline
  as selective_log_softmax and other global functions). Add import
  math and import logging to the template header.
- rl_replacements.py: Remove inline import of sanitize_logprob
  from trl.scripts.vllm_serve in the regex replacement. The
  function is now a module-level global in the compiled file.
- tokenizer_utils.py: Wrap dynamic exec import with per-item
  fallback so a single removed symbol does not break the entire
  bulk import.

Depends on unslothai/unsloth-zoo#516.

Tested across all TRL versions from 0.22.2 through 0.29.0.dev0
(git main). Training losses and grad norms are bit-identical
to unpatched runs.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-25 06:34:21 -08:00
Irfan Ali
30fac638ad fix: correct gpt-oss Ollama generation prompt and add quantization wa… (#4087)
* Warn when save_pretrained_gguf overrides quantization to MXFP4 for GPT-OSS

GPT-OSS only supports MXFP4 format. If the user passes a different
quantization_method, log a warning via logger.warning_once before
overriding. Pass quantization_method=None to suppress the warning.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-25 04:39:16 -08:00
Roland Tannous
a8b5b7ed58 Fix GGUF models missing from chat page model search
GGUF was in the global EXCLUDED_TAGS set which filtered it from all
consumers of useHfModelSearch, including the chat page. Move GGUF
exclusion to an opt-in excludeGguf option so only training and
onboarding pages filter out GGUF models.
2026-02-25 16:21:08 +04:00
Roland Tannous
f92e4a3e1b Merge pull request #261 from unslothai/feat/gguf-llama-cpp-inference
Add GGUF model inference via llama-server with quantization variant selection
2026-02-25 16:07:39 +04:00
Roland Tannous
01082b84e5 Merge branch 'nightly' into feat/gguf-llama-cpp-inference 2026-02-25 16:06:03 +04:00
Roland Tannous
a1e064b1c4 Remove UNSLOTH_ENABLE_LOGGING from export pipeline 2026-02-25 16:00:24 +04:00
Roland Tannous
a7fe8a388c Filter GGUF models from training page model selectors
GGUF models can't be fine-tuned, so hide them from the training/studio
page while keeping them available for inference on the chat page.

- Add "gguf" to EXCLUDED_TAGS in HF model search hook
- Filter local models with .gguf extension or -GGUF in ID
2026-02-25 15:47:45 +04:00
samit
bbe208ea38 reduced broad padding 2026-02-25 03:44:05 -08:00
samit
0eab635666 added space to show model/dataset name 2026-02-25 03:40:44 -08:00
Roland Tannous
299ce77467 added vision.py patch for vision processor from PR#260 2026-02-25 11:39:17 +00:00
Roland Tannous
cfaa2f2074 Merge pull request #249 from unslothai/fix/section-card-corner-bleed
Fix: Clip section card overflow to prevent background bleed
2026-02-25 15:27:46 +04:00
Roland Tannous
cb3e4f2c26 Merge pull request #259 from unslothai/feat/dataset-subsets-split
Feat/dataset subsets split
2026-02-25 15:27:12 +04:00
Roland Tannous
96217b5056 Merge pull request #246 from unslothai/fix/dataset-custom-mapping-heuristic
adding custom mapping according to the chat templates
2026-02-25 15:26:36 +04:00
Roland Tannous
d2fe02ff04 Merge pull request #260 from unslothai/fix/fix-vision-processor-unsloth-bug
fix: correct vision.py patch path to unsloth/models/vision.py + add V…
2026-02-25 15:21:01 +04:00
Daniel Han
78963ca19c Fix Nemotron-H and Nemotron-VL model support (#4105)
* Fix Nemotron-H and Nemotron-VL model support

- Add Mamba kernel precision settings for Nemotron-H hybrid models
- Fix VL model auto_model selection for models that only register
  AutoModelForCausalLM in their auto_map
- Skip quantization of out_proj for Nemotron-H Mamba layers

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Simplify VLM auto_model selection logic

Reduce three branches to two since the first and third both assign
AutoModelForVision2Seq. The simplified condition checks whether the
auto_map exclusively registers AutoModelForCausalLM without the VLM
class, and defaults to AutoModelForVision2Seq otherwise.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-25 03:14:12 -08:00
Shine1i
db11f1a601 style(studio): align card heights and restore dataset advanced section placement 2026-02-25 12:11:06 +01:00
Roland Tannous
40719c4a6f fix: correct vision.py patch path to unsloth/models/vision.py + add VLM processor diagnostic 2026-02-25 11:06:22 +00:00
Roland Tannous
6f0b7bc38a fix: use raw github URL for vision.py patch + add VLM processor diagnostic logging 2026-02-25 10:29:05 +00:00
Shine1i
cb57d48e7e chore: add tour label next to navbar tour icon 2026-02-25 11:23:43 +01:00
Manan17
6e8e70c987 fixing the chatml None error 2026-02-25 10:23:13 +00:00
Shine1i
122311a6b1 fix recipe output path, remove tracked root datasets 2026-02-25 11:19:10 +01:00
Wasim Yousef Said
c67da8f349 Merge pull request #257 from unslothai/feature/chat-model-switch-warning
feat: chat model switching toast and add image detection logic
2026-02-25 01:58:56 -08:00
Shine1i
44d6abb36c feat: chat model switching toast and add image detection logic 2026-02-25 10:55:53 +01:00
Wasim Yousef Said
a793660960 Merge pull request #254 from unslothai/feature/theme-fix
feat: fix markdown rendering, UI adjustments
2026-02-25 00:48:09 -08:00
Shine1i
8732f3befb feat: fix markdown rendering, UI adjustments 2026-02-25 09:46:13 +01:00
samit
a199e3f682 rebase with nightly 2026-02-25 00:36:25 -08:00
Wasim Yousef Said
468ec99489 Merge pull request #253 from unslothai/feature/theme-fix
feat: fix dark mode support and refine UI assets
2026-02-25 00:27:43 -08:00
Shine1i
adcbf78553 feat: fix dark mode support and refine UI assets 2026-02-25 09:23:05 +01:00
Manan17
47fc79df6d My changes for dataset 2026-02-25 08:15:44 +00:00
Manan17
60912e45e6 adding custom mapping according to the chat templates 2026-02-25 07:56:30 +00:00
imagineer99
0b47ab1eab fix: clip section card overflow to prevent background bleed at rounded corners 2026-02-25 03:54:40 +00:00
imagineer99
dbf5acf486 feat: filter pretraining datasets from search results 2026-02-25 03:00:40 +00:00
Roland Tannous
875c6c8094 Merge pull request #247 from unslothai/fix/path-traversal-vuln
Fix Content-Length crash and path traversal vulnerability in frontend serving
2026-02-25 05:16:00 +04:00
Roland Tannous
a6f1153f9a fix: replace FileResponse with Response for index.html to prevent Content-Length mismatch and add path traversal guard 2026-02-25 01:05:04 +00:00
Roland Tannous
da8e58fb8f Merge pull request #30 from unslothai/feature/canvas-lab
Draft: Data Recipes graph editor WIP
2026-02-25 04:01:53 +04:00
Shine1i
773f7945a6 chore: squircle! tooltip 2026-02-25 00:55:35 +01:00
Shine1i
faa36e9abb feat: update icons and enhance dark mode styling for navbar and recipes
- Replaced `CookBookIcon` with `ChefHatIcon` in navbar for improved clarity.
- Added dark mode-specific gradient styles to recipe cards for better visual differentiation.
2026-02-25 00:39:49 +01:00
Shine1i
164560c6c9 feat: improve dark mode styling and simplify navbar 2026-02-25 00:31:20 +01:00
Roland Tannous
7adb69581e Fix GGUF export cwd confusion: remove os.chdir, use absolute paths
Remove os.chdir(save_directory) from export.py which was causing all of
unsloth-zoo's relative-path internals (check_llama_cpp, use_local_gguf,
_download_convert_hf_to_gguf) to resolve against the export directory
instead of the repo root. This caused llama.cpp to be cloned inside each
export dir and destroyed the repo root's llama-server build on cleanup.

Now passes absolute paths to save_pretrained_gguf so unsloth resolves
llama.cpp from the repo root where setup.sh already built it.

Also builds llama-quantize in setup.sh (needed by unsloth-zoo's export
pipeline) and symlinks it to llama.cpp root for check_llama_cpp().
2026-02-25 03:30:54 +04:00
Shine1i
2548720c01 Merge branch 'feature/canvas-lab' of https://github.com/unslothai/new-ui-prototype into feature/canvas-lab
# Conflicts:
#	studio/frontend/bun.lock
2026-02-25 00:19:19 +01:00
Shine1i
929c7f86e4 feat: add animated theme toggler and refine dark mode styling 2026-02-25 00:18:20 +01:00
Manan17
fdbc60de77 adding custom mapping according to the chat templates 2026-02-24 21:15:56 +00:00
Leo Borcherding
a3daae1c40 fix: replace datetime.UTC with timezone.utc for Python 3.9+ compatibility
- Replace datetime.UTC with datetime.timezone.utc in authentication.py and storage.py
- Fixes ImportError on Python versions < 3.11
- timezone.utc works on Python 3.9+

Resolves #237
2026-02-24 14:37:00 -06:00
Roland Tannous
0e7c8a2e5e Switch GGUF backend from /v1/completions to /v1/chat/completions
Fixes two bugs:
1. Chat template tags (<|im_start|>, <|im_end|>) leaking into output
   because /v1/completions treated them as literal text
2. Image hallucination because image_b64 was never passed to llama-server

Now llama-server handles chat templates natively and receives images
as OpenAI-format multimodal content parts for vision models.
2026-02-24 19:21:01 +04:00
Roland Tannous
ef1cd3ac98 Use llama-server -hf mode, add GGUF variant selector, fix vision detection
Replace Python-side GGUF download with llama-server's native -hf flag for
HuggingFace repos. Add frontend variant picker so users can choose
quantization (Q4_K_M, Q8_0, BF16, etc.) with file sizes. Fix vision
detection via mmproj files instead of hardcoding is_vision=False.
2026-02-24 19:03:06 +04:00
Roland Tannous
08aeeaee4b Fix llama-server: build in-tree, fix path resolution, add LD_LIBRARY_PATH 2026-02-24 18:19:29 +04:00
Roland Tannous
4e88092452 Preflight llama-server check before downloading remote GGUF files 2026-02-24 18:02:43 +04:00
Daniel Han
0f5a1fa7c3 Fix FP8 model loading: redirect to BF16 sibling for BNB/16-bit (#4095)
* Fix FP8 model loading for BNB/16-bit: redirect to BF16 sibling

Models like Ministral-3-3B-Instruct-2512 ship with FP8 weights and an FP8
quantization_config in their config.json. Loading these with BNB 4-bit/8-bit
fails because BNB cannot quantize FP8 tensors. Loading with 16-bit also fails
because the FP8 quantization config has activation_scheme=static which is
unsupported by transformers' FineGrainedFP8Config.

When an FP8 model is detected and the user is not explicitly requesting FP8
loading, check if a BF16 sibling repo exists (model_name + "-BF16") and
redirect to it. This happens early in the loading flow before any quantization
config processing.

Also pass the modified model_config to auto_model.from_pretrained to avoid
transformers re-reading the original config from the model repo.

Tested with Ministral-3-3B in 4-bit and 16-bit modes. Both now load and
train correctly.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Simplify FP8 condition and narrow exception handling

Simplify the load_in_fp8 check (works for bool and string values).
Narrow inner except to KeyError and add comment for outer except.

* Warn user when FP8 model has no BF16 sibling for redirect

Previously the except block silently fell through with `pass`,
so users would get a confusing BNB dtype error later. Now prints
a clear message explaining the FP8 situation and suggesting
load_in_fp8=True or uploading a BF16 version.

* Fix FP8 redirect state corruption and add fbgemm_fp8 support

- Fix state corruption: model_name was reassigned before
  AutoConfig.from_pretrained, so if config fetch failed,
  model_name pointed to BF16 repo while auto_config still
  had FP8. Now only updates state after both checks succeed.
- Save original model_name so warning message is correct
  even on failure.
- Handle fbgemm_fp8 quant method in addition to fp8.

* Extract FP8 redirect to shared _redirect_fp8_to_bf16() in _utils.py

Addresses reviewer feedback:
- Move FP8 redirect logic to a shared function callable from both
  vision.py (FastBaseModel) and llama.py (FastLlamaModel)
- Raise RuntimeError instead of warning when BF16 sibling not found
- Add FP8 redirect to llama.py for text-only model loading path

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add Ministral 3B/8B/14B mapper entries

Adds all 9 Ministral model variants to the mapper:
- Instruct (3B, 8B, 14B) with FP8 variant mappings
- Base (3B, 8B, 14B)
- Reasoning (3B, 8B, 14B)

This routes mistralai/Ministral-* to unsloth/Ministral-* repos
(BF16 weights), which also avoids the FP8 config issue for the
standard loading path through loader.py.

* Add FP8 mapper entries for Mistral-Small-3.2 and Magistral-Small-2509

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-16-253.us-east-2.compute.internal>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-24 05:56:07 -08:00
Roland Tannous
a900eb9ad7 Fix GGUF detection for HuggingFace repo IDs (not just local paths) 2026-02-24 17:49:09 +04:00
Roland Tannous
70c912d788 Fix CUDA detection for llama-server build on multi-GPU machines 2026-02-24 17:45:17 +04:00
Roland Tannous
a40ebb1aab Add GGUF model inference via llama-server backend 2026-02-24 17:40:05 +04:00
Roland Tannous
3f34996288 Merge branch 'nightly' into feature/canvas-lab 2026-02-24 10:08:13 +00:00
Roland Tannous
7bbb1f0a0b Merge pull request #218 from unslothai/fix/stop-startup-modal
Added cancel training button on the overlay
2026-02-24 14:03:25 +04:00
Roland Tannous
d38656139d Merge pull request #241 from unslothai/feature/adding-exported-models-for-chat
Adding exported model for chat
2026-02-24 13:55:45 +04:00
Roland Tannous
3ffbee3586 fix(chat): strip /suffix from lora display name and show type tag instead of base model 2026-02-24 09:54:34 +00:00
Roland Tannous
2149bc74ee Merge pull request #232 from unslothai/fix/disable-eval-by-default
# fix/disable eval by default
2026-02-24 13:35:11 +04:00
Roland Tannous
f5057d86ed use explicit float bounds for eval_steps input (0.0–1.0) 2026-02-24 09:31:30 +00:00
Roland Tannous
2be2933846 skip eval split and HF split detection when eval_steps is disabled 2026-02-24 09:26:54 +00:00
imagineer99
b8617a5544 feat: move cancel training button inside terminal startup card 2026-02-24 09:05:02 +00:00
Roland Tannous
c0f6012d77 Merge remote-tracking branch 'origin/nightly' into feature/canvas-lab
# Conflicts:
#	studio/frontend/bun.lock
#	studio/frontend/package.json
2026-02-24 09:02:54 +00:00
Roland Tannous
8aca1cad29 Merge pull request #231 from unslothai/feat/custom-YAML-saving
Feat: Add Upload / Save / Reset training config from local YAML
2026-02-24 12:43:42 +04:00
imagineer99
002fe3d879 feat: improve training config UX and remove unused logging options 2026-02-24 08:22:02 +00:00
Shine1i
a4b7d360de chore: remove unused "Evaluate" navigation item and its icon from navbar 2026-02-24 09:09:56 +01:00
Shine1i
c8c844a4d6 feat: introduce single-env Python dependency management for streamlined compatibility
- Added constrained dependency files for single-env installations: `constraints.txt`, `data-designer.txt`, and `data-designer-deps.txt`.
- Implemented a `patch_metadata.py` script to resolve metadata conflicts between dependency versions.
- Updated `setup.sh` to integrate single-env setup, including dependency installation and metadata patching.
- Upgraded `fastmcp` and `websockets` versions in `extras.txt` for compatibility.
- Commented out unused "Start Tutorial" button in `data-recipes-page.tsx`.
2026-02-24 07:45:40 +01:00
Shine1i
dad78ae0ce feat: improve markdown note styles and layout logic 2026-02-24 04:04:02 +01:00
Shine1i
b80796a7cd feat: enhance markdown note blocks with style options and double-click config access
- Added support for configuring markdown note block styles, including color and opacity.
- Enabled double-click on markdown notes to open their configuration dialog.
- Adjusted layout styles in markdown previews for better interaction control.
- Updated relevant payloads, types, and UI logic to support added styling features.
- Integrated multiple example notes in learning recipes for better visualization.
2026-02-24 03:47:42 +01:00
Shine1i
3989cd6524 feat: introduce markdown note blocks for canvas documentation
- Added "Markdown Note" block to allow users to add UI-only markdown notes to the canvas for documentation purposes.
- Integrated note creation, editing, and rendering in the `recipe-studio` UI, including markdown previews.
- Updated payload generation logic to omit markdown notes from backend payloads.
- Enhanced block types, definitions, and dialog support to include the new "Markdown Note" feature.
2026-02-24 03:11:29 +01:00
Shine1i
ba000dc0f2 feat: add "Multi-Turn Chat" learning recipe with structured conversation outputs
- Introduced "Multi-Turn Chat" recipe to generate structured user-assistant conversations with domain/topic-based goals and constraints.
- Added `conversation.json` with model configuration, sampling strategies, and LLM prompts.
- Updated UI nodes, layout, and graph rendering logic to support new recipe.
- Enhanced `recipe-studio` fit view logic to improve editor layout responsiveness.
2026-02-24 02:38:36 +01:00
samit
32d5cd7198 resolved unbound variable error 2026-02-23 17:37:23 -08:00
Manan17
aeb198f52d Fixing base model export issue for vlms 2026-02-24 01:34:11 +00:00
Manan17
4be677e45d Adding exported model for chat 2026-02-24 01:17:09 +00:00
pre-commit-ci[bot]
36181bad96 [pre-commit.ci] pre-commit autoupdate (#4096)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.15.1 → v0.15.2](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.1...v0.15.2)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-23 17:04:34 -08:00
Shine1i
dec6b4b224 feat: add new learning recipes for diverse data transformations
- Added three new learning recipes: "Instruction from Answer," "PDF Grounded QA," and "Structured Outputs Jinja," with respective metadata and configuration.
- Integrated support for unstructured and structured input handling, including sampling strategies, prompt definitions, and model specifications.
- Enhanced JSON structure and UI nodes to facilitate better recipe visualization and execution.
2026-02-24 01:50:55 +01:00
Shine1i
5254f04065 feat: add layout direction support and enhance handle logic
- Introduced `layoutDirection` to control graph orientation ("LR" or "TB") and integrate into edges, nodes, and payloads.
- Enhanced handle management with new default, semantic, and data-specific mappings based on layout direction.
- Added handle normalization for consistent connections across layouts and semantic/data flows.
- Updated UI to reflect layout-aware positioning and semantic connections.
2026-02-24 00:51:49 +01:00
Shine1i
9f574941f9 feat: normalize handle IDs and enhance scorer options UI
- Added handle normalization functions to standardize handle IDs across connections.
- Expanded UI for scorer options with real-time updates, input fields for values and descriptions, and support for adding/removing options.
- Updated graph node handles and their layout logic for better connection visualization.
- Stripped sensitive fields (e.g., `api_key`) from payloads during export.
2026-02-24 00:29:14 +01:00
Shine1i
ad95b4a951 feat: add "Instruction from Answer" learning recipe and badge display enhancements
- Introduced a new "Instruction from Answer" learning recipe with related metadata, payload integration, and UI updates.
- Enhanced badge display logic to include up to 3 badges with overflow indication for additional learning badges.
2026-02-24 00:25:21 +01:00
Shine1i
ab31aa9ed4 feat: add per-column seed drop support with UI integration, validation, and payload enhancements 2026-02-23 23:33:59 +01:00
Shine1i
54382c659c feat: add support for learning recipes with template loading, dialog integration, and enhanced payload handling 2026-02-23 23:20:32 +01:00
Shine1i
71916f1dce feat: add ShineBorder UI component and learning recipe templates to enhance data recipes page 2026-02-23 22:38:06 +01:00
Shine1i
8739a01f56 Merge branch 'nightly' into feature/canvas-lab 2026-02-23 21:54:35 +01:00
Shine1i
b8231a6be6 refactor: keep seed block pos 2026-02-23 21:53:32 +01:00
Shine1i
1323e0af53 refactor: add batch processing support with configuration options and execution enhancements 2026-02-23 21:32:20 +01:00
Shine1i
59a15cb5bc refactor: enhance recipe validation flows with error collection, seed-specific updates, and improved UX in execution dialogs 2026-02-23 20:34:53 +01:00
Shine1i
d4655eb8bf refactor: streamline recipe execution flows with validation support and enhanced run dialog interactions 2026-02-23 20:28:41 +01:00
Shine1i
91cbb0e933 refactor: improve dialog rendering and logging setup for stability and configurability 2026-02-23 20:16:03 +01:00
Leo Borcherding
cdeed53a97 fix: disable eval by default, set eval_steps to 0.0
- Changed default eval_steps from 0.01 to 0.0 across backend and frontend
- Fixed UI to allow eval_steps=0 (removed min=0.001 constraint)
- Added conditional eval logic with helpful console messages
- Updated tooltip to explain how to disable evaluation
- Tested: confirmed eval disabled by default with eval_steps=0.0
2026-02-23 13:07:47 -06:00
imagineer99
6cedc339c6 feat: add Upload / Save / Reset training config from local YAML 2026-02-23 18:55:08 +00:00
Shine1i
71ab9ff4b4 refactor: enhance seed configuration handling with added fields, dynamic chunking logic, and streamlined interactions 2026-02-23 19:40:13 +01:00
Shine1i
424b00b701 refactor: improve seed source handling with additional type support, enhanced parsing logic, and text chunking optimization 2026-02-23 19:29:54 +01:00
Shine1i
3e17e2b0f6 refactor: enhance seed source handling with new source types and streamlined inspection flows 2026-02-23 18:46:02 +01:00
imagineer99
71d698d182 feat: sort and filter dataset search results by model type relevance 2026-02-23 16:22:45 +00:00
Roland Tannous
77b0978d5f Merge pull request #228 from unslothai/fix/cap-num-proc-multigpu-deadlock
Cap dataset.map num_proc on multi-GPU machines to prevent fork deadlocks
2026-02-23 19:04:22 +04:00
Roland Tannous
d74174f7f5 Cap dataset.map num_proc on multi-GPU machines to prevent fork deadlocks 2026-02-23 14:25:31 +00:00
Roland Tannous
6acc2dbf8f Merge branch 'nightly' into feature/transformers-v5-support 2026-02-23 13:40:16 +00:00
Roland Tannous
4cb0cfdaf5 Remove firebase-debug.log and setup_leo.sh from tracking and add to .gitignore 2026-02-23 17:38:22 +04:00
Roland Tannous
c017ce802f Remove firebase-debug.log and setup_leo.sh from tracking and add to .gitignore 2026-02-23 17:37:46 +04:00
Roland Tannous
313e77c5fd Merge branch 'nightly' into feature/transformers-v5-support 2026-02-23 13:32:51 +00:00
Roland Tannous
2e2aa54ad2 Merge pull request #225 from unslothai/fix/fix-response-on-completion-truncation
fix: error on >30% sample drop after `train_on_responses_only` instead of silent DataLoader crash
2026-02-23 16:28:32 +04:00
Roland Tannous
3015916d26 fix: error on >30% sample drop after train_on_responses_only instead of silent DataLoader crash 2026-02-23 12:21:06 +00:00
Roland Tannous
b03938f6ad Merge nightly into main Brings main up to date with nightly, including chat attachments, model-per-thread persistence, speech recognition, VRAM recommendations, MoE model configs, VLM fixes, and compile cache cleanup. Conflicts resolved by taking nightly's version for all diverged files (main-only changes were a feature add + immediate revert with net zero effect). 2026-02-23 14:52:57 +04:00
Daniel Han
2ed86865fb Suppress FBGEMM CUTLASS stdout spam on Blackwell GPUs (#4092)
* Suppress FBGEMM CUTLASS "Arch conditional MMA" stdout spam on Blackwell GPUs

On Blackwell GPUs (B200/B100, SM100), FBGEMM's f8f8bf16_blockwise kernel
is hardcoded to cutlass::arch::Sm90 with no SM100 code path. When
test_has_fbgemm() probes this kernel, it fires 2304 "ERROR : Arch
conditional MMA instruction used without targeting appropriate compute
capability" lines before aborting and returning zeros.

The existing HidePrintMessage filter on sys.stderr (line 109) does not
catch these because CUDA device-side printf writes to stdout fd 1 at the
C level, bypassing Python's sys.stdout/sys.stderr entirely.

Fix: add suppress_cuda_printf() context manager in import_fixes.py that
redirects fd 1 and fd 2 to /dev/null at the OS level, with
torch.cuda.synchronize() and libc fflush before restoring. Wrap the
test_has_fbgemm() call in fp8.py with this context manager.

Tested on B200 with fbgemm-gpu-genai 1.4.0+cu130 and 1.5.0+cu130:
- Before: 2304 warning lines on every import
- After: 0 warning lines
- UNSLOTH_HAS_FBGEMM correctly set to 0 (Triton fallback works)
- Works with both UNSLOTH_ENABLE_LOGGING=0 and =1

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Guard _libc init and fflush to prevent fd leak on failure

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-16-253.us-east-2.compute.internal>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-23 01:27:10 -08:00
Daniel Han
fec06247c9 Fix VLM processor load degradation and vLLM CUDA version detection (#4091)
* Fix VLM processor load degradation and vLLM CUDA version detection

vision.py - Fix VLM processor load for issue #4085:
- Before loading the processor, scan local config files and strip the
  _Unsloth_Patched_ prefix. AutoProcessor.from_pretrained silently
  degrades to a text-only tokenizer instead of raising an exception
  when it encounters the unrecognized class name, so the existing
  get_auto_processor fallback never triggers. Sanitizing the configs
  before loading fixes backwards compat for old corrupted saves.
- After loading, detect when AutoProcessor returned a text-only
  tokenizer for a VLM model (has no image_processor attribute) and
  trigger the manual fallback constructor.

import_fixes.py - Fix vLLM CUDA version mismatch detection:
- _is_broken_vllm_error now also matches CUDA shared library errors
  (libcudart, libcublas, libnvrtc) with "cannot open shared object
  file". Previously it only matched errors containing "vllm._c" in
  the message text, which missed cases where the error message was
  about the missing CUDA library itself (e.g. vllm built for CUDA 12
  on a CUDA 13 system).
- New _get_vllm_cuda_mismatch_message function extracts the CUDA
  version from the error, compares to the system CUDA version via
  torch.version.cuda, and returns a targeted install command using
  the correct GitHub releases wheel URL.
- disable_broken_vllm uses the targeted message when a CUDA mismatch
  is detected, falling back to the existing generic message otherwise.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-16-253.us-east-2.compute.internal>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-23 01:06:53 -08:00
Roland Tannous
5416bdd4e6 added shutil import to main.py 2026-02-23 07:44:59 +00:00
Roland Tannous
f16a7f2d17 Merge nightly into feature/transformers-v5-support 2026-02-23 07:40:28 +00:00
Roland Tannous
bfb84221fe Merge pull request #221 from unslothai/feature/clear-unsloth-compile-cache
feat: clear unsloth_compiled_cache on startup, shutdown, and between …
2026-02-23 11:30:46 +04:00
Roland Tannous
dbbcdb4f09 feat: clear unsloth_compiled_cache on startup, shutdown, and between model loads 2026-02-23 07:26:22 +00:00
Roland Tannous
2a117f57a9 Merge pull request #220 from unslothai/feature/moe-training-models-configs
Add model defaults for MoE models (Qwen3 MoE, GLM Flash) and GLM response mapping
2026-02-23 10:09:27 +04:00
Roland Tannous
fb1c321ad3 Add GLM, Qwen3 MoE, TinyQwen3 MoE, and Ministral 3 VL model defaults and GLM train_on_responses_only mapping 2026-02-23 05:51:43 +00:00
Roland Tannous
3fe85d36cc Remove stale .venv_overlay on server startup to prevent transformers version conflicts 2026-02-23 05:08:27 +00:00
Roland Tannous
6995aaf077 Clean up stale .venv_overlay directory during setup 2026-02-22 20:30:00 +00:00
Roland Tannous
e3fb4f53df Patch adapter_config.json with unsloth_training_method and auto-detect load_in_4bit for LoRA inference 2026-02-22 20:27:52 +00:00
Roland Tannous
5de6246142 Purge own utils/core modules and use lazy imports so is_vision_model picks up fresh AutoConfig after version switch 2026-02-22 20:04:35 +00:00
samit
7dcaa52083 added cancel training button on the overlay 2026-02-22 12:03:26 -08:00
Roland Tannous
c12d75c472 Add transformers version switch to model config and vision check endpoints for dropdown selection 2026-02-22 19:56:12 +00:00
Roland Tannous
60997a75eb Install transformers into both site-packages and overlay to fix sub-package resolution during version switch 2026-02-22 19:44:12 +00:00
Roland Tannous
7cde520176 Move transformers overlay to local .venv_overlay/, add huggingface-hub to overlay install 2026-02-22 19:34:46 +00:00
Roland Tannous
15cb9b0f37 Use sys.path overlay to switch transformers versions in-process instead of modifying site-packages 2026-02-22 19:19:12 +00:00
Roland Tannous
0050e78aa3 Fix in-memory transformers version detection and aggressive module purge for 5.1.0/4.57.1 switching 2026-02-22 19:08:18 +00:00
Roland Tannous
1c2653fcc2 aggressive reload_transformers 2026-02-22 18:52:06 +00:00
Roland Tannous
4d06258e93 Auto-switch transformers version (5.1.0/4.57.1) for Ministral-3, GLM-4.7-Flash, Qwen3-30B-A3B models with LoRA adapter resolution 2026-02-22 18:29:40 +00:00
Roland Tannous
bb1bd49a68 Merge pull request #217 from unslothai/fix/update-config-yamls
Fix vision LoRA defaults for VLMs and clean up text-only model configs
2026-02-22 19:14:54 +04:00
Roland Tannous
132cdb0547 fix: correct vision LoRA defaults for VLMs and remove vision fields from text-only model configs 2026-02-22 15:09:10 +00:00
Roland Tannous
7ac391d1e0 Merge pull request #215 from unslothai/fix/vlm-processing-class
fix: pass full Processor as processing_class for VLM SFTTrainer
2026-02-22 18:14:14 +04:00
Roland Tannous
202b7cdfa7 fix: pass full Processor as processing_class for VLM SFTTrainer 2026-02-22 14:11:12 +00:00
Roland Tannous
a4346b954e Merge pull request #213 from unslothai/fix/fix-clear-chat-new-model
Fix: Clear Chat and Manage Model Lifecycle on Model Switch
2026-02-22 17:54:27 +04:00
Roland Tannous
761953b50e feat(chat): persist model per thread and auto-load on thread switch 2026-02-22 13:35:45 +00:00
Roland Tannous
536a735acc feat(chat): eject current model and start fresh thread on model switch 2026-02-22 13:17:03 +00:00
Roland Tannous
a490dca8a0 Merge pull request #211 from unslothai/fix/param-count-display
Fix: Remove download count fallback when model param count is unavailable
2026-02-22 16:52:25 +04:00
Roland Tannous
d3fbaa2256 Merge pull request #204 from unslothai/fix/image-preview-thumbnail
Fix: Resolve image preview thumbnail not rendering before send
2026-02-22 16:09:24 +04:00
Roland Tannous
36f10ea90e Merge pull request #212 from unslothai/fix/stop-unclickable
Updated stop button to be unavailable during cancel training
2026-02-22 16:08:47 +04:00
Roland Tannous
c5a88ef90b Merge pull request #208 from unslothai/revert-207-feature/attachment-restore
Revert "fix(chat): persist + hydrate user attachments in IndexedDB history"
2026-02-22 12:22:25 +04:00
Roland Tannous
df8e4bdc40 Merge pull request #203 from unslothai/feat/sort-unsloth-models-first
Feat: Sort unsloth models first in HF search dropdowns
2026-02-22 12:21:42 +04:00
Roland Tannous
75f3e5e2a1 feat: dual-query HF model search to surface all unsloth size variants first 2026-02-22 08:20:25 +00:00
imagineer99
4fe3772aa2 fix: remove download count fallback when model param count is unavailable 2026-02-22 06:30:50 +00:00
Wasim Yousef Said
4779ae8e61 Merge pull request #209 from unslothai/feature/attachment-restore
fix(chat): persist + hydrate user attachments in IndexedDB history
2026-02-21 22:17:18 -08:00
Roland Tannous
ac83e8b668 Revert "fix(chat): persist + hydrate user attachments in IndexedDB history" 2026-02-22 10:15:24 +04:00
Wasim Yousef Said
4f2e434bc3 Merge pull request #207 from unslothai/feature/attachment-restore
fix(chat): persist + hydrate user attachments in IndexedDB history
2026-02-21 22:09:53 -08:00
Shine1i
c726cff4c8 feat: add utils for deep cloning content and attachments in thread messages 2026-02-22 07:08:18 +01:00
Shine1i
7e1e25fb32 refactor: simplify execution overview tab by removing unused token metrics and refining layout spacing 2026-02-22 06:36:32 +01:00
Shine1i
b7dfa2b7e4 refactor: extract and modularize execution tabs and helpers for enhanced code reusability and maintainability 2026-02-22 06:26:40 +01:00
Shine1i
d6921042b0 refactor: enhance execution row handling and dataset pagination for improved interactivity and preview support 2026-02-22 05:44:23 +01:00
Shine1i
4cda750589 refactor: extract reusable runtime utilities and unify execution dialog flows for preview and full runs 2026-02-22 05:37:33 +01:00
Shine1i
17a22fe155 refactor: improve layout direction handling and auxiliary node visibility for LLMS 2026-02-22 05:18:33 +01:00
Shine1i
2cc9981ef9 refactor: enhance variable handling with structured entries and UI updates for badges 2026-02-22 04:07:01 +01:00
Shine1i
0786728323 refactor: extract reusable helpers and streamline seed inspection flow 2026-02-22 03:41:59 +01:00
imagineer99
52a2bfe016 fix: resolve image preview thumbnail not rendering before send 2026-02-22 02:39:56 +00:00
Shine1i
869ac64e18 feat: enhance dataset seed handling with inspection and UI improvements 2026-02-22 03:39:34 +01:00
imagineer99
c815fc045d feat: sort unsloth models first in HF search dropdowns 2026-02-22 01:45:51 +00:00
Shine1i
3b29d088c0 merge: nightly into feature/canvas-lab 2026-02-22 02:31:32 +01:00
Shine1i
63c1b95f20 refactor: replace inline labels with reusable FieldLabel component in dialogs 2026-02-22 02:30:04 +01:00
Shine1i
77bed648ae refactor: update UI styles for graph nodes and components with consistent transitions and rounded elements 2026-02-22 02:20:51 +01:00
Shine1i
e3c3bf75a0 refactor: streamline samplers and block handling, update dialogs and validation 2026-02-22 02:16:09 +01:00
Roland Tannous
000da20237 Merge pull request #202 from unslothai/fix/rename-downloading-model-to-loading-model
renaming downloading model to loading model
2026-02-21 17:32:32 +04:00
Roland Tannous
9bc32789a6 renaming downloading model to loading model 2026-02-21 13:30:41 +00:00
Roland Tannous
44828f582f Merge pull request #200 from unslothai/fix/sloth-z-index-overlay
fix: enable navbar z-index by adding relative position
2026-02-21 17:15:05 +04:00
Roland Tannous
ea324f2806 Merge pull request #193 from unslothai/feature/vram-fit-chat
Added vram fit indicator to models in chat
2026-02-21 17:09:31 +04:00
Roland Tannous
3a4a576128 Merge pull request #196 from unslothai/fix/compare-dictate-attachment-buttons
Added dictate and add attachments feature in chat page
2026-02-21 16:52:56 +04:00
Roland Tannous
9b3cab6d5f move microphone icon in compare page to be next to send button 2026-02-21 12:51:38 +00:00
samit
3fb9b4c056 added SpeechRecognition declarations and missing type packages 2026-02-21 01:00:08 -08:00
imagineer99
33b31c6be6 fix: enable navbar z-index by adding relative position 2026-02-21 08:37:47 +00:00
samit
f4a888cddb Added VRAM fit indicator to recoomended models 2026-02-20 23:55:10 -08:00
Roland Tannous
ea0964de75 Merge pull request #192 from unslothai/feature/model-download-status
Updated to edit loading as "downloading model"
2026-02-21 11:07:08 +04:00
samit
e7d32a6461 updated stop button to unavailable during cancel training 2026-02-20 22:45:41 -08:00
Roland Tannous
470d12cf46 Merge pull request #188 from unslothai/fix/gemma-3-chat
fixed the vlm's text only errors
2026-02-21 10:16:20 +04:00
samit
97f40bdc58 Added dictate and add attachments feature 2026-02-20 22:14:27 -08:00
Roland Tannous
c051e3d532 fix: load proper vision processor from base model when FastVisionModel returns raw tokenizer, add tokenize=False to vision chat template 2026-02-21 04:40:29 +00:00
Manan17
f6ebeb1d42 Mapping proper tokenizer for VLMs 2026-02-21 01:57:05 +00:00
samit
08c3c80d31 added vram fit indicator to models in chat 2026-02-20 17:25:03 -08:00
samit
0a3beade35 updated to edit loading as downloading model 2026-02-20 15:49:33 -08:00
Manan17
3fa9e773c2 fixed the vlm's text only errors 2026-02-20 22:23:26 +00:00
Roland Tannous
ef3bd22b02 Merge pull request #187 from unslothai/fix/fix-git-clone-branch-colab
removed branch from colab git clone
2026-02-20 23:16:06 +04:00
Roland Tannous
a77b9717f8 removed branch from colab git clone 2026-02-20 19:15:03 +00:00
Roland Tannous
ed476534f7 Merge pull request #186 from unslothai/fix/colab-setup-fixes
Fix/colab setup fixes
2026-02-20 23:08:25 +04:00
Roland Tannous
08ff8de31d add huggingface-hub==0.36.0 due to colab error 2026-02-20 18:26:20 +00:00
Roland Tannous
48e232b38c add huggingface-hub==0.36.0 due to colab error 2026-02-20 18:24:48 +00:00
Roland Tannous
34131da9a4 moved transformers4.57.1 to no-extra-deps 2026-02-20 18:08:58 +00:00
Roland Tannous
bf34ede725 Merge pull request #180 from unslothai/fix/dropdown-menu-prefill
Fix: model and dataset dropdowns selecting stale value on Enter
2026-02-20 22:01:35 +04:00
Roland Tannous
e5f9ae5c9f Merge branch 'nightly' into fix/dropdown-menu-prefill 2026-02-20 17:44:06 +00:00
Roland Tannous
63c564d7ec Merge pull request #185 from unslothai/fix/remove-warmup-text-inference-status
Fix: remove warmup text inference status
2026-02-20 21:14:37 +04:00
imagineer99
77b7e8a9ba fix: remove warmup text inference status 2026-02-20 15:35:06 +00:00
Shine1i
0d2f81ab3d refactor: extract and consolidate execution runtime and tracking logic 2026-02-20 14:41:38 +01:00
Shine1i
e0d65bb1cf feat: add recipe execution stores, hooks, and logic for managing preview and full executions 2026-02-20 14:38:05 +01:00
Shine1i
4c19a2330a refactor: simplify execution view by removing unused state and redundant logic 2026-02-20 14:19:52 +01:00
Shine1i
f50dd7bcd9 feat: refine execution view with enhanced summary and insights
- Removed unused model usage properties (`total`, `tps`, `requestsSuccess`, etc.) for cleaner data handling.
- Added new metrics: total input/output tokens, null rate, and low uniqueness flags.
- Improved UI for execution summary cards with consolidated insights and model usage tables.
- Introduced detailed analysis for dataset columns, including dropped columns and LLM column counts.
- Optimized rendering logic to reduce clutter and enhance user experience.
2026-02-20 14:05:43 +01:00
Shine1i
360b4daf6d feat: enhance execution log tracking, progress updates, and data visualization
- Added `log_lines` field to track and display runtime logs for executions.
- Enhanced progress tracking with terminal-like log outputs and live log scrolling.
- Introduced detailed "model usage" and "dropped columns" analysis in `ExecutionsView`.
- Optimized UI components for displaying dataset metrics, including input/output token averages.
2026-02-20 13:51:19 +01:00
Shine1i
e7fcfef8c7 feat: improve dataset column visibility and cell expansion in ExecutionsView
- Added column visibility toggles using a dropdown menu for greater customization.
- Introduced expandable table cells for long values with "expand/collapse" functionality.
- Ensured hidden columns reset on execution change, providing a consistent user experience.
2026-02-20 13:04:57 +01:00
Shine1i
391b633cae feat: enhance progress tracking for execution jobs
- Added logic to calculate and manage column-level progress for job executions.
- Introduced `progress_columns_total` and `_column_done` fields for more granular progress updates.
- Improved overall progress computation by considering total columns and individual progress per column.
2026-02-20 12:52:45 +01:00
Shine1i
13e153e448 feat: refactor and extend recipe execution logic
- Extracted shared execution utilities into `execution-helpers.ts` for reusability across features.
- Replaced deprecated `/preview` endpoint and its logic with unified job execution handling.
- Consolidated job execution flows ("Preview" and "Full Run") into shared `runJobExecution` logic.
- Enhanced execution progress tracking with support for column-level progress reporting.
- Added support for handling execution job events and improved error reporting from the backend.
- Updated backend to better manage dataset access errors and provide more informative error messages.
- Cleaned up redundant code in `use-recipe-studio-actions` and streamlined execution APIs.
2026-02-20 12:47:51 +01:00
imagineer99
759ae059db Fix: model and dataset dropdowns selecting stale value on Enter 2026-02-20 11:33:20 +00:00
Shine1i
f3296b1953 feat: add dataset pagination support for recipe executions
- Introduced backend changes to handle dataset pagination with limit, offset, and total row support.
- Updated frontend execution view with dataset pagination controls, including "Next" and "Prev" buttons.
- Extended recipe execution logic to manage dataset pagination details like page number, page size, and total records.
2026-02-20 12:12:02 +01:00
Shine1i
1259b75d15 feat: add support for full recipe executions with detailed progress and analysis
- Introduced "Full Run" support in execution logic, including progress tracking, cancellation, and job status updates.
- Extended backend to manage full execution jobs, handle dataset previews, and return detailed analysis and artifacts.
- Updated frontend components to support full runs, with execution sorting, live updates, and detailed execution views.
- Enhanced `ExecutionsView` with progress indicators, status filtering, and dataset preview capabilities.
- Added IndexedDB schema migration to track additional execution metadata.
2026-02-20 12:05:42 +01:00
Roland Tannous
811a9243b2 Merge pull request #179 from unslothai/fix/copy-mac
Added the copy feature on mac
2026-02-20 14:38:29 +04:00
Shine1i
d378e48c2a feat: introduce execution tracking and analysis for recipe preview
- Added `ExecutionsView` with execution history tracking, live updates, and detailed data analysis.
- Implemented IndexedDB support via Dexie to persist execution records locally.
- Enhanced backend preview logic to return execution analysis and artifacts.
- Updated studio header with view toggling between "Editor" and "Executions."
2026-02-20 11:34:25 +01:00
Roland Tannous
cdd1f7fce2 Merge pull request #166 from unslothai/feat/download-progress-indicator
feat: add download progress indicators for dataset preview and training overlay
2026-02-20 14:11:07 +04:00
Roland Tannous
168055a267 Merge pull request #172 from unslothai/fix/training-param
Added optim and lr_scheduler_type in the frontend
2026-02-20 14:05:33 +04:00
Shine1i
763001b78e refactor: remove Jinja autocomplete components and simplify variable handling
- Deleted `jinja-ref-autocomplete` components and related hooks.
- Replaced custom Jinja variable autocomplete with standard `Textarea` and `Input` components.
- Streamlined variable handling logic by replacing `getAvailableRefItems` with `getAvailableVariables`.
- Removed unused state (`flowMoving`) and redundant logic tied to Jinja-specific functionality.
2026-02-20 10:36:51 +01:00
samit
3d403c6c99 edited the font of the new parameters 2026-02-20 01:29:37 -08:00
Wasim Yousef Said
6dd0e11439 Merge branch 'nightly' into feature/canvas-lab 2026-02-20 01:23:44 -08:00
Roland Tannous
f61392cfb3 Merge pull request #173 from unslothai/fix/hf-token-model-hiding
added hf token validation
2026-02-20 13:19:31 +04:00
samit
5ba8edf9fe added the copy on mac 2026-02-20 00:49:38 -08:00
Roland Tannous
a8c1f5fa84 added NODE OPTIONS export , updating npm to 2.2.6 2026-02-20 08:27:09 +00:00
samit
68028bf7f3 added lr_scheduler type to the frontend 2026-02-19 23:31:46 -08:00
Roland Tannous
fa555c72e8 Merge pull request #175 from unslothai/fix/changing-num-proc-for-filtering
Fix/changing num proc for filtering
2026-02-20 10:39:11 +04:00
Roland Tannous
3c1eff525f Merge pull request #171 from unslothai/fix/compare-mode-race-and-adapter-toggle
Fixing compare feature
2026-02-20 10:38:56 +04:00
Manan17
798bfb8f6f Setting it to total cpu_count // 4 2026-02-20 06:32:01 +00:00
samit
035f765130 added hf token validation 2026-02-19 21:05:27 -08:00
samit
93b31f0db2 added optim in the frontend 2026-02-19 15:29:23 -08:00
Manan17
fdeccec259 Fixing compare feature 2026-02-19 20:15:44 +00:00
Roland Tannous
b6799a21a5 Merge pull request #169 from unslothai/feature/backwards-compatibility-unsloth-ui-command
Add `unsloth-ui` alias for backwards compatibility
2026-02-19 22:44:41 +04:00
Roland Tannous
353cb13618 add unsloth-ui shell alias for backwards compatibility alongside unsloth-studio 2026-02-19 18:41:10 +00:00
Roland Tannous
53820f0eed Merge pull request #168 from unslothai/fix/cli-studio-command-update
Update `studio` CLI command to use new FastAPI backend
2026-02-19 22:17:37 +04:00
Roland Tannous
d486a2ef6f rename cli studio command to use new FastAPI backend and add unsloth-ui alias for backwards compatibility 2026-02-19 17:59:22 +00:00
Daniel Han
3bddfed117 Patch trunc_normal_ for low-precision stability (#4027)
* Fix low-precision trunc_normal initialization instability

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Document TorchTitan trunc_normal low-precision failure mode

* Fix trunc_normal generator positional compatibility

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix trunc_normal generator TypeError fallback

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-19 04:40:14 -08:00
Daniel van Strien
8165266a37 Add optional datasets metadata support to save/push functions (#4076)
* Add `datasets` metadata support to model cards

Add an optional `datasets` parameter to all save/push functions so users
can specify which datasets were used for training. The metadata is set
via `ModelCard.data.datasets` for standard paths and via
`metadata_update` for GGUF and generic save paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix datasets metadata for existing repos, add token, improve errors

- Add metadata_update fallback in create_huggingface_repo and
  upload_to_huggingface so datasets metadata is set even when the
  repo already exists (previously only worked on first creation).
- Pass token=token to all metadata_update calls so they work
  without a global HF login.
- Replace silent except:pass with logger.warning_once for
  metadata failures so users know if something went wrong.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix generic datasets metadata repo resolution for PR #4076

* Fix create_huggingface_repo username resolution for PR #4076

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-19 03:53:35 -08:00
Roland Tannous
f106ca5ed4 Merge pull request #167 from unslothai/fix/train_on_completions-dataset-check-improvements
Simplify dataset check to 2-tier, improve multimodal detection, auto-set trainOnCompletions, recheck dataset on reload
2026-02-19 15:35:50 +04:00
Roland Tannous
18c41c2b08 Simplify dataset check to 2-tier, improve multimodal detection, auto-set trainOnCompletions, recheck dataset on reload 2026-02-19 11:25:54 +00:00
imagineer99
ee6d33fa32 feat: add download progress indicators for dataset preview and training overlay 2026-02-19 08:22:23 +00:00
Roland Tannous
21ee32df40 Auto-set trainOnCompletions based on vision/multimodal state, default all model configs to true, and re-fetch model defaults on page reload 2026-02-19 06:40:40 +00:00
Roland Tannous
e2b7b4b54c change train_on_completions to true 2026-02-19 06:02:16 +00:00
Roland Tannous
9f1caa1ceb Revert "Setting default for the train on responses only"
This reverts commit 982a855df7.
2026-02-19 09:55:21 +04:00
Roland Tannous
b35798c8ff Merge pull request #163 from unslothai/fix/add-vision-and-dataset-to-local
Fix/add vision and dataset to local
2026-02-19 09:46:54 +04:00
Roland Tannous
35194e4382 Merge pull request #161 from unslothai/fix/vision-model-detection
Fixing Vision model detection
2026-02-19 09:43:30 +04:00
Manan17
b7b4cbc949 Adding vision and multimodal dataset to the localstorage 2026-02-19 05:06:41 +00:00
Manan17
982a855df7 Setting default for the train on responses only 2026-02-19 02:59:48 +00:00
Manan17
56869c63bd Passing use_auth = True and also having different checks which is missed by the is_vision function 2026-02-19 02:55:46 +00:00
Kaitao Yang
fd38dc96c3 reduce code duplicaton by inheritting from LlamaRotaryEmbedding (#3878)
* simplify_code_using_apply_time_scaling

* modify LlamaRotaryEmbedding for better inheritance

* reduce_code_duplication_LlamaExtendedRotaryEmbedding
2026-02-18 19:13:33 -06:00
Roland Tannous
559ba976bc Merge pull request #159 from unslothai/fix/reduce_dataset_num_proc_25_pct
Fix/reduce dataset num proc 25 pct
2026-02-19 00:55:02 +04:00
Roland Tannous
adc0c78dbc reduce dataset_num_proc to 1/4 of cpu_count 2026-02-18 20:53:32 +00:00
Roland Tannous
6aceaec323 Merge pull request #155 from unslothai/fix/sft-tokenizer-unwrap-for-vlm-text
fix: Unwrap ProcessorMixin to raw tokenizer for text-only SFTTrainer on VLM-architecture models
2026-02-18 23:19:57 +04:00
Roland Tannous
a5529fbb0e fix: unwrap ProcessorMixin to raw tokenizer for text-only SFTTrainer on VLM-architecture models 2026-02-18 19:16:45 +00:00
Roland Tannous
e33920974b fix: unwrap ProcessorMixin to raw tokenizer for text-only SFTTrainer on VLM-architecture models 2026-02-18 19:13:20 +00:00
Roland Tannous
a243ea411d Merge pull request #154 from unslothai/feature/colab-notebook
Add Google Colab Support for Unsloth Studio
2026-02-18 23:02:22 +04:00
Roland Tannous
a6e2fa5b3a Merge remote-tracking branch 'origin/nightly' into fix/dataset-mapping-vlm-text-datasets 2026-02-18 18:11:45 +04:00
Roland Tannous
940328ce1e Merge remote-tracking branch 'origin/nightly' into feature/colab-notebook 2026-02-18 17:41:07 +04:00
Roland Tannous
fb556d9a2c Merge branch 'fix/sm_120-flex-attention-temp-disable' into nightly
renamed UNSLOTH_FLASH_ATTENTION to UNSLOTH_ENABLE_FLASH_ATTENTION to match actual environment variable in
unsloht
2026-02-18 09:24:52 +00:00
Roland Tannous
d57b2742ab renamed UNSLOTH_FLEX_ATTENTION to UNSLOTH_ENABLE_FLEX_ATTENTION 2026-02-18 09:21:53 +00:00
Roland Tannous
ee5236228d Merge pull request #152 from unslothai/fix/sm_120-flex-attention-temp-disable
Disable flex attention on Blackwell+ GPUs (sm_120+) at startup
2026-02-18 13:00:43 +04:00
Roland Tannous
5a02ed4f0f Disable flex attention on Blackwell+ GPUs (sm_120+) at startup 2026-02-18 08:58:25 +00:00
Roland Tannous
346d7cd13a Merge pull request #151 from unslothai/fix/trainer-hang-resource-cleanup
Fix/trainer hang resource cleanup
2026-02-18 12:39:40 +04:00
Roland Tannous
d69431fa57 Scale dataset num_proc dynamically to cpu_count//3 instead of hardcap 8 2026-02-18 08:38:53 +00:00
Manan17
76cd1dc24c fixing the hangup of training after multiple back to back training processes 2026-02-18 08:18:13 +00:00
Manan17
c37bf686a6 Dividing the total cpu_count // 3 2026-02-18 07:59:57 +00:00
Roland Tannous
14edb08cf5 Merge pull request #148 from unslothai/fix/linear
linear fix
2026-02-18 11:10:16 +04:00
Manan17
58116e7e7a fix the linear path on backend 2026-02-18 07:08:32 +00:00
Roland Tannous
d2f7eaf085 Merge pull request #145 from unslothai/fix/check-format-sample
fix: stream HF datasets in check-format endpoint to avoid full d…
2026-02-18 10:49:17 +04:00
Roland Tannous
713d4a6ddb Merge pull request #146 from unslothai/feature/fix-cuda-fork-deadlock
fix: cap dataset.map() num_proc to 8 to prevent CUDA fork deadlocks
2026-02-18 10:47:46 +04:00
Lee Jackson
73c7dbec21 Merge pull request #147 from unslothai/feat/disable-navbar-training
feat: disable navbar navigation while training is active
2026-02-18 01:37:11 +00:00
imagineer99
064cd56a21 feat: disable navbar navigation while training is active
Disable Export and Chat nav items (desktop + mobile) when
isTrainingRunning is true, keeping only Studio clickable.
2026-02-18 01:30:13 +00:00
Roland Tannous
d7853efd21 debug statements 2026-02-18 00:37:09 +00:00
Roland Tannous
0d8b67b706 fix: normalize target_modules [all-linear] list to string for Unsloth/PEFT compatibility 2026-02-18 00:32:21 +00:00
Manan17
42ee6178ae linear fix 2026-02-18 00:11:27 +00:00
Roland Tannous
6f1b782172 Update README.md 2026-02-18 03:58:44 +04:00
Roland Tannous
d2332622d1 fix: defensively rename VLM chat column to match model's forward() signature 2026-02-17 23:49:13 +00:00
Roland Tannous
3d0d1c7020 fix: cap dataset.map() num_proc to 8 to prevent CUDA fork deadlocks 2026-02-17 23:12:45 +00:00
Roland Tannous
5864dece26 fix: fix: stream HF datasets in check-format endpoint to avoid full downloads; add info logging to model config endpoints 2026-02-17 22:53:29 +00:00
Roland Tannous
63196042ea Merge pull request #144 from unslothai/fix/remove-hardcoded-port-in-dataset-preview
fix: remove hardcoded port from dataset preview error message
2026-02-18 02:44:17 +04:00
imagineer99
e3fb46a4c6 fix: remove hardcoded port from dataset preview error message 2026-02-17 22:37:52 +00:00
Shine1i
dc0cec772d feat: enhance training stop and reset flow with detailed checks 2026-02-17 23:32:22 +01:00
Leo Borcherding
3e3c315b01 Merge nightly into feature/colab-notebook - resolved setup.sh conflicts 2026-02-17 15:51:14 -06:00
Roland Tannous
5fbcd682b7 update gitignore 2026-02-17 21:45:05 +00:00
Leo Borcherding
7818e3efc8 Add GPU check as first cell in notebook 2026-02-17 15:29:29 -06:00
Wasim Yousef Said
765e1cfee2 Merge pull request #143 from unslothai/feature/local-models
feat: add schemas for local model discovery and listing
2026-02-17 13:10:04 -08:00
Shine1i
972cde7971 feat: add schemas for local model discovery and listing 2026-02-17 21:53:42 +01:00
Roland Tannous
f88f0bc047 Merge pull request #142 from unslothai/feature/wsl-gguf-sudo-fix
fix: skip sudo check on WSL during GGUF export to prevent password pr…
2026-02-18 00:46:15 +04:00
Wasim Yousef Said
43fac18d40 Merge pull request #141 from unslothai/feature/uxui-heuristics
style: improve layout consistency and responsiveness across components
2026-02-17 12:34:37 -08:00
Shine1i
68f0321404 style: improve layout consistency and responsiveness across components
- Adjusted padding, spacing, and grid configurations for better alignment and scaling across screen sizes.
- Enhanced mobile responsiveness by updating flex and grid layouts, ensuring optimal display on smaller devices.
- Tuned container dimensions and card styling to maintain design consistency.
2026-02-17 21:27:02 +01:00
Leo Borcherding
7cac94c930 Skip venv creation in Colab, install packages directly 2026-02-17 14:21:56 -06:00
Wasim Yousef Said
d9869dece1 Merge pull request #140 from unslothai/feature/uxui-heuristics
ux: improve training-to-chat flow, param defaults UX, and guided onboarding/export polish
2026-02-17 12:10:27 -08:00
Shine1i
9c353f7bc2 refactor: wrap splash screen content in a card for improved layout and consistency 2026-02-17 21:08:45 +01:00
Leo Borcherding
25bf39c1e6 Detect Colab environment and upgrade npm directly instead of using nvm 2026-02-17 13:57:45 -06:00
Shine1i
08437def80 feat: add session-based storage for chat training comparison handoff
- Implemented a utility to manage `training-compare-handoff` data in `sessionStorage` with strict validation and expiration logic.
- Added methods to set, retrieve, and clear handoff data for improved chat training flow.
2026-02-17 20:43:08 +01:00
Shine1i
a0235025af fix chat compare handoff: auto-load trained lora, stop refresh loop, add debug logs 2026-02-17 20:42:43 +01:00
Roland Tannous
c9fdce63e7 fix: skip sudo check on WSL during GGUF export to prevent password prompt hang 2026-02-17 19:30:02 +00:00
Leo Borcherding
057f3b9628 Remove unnecessary if statement for token check 2026-02-17 13:27:05 -06:00
Shine1i
ff458a36e9 feat: enhance training flow with new runtime hints, adjustable steps/epochs
- Added halfway/completed training hints with actionable links.
- Introduced sliders for adjusting max steps and epochs dynamically.
- Refined tooltip explanations for configuration parameters.
- Enabled custom overlay styling for `AlertDialogContent`.
2026-02-17 20:08:02 +01:00
Roland Tannous
41ad9fc3d9 Merge pull request #139 from unslothai/fix/move-backend-requirements
move requirements/ to studio/backend/ and update paths in setup.sh
2026-02-17 23:04:44 +04:00
Roland Tannous
39b072d2ee move requirements/ to studio/backend/ and update paths in setup.sh 2026-02-17 19:02:25 +00:00
Leo Borcherding
f48d815797 Clone feature/colab-notebook branch for colab.py 2026-02-17 13:01:16 -06:00
Leo Borcherding
2c5e0ce606 Simplify notebook to use existing setup.sh script 2026-02-17 12:54:07 -06:00
Roland Tannous
c13e1594de Merge pull request #138 from unslothai/fix/reset-max-steps-epoch-defaults
Override Model Defaults for num_epochs and max_steps
2026-02-17 22:51:40 +04:00
Roland Tannous
299bc65e36 chore: override model defaults to use max_steps=30, save_steps=30, num_epochs=0 for testing 2026-02-17 18:46:23 +00:00
Wasim Yousef Said
7874fa9066 Merge pull request #137 from unslothai/feature/chart-fixes
rafactor: fix training charts: sticky full-window follow latest + grad line render + stopped metric fallback
2026-02-17 10:41:01 -08:00
Shine1i
4e2b567c29 refactor: simplify chart view logic by removing pan controls and enhancing window size handling 2026-02-17 19:32:32 +01:00
Shine1i
2d90210a69 feat: add reusable chart components for training metrics visualization
- Introduced `EvalLossChartCard`, `GradNormChartCard`, `LearningRateChartCard`, and `TrainingLossChartCard` components.
- Implemented shared chart settings via `SharedChartSettings` to manage scale, outliers, and view configuration.
- Added utilities for metrics formatting, step tick generation, data compression, and smoothing (`utils.ts`).
- Created types and structures for chart data handling (`types.ts`).
2026-02-17 19:10:24 +01:00
Roland Tannous
7307e08bff Merge pull request #130 from unslothai/integrate/exports-page
Integrate/exports page
2026-02-17 22:10:10 +04:00
Roland Tannous
b8171a86ac Merge branch 'nightly' into integrate/exports-page 2026-02-17 22:09:36 +04:00
Roland Tannous
e818d97f24 Merge pull request #136 from unslothai/setup/update-setup-sh-dependencies
setup.sh: Replace inline pip installs with pinned requirements files
2026-02-17 22:04:03 +04:00
Roland Tannous
caa28e5d46 replace with patch from merged PR in unsloth-zoo 2026-02-17 17:55:05 +00:00
Wasim Yousef Said
e03ed04278 Merge pull request #135 from unslothai/feature/chart-fixes
feat: integrate gradient norm tracking in training runtime and metrics
2026-02-17 09:49:29 -08:00
Shine1i
0be3e6f525 feat: integrate gradient norm tracking in training runtime and metrics
- Enhanced chart logic to filter and visualize finite gradient norm values.
2026-02-17 18:26:59 +01:00
Wasim Yousef Said
91b67f197e Merge pull request #134 from unslothai/feature/model-configs
feat: Apply model-config defaults in onboarding + studio
2026-02-17 09:08:21 -08:00
Shine1i
7203766fac refactor: streamline vision model detection and improve state persistence logic
- Removed redundant vision-check controllers.
- Added `NON_PERSISTED_STATE_KEYS` to manage persisted training state.
- Introduced `partializePersistedState` for cleaner state filtering.
2026-02-17 18:01:45 +01:00
Shine1i
3badf6649c feat: add default model configuration mapping and auto-apply logic
- Implemented backend model configuration mapping to training state.
- Added auto-apply logic for default configurations when models are selected.
- Introduced utilities for type conversion and validation within training configuration.
2026-02-17 17:58:36 +01:00
Michael Han
ac70db5556 Update README Install.md
Updating to include new installation links
2026-02-17 07:23:31 -08:00
Roland Tannous
e803e13d3e add full dependency chain for unsloth + unsloth-extras 2026-02-17 14:21:18 +00:00
Leo Borcherding
1530d1c165 fix: Add GitHub authentication cell for private repo
- Add cell 1 for token input (getpass)
- Update clone command to use token from environment
- Now 3 cells: auth, setup, start
2026-02-17 05:07:46 -06:00
Leo Borcherding
bebb45d847 fix: Clean up notebook to just 2 cells
Remove all the overcomplicated markdown and extra cells.
Now it's exactly like the POC: setup and start only.
2026-02-17 05:03:59 -06:00
Leo Borcherding
17df5bf3bf feat: Add simple 2-cell Colab notebook (no tunnel needed)
- Create studio/backend/colab.py using Colab's built-in proxy
- Uses google.colab.kernel.proxyPort() for URL (no cloudflare)
- Shows nice clickable link with IPython.display.HTML
- Notebook has just 2 cells: setup and start
- Much simpler than external tunneling approach
2026-02-17 04:57:30 -06:00
Roland Tannous
8054449606 Remove exports/ from tracking 2026-02-17 07:57:43 +00:00
Roland Tannous
d97e23dfd3 added llama.cpp python dependencies to setup.sh 2026-02-17 07:56:27 +00:00
Roland Tannous
8a4a1554b0 fix: vite build fail - suppress unused estimatedSize prop in export dialog 2026-02-17 07:21:23 +00:00
Roland Tannous
bbd7d6d122 Merge pull request #129 from unslothai/fix/adding-meta-data-for-checkpointing-api
Adding metadata for checkpoints
2026-02-17 11:17:24 +04:00
Lee Jackson
d5b6a69b05 Merge pull request #132 from unslothai/fix/dataset-check-format-subset-param
fix: subset param name
2026-02-17 05:57:05 +00:00
imagineer99
e1fbccfc57 fix: subset param name 2026-02-17 05:53:21 +00:00
pre-commit-ci[bot]
42f5a02f06 [pre-commit.ci] pre-commit autoupdate (#4072)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.15.0 → v0.15.1](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.0...v0.15.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-16 21:19:45 -08:00
Manan17
6f9dd90d56 Integration of the api with the EXPORT page with UI changes 2026-02-17 00:34:48 +00:00
Manan17
c7b7ecab4f Adding metadata for checkpoints 2026-02-16 23:46:17 +00:00
Wasim Yousef Said
7b8220598e Merge pull request #124 from unslothai/feature/bug-fixes
feat: support disabling top-k sampling with -1 and standardize normalization
2026-02-16 13:21:12 -08:00
Roland Tannous
a6cefadbf7 Merge pull request #122 from unslothai/feature/eval-split-auto-detection
[Feature] evaluation during training
2026-02-17 01:13:46 +04:00
Roland Tannous
ff0aec180a Merge branch 'nightly' into feature/eval-split-auto-detection 2026-02-17 01:11:30 +04:00
Roland Tannous
369486246e Merge pull request #121 from unslothai/fix/vision-model-fixes
fix: Vision model detection and model-dataset compatibility
2026-02-17 00:39:58 +04:00
Shine1i
0db7da96cc feat: support disabling top-k sampling with -1 and standardize normalization logic
- Updated top-k parameter range to accept -1 in models and frontend.
- Added utility to normalize top-k for backend compatibility.
2026-02-16 21:33:24 +01:00
Roland Tannous
fa0ca59215 feat: auto-detect model+dataset compatibility to select VLM vs LLM training path 2026-02-16 19:18:49 +00:00
Roland Tannous
09f3b6bce5 feat(frontend): auto-detect vision models via backend, separate search filter from model classification 2026-02-16 18:24:44 +00:00
Roland Tannous
43d84d7143 Merge pull request #120 from unslothai/fix/back-on-complete
fix: show Back to configuration breadcrumb when training completes
2026-02-16 21:38:40 +04:00
Roland Tannous
f4f2f50364 fix: show Back to configuration breadcrumb when training completes 2026-02-16 17:34:52 +00:00
Roland Tannous
fc36696a52 Merge pull request #119 from unslothai/feature/base-model-chat-template-handling
fix:  Apply default chat template for base models without tokenizer chat_template
2026-02-16 20:52:17 +04:00
Roland Tannous
b32ad350c5 feat: apply default chat template for base models without tokenizer chat_template 2026-02-16 15:56:06 +00:00
Roland Tannous
5df3a0b250 feat: add eval_enabled flag and format-first-then-split for eval dataset 2026-02-16 14:13:55 +00:00
Roland Tannous
0aea3f149d feat: add eval split auto-detection, eval_steps hyperparam, and eval_loss chart integration 2026-02-16 13:51:10 +00:00
Roland Tannous
37452d56cf feat: add eval split auto-detection, eval_steps hyperparam, and eval_loss chart integration 2026-02-16 13:38:54 +00:00
Roland Tannous
321096f0cb Merge pull request #118 from unslothai/feature/onboarding-hardware-info
feat(onboarding): hook system info to live /api/system…
2026-02-16 16:29:22 +04:00
Roland Tannous
883ae7ff8c feat(onboarding): replace hardcoded system info with live /api/system/hardware data 2026-02-16 12:26:49 +00:00
Roland Tannous
c18ccbb773 Merge pull request #116 from unslothai/feature/gpu-monitor-training
feat: add live GPU monitor with nvidia-smi polling during training
2026-02-16 15:52:48 +04:00
Roland Tannous
a0ebd9183a feat: add live GPU monitor with nvidia-smi polling during training 2026-02-16 11:47:43 +00:00
Roland Tannous
3cc2a8f326 Merge pull request #115 from unslothai/feature/api-hardware-info
feat: add GET /api/system/hardware endpoint for GPU info and package versions
2026-02-16 14:35:06 +04:00
Roland Tannous
b20d50e8d0 feat: add GET /api/system/hardware endpoint for GPU info and package versions 2026-02-16 10:29:21 +00:00
Roland Tannous
da356afd33 Merge pull request #114 from unslothai/feature/checkpoint-loss-in-api
feat: include training loss per checkpoint in API response
2026-02-16 13:53:53 +04:00
Roland Tannous
fd49c56481 feat: include training loss per checkpoint in /api/models/checkpoints response 2026-02-16 09:50:28 +00:00
Roland Tannous
d4f2fc8a8f Merge pull request #113 from unslothai/feature/refactor-checkpoints-pull-endpoint
Refactor: Move checkpoint scanning to models domain
2026-02-16 13:34:08 +04:00
Roland Tannous
f0298edeb8 refactor: move checkpoint scanning to utils/models and /checkpoints endpoint to models router 2026-02-16 09:32:11 +00:00
Datta Nimmaturi
f3b5090f24 [Feat] FP8 per tensor quant support (#4043)
* FP8 per tensor quant support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-16 01:21:30 -08:00
Shine1i
fc71548a31 feat: add index route with auth guard and redirect logic 2026-02-16 09:57:41 +01:00
Roland Tannous
6ecc03485d Merge pull request #97 from unslothai/fix/progress-metics
Resolved the progress metrics
2026-02-16 11:55:01 +04:00
sshah229
63b34660ed modified the num_tokens logic 2026-02-16 00:38:40 -07:00
Daniel Han
0212f7f7df Fix regressions from security PRs #4042, #4044, and #4045 (#4062)
* Fix security-regression fallout in chat templates and PDL patching

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Drop security regression test files from PR scope

* Apply suggestion from @danielhanchen

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-15 23:16:17 -08:00
Daniel Han
be77c66a84 Add reinstall command to broken vLLM warning (#4070)
* Add vLLM reinstall command to broken-extension warning

* Apply suggestion from @danielhanchen

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-15 23:02:12 -08:00
Roland Tannous
e02c6e4caf Merge pull request #109 from unslothai/feat/backend-generation-implement-min-p
feat: add `min_p` sampling parameter to `/chat/completions` generation pipeline
2026-02-16 10:54:15 +04:00
Roland Tannous
909955767b feat: add min_p sampling parameter to /chat/completions generation pipeline 2026-02-16 06:33:17 +00:00
Wasim Yousef Said
c1d2ed1449 Merge pull request #108 from unslothai/feature/inference-params
feat(chat): apply recommended inference params on model load
2026-02-15 22:19:48 -08:00
Shine1i
96c30b6c38 feat: add inference parameter merging for model loading and runtime updates 2026-02-16 07:14:53 +01:00
Daniel Han
5f81ac8964 Guard optional vLLM imports when extension is broken (#4068)
* Guard optional vLLM imports when extension is broken

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove vLLM import guard tests from PR scope

* Block broken vLLM imports like causal_conv1d

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-15 22:09:29 -08:00
Lee Jackson
d0be326f96 Merge pull request #107 from unslothai/feat/chat-min-p
feat: add min_p inference parameter to chat page
2026-02-16 05:49:10 +00:00
imagineer99
36703c46a8 feat: add min_p inference parameter to chat page 2026-02-16 05:41:36 +00:00
Roland Tannous
9cc8f02818 Merge pull request #106 from unslothai/fix/adding-checkpointing-data-to-api
Fixing the get checkpoint api
2026-02-16 09:13:57 +04:00
Manan17
19276ae60b Fixing the get checkpoint api 2026-02-16 04:47:28 +00:00
Roland Tannous
84427bec9e Merge pull request #105 from unslothai/feature/setup-shell-improvements
Auto-detect Python version & shell RC file in setup script
2026-02-16 08:15:58 +04:00
Roland Tannous
af4c145096 setup: auto-detect best Python ≤3.12 and write alias to user's default shell rc file 2026-02-16 04:10:05 +00:00
Roland Tannous
d064fafcd8 Merge pull request #104 from unslothai/feature/support-dataset-configs-splits
dataset `subset`/`split` params from API routes through to `load_dataset` calls
2026-02-16 07:59:09 +04:00
Roland Tannous
d0964652af feat: thread dataset subset/split params from API routes through to load_dataset calls 2026-02-16 03:56:22 +00:00
Daniel Han
61c8ea6342 Add torchvision upgrade hint to mismatch ImportError (#4067)
Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-15 19:36:16 -08:00
Daniel Han
ec80fd3f66 Raise ImportError on stable torch/torchvision mismatch (#4065)
* Raise ImportError for stable torchvision mismatches

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove torchvision compatibility tests from PR scope

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-15 19:14:19 -08:00
Wasim Yousef Said
dff074869c Merge pull request #103 from unslothai/feature/format-mapping
feat: dataset manual mapping (2-col)
2026-02-15 17:10:50 -08:00
Shine1i
d0bcf02784 feat: lock model selector during guided tour and refine open state logic 2026-02-16 02:05:53 +01:00
Shine1i
541179ba98 merge: nightly 2026-02-16 02:02:40 +01:00
Shine1i
1a90741046 refactor: create utilities for dataset manual mapping and improve UI logic consistency in dataset preview dialog 2026-02-16 01:54:09 +01:00
Shine1i
cc6cb00d56 fix: enforce unique input/output mapping and enhance UI feedback in dataset preview dialog 2026-02-16 01:47:28 +01:00
Shine1i
4fd4d2fe76 feat: implement dataset mapping UI and preview dialog 2026-02-16 01:44:26 +01:00
Wasim Yousef Said
34483f2325 Merge pull request #102 from unslothai/feature/guided-tour-p2
feat: Guided tours p2: per-page + navbar trigger
2026-02-15 16:10:07 -08:00
Shine1i
7b20b4848e feat: enable conditional confetti in guided tour and update navbar icon 2026-02-16 01:04:54 +01:00
Shine1i
baf47ab332 chore: text update 2026-02-16 00:58:46 +01:00
Shine1i
448aaa13cb feat: improve guided tour descriptions and add sidebar state management
- Updated step descriptions across Studio, Chat, and Export tours for better clarity.
- Added `openSidebar` state management function and integrated it into the tour logic.
- Improved target detection in guided tours with retry logic for better handling of unavailable elements.
2026-02-16 00:55:13 +01:00
Shine1i
362e679b66 feat: implement guided tours and refactor model selector components 2026-02-15 23:43:10 +01:00
Shine1i
28d42794cd gitignore: .omx 2026-02-15 23:00:01 +01:00
Wasim Yousef Said
ed583f2b78 Merge pull request #101 from unslothai/feature/guided-tour
feat: studio: guided tour (1st visit, skippable)
2026-02-15 12:33:07 -08:00
Wasim Yousef Said
28364f3314 Merge pull request #100 from unslothai/feature/chat-compare
feat: chat compare + inference stream cancel fix
2026-02-15 12:29:54 -08:00
Shine1i
923371d637 setup: nightly 2026-02-15 21:26:11 +01:00
Shine1i
b8d8150201 revert setup.sh 2026-02-15 21:24:30 +01:00
Wasim Yousef Said
a9ec5d894c Merge pull request #98 from unslothai/feat/dataset-config-splits
feat: check dataset configs and splits before hitting check-format
2026-02-15 12:17:22 -08:00
Wasim Yousef Said
b071ae9c63 Merge pull request #99 from unslothai/feature/vram-estimation
feat: VRAM-based model filtering in frontend
2026-02-15 12:17:07 -08:00
Shine1i
3dce0d475d rm vitest 2026-02-15 21:13:15 +01:00
Shine1i
51218702c2 cfg->subset 2026-02-15 21:09:45 +01:00
Shine1i
ef9e5ffc33 fix hf cfg/split ui 2026-02-15 20:53:59 +01:00
Shine1i
e341656d45 feat: add confetti fireworks effect on tour completion 2026-02-15 20:25:23 +01:00
Shine1i
a14e3feed9 refactor: reorganize tour steps and relocate ReadMore component for improved structure 2026-02-15 20:22:17 +01:00
Shine1i
a579fe4eea refactor: define explicit prop types for GuidedTour and SpotlightOverlay components 2026-02-15 20:17:28 +01:00
imagineer99
f285b5379a feat: VRAM-based model filtering in frontend 2026-02-15 19:12:28 +00:00
Shine1i
87cf3b834a refactor: extract tour utils into separate modules for cleaner structure 2026-02-15 20:11:28 +01:00
Shine1i
080b196dfc tour readmore 2026-02-15 19:58:10 +01:00
Shine1i
2b6229f049 tour steps split 2026-02-15 19:57:07 +01:00
Shine1i
ddffa2bbdc rm shine + perf 2026-02-15 19:41:15 +01:00
Shine1i
0db321ab85 tour light card 2026-02-15 19:31:49 +01:00
imagineer99
eb2ba90a03 feat: check dataset configs and splits before hitting check-format 2026-02-15 18:30:54 +00:00
Shine1i
ea36112a05 feat: add guided tour component and integrate with Studio UI elements 2026-02-15 19:20:37 +01:00
Shine1i
1eb07f6ad2 refactor: streamline chat runtime logic and remove warming indicator
- Replaced `setThreadWarming` logic with streamlined token settlement functions (`settleFirstTokenOk` and `settleFirstTokenErr`) for improved readability and reliability.
- Simplified model loading/unloading functions with reusable `performLoad` and `performUnload` patterns.
- Removed `warmingByThreadId` from runtime store and associated code for reduced complexity.
- Enhanced title generation flow by consolidating logic for persisting and streaming titles.
2026-02-15 18:59:03 +01:00
Shine1i
2e9f756ca6 feat: improve model loading/unloading UX and remove _WarmupIndicator_ from thread UI
- Refactored loading/unloading logic to provide detailed toast notifications with statuses (loading, success, error).
- Removed unused `WarmupIndicator` component from thread UI to simplify interface.
- Introduced better error handling for model refresh and inference tasks.
2026-02-15 18:48:02 +01:00
Shine1i
e59ef60a5c feat: conditionally render "Compare" button based on active checkpoint and lora selection, improve fallback title generation, and update default settings 2026-02-15 18:38:25 +01:00
Shine1i
571959e383 feat: add cancelation support for chat generation and streaming tasks 2026-02-15 18:23:27 +01:00
Shine1i
9a4f71c939 chore: remove unused ComponentExample and associated imports and auto title generate 2026-02-15 18:08:46 +01:00
Shine1i
b83eeab603 fix: ensure consistent message order in chat runtime by improving sort logic and adding fallback for createdAt 2026-02-15 17:34:32 +01:00
Shine1i
8529f89a75 fix lora: outputs path local 2026-02-15 16:58:24 +01:00
Shine1i
2ffdd59925 chat compare: send use_adapter 2026-02-15 16:44:14 +01:00
Shine1i
b1258bed0d wip setup: py312 2026-02-15 16:36:53 +01:00
Shine1i
184b786116 fix setup: fish alias, venv no activate 2026-02-15 16:27:44 +01:00
Shine1i
43c66da783 merge nightly 2026-02-15 14:52:07 +01:00
Shine1i
992e60266f chore: ignore frontend .omx 2026-02-15 14:50:26 +01:00
Shine1i
485f174202 feat: add support for event replay and resume in job events API, improve SSE handling, and fix regex patterns in log parsers 2026-02-15 14:49:04 +01:00
Roland Tannous
4dbd77786e Merge pull request #96 from unslothai/feature/inference-yaml-ordered
Added the inference defaults for models
2026-02-15 17:43:31 +04:00
Roland Tannous
6fc52d6535 Merge pull request #94 from unslothai/fix/pass-save-steps-to-trainer
Adding save-steps to the SFTConfig
2026-02-15 17:25:48 +04:00
Shine1i
85653237ea feat: add Data Recipe core functionality with job manager, API routes, and validation services 2026-02-15 13:43:46 +01:00
sshah229
0b1c635b43 resolved the prgress metrics 2026-02-15 05:35:32 -07:00
Shine1i
8bd7cfaab1 feat: remove seed inspect/preview and MCP tools fetch support due to backend endpoint deprecation 2026-02-15 12:45:59 +01:00
sshah229
9e50e167d9 added the inference fetching from model mappers 2026-02-15 02:48:53 -07:00
Manan17
6e4cde3bf8 Adding save-steps to the SFTConfig 2026-02-15 09:37:54 +00:00
sshah229
625bc1bbc6 added default inference config for default.yaml 2026-02-15 02:13:24 -07:00
sshah229
238fdc5c4a added default inference config from unsloth notebooks 2026-02-15 02:13:24 -07:00
sshah229
b5d93adcf2 added configs from Ollama 2026-02-15 02:13:24 -07:00
sshah229
ac20103e54 added inference defaults from unsloth guides 2026-02-15 02:13:24 -07:00
nole69
e3c9482cfb [FIX] Move loss and n_items to logits device in fast_cross_entropy_loss loss for multi-GPU support (#4063)
* bug fix for multi-GPU

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-15 01:09:40 -08:00
Roland Tannous
12db4799b1 Merge pull request #93 from unslothai/docs/add-readme
add draft README.md file
2026-02-15 12:51:37 +04:00
Roland Tannous
0bada7e758 README.md draft 2026-02-15 08:49:56 +00:00
Roland Tannous
f453791916 Merge pull request #29 from unslothai/feature/export
Added the export routes and pydantic models
2026-02-15 12:42:01 +04:00
Roland Tannous
0f07afcbbc Merge pull request #91 from unslothai/fix/automatic-signup-if-unauthenticated
Auto-redirect to signup/login on stale auth tokens instead of showing 401 spam
2026-02-15 12:30:49 +04:00
Roland Tannous
0d360d74df fix: auto-redirect to signup/login when auth tokens are stale 2026-02-15 08:27:35 +00:00
Roland Tannous
7b5fc07e87 Merge pull request #89 from unslothai/ux/default-signup-tab-on-first-launch
feat: Redirect first-time users to signup page instead of login
2026-02-15 10:45:17 +04:00
Manan Shah
0ec08ff4c1 Merge pull request #88 from unslothai/fix/change-labels-in-data-card
Changing labels in dataset card
2026-02-15 00:38:57 -06:00
Manan17
e8f9610122 Changing labels in dataset card 2026-02-15 06:34:39 +00:00
Roland Tannous
46e8dd3fad feat: redirect first-time users to signup page instead of login 2026-02-15 06:29:52 +00:00
Roland Tannous
fb712f73f4 Merge pull request #85 from unslothai/fix/password-hint-length
Fix/password hint length
2026-02-15 10:19:03 +04:00
Roland Tannous
2ed9408b50 Merge pull request #83 from unslothai/fix/training-stuck-multiprocessing-cuda
Fixing stuck training processes
2026-02-15 10:18:49 +04:00
Roland Tannous
7411be50a0 Merge pull request #86 from unslothai/fix/index-html-cache-headers
Fix: Browser serving stale frontend after rebuild
2026-02-15 10:17:38 +04:00
Daniel Han
084ca10ac2 Silence Apex Aiter RoPE warning unless logging is enabled (#4058)
* Silence Apex Aiter RoPE warning unless logging is enabled

* Update unsloth/import_fixes.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-14 22:14:05 -08:00
anonymous dev
ba1688c609 [FIX] Move labels to logits device in cross-entropy loss for multi-GPU support (#4041) (#4059)
When using device_map='balanced' with multiple GPUs, the labels tensor
may reside on a different device than the logits/losses tensors. This
causes a RuntimeError at the masked_fill_ call in the chunked
cross-entropy forward path.

Fix: explicitly move labels to the same device as logits at the start
of Fast_CrossEntropyLoss.forward(). This is a no-op on single-GPU
setups.

Fixes #4041
2026-02-14 22:13:07 -08:00
Manan17
45618fb43c Adding hint for password length 2026-02-15 06:12:55 +00:00
Roland Tannous
d5faeb058c Remove test_lora.py from tracking 2026-02-15 06:11:06 +00:00
Roland Tannous
2840efcc08 fix: add no-cache headers to index.html to prevent stale frontend after rebuild 2026-02-15 06:06:27 +00:00
Daniel Han
defcbf8bea Auto-configure AMDGPU_ASIC_ID_TABLE_PATH on ROCm startup (#4060)
* Auto-configure AMDGPU_ASIC_ID_TABLE_PATH on ROCm startup

* Remove ROCm fd2 amdgpu.ids noise filter wrappers

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use PyPI bitsandbytes for amd extra to avoid malformed wheel URL

* Add amd-preview extra for bitsandbytes continuous wheel channel

* Keep amd extra on bitsandbytes>=0.49.1 and remove amd-preview

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-14 21:52:31 -08:00
Manan17
ace86ed0b8 Fixing stuck training processes 2026-02-15 05:40:06 +00:00
Manan17
6ccbc4edce Fixing stuck training processes 2026-02-15 05:38:06 +00:00
Roland Tannous
2c5a4359c6 Merge pull request #81 from unslothai/feature/early-stop-or-cancel-training-ui
feat: early stop or cancel training UI
2026-02-15 08:28:39 +04:00
Manan17
52cbf9b699 feat: UI for cancel or save and stop training 2026-02-15 00:22:28 +00:00
Manan17
97c6a09b84 feat: add cancel or save and stop training 2026-02-15 00:00:22 +00:00
Shine1i
070b9d41a4 feat: enhance sampler builders with datetime unit mapping, uuid format handling, and error reporting 2026-02-14 23:07:11 +01:00
Roland Tannous
1d362a36c3 Merge pull request #79 from unslothai/feat/compare-use-adapter
Adapter Toggling for chat compare feature
2026-02-15 00:25:53 +04:00
Roland Tannous
b334e49498 decouple reliance of backend on frontend for is_lora 2026-02-14 20:13:50 +00:00
Roland Tannous
be3934860f strip extra debug statements 2026-02-14 19:23:51 +00:00
Roland Tannous
3ff3def555 replace model unloading and peft loading mechanism for compare feature 2026-02-14 19:18:49 +00:00
Shine1i
ebc411e508 chore: ignore agent.md 2026-02-14 19:22:07 +01:00
Shine1i
964f7d1548 feat: refactor block definitions and utilities into modular components for enhanced maintainability 2026-02-14 19:13:19 +01:00
Shine1i
f2a00d6e44 feat: add seed dataset support with configuration, preview, and builder utilities 2026-02-14 18:44:38 +01:00
Roland Tannous
e7ae901737 del model.peft_config instead of using model.delete_adapter 2026-02-14 17:32:15 +00:00
Roland Tannous
7d8e991c1f added print statements for activate_lora_adapter 2026-02-14 17:25:37 +00:00
Roland Tannous
6fefbe9f0b swipped logger for print statements as logger isn't propagating 2026-02-14 17:21:26 +00:00
Roland Tannous
d0b94eae75 added logging 2026-02-14 17:09:07 +00:00
Roland Tannous
b5c8136957 exclude default from model.delete_adapter 2026-02-14 17:03:52 +00:00
Roland Tannous
35a6e40268 _apply_adapter_state now calls revert_to_base_model and activate_lora_adapter properly 2026-02-14 16:57:24 +00:00
Shine1i
2bd20d7d15 ignore tests 2026-02-14 16:42:58 +01:00
Shine1i
ca30e9f004 merge nightly 2026-02-14 16:42:20 +01:00
Shine1i
d28cc1670b lock 2026-02-14 16:37:38 +01:00
Shine1i
175fd0459c feat: add Jinja reference autocomplete components and enhance graph edges styling 2026-02-14 16:30:01 +01:00
Roland Tannous
f67ee58347 feat(inference): add use_adapter field for per-request adapter toggling in compare mode 2026-02-14 14:52:13 +00:00
Daniel Han
842099f2b0 Wrap models import with ROCm amdgpu ids fd2 filter (#4057)
Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-14 04:13:25 -08:00
Daniel Han
191cbe55ee Wrap unsloth_zoo import with HIP amdgpu.ids filter (#4056)
* Wrap unsloth_zoo import with HIP amdgpu.ids filter

* Refactor ROCm ids filter helpers for readability

* Rename ROCm ids filter helper and annotate call sites

* Remove obsolete amdgpu ids filter alias

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-14 03:59:57 -08:00
Daniel Han
66db2a1417 Filter only amdgpu.ids fd2 noise during ROCm startup (#4054)
Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-14 03:35:41 -08:00
Daniel Han
66b09f2481 Make ROCm suppression detection robust for custom torch builds (#4053)
* Make ROCm suppression detection robust for custom torch builds

* Add ROCm detection debug logging behind UNSLOTH_ENABLE_LOGGING

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-14 02:59:49 -08:00
金黄色葡萄球君君
dd5ff9dcef ROCm: Add gfx950 (MI355X/CDNA4) to is_cdna() (#4051)
MI355X (gfx950) has the same 1024-thread workgroup limit as MI300X (gfx942),
but was missing from is_cdna(), causing all Triton kernels to use num_warps=32
(2048 threads) instead of 16 (1024 threads), resulting in OutOfResources crash.

Tested on: 8x AMD Instinct MI355X (gfx950), ROCm 7.1
2026-02-14 02:50:05 -08:00
Daniel Han
6ec46f49a6 Suppress HIP amdgpu.ids stderr noise during causal_conv1d check (#4052)
* Suppress HIP libdrm stderr noise in causal_conv1d probe

* Broaden HIP libdrm stderr suppression for early ROCm startup

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-14 02:44:34 -08:00
Daniel Han
1a929ce6f1 Simplify MI300X startup banner name (#4049)
* Improve HIP GPU name reporting in startup banner

* Drop MI300X arch suffix in banner name

* Normalize _utils.py file mode

* Simplify FA2 fallback text and filter AMD ids noise

* Strip trailing GPU arch suffix via regex

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use gfx lookup default and normalize Ryzen AI naming

* Remove name-path Ryzen AI normalization

* Expand ROCm gfx map to full documented GPU name aliases

* Simplify HIP fallback naming to AMD gfx token

* Remove Ryzen Al torch_name normalization

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-14 02:24:03 -08:00
Wasim Yousef Said
6ed179b459 Merge pull request #78 from unslothai/feature/vision-capabilities-chat
feat(chat): add vision image attachments for OpenAI-compatible chat
2026-02-14 02:01:17 -08:00
Shine1i
7611c7122c feat: add image handling support with Vision adapter and base64 serialization in chat runtime 2026-02-14 10:59:10 +01:00
Roland Tannous
c843a93797 Merge pull request #76 from unslothai/feature/inference-vision-openai-compatible
PR: OpenAI-compatible multimodal vision support + true vision streaming
2026-02-14 13:47:39 +04:00
Roland Tannous
418a374125 migrate _generate_vision_response to use TextIteratorStreamer + background thread 2026-02-14 09:30:32 +00:00
Roland Tannous
9de38cb773 feat(inference): accept OpenAI multimodal content parts (image_url) in /chat/completions 2026-02-14 09:06:25 +00:00
Roland Tannous
44d52b4103 Merge pull request #72 from unslothai/fix/sse-progress-timeout
Fix: SSE progress stream timeout during training
2026-02-14 09:58:14 +04:00
Roland Tannous
4f0fad2156 fix: increase SSE progress timeout to 30min and allow step-0 updates 2026-02-14 05:47:22 +00:00
Roland Tannous
957e39c50d Merge pull request #71 from unslothai/fix/frontend-default-path
Fix/frontend default path
2026-02-14 09:43:24 +04:00
Daniel Han
d3fcba134b Improve HIP GPU name detection in startup banner (#4048)
* Improve HIP GPU name reporting in startup banner

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-13 21:32:34 -08:00
Roland Tannous
414d173624 replace function with alias 2026-02-14 05:28:47 +00:00
Roland Tannous
e09c5dfc7e fix alias command 2026-02-14 05:24:54 +00:00
Daniel Han
c14917b96e Handle broken causal_conv1d at import time (#4047)
* Handle broken causal_conv1d import at runtime

Add a startup import-time probe for causal_conv1d and disable the fast path when the shared library is ABI broken. This keeps Falcon H1/model loading resilient without requiring env flags.

- Add disable_broken_causal_conv1d in import_fixes.
- Invoke it early from unsloth/__init__ during package init.
- Make Falcon H1 optional imports in loader and models/__init__ soft-fail instead of failing hard.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Enforce unavailable semantics for broken causal_conv1d

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove Falcon H1 import swallowing

* Restore optional Falcon H1 import guard

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove causal_conv1d regression tests

* Trim FA2 fallback messaging

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-13 21:20:25 -08:00
Roland Tannous
7260c09268 change line arguments order in setup.sh 2026-02-14 05:13:49 +00:00
Roland Tannous
6b2a777f97 fix path in run_server 2026-02-14 05:06:17 +00:00
Roland Tannous
f07b919385 change default frontend path in run.py to studio/frontend/dist 2026-02-14 05:02:26 +00:00
Michael Han
2a7d098203 Update README with faster MoE.md
Adding MoE
2026-02-13 19:38:23 -08:00
Roland Tannous
933048d4f7 Merge pull request #70 from unslothai/feat/wire-custom-format-mapping-to-training
feat: wire `custom_format_mapping` through training pipeline
2026-02-14 01:09:35 +04:00
Roland Tannous
67edebfeb3 feat: wire custom_format_mapping through training pipeline to format_and_template_dataset 2026-02-13 21:07:36 +00:00
Roland Tannous
f12c5f61ef Merge pull request #69 from unslothai/fix/auto-detect-lora-in-model-config
fix: auto-detect LoRA adapters for both local and remote HF models in ModelConfig
2026-02-14 00:56:34 +04:00
Roland Tannous
8ce96df66f fix: auto-detect LoRA adapters for both local and remote HF models in ModelConfig 2026-02-13 20:54:40 +00:00
Roland Tannous
3d33753899 Merge pull request #68 from unslothai/fix/datasets-fix-vlm-detection
fix: auto-detect multimodal datasets in /check-format without requiri…
2026-02-14 00:04:51 +04:00
Wasim Yousef Said
bdcaf5518c Merge pull request #66 from unslothai/ui-fixes
UI fixes
2026-02-13 11:29:32 -08:00
Wasim Yousef Said
1f50b5b5f1 Merge pull request #67 from unslothai/style/polish-chat-sidebar-spacing
style: polish chat page spacing, typography adjustment, and panel alignment
2026-02-13 11:28:56 -08:00
Manan17
2210545493 fixing padding for titles 2026-02-13 19:22:39 +00:00
imagineer99
59acc087b6 style: polish chat page spacing, small typography, and panel alignment 2026-02-13 19:18:20 +00:00
Manan17
c0fbe7d4a5 Change of font space for title 2026-02-13 19:03:51 +00:00
Roland Tannous
a62a30bd9b Merge pull request #65 from unslothai/refactor/change-highlighted-text-color
Refactor/change highlighted text color
2026-02-13 21:36:08 +04:00
Roland Tannous
adf1ef5ea5 fix: auto-detect multimodal datasets in /check-format without requiring is_vlm flag 2026-02-13 17:29:39 +00:00
Manan17
acc00bd0a4 Changing the highlighted text color to be black while keeping the checkmark emerald 2026-02-13 17:24:57 +00:00
Wasim Yousef Said
6935fc3593 Merge pull request #63 from unslothai/feature/chat-openai-integration
feat(chat): integrate backend chat runtime + model load flow
2026-02-13 08:47:44 -08:00
Shine1i
ce7c9917ab feat: refactor suggestion handling and centralize defaults for thread UI 2026-02-13 17:42:45 +01:00
Shine1i
5fe4258401 feat: add warm-up indicator, new thread feature, and runtime improvements in chat UI 2026-02-13 17:28:01 +01:00
Roland Tannous
fe5a46ede3 Merge pull request #62 from unslothai/feature/add-easydict-addict
Feature/add easydict addict
2026-02-13 20:22:46 +04:00
Roland Tannous
965be3f5dc add easydict and addict to setup file 2026-02-13 16:19:55 +00:00
Shine1i
c9c4463d5d feat: integrate LoRA model management with UI and runtime synchronization 2026-02-13 17:14:49 +01:00
Shine1i
23d2cfd09d feat: refactor chat runtime with modular APIs, state management, and runtime synchronization 2026-02-13 16:45:00 +01:00
Wasim Yousef Said
a7c6432ffd Merge pull request #61 from unslothai/feature/training-frontend-integration
training frontend integration + backend sync v2
2026-02-13 05:06:17 -08:00
Shine1i
d58fa17c81 feat: add support for serialized previews in dataset API and improve training initialization logging 2026-02-13 13:47:17 +01:00
Shine1i
b0062535a7 feat: add image previews in dataset dialog, enable popularity sorting in model search, refine training config serialization 2026-02-13 13:17:20 +01:00
Shine1i
b0cb7a7305 Merge remote-tracking branch 'origin/nightly' into feature/training-frontend-integration 2026-02-13 12:41:09 +01:00
Shine1i
a4a997eee6 feat: add training feature with state management, API integration, and runtime synchronization 2026-02-13 12:26:28 +01:00
Shine1i
4e0596c395 wip p1 2026-02-13 11:42:19 +01:00
Roland Tannous
8fdbd05cc2 Merge pull request #51 from unslothai/feature/print-outbound-interface-address
Show External IP in Startup Banner
2026-02-13 14:32:39 +04:00
Roland Tannous
5f155010f6 read external ips with fallback to standard notation 0.0.0.0 2026-02-13 10:28:20 +00:00
Roland Tannous
837596a9e7 feat: show external IP in startup banner 2026-02-13 10:23:12 +00:00
Roland Tannous
b10b303f4e Merge pull request #50 from unslothai/fix/auth-setup-rollback
Auth Setup Failure: `auth.db` Created Before Token Generation
2026-02-13 14:14:34 +04:00
Roland Tannous
5602f7ccb4 fix: rollback auth.db user row if token generation fails during setup 2026-02-13 10:11:07 +00:00
Roland Tannous
d201935dae Merge pull request #49 from unslothai/fix/replace-jwt-with-pyjwt
add pyjwt as dependency. remove jwt. fix AttributeError: module 'jwt'…
2026-02-13 14:00:52 +04:00
Roland Tannous
b86503af3f add pyjwt as dependency. remove jwt. fix AttributeError: module 'jwt' has no attribute 'encode' 2026-02-13 09:58:50 +00:00
Roland Tannous
1ab41f67d8 Merge pull request #48 from unslothai/feature/shorten-setup-alias
shorten unsloth-ui alias. auto append frontend dist folder location
2026-02-13 13:57:33 +04:00
Roland Tannous
44cc46bfec shorten unsloth-ui alias. auto append frontend dist folder location 2026-02-13 09:45:47 +00:00
Roland Tannous
0f61fbff6a Merge pull request #47 from unslothai/fix/add-jwt-dependency
add jwt dependency to setup.sh
2026-02-13 13:40:19 +04:00
Roland Tannous
14c5560ace add jwt dependency to setup.sh 2026-02-13 09:39:36 +00:00
Roland Tannous
8a2a9030f6 Merge pull request #46 from unslothai/feature/update-setup-file
Feature/update setup file
2026-02-13 13:33:05 +04:00
Roland Tannous
d68a2ddd67 chore: suppress verbose output in setup.sh, show errors only 2026-02-13 09:32:17 +00:00
Roland Tannous
8db52c7649 Merge pull request #44 from unslothai/fix/remove-gradio-training
refactor: remove gradio dependency from training backend
2026-02-13 13:26:40 +04:00
Roland Tannous
f52bddc23f refactor: remove gradio dependency from training backend 2026-02-13 09:25:49 +00:00
Roland Tannous
cec51ad6d2 Merge pull request #43 from unslothai/feature/setup-file
Add `setup.sh` for automated environment setup
2026-02-13 13:13:38 +04:00
Roland Tannous
96c827020b chore: add setup.sh for automated environment and frontend build 2026-02-13 09:11:07 +00:00
Roland Tannous
2df07aa224 Merge pull request #42 from unslothai/fix/epoch-type-float
Fix: Change `epoch` type from `int` to `float`
2026-02-13 10:53:51 +04:00
Roland Tannous
75f775d088 fix: change epoch type from int to float to match TrainerState 2026-02-13 06:51:55 +00:00
Roland Tannous
c065b05179 Merge pull request #38 from unslothai/fix/model-config-directory
Changed the directory for default configs
2026-02-13 09:20:40 +04:00
Wasim Yousef Said
37e8d43cd5 Merge pull request #35 from unslothai/feature/dataset-preview-table
feat: add dataset viewer using /check-format endpoint
2026-02-12 21:18:36 -08:00
imagineer99
bc9645cbbf feat: add dataset preview dialog using /check-format endpoint 2026-02-13 05:04:11 +00:00
sshah229
82be5b237f fixed the script directory 2026-02-12 21:55:36 -07:00
Roland Tannous
363ffb7d1a Merge pull request #34 from unslothai/feature/openai-chat-completions
PR: OpenAI-Compatible Chat Completions Endpoint
2026-02-12 23:20:52 +04:00
Roland Tannous
8403190cdd feat: add OpenAI-compatible POST /chat/completions endpoint with streaming and non-streaming support 2026-02-12 19:00:05 +00:00
Roland Tannous
78d2fe5ee3 Merge pull request #33 from unslothai/feature/sse-connection-resilience
feat: SSE Connection Resilience
2026-02-12 22:03:20 +04:00
Roland Tannous
509659ba97 feat: add SSE reconnection resilience with spec-compliant event fields, Last-Event-ID resume, and metric_history fallback in /status 2026-02-12 17:58:48 +00:00
Roland Tannous
c6a1e9ca4b Merge pull request #32 from unslothai/feature/datasets-endpoint-preview-samples
Return Raw Preview Samples on Format Detection Failure
2026-02-12 19:53:13 +04:00
Roland Tannous
0dbce96700 untrack package-lock.json and add to gitignore 2026-02-12 15:41:20 +00:00
Roland Tannous
dd71b0f18a return raw preview samples on format detection failure for manual column mapping 2026-02-12 15:39:56 +00:00
Roland Tannous
e0623cae6c Merge pull request #31 from unslothai/feature/datasets-endpoint-return-top-10
Optimize `/check-format` to return preview samples
2026-02-12 15:26:13 +04:00
Roland Tannous
4c791bd5aa feat(datasets): check-format to return preview samples 2026-02-12 11:25:47 +00:00
Daniel Han
08bb85fcda Create CODEOWNERS (#4039) 2026-02-12 02:56:13 -08:00
Shine1i
e145a72adb feat: add builders and components for LLM configuration in Recipe Studio and refactor for readability, preparing for draft 2026-02-12 04:25:15 +01:00
sshah229
ee703dd6c6 added router in main 2026-02-11 18:51:53 -07:00
sshah229
40bfe42974 added the pydantic models and routes for export 2026-02-11 18:34:12 -07:00
Shine1i
30cc509197 feat: implement Data Recipes page feature subfolders for workflow management and saving logic 2026-02-12 02:07:53 +01:00
Shine1i
390e9ed9d2 feat: add Recipe Studio utilities and components for configuring synthetic data pipelines 2026-02-12 01:03:47 +01:00
Shine1i
93f45ffd07 chore: rename Canvas Lab components and utilities 2026-02-12 00:39:08 +01:00
Shine1i
0ed6c141b0 feat: add interactive viewport controls and refactor floating button styles
- Introduced `ViewportControls` for zoom, fit view, and interactive toggle in canvas lab.
- Extracted and reused `CANVAS_FLOATING_ICON_BUTTON_CLASS` for consistent button styling.
- Updated API base paths and server proxy settings.
- Enabled dynamic interaction states for nodes and connections in canvas lab.
2026-02-12 00:18:56 +01:00
shine1i
e39d03c21e Merge remote-tracking branch 'origin/nightly' into feature/canvas-lab
# Conflicts:
#	.gitignore
#	studio/frontend/.gitignore
#	studio/frontend/bun.lock
#	studio/frontend/src/app/router.tsx
2026-02-11 22:36:34 +01:00
Roland Tannous
a8b8da96d1 Merge pull request #28 from unslothai/refactor/centralize-device-selection-and-gpu-cache
[MLX] - Centralize Device Selection & GPU Cache Management
2026-02-11 20:59:23 +04:00
Roland Tannous
da1cde971c use get_device() for device selection and clear_gpu_cache() for GPU memory cleanup in inference, trainer, and export 2026-02-11 16:56:52 +00:00
Roland Tannous
f7529d1503 Merge pull request #27 from unslothai/feature/implement-silicon-utils-compatibility
[MLX] Add Hardware Detection Module & Apple Silicon (MLX) Compatibility
2026-02-11 20:21:08 +04:00
Roland Tannous
1a6bfe51b6 added @needs_torch to test_cuda_oom 2026-02-11 16:12:37 +00:00
Roland Tannous
95038d6129 add @needs_mlx decorator on tests 2026-02-11 16:10:22 +00:00
Roland Tannous
63c583c54f replace torch MPS with MLX 2026-02-11 16:04:35 +00:00
Roland Tannous
7db31723b9 reset DEVICE type on fastapi lifespan exit 2026-02-11 15:58:13 +00:00
Roland Tannous
e7c3e7b48d fixed tests to be hardware specific 2026-02-11 15:40:00 +00:00
Roland Tannous
85fc481afe fixed tests to be hardware specific 2026-02-11 15:37:34 +00:00
Roland Tannous
59d5f24eb5 integrate global hardware detection at lifespan entrypoint 2026-02-11 15:34:26 +00:00
Roland Tannous
107bd2be4c feat: add Apple Silicon (MPS) compatibility to backend utils + tests 2026-02-11 14:00:39 +00:00
Wasim Yousef Said
eddb2d5405 Merge pull request #23 from unslothai/feature/auth-ui
auth setup for the client and auth guard checks
2026-02-11 05:05:07 -08:00
shine1i
8c85fef59f drop docs file 2026-02-11 14:03:45 +01:00
shine1i
77f8316546 feat: auth guard on routes 2026-02-11 14:01:30 +01:00
Roland Tannous
fde1fea5a9 Merge pull request #26 from unslothai/fix/fix-existing-routes-models
Fix: Move Inline Pydantic Models & Add Response Models for Routes
2026-02-11 16:42:25 +04:00
shine1i
cdf7ead71b feat: new auth and refresh token on unauthorized 2026-02-11 13:40:33 +01:00
Roland Tannous
7ee4381936 move inline pydantic models - fix existing models routes integration 2026-02-11 12:39:58 +00:00
Roland Tannous
1e51f0b791 Merge pull request #25 from unslothai/feature/remove-unsloth-compiled-cache-lifespan-exit
Remove `unsloth_compiled_cache` on FastAPI Lifespan Exit
2026-02-11 16:31:57 +04:00
Roland Tannous
28c7df5925 remove unsloth_compiled_cache folder on fastapi lifespan exit 2026-02-11 12:30:05 +00:00
shine1i
3d0b90a862 Merge remote-tracking branch 'origin/nightly' into feature/auth-ui 2026-02-11 13:29:11 +01:00
Roland Tannous
fd3e0e5f09 chore: untrack auth.db (already in .gitignore) 2026-02-11 12:21:54 +00:00
Roland Tannous
e66280119d Merge pull request #24 from unslothai/feature/refactor-authentication-mechanism
Feature/refactor authentication mechanism
2026-02-11 16:13:44 +04:00
Roland Tannous
01fcb4f713 authentication refactor - added setup token and token refresh mechanism 2026-02-11 12:09:47 +00:00
Lei Zhenyuan
cdc9dc1fb1 fix for tma (#4023) 2026-02-10 17:50:33 -08:00
shine1i
de0c142a1c ignore claude, test folder and docs for arch of canvas-lab 2026-02-10 17:39:41 +01:00
Datta Nimmaturi
6804c05130 Misc fixes (#4018)
* convert print to logger

* Print but cleaner

* Hide model on multiple devices

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo transfomers -> transformers, revert MoE message change

* Update MoE detection message to show num_experts and target_modules

* Fix llama-cli path in save info message

* target_parameters warning for moe

* fix should_convert_module for llm_int8_skip_modules

* fix should_convert_module for llm_int8_skip_modules

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Logging filters

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* negation

* remove should_convert_module patch

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-10 06:31:34 -08:00
Daniel Han
10338dbaa4 Fix warmup_ratio deprecation for transformers >= 5.0 (#4019)
* Fix warmup_ratio deprecation warning for transformers >= 5.0

In transformers 5.0, warmup_ratio is deprecated in favor of
warmup_steps which now accepts float values (< 1 = ratio,
>= 1 = absolute steps).

The compiler now conditionally sets warmup_steps=0.1 on
transformers >= 5.0 (same semantics as warmup_ratio=0.1) and
keeps warmup_ratio=0.1 on older versions where warmup_steps
only accepts int.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-10 06:17:47 -08:00
Daniel Han
f106eec5e9 Fix Gemma3 4B training on transformers 5.x (token_type_ids) (#4017)
* Inject token_type_ids for Gemma3 multimodal training on transformers 5.x

In transformers 5.x, create_causal_mask_mapping() raises ValueError when
is_training=True and token_type_ids is None. When doing text-only SFT on
Gemma3 4B (a multimodal model), the dataset_utils detection for
_needs_token_type_ids can miss because:
- The model is wrapped in PeftModel, so type(model).__module__ points to
  peft.peft_model instead of transformers
- The processing_class is a tokenizer (not Gemma3Processor), so the
  fallback MRO check resolves to a module without create_causal_mask_mapping

This adds a fallback in _unsloth_pre_compute_loss that injects
token_type_ids=zeros when:
1. token_type_ids is not already in inputs
2. The inner model config has model_type "gemma3"
3. The model's module has create_causal_mask_mapping (transformers 5.x)
4. The model is in training mode

On transformers 4.x, create_causal_mask_mapping does not exist so this
check is inert.

Depends on: unslothai/unsloth-zoo#488

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-10 05:14:36 -08:00
andrewor14
cd24ea0e50 FP8: Load model on-the-fly in vLLM (#3717)
* FP8: Load model on-the-fly in vLLM

**Summary:** Existing support for `load_in_fp8=True` performs
an offline quantization when loading the initial model.
This is no longer necessary as of vllm==0.12.0 (after
https://github.com/vllm-project/vllm/pull/23014), where we
can quantize the model on-the-fly when we load it:

```
llm = LLM(
  ...
  hf_overrides={
    "quantization_config_dict_str": json.dumps(torchao_config),
  },
)
```

**Note:** Needs https://github.com/unslothai/unsloth-zoo/pull/380

**Test Plan:**
https://gist.github.com/andrewor14/5b85119fae46845d07b608d420907423

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix on-the-fly FP8: always check mapper first, fallback to on-the-fly

The original implementation bypasses the FP8 mapper entirely for
vllm >= 0.12.0, meaning models like Llama-3.2-1B-Instruct and Qwen3-8B
that have pre-quantized FP8-Block/FP8 checkpoints would never use them.

This fixes the priority order:
1. Mapper has a pre-quantized model -> use it (always)
2. Mapper has no match + vllm >= 0.12.0 -> on-the-fly FP8 via torchao
3. Mapper has no match + vllm < 0.12.0 -> offline quantization

Changes:
- loader_utils.py: Move vllm >= 0.12.0 check after mapper lookups
- loader.py: Set load_in_fp8=False when mapper resolves to a
  pre-quantized model to prevent double quantization

Tested on B200 with Llama-3.2-1B-Instruct and Qwen3-8B. Corrected code
produces results matching baseline (pre-quantized path preserved).

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-10 05:10:13 -08:00
Datta Nimmaturi
3df65308f3 [Misc] Fixes (#4015)
* convert print to logger

* Print but cleaner

* Hide model on multiple devices

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo transfomers -> transformers, revert MoE message change

* Update MoE detection message to show num_experts and target_modules

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-10 02:08:55 -08:00
Roland Tannous
fe5a7d11b6 add llama.cpp prefix to gguf conversion help messages (#4016) 2026-02-10 01:59:05 -08:00
Fizza Mukhtar
a353fad514 Fix #3397: Prevent trainer tokenization hang with safe num_proc (#4013)
* Fix #3397: Prevent trainer tokenization hang with safe num_proc

* Fix #3397: Add missing import sys for Windows-safe tokenization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Consolidate with existing num_proc guard in dataset_utils.py

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-10 01:53:46 -08:00
Daniel Han
acfe670357 Fix EmbeddingGemma float16 NaN via FORCE_FLOAT32 for gemma3_text (#4014)
* Fix EmbeddingGemma float16 NaN by adding gemma3_text to FORCE_FLOAT32 and SDPA lists

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-10 01:40:13 -08:00
Daniel Han
a2f4f04ea5 Inject model reference for dynamic token_type_ids detection in SFTTrainer (#4012)
* Inject model reference for dynamic token_type_ids detection in SFTTrainer

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-10 00:37:07 -08:00
Daniel Han
a35e866625 Suppress vLLM v1 executor sleep/wake log messages (#4011)
* Suppress vLLM v1 executor sleep/wake log messages

Add HideLoggingMessage filters for vllm.v1.executor.abstract logger to
suppress repetitive sleep/wake INFO and WARNING messages that spam training
output when UNSLOTH_VLLM_STANDBY is enabled. The existing filter at line 275
handles the legacy vllm.executor.executor_base path; this adds coverage for
the v1 engine path used by vllm 0.11+.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-09 23:51:58 -08:00
pre-commit-ci[bot]
293b431e77 [pre-commit.ci] pre-commit autoupdate (#4009)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.14.14 → v0.15.0](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.14...v0.15.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-09 17:32:18 -08:00
shine1i
aab37f8dc2 refactor: consolidate AvailableVariables component and enhance variable display logic across dialogs
- Moved `AvailableVariables` to shared directory.
- Updated dialogs to use shared `AvailableVariables` component.
- Enhanced inline expressions and processors dialog with better variable display.
2026-02-09 20:07:41 +01:00
shine1i
d360372d9c add variable handling components and refactor inputs across dialogs
- Introduce `AvailableVariables` for displaying variables linked to configs.
- Implement `ChipInput` for dynamic value management in category and subcategory dialogs.
- Add `AuxVariableBadges` to aux nodes for displaying variable references.
- Update inline components with comboboxes for better user experience.
- Replace badges and manual inputs with streamlined reusable components.
2026-02-09 19:48:42 +01:00
Daniel Han
4f5de9ba93 Silence peft target_parameters RuntimeWarning for MoE models (#4008)
* Silence peft target_parameters RuntimeWarning for MoE models

Wrap _get_peft_model calls with warnings.catch_warnings() to suppress
the "target_parameters were set but no parameter was matched" warning.
This fires on MoE models where expert layers use nn.Parameter naming
that peft warns about but handles correctly.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-09 08:25:40 -08:00
Daniel Han
4924a5f6aa Silence TRL's batch_size=1 padding-free warning in compiled trainer source (#4007)
Strip the "anihilate"/"annihilate" warning block from compiled trainer
source so it does not fire when Unsloth auto-enables padding-free mode
with batch size 1 (the common single-GPU case).

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-09 07:55:29 -08:00
Daniel Han
f3f3c9dfb9 Fix dtype mismatch in fp16 + 4-bit/8-bit LoRA training (#4005)
* Fix dtype mismatch in fp16 + 4-bit/8-bit LoRA training

Two fixes for training with dtype=torch.float16 and load_in_4bit=True:

1. fast_lora.py: fast_dequantize() returns tensors in quant_state.dtype
   (typically bfloat16 or float32), but activations may be float16. The
   subsequent matmul/addmm operations require matching dtypes. Add dtype
   casts after each fast_dequantize() call in LoRA_MLP.backward and
   LoRA_QKV.backward (5 locations total).

2. rl.py: TRL unconditionally casts trainable parameters to bfloat16 in
   the peft init block. When training with fp16=True, this causes
   GradScaler to crash since it requires float32 parameters. Make the
   cast conditional -- use float32 when fp16 is enabled, bfloat16
   otherwise. This is a no-op for GRPOTrainer (whose peft init block is
   already removed by the existing regex), but fixes SFTTrainer and
   other TRL trainers.

Tested with Llama-3.2-1B-Instruct 4-bit on both fp16 and bf16 training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix fp16 + 4-bit LoRA: thread correct_dtype through post_patch

Root cause: fast_dequantize returns tensors in quant_state.dtype, which
for pre-quantized models is bfloat16 (from config.json). The post_patch
methods in llama/gemma/gemma2 call patch_model_and_tokenizer without
passing correct_dtype, so quant_state.dtype is never overridden to match
the user's requested dtype. This causes a dtype mismatch crash in the
backward pass when training with dtype=torch.float16.

Fix: pass the user's dtype from from_pretrained through post_patch to
patch_model_and_tokenizer as correct_dtype, matching the pattern already
used by vision.py.

Revert the 5 symptom-level dtype casts in fast_lora.py (upW, gateW, QW,
KW, VW) since they are no longer needed with quant_state.dtype properly
set at the source.

Tested: fp16+4bit and bf16+4bit Llama-3.2-1B-Instruct 15-step SFT runs
both complete successfully with similar losses (~1.558 vs ~1.563).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove TRL's unconditional bfloat16 cast instead of patching the dtype

TRL 0.26.0+ hardcodes `param.data.to(torch.bfloat16)` for all trainable
params in quantized models, citing the QLoRA paper recommendation. This
is wrong: it ignores the user's requested dtype and breaks GradScaler
when fp16=True. The block exists in sft_trainer, grpo_trainer,
rloo_trainer, and reward_trainer (not dpo_trainer).

Previous fix patched the cast to be dtype-conditional. This commit
replaces the entire guard `if getattr(model, "is_loaded_in_4bit", ...)
or getattr(model, "is_loaded_in_8bit", ...):` with `if False:` to
disable the block entirely. Unsloth already handles adapter dtype via
patch_model_and_tokenizer, making TRL's cast both unnecessary and
harmful.

For GRPOTrainer the enclosing peft init block is already removed by
the regex above, making this a no-op for GRPO.

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-09 07:39:26 -08:00
Daniel Han
0a04b1b22c Fix trl.experimental thin wrapper compilation and OOM from peft_config overwrite (#4006)
* Fix trainer compilation failures from trl.experimental thin wrappers

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix OOM from prepare_model_for_kbit_training overwriting peft_config patching

---------

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-09 07:04:55 -08:00
Daniel Han
14fe579629 Fix VLM model + text-only dataset ValueError in TRL 0.22.x (#4004)
TRL 0.22.x checks _is_vlm (model type) instead of _is_vision_dataset
(dataset content, added in 0.25.1+) in _set_signature_columns_if_needed.
When _is_vlm=True (e.g. Gemma3), signature columns are set to vision-only
["messages","prompt","completion","images"], which has zero overlap with
tokenized text columns [input_ids, labels, attention_mask, ...], causing
a ValueError.

Fix: expand the VLM branch signature columns to include both vision and
text column names. Extra columns not present in the dataset are harmlessly
ignored by _remove_unused_columns (it only raises when zero columns match).

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-09 06:24:58 -08:00
Daniel Han
ba7366be53 Fix notebook compatibility for transformers 4.57.6 and TRL 0.22-0.27 (#3998)
* Patch before compile?

* Fix notebook compatibility for transformers 4.57.6 and TRL 0.22-0.27

Fixes several notebook failures discovered during testing all 125
notebooks with transformers==4.57.6 + tRL 0.22.2 and TRL 0.27.1.

Warning suppression (import_fixes.py):
- Suppress torch 2.9+ pin_memory/is_pinned device deprecation warnings
- Suppress cuda.cudart/cuda.nvrtc module deprecation FutureWarning
- Filter vllm "Level is deprecated" stderr noise
- Filter PydanticSerializationUnexpectedValue warnings
- Filter Triton "df: No such file" stderr noise

VLM tokenizer loading (vision.py):
- Add _construct_vlm_processor_fallback() for models where
  AutoProcessor.from_pretrained fails (e.g., ERNIE 4.5 VL, LFM2.5-VL)
- Wrap processor loading in try/except with fallback to manual
  construction from separate image_processor + tokenizer components
- Add fallback to AutoTokenizer/PreTrainedTokenizerFast when tokenizer
  loading or patching fails

TRL 0.27.1 trainer compatibility (trainer.py):
- Add _resolve_trainer_params() to handle thin wrapper trainers that
  only have def __init__(self, *args, **kwargs) (e.g., ORPOTrainer
  in TRL 0.27.1) by walking MRO for real parameter signature

VLM _is_vlm detection (rl.py):
- Replace blanket _is_vlm=False override with model-architecture-based
  detection that checks vision_config or ForConditionalGeneration class
  name, fixing VLM training when bare tokenizer is passed as
  processing_class

ModernBERT SDPA compatibility (loader.py, sentence_transformer.py):
- Add "modernbert" to DISABLE_SDPA_MODEL_NAMES to avoid stride
  alignment issues with torch.compile backward pass
- Add DISABLE_SDPA check for sentence transformer models

Other fixes (_utils.py):
- Suppress false uninitialized weight warnings for VLM
  multi_modal_projector.layer_norm

Tested: 92/125 notebooks pass with TRL 0.22.2, 94/125 with TRL 0.27.1.
Remaining failures are infra (missing FFmpeg, network timeouts, GPU
arch) not code bugs.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix KTO shape mismatch on TRL 0.27.2+ and truncation alignment

- Patch KTO get_batch_logps to auto-align logits and labels when Unsloth
  model forward truncates input_ids beyond max_seq_length. TRL 0.27.2
  changed _process_tokens to only truncate completions (not prompts), so
  sequences with long prompts exceed max_seq_length and trigger model-side
  truncation. The original ValueError is replaced with min-length alignment.

- Also truncate attention_mask in LlamaModel forward when input_ids are
  truncated to max_seq_length, preventing shape mismatches in attention.

- Widen except clause in rl_replacements.py openenv import from
  `except ImportError` to `except (ImportError, NameError, Exception)` to
  handle vllm SamplingParams NameError in TRL 0.27.2.

* Fix TRL 0.26+ thin wrapper resolution, enable ModernBERT SDPA, clean up warning filters

TRL 0.26+ thin wrapper resolution (rl.py):
- Filter _-prefixed private imports when discovering Trainer/Config classes
- Look up Config in separate *_config.py module when not found in trainer module
- Detect thin wrappers (<1000 chars source) and resolve to experimental parent
  via MRO walk; use resolved module for imports and create_new_function
- Enables all 15 trainers to patch successfully (was 5/15 before)

ModernBERT SDPA (loader.py):
- Remove "modernbert" from DISABLE_SDPA_MODEL_NAMES
- SDPA works correctly for both classification and sentence transformers
- Verified: 88.9% accuracy on emotion classification, correct domain-specific
  embeddings after sentence transformer fine-tuning

Warning filter cleanup (import_fixes.py):
- Remove cuda.cudart/cuda.nvrtc FutureWarning filters (no such warnings
  exist in torch 2.9.1+; proactive suppression is unnecessary)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove multi_modal_projector.layer_norm from uninitialized weight guard

The LFM2.5-VL projector LayerNorm is properly initialized by
transformers and does not need to be excluded from the uninitialized
weight check. The original exclusion was added as a workaround but is
no longer needed after the upstream fix.

* Add transformers 5.0 compat: rope_theta helper, config-as-dim detection, BatchEncoding guard, try/except for TRL trainer source, push_to_hub_token compiler fix

- llama.py: Add _get_rope_theta() helper handling both config.rope_theta and rope_parameters dict
- llama.py: Handle BatchEncoding in unsloth_fast_generate (transformers 5.0+ returns BatchEncoding from apply_chat_template)
- gemma.py: Detect config passed as dim arg in GemmaFixedRotaryEmbedding
- tokenizer_utils.py: Add try/except for TRL trainer getsource in patch_sft_trainer_tokenizer
- rl_replacements.py: Add compiler fix replacing bare pop("push_to_hub_token") with pop(..., None)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use trl.experimental string check instead of char-count heuristic for thin wrapper detection

The <1000 / >1000 char threshold was fragile -- XPOConfig's parent is only
994 chars and would be skipped. All thin wrappers in TRL 0.26+ contain
"trl.experimental" in their deprecation warning, while no real trainer or
config class does, making it a reliable detection marker.

* Move DISABLE_SDPA_MODEL_NAMES import to module level in sentence_transformer

The function-level import was redundant since loader.py is already imported
at module level. Move it to the existing loader import line.

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-09 05:11:50 -08:00
siddhu donda
884ce4601f fix: add inputs_embeds support in _fast_prepare_inputs_for_generation (#3798) (#3814)
Add `inputs_embeds` parameter to `_fast_prepare_inputs_for_generation` so
`model.generate(inputs_embeds=...)` works with Unsloth-patched models.

Changes:
- Add `inputs_embeds=None` to function signature (fixes HF inspect check)
- Track `use_inputs_embeds` flag: True when inputs_embeds provided and no cache
- Conditionally return inputs_embeds on first step, input_ids on subsequent steps
- Handle input_ids being None/empty for batch size and device extraction
- Add attention_mask None-guard before slicing

Fixes: https://github.com/unslothai/unsloth/issues/3798

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: siddhudonda <siddhudonda@users.noreply.github.com>
2026-02-09 04:59:43 -08:00
Daniel Han
3b1e8d0ae6 Update README.md 2026-02-09 04:50:54 -08:00
Daniel Han
60dd7269a5 Fix broken documentation links, typos, and formatting in README (#4003)
- Fix 14 broken documentation links (all returning 404) caused by docs
  site restructuring (install-and-update -> install, pages moved to
  /docs/blog/ and /docs/models/tutorials/)
- Fix "Qwen2.3-VL" -> "Qwen3-VL" (model does not exist)
- Fix incorrect "GSPO" label on gpt-oss GRPO notebook
- Fix "4b-bit" typo -> "4-bit"
- Fix "sodoku" typo -> "sudoku"
- Fix double dash formatting on FP8 GRPO notebook list item
- Fix citation URL from http:// to https://
- Update "MultiGPU coming soon" to "is now supported"
- Fix Windows installation step numbering (1,3,5,6,7 -> 1,2,3,4,5)
- Fix Advanced/Troubleshooting step numbering (5,6,5 -> 4,5,6)

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-09 04:46:46 -08:00
Fizza Mukhtar
c98312f229 Fix multi-GPU loading for quantized models in distributed training (#3917)
When using torchrun with quantized models (4bit/8bit/fp8), each rank
must load the model directly onto its own GPU. The default device_map
("sequential") places everything on GPU 0, causing illegal memory
access errors when Accelerate tries to relocate quantized weights.

Use the existing prepare_device_map() utility from loader_utils to
detect distributed training via LOCAL_RANK/WORLD_SIZE env vars and
override device_map to target each rank's local GPU. This is applied
in both FastLanguageModel.from_pretrained and FastModel.from_pretrained,
covering text, vision, and audio model paths.

Fixes #3914

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-09 04:26:21 -08:00
Mohammad Miadh Angkad
336bec216a Refactor Ollama template wiring and harden packing helpers (#3890)
* Refactor Ollama template wiring and harden packing helpers

Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>

* Fix Qwen3 and Gemma3n template bindings and tidy packing test helper

* Fix gptoss Ollama comment and tinyllama stop parameter

- Fix wrong comment referencing gemma3n for gptoss_ollama in chat_templates.py
- Add missing stop keyword to tinyllama PARAMETER in ollama_template_mappers.py

* Fix _DummyTrainer compatibility across TRL versions

The try/except only handled the removal of return_position_ids
(TRL v0.24+) but not the absence of padding_free (TRL v0.18.2).
Gracefully degrade through all optional collator flags so the
test works from trl>=0.18.2 through v0.27+.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-09 04:04:48 -08:00
RektPunk
f868d8b073 [Feature] seperate gguf file path (#3934)
* seperate gguf

* fix Modelfile log

* ollama Modelfile create

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix GGUF file placement: move initial conversion to _gguf dir, fix cleanup

- Move initial GGUF files (from convert_to_gguf) into {model_directory}_gguf/
  immediately after conversion, so all GGUF outputs live in the dedicated
  directory regardless of quantization method (fixes bf16-only case where
  quant == first_conversion skipped the loop and _gguf dir was never created)
- Remove redundant gguf_directory/makedirs from inside the re-quant loop
  since the directory is now created before the loop
- Use Path.unlink(missing_ok=True) for base GGUF cleanup robustness
- Unify Modelfile location to {save_directory}_gguf/Modelfile for both
  VLM and non-VLM models
- Fix print message to show actual modelfile_location path
- Add gguf_directory key to return dict
- Clean up {save_directory}_gguf in push_to_hub_gguf error/finally blocks

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-09 04:00:14 -08:00
Etherll
315178b5c3 Add push_to_hub_gguf support for FastSentenceTransformer (#4002)
* Implement GGUF upload method for SentenceTransformer

Added a method to convert and upload SentenceTransformer models to GGUF format, including handling of tokenizer, quantization methods, and repository management on Hugging Face Hub.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-09 00:51:26 -08:00
Daniel Han
b47b081f99 Fix triton 3.6.0 + torch 2.9.x torch.compile crash (missing cluster_dims) (#4001)
Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-08 20:18:25 -08:00
Daniel Han
c43a5b8f02 Fix multiprocessing crash on Windows/macOS and unify num_proc logic (#3999)
On Windows and macOS (Python 3.8+), multiprocessing uses the spawn
start method. When datasets .map(num_proc=N) is called, it creates a
Pool(N) which re-imports __main__ in each worker, causing infinite
recursion and a RuntimeError during bootstrapping.

Guard the auto-computed dataset_num_proc in the generated Config
__init__ by checking multiprocessing.get_start_method() != 'fork'.
When the start method is not fork (spawn/forkserver), force
dataset_num_proc = None so datasets takes the single-process path.
Linux fork behavior is unchanged.

Also replace the fixed memory threshold logic with the simpler
adaptive approach: cap at 64, then min(num_proc, int(available_gb)),
with a safety floor of 1 when available memory is at or below 2GB.

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-08 02:50:06 -08:00
shine1i
9b4293c677 auth ui flow 2026-02-07 16:14:02 +01:00
shine1i
27a11b6383 merge nightly 2026-02-07 14:30:46 +01:00
Roland Tannous
d54671b1a6 Merge pull request #22 from unslothai/feature/auth
Added jwt authentication
2026-02-07 14:27:35 +04:00
sshah229
0082627801 fixed the errors- renamed jwt to authentication, used raw jwt, and removed search route 2026-02-07 03:14:30 -07:00
shine1i
58b40a9015 inline sizes, aux nodes resize, and auto layout keep nodes close logic 2026-02-06 13:18:18 +01:00
shine1i
baa2a8426c cleanup and restctucture 2026-02-06 12:54:55 +01:00
shine1i
0db132813b decouple aux nodes and text dom for aux nodes 2026-02-06 12:12:32 +01:00
shine1i
03b947d8a2 fix edfe desync 2026-02-06 11:47:29 +01:00
shine1i
8f1e6622ca ui buttons move, squircle boxes, and resize 2026-02-06 11:28:32 +01:00
sshah229
9c28c592a4 Merge branch 'feature/auth' of https://github.com/unslothai/new-ui-prototype into feature/auth 2026-02-06 03:19:22 -07:00
sshah229
50ff5626f1 refactored the code for username/password and added pydantic models and routes for the same 2026-02-06 03:15:30 -07:00
sshah229
009e93f079 Refactored the training and model routes and added the jwt authentication 2026-02-06 03:15:30 -07:00
sshah229
daa20821c1 refactored the code for username/password and added pydantic models and routes for the same 2026-02-06 02:56:54 -07:00
shine1i
b15f4e7ad5 inline dialogs and react flow ui refactor WIP p1 2026-02-06 00:42:35 +01:00
shine1i
271ddcfb4a refactor of payload files 2026-02-05 23:41:27 +01:00
shine1i
1ca01e5d21 processors and drop column 2026-02-05 22:26:48 +01:00
shine1i
d67efb8516 new blocks timedelta, and some tweaks 2026-02-05 21:55:50 +01:00
shine1i
2a9a332ce3 model config and provider fixes and inline dialog 2026-02-05 20:55:39 +01:00
pluesclues
c6de138e62 Update rl_replacements.py (#3990) 2026-02-05 08:22:42 -08:00
Daniel Han
3a4b1e7fc5 Disable torchcodec in transformers when FFmpeg is missing (#3989)
* Disable torchcodec in transformers when FFmpeg is missing

When torchcodec is installed but FFmpeg libraries are unavailable,
transformers still thinks torchcodec is available (via find_spec check)
and tries to use it for audio loading, causing RuntimeError.

This adds disable_torchcodec_if_broken() which tests if torchcodec can
actually load its native libraries, and if not, patches transformers'
_torchcodec_available to False so it falls back to librosa instead.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-05 06:54:09 -08:00
Daniel Han
145f6aaeb1 Fix cutlass inductor options for PyTorch < 2.8.0 (#3988)
The cuda.cutlass_epilogue_fusion_enabled and cuda.cutlass_tma_only
inductor config options were added in PyTorch 2.8.0. Using these
options on older PyTorch versions causes a RuntimeError during
GRPOTrainer initialization.

This fix adds a version check to only include these options when
running PyTorch 2.8.0 or later, allowing GRPO training to work on
older PyTorch versions (e.g., Colab environments with PyTorch 2.5-2.7).

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-05 06:40:11 -08:00
Daniel Han
7b42acae94 Fix RuntimeError not caught when torchcodec fails to load (#3987)
When datasets library has torchcodec installed but FFmpeg libraries
are missing, torchcodec raises a RuntimeError during import. The
exception handler only caught ImportError and AttributeError, causing
the error to propagate and crash Unsloth imports in environments
like Colab where FFmpeg may not be installed.

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-02-05 06:35:10 -08:00
Daniel Han
ce256c43bc Merge branch 'main' of https://github.com/unslothai/unsloth 2026-02-05 06:10:06 -08:00
Daniel Han
f463f692d6 MoE release 2026-02-05 06:09:56 -08:00
Datta Nimmaturi
fad6957555 [MoE] Improve moe kernels for unsloth fine tuning (#3812)
* Improve MoE performance

* small changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix imports

* disable autotune

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* LoRA for MoE

* Make autotune default

* make dy contiguous

* use non lora model as base for RL

* Revert "use non lora model as base for RL"

This reverts commit bc8f15629d060593b2eaf436f158ff5ac9df0d5d.

* fixup derp

* non TMA [T4]

* Revert "non TMA [T4]"

This reverts commit 35304566690e7c9ab9632899920c85bff322409a.

* Fixes for VL MoE and v5 transformers

* [transformers] [v5] remove unused hybridcache (#3910)

* remote unused hybridcache

* cleanup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* No double compile for qwen3moe

* Fix top_k on trl GRPO

* Recognise GLM as MoE

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing RotaryEmbeddingConfigMixin

* Licensing for autotuning cache

* Cleanup

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-02-05 06:03:25 -08:00
Daniel Han
2883ce4091 Update _utils.py 2026-02-05 05:58:00 -08:00
Daniel Han
ff3f78b6b9 Add PyTorch 2.10 and xformers 0.0.34 support (#3985)
- Add cu126/cu128/cu130 xformers 0.0.34 wheel dependencies for torch 2.10
- Add cu126-torch2100, cu128-torch2100, cu130-torch2100 meta-dependencies
- Add cu126-ampere-torch2100, cu128-ampere-torch2100, cu130-ampere-torch2100 variants
- Update _auto_install.py version detection for torch 2.10.x
- Add CUDA check for torch 2.10 (requires CUDA 12.6, 12.8, or 13.0)
- Update README.md with torch 2.10 installation instructions

Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
2026-02-05 05:56:26 -08:00
Daniel Han
7ceebe4554 Silence non-actionable TRL trainer import failures (#3980)
_patch_trl_rl_trainers enumerates all trainer modules from dir(trl.trainer)
and attempts to import each one. Modules like alignprop_trainer fail because
they depend on optional packages (diffusers) that may not be installed. The
failure is harmless but the print() call produces noise on every import.

Change print() to logger.info() so these messages only appear when
UNSLOTH_ENABLE_LOGGING=1.

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-02-05 05:32:52 -08:00
Daniel Han
5798267401 Silence third-party deprecation warnings and fix socket leak (#3983)
* Silence third-party deprecation warnings and fix socket resource leak

- Add warning filters for TorchAO deprecated import paths
- Filter SWIG builtin type warnings from bitsandbytes/triton
- Filter Triton autotuner deprecation warnings
- Filter Python 3.12+ multiprocessing fork warnings
- Filter resource warnings for unclosed sockets/files
- Fix socket leak in has_internet() by properly closing socket

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-05 04:55:52 -08:00
Daniel Han
649865caca Fix GPT-OSS BlockMask error during inference (#3982)
GPT-OSS models use eager attention during inference because flex
attention returns incorrect results (likely due to left padding).
However, when _attn_implementation is set to "flex_attention",
transformers creates BlockMask objects which cause a TypeError
when passed to the eager attention path:

  TypeError: unsupported operand type(s) for +=: 'Tensor' and 'BlockMask'

This fix excludes GPT-OSS from using flex_attention, keeping it on
the eager path to avoid the BlockMask/Tensor type mismatch.
2026-02-05 04:28:46 -08:00
shine1i
7350e52f2f feat: Model Provider and Model config 2026-02-05 12:20:11 +01:00
Daniel Han
6f3e52bbcf Prefer flex attention when available (#3979)
* Enable flex attention by default

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Avoid dropping flex attention when SDPA unsupported

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-05 03:19:04 -08:00
shine1i
fbf5a30c77 sheet icons and llm judge 2026-02-05 11:38:11 +01:00
pluesclues
9b34982509 Trl 0.27.0 update (#3965)
* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update rl_replacements.py

* Update rl.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update rl_replacements.py, remove chat template from codexes commits

* Update rl.py, got rid of gradient checkpointing code that did not work

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-04 23:01:16 -08:00
shine1i
4a909ded0e handle layouting 2026-02-04 17:21:25 +01:00
shine1i
35721763c3 straight lines 2026-02-04 16:37:42 +01:00
shine1i
fbb8adbab5 convert to, lines and fixes 2026-02-04 16:33:42 +01:00
shine1i
4234e23f68 refactor: centralize block definitions and dialogs into registry, streamline node updates using helper utilities 2026-02-04 15:42:46 +01:00
shine1i
dc56229eda save and import, and fixes 2026-02-04 14:32:49 +01:00
shine1i
9b10d81a46 import export 2026-02-04 14:22:03 +01:00
shine1i
3a4e54ef02 sheet tidy 2026-02-04 14:11:09 +01:00
shine1i
1d5d1b625b canvaslab v1 2026-02-04 14:06:38 +01:00
Daniel Han
e1c682e6d2 Fix torchvision compatibility check for source builds and future torch versions (#3978)
* Fix torchvision compatibility check for source builds and future torch versions

The torchvision version check raised a hard ImportError for custom/source-built
PyTorch installations (e.g. AMD ROCm from source with +git* suffixes), even when
the actual build was functional. This also silently skipped any torch version
not already in the hardcoded table, giving no warning at all for future releases.

Changes:
- Detect custom/source builds by checking the raw version string's local
  identifier against known standard prefixes (cu, rocm, cpu, xpu). Our custom
  Version() strips local identifiers via regex, so detection must happen on the
  raw string before parsing.
- Downgrade to a warning (instead of ImportError) for custom/source builds,
  since their version numbers may not follow standard PyPI release pairings.
- Add formula-based inference for future torch versions not yet in the table.
  The torch->torchvision minor version formula (torch 2.x -> tv 0.(x+15)) has
  held for every release from torch 2.0 through 2.9. For formula-predicted
  versions, mismatches produce a warning rather than a hard error.
- Add UNSLOTH_SKIP_TORCHVISION_CHECK=1 env var to skip the check entirely.
- Wrap importlib_version and Version calls in try/except so broken metadata
  never crashes the import.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review: stricter regex, case insensitivity, pre-release detection

Fixes three edge cases found during review:

1. Regex precision: cu/xpu now require a trailing digit (cu\d, xpu\d) to
   avoid false negatives on suffixes like "+custom_build" that happen to
   start with "cu". cpu/xpu match as exact strings only.

2. Case insensitivity: added re.IGNORECASE so "+ROCM6.3" and "+CPU" are
   correctly recognized as standard builds rather than custom ones.

3. Pre-release detection: nightly/dev/alpha/beta/rc builds with standard
   CUDA/ROCm suffixes (e.g. "2.7.0.dev20250301+cu124") now produce a
   warning instead of a hard ImportError. These builds commonly have
   version mismatches that are expected during development.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address PR review comments: fullmatch, env var casing, torchvision pre-release

1. Switch re.match to re.fullmatch for the custom build regex so the
   entire local identifier must match. Fixes false negatives where
   suffixes like +cu124_custom were misclassified as standard because
   re.match only checked the start of the string.

2. Use .lower() for the UNSLOTH_SKIP_TORCHVISION_CHECK env var so
   any casing of "true" / "TRUE" / etc. is accepted.

3. Check torchvision_version_raw for pre-release tags in addition to
   torch_version_raw, so a stable torch paired with a nightly
   torchvision (e.g. 0.23.0.dev...) also gets a warning instead of
   a hard ImportError.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-04 04:50:26 -08:00
sshah229
482e934e09 Refactored the training and model routes and added the jwt authentication 2026-02-04 05:49:29 -07:00
Wasim Yousef Said
82d42fd8e2 Merge pull request #21 from unslothai/feature/cleanup
biome fixes, and linter fixes for some issues, and some readibility changes
2026-02-04 04:33:06 -08:00
shine1i
a48bb53e14 cleanup 2026-02-04 13:28:39 +01:00
shine1i
931891b207 add canvaslab 2026-02-04 13:22:41 +01:00
Roland Tannous
a295a624ce Merge pull request #20 from unslothai/feature/datasets-endpoint
Add Datasets Check-Format Endpoint
2026-02-04 00:45:31 +04:00
Roland Tannous
75bb6c08a5 Add datasets check-format endpoint 2026-02-03 20:42:25 +00:00
Roland Tannous
ddf8fd59eb Merge pull request #19 from unslothai/fix/dataset-utils-custom-mapping
fix custom_format_mapping flow for manual column mapping
2026-02-03 22:43:06 +04:00
Roland Tannous
9f9618980d fix custom_format_mapping flow for manual column mapping 2026-02-03 18:42:07 +00:00
Roland Tannous
5ca7de3698 Merge pull request #18 from unslothai/refactor/add-dataset-detection-status-flag
Add `requires_manual_mapping` Flag for Dataset Detection
2026-02-03 22:22:50 +04:00
Roland Tannous
8bd06e3e35 Add Flag for Dataset Detection 2026-02-03 18:21:05 +00:00
Roland Tannous
c88bf7c1a7 Merge pull request #17 from unslothai/fix/refactor-dataset-utils-part2
Fix/refactor dataset utils part2
2026-02-03 22:04:03 +04:00
Roland Tannous
f57757231b remove duplicates from dataset_utils.py 2026-02-03 18:03:01 +00:00
Roland Tannous
08cfa1ab64 Merge pull request #16 from unslothai/refactor/inference-api-routes-part-1
refactor/inference-api-routes-part-1
2026-02-03 21:00:18 +04:00
Roland Tannous
b4ec0389f0 refactor/inference-api-routes-part-1 2026-02-03 16:57:57 +00:00
Roland Tannous
6df5b1eded Merge pull request #15 from unslothai/enhance/refactor-dataset-utils
Refactor `dataset_utils.py` into focused modules
2026-02-03 18:40:53 +04:00
Roland Tannous
62ddcfa019 Refactor [dataset_utils.py](cci:7://file:///home/support/new-ui-prototype/studio/backend/utils/datasets/dataset_utils.py:0:0-0:0) into focused modules 2026-02-03 14:38:02 +00:00
Daniel Han
4f75ec2fc8 Add vLLM + torch < 2.9.0 + SM100 compatibility check (#3973)
vLLM's distributed module (device_communicators) crashes with std::bad_alloc
when imported on SM100 GPUs (B200/B100/Blackwell) with torch < 2.9.0.

This adds an early check that runs before vLLM is imported, providing a
helpful error message instead of a cryptic C++ exception.

The check:
1. Detects if vLLM is installed
2. Checks if torch version is < 2.9.0
3. Checks if any GPU is SM100 (Blackwell)
4. If all conditions met, raises RuntimeError with clear upgrade instructions
2026-02-03 03:10:24 -08:00
Daniel Han
d5f5b7d6a6 Add TRL truncation regression and metadata loss fixes (Fixes 1 and 3) (#3971)
* Add TRL truncation regression and metadata loss fixes

Fix 1: TRL 0.24.0-0.25.1 right-truncation regression
- These versions pass max_length=self.max_prompt_length and truncation=True
  to the tokenizer, which right-truncates prompts and strips the assistant
  turn suffix
- Use regex to remove these kwargs from the generated code

Fix 3: Metadata loss for chat_template_kwargs
- TRL 0.24.0+ extracts prompts = [x["prompt"] for x in inputs], losing metadata
  like reasoning_effort
- Inject code to store per-sample chat_template_kwargs on self before extraction
- Preserve these kwargs in prompts_text generation for all TRL versions

Tested with TRL versions 0.22.2, 0.23.1, 0.24.0, 0.25.1, 0.26.2, and 0.27.1.

* Update Fix 1 comment with detailed TRL version behavior explanation

Expand the comment for the TRL 0.24.0-0.25.1 truncation regression fix
to clarify what each TRL version does:

- TRL 0.22.2-0.23.1: Uses truncate_with_protected_tokens() for smart
  truncation that preserves rightmost tokens and protects special tokens
- TRL 0.24.0-0.25.1: Removed smart truncation, passes kwargs directly
  to tokenizer (max_length, truncation=True, add_special_tokens=False)
- TRL 0.26.2+: Removed these kwargs entirely

The fix removes these problematic kwargs so 0.24.0-0.25.1 behaves like
0.26.2+ (no tokenizer-level truncation).

---------

Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com>
2026-02-03 03:00:12 -08:00
Daniel Han
8f44ae0eda Fix num_train_epochs=None causing TypeError in GRPOConfig (#3972)
When users pass `num_train_epochs=None` to GRPOConfig (relying on
max_steps to control training duration), Trainer.__init__ fails with:

  TypeError: '>' not supported between instances of 'NoneType' and 'int'

This happens because transformers.Trainer does `args.num_train_epochs > 0`
in its __init__ which fails when the value is None.

This fix converts None to 3.0 (the default) before Trainer initialization.
The actual training duration is still controlled by max_steps since it
takes precedence when both are set.

Example that now works:
```python
config = GRPOConfig(
    num_train_epochs=None,  # Previously caused TypeError
    max_steps=500,          # This controls actual duration
    ...
)
```
2026-02-03 02:48:40 -08:00
Roland Tannous
d8183237f3 Merge pull request #14 from unslothai/feature/pydantic-models-update
Feature/pydantic models update
2026-02-03 14:47:12 +04:00
Roland Tannous
7bb0aeb756 add grad_norm and num_tokens to TrainingProgress response object 2026-02-03 10:35:58 +00:00
Daniel Han
41417693e4 Fix Vision GRPO string prompts and OpenEnv async compatibility (#3964)
* [fix] Vision GRPO string prompts and OpenEnv async compatibility

- Guard prepare_multimodal_messages in GRPO trainer to skip processing
  when prompts are pre-templated strings. Notebooks that pre-apply
  apply_chat_template() produce strings with image tokens already
  embedded; calling prepare_multimodal_messages on those crashes with
  TypeError.
- Apply nest_asyncio when OpenEnv EnvClient exposes async reset/step,
  so scripts using run_until_complete() wrappers work in all contexts.
- Add wrapper to call patch_torchcodec_audio_decoder() from unsloth_zoo
  for AudioDecoder dict-compatibility.

* Add apply_chat_template guard for pre-templated string prompts in Vision GRPO

When notebooks pre-apply apply_chat_template, prompts become strings.
The existing guard skips prepare_multimodal_messages for strings. This
adds a second guard to skip apply_chat_template in the forward_kwargs
block, using prompts directly as prompts_text instead. Covers both
TRL 0.25.x (no tools param) and TRL 0.26.2+ (with tools=self.tools).
Non-matching replacements silently pass for older TRL versions.

* Add TRL 0.25.1 single-line variant for apply_chat_template guard

TRL 0.25.1 uses single-line formatting for apply_chat_template:
  apply_chat_template({"prompt": prompt}, ...)["prompt"]

While TRL 0.26.2+ uses multi-line formatting:
  apply_chat_template(
      {"prompt": prompt}, ...
  )["prompt"]

Add both variants to ensure full backwards compatibility.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-03 02:03:46 -08:00
Daniel Han
949f1ce573 Fix TRL 0.27.0 GRPO compatibility and PEFT model handling (#3969)
* Fix TRL 0.27.0 GRPO compatibility and PEFT model handling

- Remove use_reentrant=False from gradient_checkpointing_kwargs for TRL 0.27.0+
  TRL 0.27.0 auto-sets use_reentrant=False in GRPOConfig.__post_init__, but
  Unsloth gradient checkpointing requires use_reentrant=True. This adds a
  post-init cleanup that removes the setting when present.

- Handle prepare_peft_model standalone function pattern for TRL 0.22.0+
  TRL changed from self._prepare_peft_model() method to prepare_peft_model()
  standalone function. Both patterns are now bypassed to let Unsloth handle
  PEFT model preparation.

Tested with TRL versions 0.22.2, 0.23.1, 0.24.0, 0.25.1, 0.26.2, and 0.27.1.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-03 01:56:31 -08:00
Kaitao Yang
7dd3ae8768 reduce code duplication (#3877)
* reduce code duplication

* address reviewer feedback: keep original function name

- Keep original function name `_offload_frozen_module_for_training`
- Make `offload_device` parameter Optional (can be None)
- Keep original error handling (return None for missing modules_to_save)
- Maintain code deduplication by reusing the helper function

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-02-03 00:27:49 -08:00
Daniel Han
8f0b57ae18 Use standard gradient checkpointing for small sequence lengths (#3867)
* Use standard gradient checkpointing for small sequence lengths

When max_seq_length < 512, the overhead of gradient offloading in
gc="unsloth" mode is not worth it. Benchmarks on B200 show:

| seq_len | gc=unsloth | gc=True  | Difference |
|---------|------------|----------|------------|
| 256     | 6,803 t/s  | 6,993 t/s| +2.8%      |
| 384     | 9,889 t/s  | 9,963 t/s| +0.7%      |
| 512     | 13,151 t/s | 13,092 t/s| -0.4%     |
| 1024    | 26,662 t/s | 25,094 t/s| -5.9%     |

The crossover point is around seq_len 384-512. For sequences shorter
than 512, we now automatically use standard gradient checkpointing
instead of the custom offloading implementation.

Additionally, when user explicitly sets use_gradient_checkpointing to
True or False in get_peft_model, it now correctly overrides any
previous "unsloth" patching from from_pretrained. This ensures
consistent behavior regardless of the order of function calls.

Updated in three locations:
- FastLlamaModel.get_peft_model (llama.py)
- FastLanguageModel.from_pretrained (loader.py)
- FastModel.from_pretrained (loader.py)

* Refactor: extract gradient checkpointing heuristic into utility function

Addresses code review feedback to reduce duplication. The gradient
checkpointing heuristic logic was duplicated in 3 places:
- FastLlamaModel.get_peft_model (llama.py)
- FastLanguageModel.from_pretrained (loader.py)
- FastModel.from_pretrained (loader.py)

Created apply_unsloth_gradient_checkpointing() utility function in
_utils.py that handles:
- Heuristic: seq < 512 falls back to standard gc
- Explicit True/False overrides unpatch previous patching
- Returns the effective use_gradient_checkpointing value

Net reduction of ~6 lines while improving maintainability.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-02 23:57:09 -08:00
Lei Zhenyuan
8228d89b30 fix for intel devices torch compile configs (#3952)
* fix for intel devices

* Refactor torch_compile_options to use base options with device-specific extensions

- Extract common options into base_options shared by all device types
- CUDA devices get additional CUDA-specific options
- XPU, HIP, and other devices use base options only
- Reduces code duplication and improves maintainability

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-02 21:15:06 -08:00
Roland Tannous
6c017686d9 delete tmp directory 2026-02-02 20:02:18 +00:00
Roland Tannous
ecebd30eca update pydantic models for Models and Training routes 2026-02-02 20:00:23 +00:00
Roland Tannous
4fc9bbf0f1 update pydantic models for Models and Training routes 2026-02-02 20:00:04 +00:00
Roland Tannous
c07b81c083 fix: add utils/models directory that was ignored by gitignore 2026-02-02 19:52:34 +00:00
Roland Tannous
95fe3bed83 fix: restore models directory files deleted during restructure 2026-02-02 19:36:30 +00:00
Roland Tannous
d1dd8a61d6 Merge pull request #11 from unslothai/fix/move-claude-file-to-frontend
moved CLAUDE.md into frontend directory
2026-02-02 22:25:52 +04:00
Roland Tannous
54846cf59c moved CLAUDE.md into frontend directory 2026-02-02 18:25:11 +00:00
Roland Tannous
ae7313193a Merge pull request #10 from unslothai/feature/add-model-training-yaml-configs
added model yaml config files
2026-02-02 22:23:38 +04:00
Roland Tannous
bfa03ebd3c added model yaml config files 2026-02-02 18:22:14 +00:00
Roland Tannous
92d4f52d7d Merge pull request #9 from unslothai/fix/remove-backend-backend-redundant-folder
remove redundant backend.backend folder
2026-02-02 22:04:38 +04:00
Roland Tannous
7448d2401b remove redundant backend.backend folder 2026-02-02 18:03:56 +00:00
Roland Tannous
55eb0bb66a migrated cli. fixed imports. fixed unsloth studio command logic 2026-02-02 17:50:11 +00:00
Roland Tannous
a1b8cd6696 Merge cli from ui-early-access and fix imports 2026-02-02 17:23:30 +00:00
Roland Tannous
396b8fb9a4 Merge pull request #8 from unslothai/feature/frontendui-export-page-client-rebased
feat: Add export page, HF model/dataset search, PDF/DOCX extraction logic, modern AUI API, and code cleanup
2026-02-02 19:55:54 +04:00
shine1i
13ce83baf6 refactor: update quantization options in export constants, remove unused entries, and add F32 option 2026-02-02 16:49:07 +01:00
shine1i
4dc19b63f5 feat: add DOCX attachment support using mammoth, extend attachment handling to process and extract text from DOCX files 2026-02-02 16:33:10 +01:00
shine1i
3a15b915fc feat: add PDF attachment support using unpdf, extend attachment handling and runtime to process and extract text from PDFs 2026-02-02 16:11:39 +01:00
shine1i
db92ab230b refactor: enhance chat and UI elements with animations, tooltips, and improved styling; streamline sidebar, navbar, and chat-page interactions in top bar 2026-02-02 15:25:15 +01:00
shine1i
303865438f refactor: replace depreceated useAssistantRuntime with useAui, update runtime API calls across chat features for consistency 2026-02-02 15:03:01 +01:00
shine1i
a87f14eccd refactor: remove unused components, mock data, and redundant logic across chat features; streamline settings and runtime handling for better maintainability 2026-02-02 14:52:58 +01:00
shine1i
0d30950b75 refactor: remove unused model and dataset configurations, simplify export-page logic by eliminating modelInfo dependency and redundant params display 2026-02-02 14:22:35 +01:00
shine1i
6abe1d6e35 refactor: streamline combobox logic, improve search handling, and remove unused elements across model and dataset sections 2026-02-02 14:06:34 +01:00
shine1i
99bea160b3 refactor: simplify model and dataset combobox logic, remove curated items, and streamline search handling across components 2026-02-02 13:16:08 +01:00
shine1i
af3e8c20ee refactor: format and clean up imports, hooks, and UI components for consistent structure and readability across models and datasets sections 2026-02-02 12:51:04 +01:00
shine1i
e705230499 feat: add Hugging Face search integration for datasets and models, extend infinite scroll support, and improve UI components with animations and tooltips 2026-02-02 12:45:41 +01:00
shine1i
e9857dab0f feat: replace config summary with model export feature, including export methods, quantization options, and new UI components 2026-02-02 11:08:31 +01:00
Roland Tannous
1179735255 Merge pull request #7 from unslothai/fix/restructure-repo-root
Add studio root folder and make frontend and backend as subfolders
2026-02-02 13:18:20 +04:00
Roland Tannous
8b80c71fe1 add studio root folder 2026-02-02 09:14:35 +00:00
Roland Tannous
544d6944d1 root studio folder 2026-02-02 09:13:49 +00:00
Roland Tannous
6db66ab1ff Merge pull request #6 from unslothai/fix/remove-backend-backend-directory
Fix/remove backend backend directory
2026-02-02 10:35:50 +04:00
Roland Tannous
9aec1c6cd4 remove redundant backend.backend directory 2026-02-02 06:35:00 +00:00
Datta Nimmaturi
5cf7b4e34f [fix] qwen3-guard tokenizer (#3959)
* fix for qwen3-guard tokenizer

* Better qwen3guard check

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-02-01 22:09:15 -08:00
Roland Tannous
ed8839e009 Merge pull request #5 from unslothai/feature/backend-core-restructuring
backend restructuring and housekeeping
Changes made:

- Moved all files from backend/backend/ → backend/core/ with nested subdirectories
- Created init.py for each submodule with proper exports
- Updated all imports in routes (routes/training.py, routes/models.py)
- Updated internal relative imports to use .. for parent references
- Deleted old backend/backend/ directory
- Moved shared modules (path_utils.py , model_config.py) to utils/ subfolder
2026-02-02 09:56:59 +04:00
Roland Tannous
b4861d345b Merge branch 'nightly' into feature/backend-core-restructuring 2026-02-02 09:56:19 +04:00
Roland Tannous
ce34bcd0d2 merge conflict .gitignore 2026-02-02 05:53:43 +00:00
Roland Tannous
75bd759108 fix .gitignore merge conflict 2026-02-02 05:51:25 +00:00
Roland Tannous
023405c76a backend restructuring and housekeeping 2026-02-02 05:48:09 +00:00
Roland Tannous
2761c59012 Merge pull request #4 from unslothai/feature/backend-draft
Pushing the initial draft of the backend
2026-02-02 09:33:01 +04:00
sshah229
c042223a7a moved utils, dataset_utilsand datasets, updated the startTraining pydantic model 2026-02-01 16:49:42 -07:00
Roland Tannous
7b70d8fe70 Merge pull request #3 from unslothai/feature/frontendui-onboarding-dashboard
feat: onboarding & dashboard UI
2026-02-01 14:00:54 +04:00
sshah229
d593b069e2 Added the training and models routes 2026-02-01 01:23:16 -07:00
shine1i
aeb5382f0a feat: track and display reasoning duration, enhance runtime with adapter for copying during inference and edit and UI integration 2026-02-01 09:08:23 +01:00
shine1i
f05db56439 refactor: improve reasoning UI with animations and dynamic behavior, minor CSS and layout tweaks 2026-02-01 08:31:22 +01:00
shine1i
614500c117 refactor: chat input bg with fade, reuse it in composer view as well 2026-02-01 08:09:17 +01:00
shine1i
dea9cf1911 refactor: migrate chat sidebar and UI components to modular sidebar framework, some minor UI tweaks (sidebar, lines) 2026-02-01 07:50:53 +01:00
shine1i
cf0cda5b65 chore: update labels and UI minor adjustments for clarity 2026-02-01 06:57:15 +01:00
shine1i
52bd5ebebd feat: add frontend UI codebase 2026-01-31 19:34:16 +01:00
Datta Nimmaturi
2deb583389 [trl] vllm trl topk fixup (#3935)
* [transformers] [v5] remove unused hybridcache (#3910)

* remote unused hybridcache

* cleanup

* Fix top_k on trl GRPO

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-01-31 06:34:07 -08:00
Roland Tannous
a30967a69c added __init__.py for backend 2026-01-31 08:38:29 +00:00
Roland Tannous
0ef09af008 git repo skeleton structure 2026-01-31 08:27:01 +00:00
Roland Tannous
b5aa137b7f first commit 2026-01-27 21:19:48 +04:00
Pádraic Slattery
a09bdb6adb chore: Update outdated GitHub Actions version (#3936) 2026-01-27 07:19:38 -08:00
pre-commit-ci[bot]
a34eb55ecd [pre-commit.ci] pre-commit autoupdate (#3937)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.14.13 → v0.14.14](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.13...v0.14.14)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-01-27 07:18:26 -08:00
Daniel Han
29edef68a8 Update pyproject.toml 2026-01-27 07:17:45 -08:00
pluesclues
3fde3a91ee Grpo compile settings update (#3927)
* Add torch compile options for GRPOTrainer

* Update CUDA settings based on device capability

* Add triton persistent TMA matmul condition

* Fix syntax for triton.enable_persistent_tma_matmul

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update rl.py

* Update rl.py

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-01-24 17:17:55 -08:00
Michael Han
f3efb70823 Embedding model fine-tuning support 2026-01-22 21:35:46 -08:00
Rachel Li
ca9cb26eed Guard torch.compile on ROCm when triton_key is missing (#3923)
* Guard torch.compile on ROCm when triton_key missing

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update unsloth/import_fixes.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Tighten ROCm Triton import handling

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Rachel Li <rachelliqx07@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-22 15:46:08 -08:00
Michael Han
08e07e7865 Embedding model support 2026-01-22 14:22:03 -08:00
Daniel Han
a78c6a62e4 Update vision.py 2026-01-22 07:40:51 -08:00
electroglyph
101ab17728 add FastSentenceTransformer for easily finetuning SentenceTransformer models (#3719)
* add FastSentenceTransformer

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Gemini code review suggestions

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unsloth-zoo patch only fixed usage for XLMRobertaForMaskedLM, this is a fix for XLMRobertaModel

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor do_lower_case

* add some comments

* force disable FP8 loading

* refactor pooling detection, add missing pooling types

* add save_pretrained_merged method which gets modules and config

* fix _save_pretrained_merged

* rename read_pooling_mode, load modules instead of hard-coding em

* comment

* revert save_pretrained_merged change

* propagate trust_remote_code properly

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add super hacky mpnet patch from hell

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor _load_modules, add for_inference to from_pretrained, add transformers 5 code for mpnet, add distilbert patches

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add ModernBert

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* deberta-v2 support (provisional), fix remote_code

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add generic add_pooling_layer logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix for missing config

* add push_to_hub_merged

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* edit messages, throw exception if no HF token

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix device_map mismatch

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add comments, move import, other suggestions by Datta0

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* re-add adapter removal to save_pretrained_merged, but if saving to folder which had adapters before, leave them

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unsloth branding to save_pretrained_merged

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* propagate dtype to internal module when loading for inference

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix mpnet gradient checkpointing for torch >= 2.9

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* same thing for transformers 5, oops =)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix FastSentenceTransformer performance: 6x speedup via torch.compile + SDPA

The original implementation was 31% slower than naive SentenceTransformer due to
conflicting decorators from Unsloth's auto-compiler (@torch.compile on attention
modules but @torch.compiler.disable on sub-modules).

Changes:
- Add fast encoder path that bypasses Unsloth patching for encoder models
- Use native torch.compile with mode="reduce-overhead" for 6x speedup
- Auto-detect and enable SDPA for models that support it (BERT, RoBERTa, etc.)
- Change defaults: load_in_16bit=True, load_in_4bit=False (16-bit is optimal)
- Change default: use_gradient_checkpointing=False (conflicts with torch.compile)
- Add UNSLOTH_COMPILE_DISABLE=1 env var to fall back to old path if needed

Supported encoder types: mpnet, bert, distilbert, roberta, xlm-roberta, albert, electra

Benchmark results (BS=32, seq_len=128):
- Naive 16-bit LoRA:     13-50ms per iter
- Unsloth 16-bit LoRA:   2-9ms per iter (5.4x-6.7x faster)
- Memory usage:          61MB-1.3GB (even largest model fits easily)

Note: 4-bit + torch.compile has a PyTorch bug (pytorch/pytorch#90665).
4-bit is also 1.7-1.9x slower than 16-bit due to dequantization overhead,
so 16-bit is recommended for these small encoder models anyway.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use Unsloth's prepare_model_for_kbit_training for consistency

Changed from peft.prepare_model_for_kbit_training to
unsloth.models._utils.prepare_model_for_kbit_training.

Unsloth's version provides:
- Float32 mixed precision upcasting for LoRA layers
- Better numerical stability
- Consistency with rest of Unsloth codebase

* Use relative imports and add float16 machine support

- Changed absolute import to relative: from ._utils import prepare_model_for_kbit_training
- Added SUPPORTS_BFLOAT16 import for proper dtype detection
- Handle devices that don't support bfloat16 by falling back to float16

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add save_pretrained_torchao

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add auto-compile for torch.compile based on training step breakeven analysis

Changes:
- Change default compile_mode from "reduce-overhead" to "default" since CUDA
  Graphs (used by reduce-overhead) is incompatible with PEFT/LoRA
- Add _estimate_compile_threshold() to calculate minimum steps needed for
  torch.compile to be beneficial based on model parameter count
- Add _apply_torch_compile() helper with accelerate unwrap_model bug workaround
- Defer torch.compile application to trainer initialization time so we can
  check max_steps against the breakeven threshold
- Patch SentenceTransformerTrainer to auto-apply compile when max_steps
  exceeds the calculated threshold

Breakeven thresholds (with 1.2x safety margin):
- 22M params (MiniLM): ~1388 steps
- 110M params (mpnet): ~242 steps
- 335M params (snowflake): ~203 steps

This ensures torch.compile warmup cost is only paid when training is long
enough to benefit from the speedup.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* do QAT preparation for fast path

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix double loading model, thanks Etherl

* do mpnet gradient checkpoint patch if gc is enabled

* remove distilbert patches from mpnet fix

* sanity check on model params, thanks Etherl

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add save_pretrained_gguf, thanks Etherl

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refine compile threshold estimation for sentence transformers

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-01-22 07:35:55 -08:00
Daniel Han
09ebbf6e63 Versioning 2026-01-22 07:33:59 -08:00
Roland Tannous
affd52e868 Merge pull request #49 from unslothai/feature/default-param-yaml
Yaml config for default parameters
2026-01-22 05:42:42 +04:00
sshah229
30c75e6d41 fixed parameters (finetune language, vision, attention layers, and mlp_modules) not updating 2026-01-21 02:40:25 -07:00
Daniel Han
509fd4227c Handle Transformers 5 vLLM import errors (#3908)
* Handle Transformers 5 vLLM import errors

* Deduplicate vLLM transformers mismatch handling

---------

Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com>
2026-01-20 01:02:39 -08:00
Roland Tannous
02982ceeba set create public gradio share link to true 2026-01-20 07:36:57 +00:00
Roland Tannous
32e72a12ae add studio command line argument to start unsloth studio UI 2026-01-20 07:27:05 +00:00
pluesclues
9172be8cfc Fix vllm ipykernel patch (#3907)
* Implement vLLM patch for notebook detection

Add patch for vLLM compatibility in notebook environments.

* Fix sys.stdout.fileno for vLLM compatibility

Patch sys.stdout.fileno for vLLM compatibility in notebooks.

* Add patch_vllm_for_notebooks to initialization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Harden vLLM notebook stdout patch

* Use logger for vLLM notebook patch

* Clarify vLLM notebook patch log message

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com>
2026-01-19 21:04:27 -08:00
pre-commit-ci[bot]
157c929354 [pre-commit.ci] pre-commit autoupdate (#3905)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.14.11 → v0.14.13](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.11...v0.14.13)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-01-19 18:42:13 -08:00
electroglyph
d80e69258c add weight-only int8 QAT scheme and update tests for torchao 0.15.0 (#3859)
* add int8 weight-only QAT scheme, add test, fix tests for current torchao version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change quantization to PerAxis

* lambda =/

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add torchao messages, remove group_size from int8

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* raise exception on missing torchao

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* touch up the torchao imports

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-01-16 09:32:29 +05:30
Michael Han
fda54f2634 Update README.md 2026-01-15 08:01:01 -08:00
Daniel Han
c80faef722 Update pyproject.toml 2026-01-15 07:00:25 -08:00
Daniel Han
f719d2b7bd Update _utils.py 2026-01-15 05:09:26 -08:00
pluesclues
e83cbc9fe0 Merge pull request #3628 from pluesclues/alternative_compute_chunked_loss
Chunk Across Batch and Context length for logprob calculations for grpo
2026-01-15 05:01:19 -08:00
Daniel Han
4fc06bd7fb Merge pull request #3895 from Datta0/rl_ref_trl
[trl] use non lora model as base for RL
2026-01-15 03:33:09 -08:00
pre-commit-ci[bot]
e360386719 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-01-15 11:25:11 +00:00
Datta Nimmaturi
b204148136 use non lora model as base for RL 2026-01-15 11:23:21 +00:00
Daniel Han
5c8ccf0671 Merge pull request #3879 from ducviet00/fix-gc
Disable gradient checkpointing when explicitly off for vision
2026-01-14 04:32:02 -08:00
Michael Han
b03b014336 Update template.md 2026-01-14 03:45:35 -08:00
Daniel Han
f4e378dcc3 Merge pull request #3880 from f14-bertolotti/f14-wrong-ndim
wrong number of dimensions
2026-01-12 21:32:48 -08:00
Daniel Han
1e790b03b2 Apply suggestion from @danielhanchen 2026-01-12 21:32:20 -08:00
Daniel Han
640404a93e Merge pull request #3881 from unslothai/pre-commit-ci-update-config
[pre-commit.ci] pre-commit autoupdate
2026-01-12 21:29:58 -08:00
pre-commit-ci[bot]
ab68311fdd [pre-commit.ci] pre-commit autoupdate
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.14.10 → v0.14.11](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.10...v0.14.11)
2026-01-12 19:08:13 +00:00
Francesco Bertolotti
e15445f13f wrong number of dimensions 2026-01-12 16:19:43 +01:00
Duc-Viet Hoang
432864cb25 Complete disable gradient_checkpointing for vision when use_gradient_checkpointing=False 2026-01-12 10:03:54 +07:00
Daniel Han
6aedc769ee Merge pull request #3865 from ykaitao/ktyang_configure_embedding_for_training
reduce code duplication by _offload_frozen_module_for_training
2026-01-09 21:02:55 -08:00
danielhanchen
3ffc8b1a5b fix: use peft.utils.other for ModulesToSaveWrapper import
ModulesToSaveWrapper was removed from peft.tuners.tuners_utils in PEFT
0.16.0. The class has been available in peft.utils.other since at least
PEFT 0.7.1, which is the minimum version Unsloth requires.

This fixes the ImportError when using PEFT >= 0.16.0.
2026-01-09 23:24:39 +00:00
Kaitao Yang
f7e17fb513 reduce code duplication by _offload_frozen_module_for_training 2026-01-09 06:07:38 -08:00
Daniel Han
6b7063713a Merge pull request #3869 from hnxnq7/fix-kaggle-telemetry-detection
Fix Kaggle telemetry misclassification when COLAB_ keys exist
2026-01-08 17:29:23 -08:00
pre-commit-ci[bot]
76d3e469a0 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-01-09 00:33:01 +00:00
Rachel Li
93cf0ee805 Fix Kaggle telemetry detection & address review feedback
- Fix Kaggle misclassification by prioritizing filesystem markers over env vars
- Preserve telemetry pings when statistics is explicitly provided
- Replace bare except with except Exception
- Minor cleanup based on automated review feedback
2026-01-08 19:32:33 -05:00
Rachel Li
7b7287c9b9 Fix telemetry ping regression for explicit statistics
Fixed Codex regression: keep snapshot_download pings for explicit statistics values; detection only runs when statistics is None. Also replaced bare except.
2026-01-08 19:20:24 -05:00
pre-commit-ci[bot]
d0f3e4d32e [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-01-09 00:04:59 +00:00
Rachel Li
8c7b89227e Update _utils.py
fixed indentation
2026-01-08 19:04:30 -05:00
pre-commit-ci[bot]
f52d370da6 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-01-08 23:49:55 +00:00
Rachel Li
c686140c68 Fix Kaggle telemetry misclassification when COLAB_ keys exist
Problem: Kaggle notebook environments can expose both KAGGLE_* and COLAB_* environment keys. _get_statistics currently checks COLAB_ before KAGGLE_, causing Kaggle sessions to be labeled colab/colabpro.

Prefer filesystem markers (e.g. /kaggle/working, /content + /opt/colab) before env-key heuristics, then fall back to the existing env-key checks. This avoids misclassification when providers leak overlapping env vars.

Kaggle test notebook: https://www.kaggle.com/code/hnxnq07/kaggle-stats-gathering-test
2026-01-08 18:44:22 -05:00
Daniel Han
0f07e36813 Merge pull request #3612 from Vangmay/feature/raw-text-dataprep
Feature/raw text dataprep
2026-01-08 03:38:15 -08:00
pre-commit-ci[bot]
3620564025 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-01-08 11:35:21 +00:00
Daniel Han
16a2d901fa Fix bugs and add improvements to RawTextDataLoader
- Fix test file: use return_tokenized instead of return_tensors
- Fix test file: use text_dataset instead of undefined dataset variable
- Move parameter validation to constructor (fail fast on invalid params)
- Add labels field in tokenized output for causal LM training
- Add empty file handling with clear error message
- Add tests for constructor validation and labels field
2026-01-08 11:35:00 +00:00
Daniel Han
e6536a5884 Merge pull request #3863 from unslothai/fix/fbgemm-cutlass-errors-sm100
Fix FBGEMM/CUTLASS errors on SM100 (Blackwell) GPUs
2026-01-08 03:19:53 -08:00
pre-commit-ci[bot]
2ee55010d3 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-01-08 04:15:17 +00:00
danielhanchen
f26948b493 Fix FBGEMM/CUTLASS errors on SM100 (Blackwell) GPUs
This PR fixes the "Arch conditional MMA instruction used without targeting
appropriate compute capability. Aborting." errors that occur when using
FBGEMM on Blackwell GPUs (B200/B100, SM100).

Changes:
- Add stderr filters in import_fixes.py for CUTLASS/FBGEMM MMA errors
- Add warning filters for various deprecation messages
- Update check_fbgemm_gpu_version() to disable FBGEMM instead of raising
  an error when old versions are detected
- Update test_has_fbgemm() in fp8.py to catch broader CUTLASS/CUDA errors
  and gracefully fall back to Triton kernels
- Update loader_utils.py to disable FBGEMM instead of raising ValueError
  for old fbgemm_gpu versions

The key behavior change is that FBGEMM errors no longer crash the script.
Instead, FBGEMM is disabled and Triton kernels are used automatically.
This allows Unsloth to work on SM100 GPUs where CUTLASS SM90 kernels fail,
and also gracefully handles old FBGEMM versions.
2026-01-08 04:14:53 +00:00
Daniel Han
d930479aa7 Merge pull request #3857 from Datta0/modelscope_stats
[ModelScope] Disable stats when modelscope is being used
2026-01-06 02:56:55 -08:00
pre-commit-ci[bot]
b73f6a9be0 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-01-06 10:00:17 +00:00
Datta Nimmaturi
dc83a17239 Check env var explicitly
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-06 15:30:06 +05:30
Datta Nimmaturi
67caa21231 Disable stats when modelscope is being used 2026-01-06 09:53:20 +00:00
Daniel Han
52935bb00f Versioning 2026-01-05 07:37:08 -08:00
Daniel Han
3b4ac4aa1c Merge pull request #3843 from unslothai/fix-grpo-version-compat
Unify Version usage and fix TRL version handling
2026-01-05 06:07:41 -08:00
Daniel Han
7e13c424c9 Merge pull request #3851 from unslothai/grpo-fix-on-pr3754
GRPO: restore model mode after generate (stacked on #3754)
2026-01-05 06:05:24 -08:00
danielhanchen
3240ab3391 Merge main into grpo-fix-on-pr3754 2026-01-05 14:02:18 +00:00
danielhanchen
3efb799aad Revert rl_replacements GRPO edits 2026-01-05 13:55:08 +00:00
danielhanchen
6ede9a735d Fix GRPO training state restoration 2026-01-05 13:50:48 +00:00
pre-commit-ci[bot]
18f335cec5 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-01-05 13:39:16 +00:00
danielhanchen
d5fcc7ddde Restore TRL version fallback in rl.py 2026-01-05 13:39:03 +00:00
Daniel Han
7f93aa0a78 Merge branch 'main' into fix-grpo-version-compat 2026-01-05 05:31:42 -08:00
danielhanchen
9d4ccdbff5 Drop rl.py GRPO changes from this branch 2026-01-05 13:29:58 +00:00
Daniel Han
a1eaf90c7b Merge pull request #3849 from unslothai/fix-pdl-use-vllm-version-check
Replace GitHub API check with vLLM version check for PDL fix
2026-01-05 05:22:16 -08:00
pre-commit-ci[bot]
cd8c6d773d [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-01-05 13:19:44 +00:00
Daniel Han
44f420db4f Address review feedback: add constant and debug logging 2026-01-05 13:19:37 +00:00
Daniel Han
e8def1194d Replace GitHub API check with vLLM version check for PDL fix
The GitHub issue check had issues:
1. Network latency on import
2. Issue being closed does not mean the fix is in the installed vLLM version

Now skip the PDL workaround if vLLM version > 0.13.2, which is when
the upstream fix is expected to be included.
2026-01-05 13:15:17 +00:00
Daniel Han
45e27a841d Merge pull request #3836 from ykaitao/remove_unused_variable_BlockDiagonalCausalMask
remove unused variable BlockDiagonalCausalMask
2026-01-05 04:42:25 -08:00
Daniel Han
6c79d84318 Merge pull request #3842 from unslothai/fix-vllm-chat-template-sync
Sync chat_template from tokenizer to vLLM
2026-01-05 04:38:39 -08:00
Daniel Han
10926b0e3a Merge pull request #3841 from unslothai/fix-vllm-pdl-blackwell
Fix vLLM PDL bug on Blackwell GPUs (B200/B100)
2026-01-05 04:37:58 -08:00
Daniel Han
c8a585a589 Keep PDL module check but remove unnecessary env var setting
The check skips the GitHub API call for old vLLM versions.
No need to set TRITON_DISABLE_PDL for versions without PDL support.
2026-01-05 12:34:32 +00:00
Daniel Han
bc0f1514f2 Remove unnecessary PDL module existence check
Old vLLM versions without PDL modules don't need the fix.
The patching code already handles missing modules gracefully.
2026-01-05 12:32:16 +00:00
Daniel Han
f22a35d903 Add None check for vLLM tokenizer
- Check _vllm_tok is not None before accessing attributes
- Use getattr for safer chat_template access
2026-01-05 10:02:11 +00:00
pre-commit-ci[bot]
defbf038b2 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-01-05 07:03:35 +00:00
danielhanchen
8e941a6422 Improve TRL compatibility and GRPO state restore 2026-01-05 07:02:36 +00:00
Daniel Han
9860d8859d Fix PDL patch: target utils.py source module and clear lru_cache
- Patch vllm.lora.ops.triton_ops.utils directly where supports_pdl is defined
- Clear lru_cache before patching to prevent stale cached results
- Add fused_moe_lora_op to consumer modules list
- Use *args, **kwargs in fake function for compatibility
2026-01-05 06:53:42 +00:00
Daniel Han
ff846db6f6 Combine nested if statements for clarity 2026-01-05 05:25:53 +00:00
pre-commit-ci[bot]
dabfa79a16 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-01-05 05:24:59 +00:00
Daniel Han
e627adb4d5 Address review feedback: refactor and scan all GPUs
- Add _spec_exists helper function to reduce duplication
- Scan all GPUs for SM100 instead of just device 0
- Use loop for module patching to improve maintainability
2026-01-05 05:24:52 +00:00
Daniel Han
5976c3f10f Add tokenizer fallback for chat_template sync 2026-01-05 05:10:24 +00:00
Daniel Han
e727c43685 Sync chat_template from tokenizer to vLLM
When using base models with custom chat templates applied after loading,
vLLM's internal tokenizer may not have the chat_template set. This causes
issues during RL training with vLLM inference.

This fix syncs the chat_template from the processing_class (the tokenizer
you loaded and configured) to vLLM's internal tokenizer during trainer
initialization, but only if vLLM's tokenizer does not already have one set.
2026-01-05 05:03:56 +00:00
pre-commit-ci[bot]
1a329b9e4f [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-01-05 05:03:29 +00:00
Daniel Han
594d3baffe Fix vLLM PDL bug on Blackwell GPUs (B200/B100)
vLLM's LoRA Triton kernels use tl.extra.cuda.gdc_wait() for PDL
optimization on SM90+ GPUs. This fails on SM100 (Blackwell) during
CUDA graph capture because Triton's pipeliner cannot handle gdc_wait
in complex kernels.

This fix:
- Detects SM100 GPUs and applies the workaround automatically
- Sets TRITON_DISABLE_PDL=1 environment variable
- Monkey-patches supports_pdl to return False in lora_expand_op and
  lora_shrink_op
- Checks GitHub issue #30872 status (with 3s timeout) to auto-disable
  the workaround once the upstream fix is merged
- Includes quick internet connectivity check (0.5s) to avoid delays
  when offline

Fixes the error:
'tt.elementwise_inline_asm' op pipeliner doesn't know how to predicate this op
LLVM ERROR: Fatal pipeliner error

See: https://github.com/vllm-project/vllm/issues/30872
2026-01-05 05:02:53 +00:00
Kaitao Yang
d66548f904 remove unused variable BlockDiagonalCausalMask 2026-01-04 09:21:44 -08:00
Daniel Han
1dd67b372e Versioning 2026-01-04 06:12:44 -08:00
Daniel Han
eef05330ca Merge pull request #3835 from unslothai/quant-config-respect
Respect user quantization_config
2026-01-04 05:43:20 -08:00
Daniel Han
7bf648a882 Merge pull request #3834 from unslothai/rl-fixes
rl.py fixes: buffer reset, safer attribute access, typo fix
2026-01-04 05:25:45 -08:00
danielhanchen
15052dc8e7 Keep 4bit flag for fast_inference 2026-01-04 13:18:15 +00:00
danielhanchen
e72808553f Handle dict quantization_config flags 2026-01-04 13:14:03 +00:00
danielhanchen
3bfc927984 Respect user quantization_config 2026-01-04 13:03:06 +00:00
danielhanchen
0b1dbefacb Fix psutil.cpu_count() potentially returning None in save.py 2026-01-04 12:58:45 +00:00
danielhanchen
d1d9832e70 Handle older unsloth-zoo without reset_unsloth_gradient_checkpointing_buffers 2026-01-04 12:57:10 +00:00
danielhanchen
762ef9a20f rl.py fixes: buffer reset, safer attribute access, typo fix
1. Auto-reset gradient checkpointing buffers after trainer.train()
   - Import and call reset_unsloth_gradient_checkpointing_buffers() in
     prepare_for_training_mode wrapper to free memory after training
     while keeping buffers ready for subsequent runs

2. Replace eval/exec with safer getattr/setattr
   - eval(f"trl.trainer.{trainer}") -> getattr(trl.trainer, trainer)
   - exec(f"...{unwrap} = ...") -> setattr(current_trainer, unwrap, ...)
   - exec(f"Trainer.prediction_step=...") -> direct assignment

3. Fix psutil.cpu_count() potentially returning None
   - Change psutil.cpu_count()+4 to (psutil.cpu_count() or 1)+4
   - Prevents TypeError on systems where cpu_count() returns None

4. Fix typo: oriignal_is_vlm_text -> original_is_vlm_text
2026-01-04 12:21:39 +00:00
Daniel Han
dc538896a5 Merge pull request #3832 from ykaitao/ktyang_remove_redundant_code_has_block
remove redundant code of has_block
2026-01-03 23:18:27 -08:00
Kaitao Yang
d84602e549 remove redundant code of has_block 2026-01-03 22:38:37 -08:00
Daniel Han
6753691c92 Merge pull request #3822 from Fizza-Mukhtar/fix/llama-build-curl
Make llama.cpp CURL dependency optional when building from source
2026-01-03 22:12:50 -08:00
pre-commit-ci[bot]
975e36f888 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-01-02 16:58:04 +00:00
Fizza-Mukhtar
1b4abe7a4e Make llama.cpp CURL support optional during CMake builds 2026-01-02 08:55:58 -08:00
Fizza-Mukhtar
22210112c6 Make llama.cpp CURL support optional during CMake builds 2026-01-02 08:42:59 -08:00
Daniel Han
dc6df32f81 Merge pull request #3821 from unslothai/nightly
Bug fixes
2026-01-02 06:22:08 -08:00
Daniel Han
52aed3ad14 Bug fixes 2026-01-02 06:07:16 -08:00
Daniel Han
e109102a0d Merge branch 'main' into nightly 2026-01-02 06:06:11 -08:00
Daniel Han
5f2bd3c6e1 Merge pull request #3820 from unslothai/fix/fast-generate-wrapper-helpful-errors
Add helpful error messages for fast_generate when fast_inference=False
2026-01-02 06:02:52 -08:00
pre-commit-ci[bot]
bd45518ba0 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-01-02 13:58:50 +00:00
danielhanchen
a52c3e545a Add helpful error messages for fast_generate when fast_inference=False
When users load a model with fast_inference=False but then try to use
vLLM-style arguments with fast_generate, they previously got confusing
errors. This adds a wrapper that detects common mistakes and provides
helpful guidance:

- Using sampling_params: explains to use HF generate args instead
- Using lora_request: explains LoRA weights are already merged
- Passing text strings: shows how to tokenize input first

Changes:
- Add make_fast_generate_wrapper to _utils.py
- Apply wrapper in llama.py when fast_inference=False
- Apply wrapper in vision.py when fast_inference=False
2026-01-02 13:58:08 +00:00
Daniel Han
2f7f260213 Merge branch 'main' into nightly 2026-01-02 05:40:32 -08:00
Daniel Han
ee0a242429 Update import_fixes.py 2026-01-02 05:05:47 -08:00
Daniel Han
7459010ab3 Update import_fixes.py 2026-01-02 03:41:51 -08:00
Daniel Han
26fea0ff35 Update loader.py 2026-01-02 02:48:28 -08:00
Daniel Han
8d44bd35d3 fix_huggingface_hub 2026-01-02 00:14:44 -08:00
Daniel Han
27f84990b5 Merge pull request #3818 from unslothai/fix-gemma3-qat-stability
Fix Gemma3 QAT training instability with int8-int4 scheme
2026-01-01 23:23:55 -08:00
danielhanchen
697ea5d1c1 Fix Gemma3 QAT training instability with int8-int4 scheme
Gemma3 models have a large vocabulary (262144 tokens) which causes
training loss to explode when using int8 embedding quantization.

This fix auto-detects Gemma3 models and switches from int8-int4
(phone-deployment) to int4 weight-only QAT for stable training.
2026-01-02 07:19:08 +00:00
Dan Saunders
f47ebfd237 CLI command for UI 2026-01-01 13:50:22 -05:00
Daniel Han
12004df0cb Merge pull request #3711 from oKatanaaa/ensure-weight-tying
FIX: weight tying for LoRA embeddings and lm_head
2026-01-01 04:55:01 -08:00
Daniel
41e6fe557a Add TODO comment for ensure_weight_tying in vision models
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 12:54:21 +00:00
Daniel Han
94c0329e38 Merge pull request #3806 from Fizza-Mukhtar/fix/3d-tensor-matmul
Fix 3D tensor support for bitsandbytes 8-bit matmul in forward pass
2026-01-01 04:07:43 -08:00
Daniel Han
f205be0ea7 Fix correctness bugs across multiple model files (#3813)
1. cohere.py:347-348 - Fixed wrong variable names in QK normalization.
   Used `Q`/`K` but variables were named `Qn`/`Kn`. This caused NameError
   when `use_qk_norm=True` (e.g., c4ai-command-r-plus models).

2. cohere.py:482 - Fixed wrong object reference in inference loop.
   Used `self.mlp` but should be `decoder_layer.mlp` since we're
   iterating through decoder layers. Caused AttributeError during inference.

3. falcon_h1.py:459,461 - Fixed wrong attribute names in inference path.
   Used `post_attention_layernorm` and `mlp` but Falcon H1 uses
   `pre_ff_layernorm` and `feed_forward`. Caused AttributeError during generation.

4. qwen3_moe.py:210 - Fixed wrong module path with incorrect capitalization.
   Used `transformers.models.Qwen3Moe` but should be `transformers.models.qwen3_moe`.
   Caused AttributeError when patching rotary embeddings.

5. qwen3_moe.py:239 - Fixed wrong model_patcher class.
   Used `FastQwen3Model` but should be `FastQwen3MoeModel` for MoE models.
   Caused incorrect patching for Qwen3 MoE models.

6. hf_hub.py:21-22 - Fixed floor division and missing return for billion values.
   Used `//` instead of `/` for millions, and had no return for values >= 1B.
   Caused incorrect formatting and None return for large numbers.

7. save.py:550 - Fixed self-assignment that did nothing.
   `sharded_ram_usage = sharded_ram_usage` should be `= max_shard_size`.
   Caused integer shard sizes to be ignored.

8. rl.py:562-567 - Fixed orphan string not included in length_check.
   The elif branch for max_seq_length validation was a standalone string
   expression, not concatenated to length_check. Caused silent skip of
   the max_seq_length > model_max_seq_length warning.

9. granite.py:49-52 - Fixed wrong model name and version in error message.
   Said "Gemma2" and "4.42.3" but should be "Granite" and "4.45.0".
2026-01-01 02:36:33 -08:00
Daniel Han
cbff64131f Fix correctness bugs in rl.py, rl_replacements.py, and vision.py (#3811)
* Fix correctness bugs in rl.py, rl_replacements.py, and vision.py

1. rl_replacements.py (lines 864, 870): Fixed undefined `nanmin`/`nanmax`
   functions by using `.nan_to_num(nan=inf/-inf).min()/.max()` pattern.
   PyTorch doesn't have torch.nanmin/nanmax, so we replace NaN values
   before computing min/max.

2. vision.py (line 150): Fixed bug where code checked for "input" key
   but then accessed kwargs["input_ids"] instead of kwargs["input"].

3. vision.py (line 159): Fixed bug where literal string "key" was used
   instead of the variable `key` when accessing kwargs.

4. rl.py (lines 903, 905): Fixed non-existent `MathError` exception
   by replacing with `ValueError`.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-12-31 21:35:48 -08:00
Michael Han
9b5571fb69 Refresh of Unsloth README.md with https://unsloth.ai/docs 2025-12-30 15:14:27 -08:00
pre-commit-ci[bot]
c5a1eccb51 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-12-30 15:58:41 +00:00
Fizza-Mukhtar
9116aa8dfd Fix 3D tensor support for bitsandbytes 8-bit matmul in forward pass 2025-12-30 07:56:01 -08:00
Fizza-Mukhtar
4f59695810 Fix 3D tensor support for bitsandbytes 8-bit matmul in forward pass 2025-12-30 07:08:10 -08:00
lif
8ab0c0c913 fix: add support for init_lora_weights="corda" in get_peft_model (#3794)
Add "corda" as an allowed value for the init_lora_weights parameter
in FastLanguageModel.get_peft_model() and FastBaseModel.get_peft_model().

This enables users to use CorDA (Correlation-aware Decomposed Adaptation)
initialization from PEFT, which provides an alternative LoRA initialization
strategy for improved finetuning performance.

Fixes #3693

Signed-off-by: majiayu000 <1835304752@qq.com>
2025-12-28 23:17:58 -08:00
ゆり
5f1361aea3 Fix Boolean value of Tensor ambiguity error in mistral.py (#3790)
* Fix is_contiguous() method call and remove duplicate imports

- Fix bug in rope_embedding.py where is_contiguous was used without
  parentheses, causing the method object (always truthy) to be evaluated
  instead of calling the method. This fixes issue #3781 where fast rope
  backpropagation was broken for zero strided/non-contiguous tensors.

- Remove duplicate `import torch` in rl.py (lines 20 and 25)
- Remove duplicate `import functools` and `import types` in vision.py

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Fix Boolean value of Tensor ambiguity error in mistral.py

Replace `or` operator with explicit `is None` check when getting
n_items from kwargs. The `or` operator fails when the value is a
Tensor because Python cannot determine the boolean value of a
multi-element tensor.

Fixes #3766

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Update rope_embedding.py

---------

Co-authored-by: yurekami <yurekami@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-12-28 21:30:55 -08:00
Fizza Mukhtar
091a801386 Fix crash when trl.experimental.openenv is unavailable (#3787)
* Guard optional trl.experimental.openenv usage in RL patches

* Simplify optional trl.openenv import handling

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-12-28 21:23:51 -08:00
Francesco Bertolotti
dabf2a901b fastrope fix for zero strided tensors (#3782)
Co-authored-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai>
2025-12-28 21:21:48 -08:00
Alkın Ünlü
6180adda1b fix(trainer): import psutil to prevent NameError in _prepare_dataset (#3780)
* fix(trainer): import psutil to prevent NameError in _prepare_dataset

Fixes #3777

* Update rl.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-12-28 21:18:02 -08:00
Daniel Han
f40fa7a0e8 Update FUNDING.yml (#3792) 2025-12-28 19:57:43 -08:00
Michael Han
96de7a817d Update README for new unsloth.ai/docs.md 2025-12-27 00:49:19 -08:00
Fizza Mukhtar
f57cd25d46 Clarify NotImplementedError for fast_inference with full_finetuning (#3768)
* Improve error message for fast_inference and full_finetuning

* Refine error message string formatting

* Update unsloth/models/vision.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-12-25 18:46:13 -08:00
Strahinja Stamenkovic
a058885b8a Add missing import of inspect (#3778)
* Add missing import of inspect

* Update device_type.py
2025-12-25 18:43:59 -08:00
pre-commit-ci[bot]
7c1d528c00 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-12-23 20:06:37 +00:00
numb3r33
3aaca08a0e Refactor return statement replacement to use explicit newlines
Replace f-string triple-quoted approach with explicit newline characters
for clearer string construction in the grpo_trainer patch.
2025-12-24 01:34:21 +05:30
pre-commit-ci[bot]
d8c9e6aafb [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-12-24 01:34:21 +05:30
numb3r33
fe21809bda Fix indentation handling in grpo_trainer return statement replacement
Use regex to dynamically detect and preserve the original indentation
when replacing the 'return output' statement, instead of hardcoding
spaces. This ensures the patched code maintains consistent indentation
regardless of the original formatting.
2025-12-24 01:34:21 +05:30
pre-commit-ci[bot]
c8e7bd9f09 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-12-24 01:34:21 +05:30
numb3r33
a6405b36b4 Remove the comment. 2025-12-24 01:34:21 +05:30
abhishek.sharma
91671433b0 Fix model training state restoration in GRPO trainer
Store the model's training state before generation and restore inference
mode after completion if the model wasn't originally in training mode.
This ensures the model returns to the correct state after generate and
score operations.
2025-12-24 01:34:21 +05:30
Daniel Han
dea670a1b6 Merge branch 'main' into nightly 2025-12-23 05:51:04 -08:00
Daniel Han
1ff6fc85f0 llama.cpp fixes 2025-12-23 05:50:26 -08:00
Daniel Han
cbfa7a20f9 Update rl.py 2025-12-23 05:42:58 -08:00
Daniel Han
0ae7d2ba28 Update rl.py 2025-12-23 05:35:06 -08:00
Daniel Han
a4408f0e50 Merge branch 'main' into nightly 2025-12-23 04:52:58 -08:00
Daniel Han
fd42103a9b Nightly (#3767)
* Update _utils.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [FIX] [Transformers] VLM input embeds fix for gradients (#3715)

* Fix get_input_embeds call for VLMs

* patch input_require_grads instead

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleanup old patch

* cleanup old patch

* cleanup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestion from @danielhanchen

* use logger instead of prints

* Move unsloth present set

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update rope_embedding.py

* Fixes

* Update _utils.py

* Update import_fixes.py

* Update rl_replacements.py

* fix_openenv_no_vllm

* Fix

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update import_fixes.py

* Update import_fixes.py

* Update import_fixes.py

* logger

* Update __init__.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update __init__.py

* Update import_fixes.py

* Update __init__.py

* Update import_fixes.py

* Update import_fixes.py

* Update import_fixes.py

* Update import_fixes.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update import_fixes.py

* Update unsloth/import_fixes.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update save.py

* [fbgemm] Silence tma fbgemm (#3735)

* Silence fbgemm TMA print

Also safer .push_to_hub

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update loader.py

* Update save.py

* Update save.py

* Update _utils.py

* Update _utils.py

* Diffusers warnings

* Update pyproject.toml

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [hf_hub] Token login (#3739)

* login on token

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleanup old code

* safer imports

* cleanup

* Return token after login

* correct return types

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestion from @danielhanchen

* add back imports

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* finish return token

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Do not overwrite slots (#3752)

* Do not overwrite slots

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update save.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-23 04:52:29 -08:00
pre-commit-ci[bot]
ad73b7e493 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-12-23 12:51:09 +00:00
Daniel Han
6866c4b3df Update save.py 2025-12-23 04:46:43 -08:00
Daniel Han
10ef983541 Merge branch 'main' into nightly 2025-12-23 04:46:15 -08:00
Daniel Han
691b5d129f Update save.py 2025-12-23 00:55:06 -08:00
pre-commit-ci[bot]
e134ceed79 [pre-commit.ci] pre-commit autoupdate (#3760)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.14.9 → v0.14.10](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.9...v0.14.10)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-12-22 23:12:00 -08:00
Daniel Han
ad5b2f66af Merge branch 'main' into nightly 2025-12-19 19:37:52 -08:00
Daniel Han
25a3141663 Update loader.py 2025-12-19 19:37:49 -08:00
Daniel Han
47ede31b8c Nightly (#3753)
* Update _utils.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [FIX] [Transformers] VLM input embeds fix for gradients (#3715)

* Fix get_input_embeds call for VLMs

* patch input_require_grads instead

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleanup old patch

* cleanup old patch

* cleanup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestion from @danielhanchen

* use logger instead of prints

* Move unsloth present set

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update rope_embedding.py

* Fixes

* Update _utils.py

* Update import_fixes.py

* Update rl_replacements.py

* fix_openenv_no_vllm

* Fix

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update import_fixes.py

* Update import_fixes.py

* Update import_fixes.py

* logger

* Update __init__.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update __init__.py

* Update import_fixes.py

* Update __init__.py

* Update import_fixes.py

* Update import_fixes.py

* Update import_fixes.py

* Update import_fixes.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update import_fixes.py

* Update unsloth/import_fixes.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update save.py

* [fbgemm] Silence tma fbgemm (#3735)

* Silence fbgemm TMA print

Also safer .push_to_hub

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update loader.py

* Update save.py

* Update save.py

* Update _utils.py

* Update _utils.py

* Diffusers warnings

* Update pyproject.toml

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [hf_hub] Token login (#3739)

* login on token

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleanup old code

* safer imports

* cleanup

* Return token after login

* correct return types

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestion from @danielhanchen

* add back imports

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* finish return token

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Do not overwrite slots (#3752)

* Do not overwrite slots

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-19 19:35:41 -08:00
Daniel Han
bfa7301768 Merge branch 'main' into nightly 2025-12-19 19:31:23 -08:00
Daniel Han
9a6d703d3c Update _utils.py 2025-12-19 19:24:49 -08:00
Strahinja Stamenkovic
490153500b Enable 4-bit quantization on AMD Radeon GPUs (#3748)
* Enable 4-bit quant on Radeon

* Fix table centering

* Update comments for clarity

* Handle failure to import Bitsandbytes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update device_type.py

* Apply suggestion from @danielhanchen

* Update device_type.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-12-19 19:22:56 -08:00
Dan Saunders
1a507b4a82 Fix VLM DDP checkpointing (#3751) 2025-12-19 19:09:16 -08:00
Datta Nimmaturi
3918e07df8 Do not overwrite slots (#3752)
* Do not overwrite slots

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-12-19 19:08:28 -08:00
Daniel Han
3f238efba5 Merge branch 'main' into nightly 2025-12-18 09:30:19 -08:00
Daniel Han
a36eb9b9a1 FunctionGemma 2025-12-18 09:27:46 -08:00
DoubleMathew
96bd2a7668 Fix Deepseek OCR Lora Model Load (#3738)
* fix deepseek ocr lora_model load: trust_remote_code option

check for import error in autoconfig/peftconfig from_pretrained error

handle import

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-18 04:09:17 -08:00
Datta Nimmaturi
6832ce8098 [hf_hub] Token login (#3739)
* login on token

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleanup old code

* safer imports

* cleanup

* Return token after login

* correct return types

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestion from @danielhanchen

* add back imports

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* finish return token

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-12-18 04:07:12 -08:00
Dan Saunders
44104fa83d review comments 2025-12-17 15:28:03 -05:00
Dan Saunders
7833191626 review comments 2025-12-17 15:22:42 -05:00
Daniel Han
6b676abaad Merge branch 'main' into nightly 2025-12-17 03:32:57 -08:00
Daniel Han
1e7302cd77 Nightly (#3737)
* Update _utils.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [FIX] [Transformers] VLM input embeds fix for gradients (#3715)

* Fix get_input_embeds call for VLMs

* patch input_require_grads instead

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleanup old patch

* cleanup old patch

* cleanup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestion from @danielhanchen

* use logger instead of prints

* Move unsloth present set

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update rope_embedding.py

* Fixes

* Update _utils.py

* Update import_fixes.py

* Update rl_replacements.py

* fix_openenv_no_vllm

* Fix

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update import_fixes.py

* Update import_fixes.py

* Update import_fixes.py

* logger

* Update __init__.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update __init__.py

* Update import_fixes.py

* Update __init__.py

* Update import_fixes.py

* Update import_fixes.py

* Update import_fixes.py

* Update import_fixes.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update import_fixes.py

* Update unsloth/import_fixes.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update save.py

* [fbgemm] Silence tma fbgemm (#3735)

* Silence fbgemm TMA print

Also safer .push_to_hub

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update loader.py

* Update save.py

* Update save.py

* Update _utils.py

* Update _utils.py

* Diffusers warnings

* Update pyproject.toml

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-17 03:31:48 -08:00
pre-commit-ci[bot]
ec0b96012e [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-12-17 11:29:40 +00:00
Daniel Han
150f7e017e Update pyproject.toml 2025-12-17 03:26:19 -08:00
Daniel Han
37888f3e10 Diffusers warnings 2025-12-17 03:25:40 -08:00
Daniel Han
fecee1b386 Update _utils.py 2025-12-17 02:54:15 -08:00
Daniel Han
2f7132d7c9 Update _utils.py 2025-12-17 02:38:05 -08:00
Daniel Han
be4b5996d7 Update save.py 2025-12-17 02:28:03 -08:00
Daniel Han
7d82555dc8 Update save.py 2025-12-17 02:21:47 -08:00
Daniel Han
28120b8a88 Update loader.py 2025-12-17 01:51:29 -08:00
Daniel Han
b342b6cf28 Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly 2025-12-17 01:07:29 -08:00
Datta Nimmaturi
a32f28d30b [fbgemm] Silence tma fbgemm (#3735)
* Silence fbgemm TMA print

Also safer .push_to_hub

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-12-17 01:07:21 -08:00
Daniel Han
59ed4fce4c Update save.py 2025-12-16 23:15:31 -08:00
Daniel Han
85f361a15d Merge branch 'main' into nightly 2025-12-16 21:52:58 -08:00
Daniel Han
23a7ac5d17 Update FUNDING.yml (#3736) 2025-12-16 21:36:25 -08:00
Daniel Han
ae4208f3ae Merge branch 'main' into nightly 2025-12-16 21:07:38 -08:00
Daniel Han
9ef9b60660 Bug fixes (#3734)
* Update _utils.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [FIX] [Transformers] VLM input embeds fix for gradients (#3715)

* Fix get_input_embeds call for VLMs

* patch input_require_grads instead

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleanup old patch

* cleanup old patch

* cleanup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestion from @danielhanchen

* use logger instead of prints

* Move unsloth present set

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update rope_embedding.py

* Fixes

* Update _utils.py

* Update import_fixes.py

* Update rl_replacements.py

* fix_openenv_no_vllm

* Fix

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update import_fixes.py

* Update import_fixes.py

* Update import_fixes.py

* logger

* Update __init__.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update __init__.py

* Update import_fixes.py

* Update __init__.py

* Update import_fixes.py

* Update import_fixes.py

* Update import_fixes.py

* Update import_fixes.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update import_fixes.py

* Update unsloth/import_fixes.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-16 20:52:57 -08:00
Daniel Han
329c465245 Update unsloth/import_fixes.py
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-16 20:52:09 -08:00
Daniel Han
19b283459c Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly 2025-12-16 20:39:00 -08:00
Daniel Han
3fa8a363ab Update import_fixes.py 2025-12-16 20:34:34 -08:00
pre-commit-ci[bot]
c27a5beed8 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-12-17 04:26:03 +00:00
Daniel Han
66c97eb1ff Update import_fixes.py 2025-12-16 20:15:22 -08:00
Daniel Han
a79766f747 Update import_fixes.py 2025-12-16 20:06:12 -08:00
Daniel Han
c74d5e4ed6 Update import_fixes.py 2025-12-16 19:59:04 -08:00
Daniel Han
967fd51990 Update import_fixes.py 2025-12-16 19:51:10 -08:00
Daniel Han
b206214bb7 Update __init__.py 2025-12-16 19:49:19 -08:00
Daniel Han
4b6afd75de Update import_fixes.py 2025-12-16 19:48:41 -08:00
Daniel Han
c69e45b32f Merge branch 'main' into nightly 2025-12-16 19:46:34 -08:00
Daniel Han
90db3f465e Update import_fixes.py 2025-12-16 16:58:03 -08:00
Daniel Han
ba3f91f72f Update import_fixes.py 2025-12-16 16:57:38 -08:00
Daniel Han
c4d44af095 Update import_fixes.py 2025-12-16 15:46:27 -08:00
Daniel Han
95ce4fa8fa Update rl.py 2025-12-15 22:43:16 -08:00
pre-commit-ci[bot]
3104fd0942 [pre-commit.ci] pre-commit autoupdate (#3731)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.14.8 → v0.14.9](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.8...v0.14.9)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-12-15 17:54:15 -08:00
Dan Saunders
e012936d75 nits 2025-12-15 18:46:21 -05:00
Dan Saunders
64a13cd4b5 vision config -> lora 2025-12-15 18:37:26 -05:00
Dan Saunders
1a929732f6 bugfix 2025-12-15 18:37:26 -05:00
Dan Saunders
e79b1c7832 review comments, tests, etc. 2025-12-15 18:37:26 -05:00
Dan Saunders
6e7e52fb26 add export command, nested reorg commands 2025-12-15 18:37:26 -05:00
Dan Saunders
7828f77175 fixes / cleanup 2025-12-15 18:37:26 -05:00
Dan Saunders
cf966fe98e autogen typer options from pydantic models 2025-12-15 18:37:26 -05:00
Dan Saunders
356fb08b03 add dry run 2025-12-15 18:37:26 -05:00
Dan Saunders
22f9a65772 refactor 2025-12-15 18:37:26 -05:00
Dan Saunders
4ef25032c1 add config support + example configs, etc. 2025-12-15 18:37:26 -05:00
Dan Saunders
42490cfbc4 train CLI 2025-12-15 18:37:26 -05:00
Michael Han
086ccd377f Update README.md 2025-12-13 16:44:44 -08:00
oKatanaaa
e368a0bd2a fix: add a log instead of silent exception 2025-12-13 00:06:41 +00:00
Daniel Han
3412452d76 Merge branch 'main' into nightly 2025-12-12 05:53:19 -08:00
Daniel Han
cdc95e33a9 Nightly (#3720)
* Update _utils.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [FIX] [Transformers] VLM input embeds fix for gradients (#3715)

* Fix get_input_embeds call for VLMs

* patch input_require_grads instead

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleanup old patch

* cleanup old patch

* cleanup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestion from @danielhanchen

* use logger instead of prints

* Move unsloth present set

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update rope_embedding.py

* Fixes

* Update _utils.py

* Update import_fixes.py

* Update rl_replacements.py

* fix_openenv_no_vllm

* Fix

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update import_fixes.py

* Update import_fixes.py

* Update import_fixes.py

* logger

* Update __init__.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update __init__.py

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
2025-12-12 05:53:08 -08:00
Daniel Han
e0ee31d814 Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly 2025-12-12 05:51:51 -08:00
Daniel Han
679a77c8f2 Update __init__.py 2025-12-12 05:51:31 -08:00
pre-commit-ci[bot]
95bfd7ba33 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-12-12 13:48:24 +00:00
Daniel Han
f656a4bf75 Update __init__.py 2025-12-12 05:46:50 -08:00
Daniel Han
6be76ac5f0 logger 2025-12-12 05:44:38 -08:00
Daniel Han
a2e08e689d Update import_fixes.py 2025-12-12 05:40:56 -08:00
Daniel Han
fed8f9a04e Update import_fixes.py 2025-12-12 05:38:24 -08:00
Daniel Han
9c10317550 Update import_fixes.py 2025-12-12 05:36:40 -08:00
Daniel Han
2e814c3ca9 Update __init__.py 2025-12-12 05:34:29 -08:00
Daniel Han
bb06544bb1 Update __init__.py 2025-12-12 05:31:36 -08:00
Daniel Han
35022da494 Update __init__.py 2025-12-12 05:29:25 -08:00
Daniel Han
06223976f6 Fix 2025-12-12 05:27:42 -08:00
Daniel Han
79e959e4d6 fix_openenv_no_vllm 2025-12-12 05:20:09 -08:00
Daniel Han
39e182e732 Update rl_replacements.py 2025-12-12 05:11:12 -08:00
Daniel Han
e479bb71e4 Update import_fixes.py 2025-12-12 05:10:45 -08:00
Daniel Han
890c30fd46 Update _utils.py 2025-12-12 05:01:43 -08:00
Daniel Han
63b6041f07 Fixes 2025-12-12 04:58:43 -08:00
Daniel Han
997931a38a Merge branch 'main' into nightly 2025-12-12 04:07:32 -08:00
Scott Roy
c91e99370b Update torchao save (#3679)
* Update torchao save

* up

* up

* up

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestion from @danielhanchen

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-12-12 04:07:02 -08:00
Daniel Han
d43dcf704b Update rope_embedding.py 2025-12-12 03:41:09 -08:00
Daniel Han
372764ae65 Merge branch 'main' into nightly 2025-12-12 03:38:09 -08:00
Datta Nimmaturi
3da42dff93 [FIX] [Transformers] VLM input embeds fix for gradients (#3715)
* Fix get_input_embeds call for VLMs

* patch input_require_grads instead

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleanup old patch

* cleanup old patch

* cleanup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestion from @danielhanchen

* use logger instead of prints

* Move unsloth present set

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-12-12 03:33:39 -08:00
Dan Saunders
43e134a6e9 Mistral packing, train on completions only, simplifications (#3709)
* pipe kwargs through mistral model

* simplify / bugfix

* bugfix for train_on_completions_only

* wire up is_unsupported_model

* nits, edge cases
2025-12-10 23:15:59 -08:00
Lei Zhenyuan
beb83d7f28 [intel] skip xpu fbgemm fp8 (#3625)
* skip xpu fbgemm fp8

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestion from @danielhanchen

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-12-10 21:13:29 -08:00
Michael Han
401de54fba Padding free packing update 2025-12-10 21:12:13 -08:00
Michael Han
bff336c7a3 Adding new padding free packing support 2025-12-10 21:10:19 -08:00
pre-commit-ci[bot]
8c21f54b74 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-12-11 03:31:41 +00:00
oKatanaaa
8f08e57d8e fix: weights tying 2025-12-11 03:21:02 +00:00
Dan Saunders
2040946d68 update TRL filter (#3707)
* update TRL filter

* both filters

* Apply suggestion from @danielhanchen

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-12-10 07:49:52 -08:00
Daniel Han
d45b63eb3d Merge branch 'main' into nightly 2025-12-10 06:25:56 -08:00
Daniel Han
3bc349f2d8 Gemma issue 2025-12-10 06:25:49 -08:00
Daniel Han
22cb954f4d Merge branch 'main' into nightly 2025-12-10 06:11:32 -08:00
Daniel Han
b16af7b0f5 Update _utils.py 2025-12-10 06:11:14 -08:00
Daniel Han
c010cc6421 Update trainer.py 2025-12-10 06:11:03 -08:00
Daniel Han
4761574752 Merge branch 'main' into nightly 2025-12-10 04:15:57 -08:00
Daniel Han
26a9b5b322 Nightly (#3706)
* Update _utils.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-12-10 04:13:13 -08:00
pre-commit-ci[bot]
d8564a05b4 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-12-10 12:12:49 +00:00
Daniel Han
0372af5ca4 Merge branch 'main' into nightly 2025-12-10 04:12:19 -08:00
Daniel Han
f0c8e21d59 Update import_fixes.py 2025-12-10 04:05:14 -08:00
Daniel Han
f9564cf84e Update _utils.py 2025-12-10 03:48:01 -08:00
Datta Nimmaturi
62d19a12ff [FIX] fbgemm version check (#3704)
* fbgemm version check

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* safer version check

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add check for torchvision-torch compatibility

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor package check logic

* Remove logs and enforce torch

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-12-10 03:46:30 -08:00
Dan Saunders
75e0d7ce62 Auto-enable padding-free SFT (#3672)
* implement (sdpa, xformers, fa2) sample packing

* attention dispatching

* ddp working OOTB with CLI

* packed SWA and softcap support

* enable batch flattening

* LGPL license headers

* mask packed sequence boundaries

* auto-enable sample packing

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add explicit toggle for sample packing

* Add explicit toggle for sample packing

* Update __init__.py

* Update unsloth/kernels/rope_embedding.py

* Update unsloth/kernels/rope_embedding.py

* remove grad output clones; restore deleted FastLanguageModel arg

* fix

* restore rope embedding clones

* xformers mask cache

* implement (sdpa, xformers, fa2) sample packing

* attention dispatching

* ddp working OOTB with CLI

* packed SWA and softcap support

* enable batch flattening

* LGPL license headers

* mask packed sequence boundaries

* auto-enable sample packing

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add explicit toggle for sample packing

* Add explicit toggle for sample packing

* Update __init__.py

* Update unsloth/kernels/rope_embedding.py

* Update unsloth/kernels/rope_embedding.py

* remove grad output clones; restore deleted FastLanguageModel arg

* fix

* restore rope embedding clones

* xformers mask cache

* add back accidental deletion

* Update unsloth/kernels/rope_embedding.py

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix merge conflicts

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add **kwargs

* add back clobbered

* Update rope_embedding.py

* Update rope_embedding.py

* simplify trl warnings filter

* docstring

* nit

* bugfix

* add padding-free seqlen metadata

* auto-enable padding free

* gemma2 disable

* Apply suggestion from @danielhanchen

* Update trainer.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update trainer.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-12-10 03:07:29 -08:00
pre-commit-ci[bot]
fb565d52f0 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-12-10 05:17:02 +00:00
vangmay
96eba88c90 Fix Chunking loop can hang when stride ≥ chunk_size 2025-12-10 10:46:29 +05:30
vangmay
07966659d8 Fix Incorrect non-relative import in dataprep package 2025-12-10 10:17:23 +05:30
vangmay
fe36643c66 Fix RawTextDataLoader import issue 2025-12-10 10:15:56 +05:30
Daniel Han
c81025b24d Merge branch 'main' into nightly 2025-12-09 17:37:18 -08:00
Dan Saunders
496f84ff6b SFT sample packing (#3566)
* implement (sdpa, xformers, fa2) sample packing

* attention dispatching

* ddp working OOTB with CLI

* packed SWA and softcap support

* enable batch flattening

* LGPL license headers

* mask packed sequence boundaries

* auto-enable sample packing

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add explicit toggle for sample packing

* Add explicit toggle for sample packing

* Update __init__.py

* Update unsloth/kernels/rope_embedding.py

* Update unsloth/kernels/rope_embedding.py

* remove grad output clones; restore deleted FastLanguageModel arg

* fix

* restore rope embedding clones

* xformers mask cache

* implement (sdpa, xformers, fa2) sample packing

* attention dispatching

* ddp working OOTB with CLI

* packed SWA and softcap support

* enable batch flattening

* LGPL license headers

* mask packed sequence boundaries

* auto-enable sample packing

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add explicit toggle for sample packing

* Add explicit toggle for sample packing

* Update __init__.py

* Update unsloth/kernels/rope_embedding.py

* Update unsloth/kernels/rope_embedding.py

* remove grad output clones; restore deleted FastLanguageModel arg

* fix

* restore rope embedding clones

* xformers mask cache

* add back accidental deletion

* Update unsloth/kernels/rope_embedding.py

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix merge conflicts

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add **kwargs

* add back clobbered

* Update rope_embedding.py

* Update rope_embedding.py

* simplify trl warnings filter

* docstring

* nit

* bugfix

* Apply suggestion from @danielhanchen

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update unsloth/trainer.py

* Update unsloth/trainer.py

* Update unsloth/trainer.py

* Update unsloth/trainer.py

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-12-09 17:36:45 -08:00
Daniel Han
e561bb3ef5 Merge branch 'main' into nightly 2025-12-09 03:31:30 -08:00
Daniel Han
2b3cb06925 Update _utils.py (#3698) 2025-12-09 03:31:20 -08:00
Daniel Han
cafa92a63b Merge branch 'main' into nightly 2025-12-09 03:30:42 -08:00
Datta Nimmaturi
89787329d3 [Fix] [TRL] load_lora for multi line llm.chat/generate (#3696)
* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove reload_weights rpc call from grpo trainer

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use regex instead of static string

* patch openenv reload_weights call

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Better handle sleep and wakeup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reset indentation

* Handle multi line self.llm.chat better

* Use logger

* re-indent

* Stricter regex to replace wildcard

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-12-09 03:30:23 -08:00
Daniel Han
6264afbf87 Update _utils.py 2025-12-09 01:02:26 -08:00
Datta Nimmaturi
9e5b4052e5 Remove reload_weights rpc call from grpo trainer (#3673)
* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove reload_weights rpc call from grpo trainer

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use regex instead of static string

* patch openenv reload_weights call

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Better handle sleep and wakeup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reset indentation

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-12-08 23:36:22 -08:00
pre-commit-ci[bot]
c579cd7094 [pre-commit.ci] pre-commit autoupdate (#3694)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.14.7 → v0.14.8](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.7...v0.14.8)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-12-08 19:44:56 -08:00
Daniel Han
43ad66d37a Versioning 2025-12-08 04:19:10 -08:00
Daniel Han
bebf042e0f Update pyproject.toml 2025-12-08 04:13:45 -08:00
Daniel Han
e72e9d499d Versioning 2025-12-08 04:06:01 -08:00
Noah Kirschmann
a80f1991c5 Update transformers version constraint in pyproject.toml (#3689)
* Update transformers version constraint in pyproject.toml

The latest transformers version just fixes the local training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update transformers version constraint in pyproject.toml

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-12-08 03:27:18 -08:00
Daniel Han
034a35f6d4 Add **kwargs 2025-12-08 03:24:51 -08:00
Daniel Han
f4de7baea1 Update rl.py 2025-12-08 02:23:43 -08:00
Daniel Han
4408ddc081 Update vision.py 2025-12-07 23:09:13 -08:00
Daniel Han
d86ded6799 Update _utils.py 2025-12-07 16:52:59 -08:00
Daniel Han
cb4d8da5a2 Xformers fix 2025-12-07 16:40:51 -08:00
Michael Han
3d4f236155 Update README.md 2025-12-04 08:21:20 -08:00
Daniel Han
845e61d351 Update README.md 2025-12-02 04:08:54 -08:00
Daniel Han
14e8e3137d Update README.md 2025-12-02 03:52:50 -08:00
pre-commit-ci[bot]
13f6491fe6 [pre-commit.ci] pre-commit autoupdate (#3666)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.14.6 → v0.14.7](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.6...v0.14.7)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-12-01 17:45:13 -08:00
Daniel Han
4fd865cc99 Update _utils.py 2025-12-01 08:01:05 -08:00
Daniel Han
d655c7434a Update rl.py 2025-12-01 08:00:08 -08:00
Daniel Han
66649d18bd Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"
This reverts commit cad158a56c.
2025-12-01 07:24:58 -08:00
pre-commit-ci[bot]
cad158a56c [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-12-01 15:24:34 +00:00
Daniel Han
487a951914 Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"
This reverts commit 964c9fef95.
2025-12-01 07:24:21 -08:00
pre-commit-ci[bot]
964c9fef95 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-12-01 15:23:44 +00:00
Daniel Han
5f27bc4db5 Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"
This reverts commit d34e0454ac.
2025-12-01 07:23:31 -08:00
pre-commit-ci[bot]
d34e0454ac [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-12-01 15:20:22 +00:00
Daniel Han
d994280cdc Update unsloth/models/rl.py
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-01 07:20:00 -08:00
Daniel Han
ebec564689 Update unsloth/models/rl.py
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-01 07:19:50 -08:00
Daniel Han
b4f5a70878 Update qwen3_moe.py 2025-12-01 07:19:07 -08:00
Datta Nimmaturi
04cfc0d139 Vllm guided decoding (#3663)
* vllm sampling params fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* do not patch base_trainer

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* seperate vllm fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixup deletion

* Fix indentation

* revert to old style

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-12-01 07:11:28 -08:00
Daniel Han
3d62c38ada Verisoning 2025-12-01 07:09:17 -08:00
Daniel Han
3f4768ad1e Update rl.py 2025-12-01 06:23:23 -08:00
Daniel Han
ba2897a318 Revert "[FIX] Vllm guided decoding params (#3662)"
This reverts commit fb4f0fdf56.
2025-12-01 05:43:45 -08:00
Datta Nimmaturi
fb4f0fdf56 [FIX] Vllm guided decoding params (#3662)
* vllm sampling params fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* do not patch base_trainer

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* seperate vllm fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestion from @danielhanchen

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

This reverts commit 58b483dc0d1790f99580665801d3fa0d7267c533.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

This reverts commit b2497519659a9f301e7a633795d9efdafdc2b277.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

This reverts commit de3daaf429f81aceb6632932b0cb1af5149652a8.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-12-01 05:42:37 -08:00
Daniel Han
bf28d686a0 Merge branch 'main' into nightly 2025-12-01 04:21:27 -08:00
Santosh Bhavani
f7be4b1140 Fix: Pass gradient_checkpointing parameter to model.for_training() calls (#3659) 2025-12-01 04:18:41 -08:00
Daniel Han
f9fd3c43fa Update vision.py 2025-12-01 01:21:26 -08:00
Daniel Han
085a0a9c2d Typos 2025-12-01 00:01:07 -08:00
Daniel Han
7028dc02a5 Update qwen3_moe.py 2025-11-30 23:37:32 -08:00
Daniel Han
60fd9d870d Update vision.py 2025-11-30 21:32:07 -08:00
VED
f23f17e8ba set defualt [128, 128] insted of none (#3658)
Co-authored-by: Ved <ved.work2024@gmail.com>
2025-11-30 17:00:31 -08:00
Daniel Han
bf11ba0c53 Update rl.py 2025-11-30 04:40:03 -08:00
Daniel Han
dbcedbbf65 Merge branch 'main' into nightly 2025-11-30 04:39:55 -08:00
DoubleMathew
f9d2a11dba make unsloth_tiled_mlp a from_pretrained arg (#3655)
* make unsloth_tiled_mlp a from_pretrained arg

* adjust patching logic
2025-11-29 22:47:51 -08:00
Bhuvan Prakash
cd24a2896c Fix: prevent load_in_fp8 kwarg from reaching Qwen3MoeForCausalLM constructor (Fix #3649) (#3654)
* Fix: remove load_in_fp8 from kwargs to prevent Qwen3Moe init TypeError (Fix #3649)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-11-29 20:18:11 -08:00
gitpullpull
77d47ecee5 Fix broken link for Advanced pip install instructions (#3652) 2025-11-29 15:33:48 -08:00
Michael Han
bfdf73fa66 Update README.md 2025-11-29 08:01:00 -08:00
DoubleMathew
8953a06764 fix rope_theta -> rope_parameters['rope_theta'] (#3651) 2025-11-29 06:44:26 -08:00
Michael Han
f668897b3c Update README.md 2025-11-27 20:52:27 -08:00
Michael Han
ef30739fd7 Update README.md 2025-11-27 20:49:47 -08:00
Daniel Han
1abf47e27c Merge branch 'main' into nightly 2025-11-27 05:45:20 -08:00
mk0walsk
460f2cf6ad Fix indefinite article usage in comments and docstrings (#3648) 2025-11-26 18:15:27 -08:00
Dina Suehiro Jones
c740e14937 Fix llama tokenizer padding_side when using model.generate in inference mode (#3644)
* Only restore training mode after generation, if the model started out in training mode

Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-11-25 17:33:28 -08:00
Daniel Han
d7eafb1042 Merge branch 'main' into nightly 2025-11-25 07:59:24 -08:00
Daniel Han
ea4afe1154 Update mapper.py 2025-11-25 07:59:21 -08:00
Daniel Han
499758b0b5 Merge branch 'main' into nightly 2025-11-25 07:45:27 -08:00
Daniel Han
8119697202 Update loader.py 2025-11-25 07:45:14 -08:00
Daniel Han
8fc18d1a84 Merge branch 'main' into nightly 2025-11-25 07:39:00 -08:00
Daniel Han
528142dcda Update loader.py 2025-11-25 07:38:51 -08:00
Daniel Han
fce90cc9b3 Merge branch 'main' into nightly 2025-11-25 07:23:55 -08:00
Daniel Han
86f708097d Float8 GRPO, RL (#3640)
* Enable FP8 + RL training for bf16 models (#3440)

* Enable FP8 + RL training for bf16 models

**Summary:** Enable FP8 + RL training using TorchAO for 1.33x faster training and 42% less model memory usage:
- We quantize the frozen LoRA weights into fp8 and keep the LoRA adapters in bf16
- We leverage TorchAO's `Float8Tensor`, which calls into fbgemm's fp8 x fp8 rowwise matmul kernel
- For now, we need to do an offline quantization first, because vllm doesn't support on-the-fly quantization for torchao yet  (this is in progress: https://github.com/vllm-project/vllm/pull/26327)

**Example usage:**
```
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen3-8B-Base",
    max_seq_length = 2048,
    load_in_4bit = False,
    fast_inference = True,
    max_lora_rank = 32,
    load_in_fp8 = True,  # set this to True
)

\# the rest is the same as before
model = FastLanguageModel.get_peft_model(...)
```

**Initial results:**
```
\# fp8
{'train_runtime': 1725.4337, 'train_samples_per_second': 0.232, 'train_steps_per_second': 0.058, 'train_loss': 0.00015715716748673002, 'epoch': 0.01}

\# bf16
{'train_runtime': 2297.8145, 'train_samples_per_second': 0.174, 'train_steps_per_second': 0.044, 'train_loss': 0.00016081033063528594, 'epoch': 0.01}
```

<img width="1199" height="448" alt="Screenshot 2025-11-11 at 4 10 50 PM" src="https://github.com/user-attachments/assets/b6304afd-89e9-42b1-8064-775807e17b23" />

Test script: https://gist.github.com/andrewor14/5b85119fae46845d07b608d420907423

**Requires:**
- https://github.com/pytorch/ao/pull/3158 (torchao nightly or 0.15.0+)
- https://github.com/unslothai/unsloth-zoo/pull/351

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update utils.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* _get_inference_mode_context_manager

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update utils.py

* Update utils.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update __init__.py

* Fix/save torchao model loading logic (#3621)

* make loading gpt-oss-BF16 faster. Linked to unsloth-zoo PR #314

* fix model loading and clean merged model directory

* revert default quant

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert mapper.py

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update loader_utils.py

* Update loader_utils.py

* Add 128x128 PerBlock FP8 + RL (#3629)

* Add 128x128 PerBlock FP8 + RL

**Summary:** Following https://github.com/unslothai/unsloth/pull/3440,
this PR extends torchao FP8 + RL support to also handle 128x128
PerBlock granularity (in addition to PerRow).

**Example usage:**

```
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen3-8B-Base",
    max_seq_length = 2048,
    load_in_4bit = False,
    fast_inference = True,
    max_lora_rank = 32,
    load_in_fp8 = "block",  # or "row" or True
)
```

**Initial results:** TBD

**Note:**
- Requires https://github.com/pytorch/ao/pull/3370

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Version

* Update vision.py

* Update rl.py

* Add torch 2.9.1

* Fix auto installer

* Update fp8.py

* Float8

* Update fp8.py

* Update mapper.py

* Update mapper.py

* Update loader_utils.py

* Update loader.py

* Update fp8.py

* Versioning

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: andrewor14 <andrewor14@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
2025-11-25 07:23:26 -08:00
pre-commit-ci[bot]
967434c948 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-11-25 15:20:19 +00:00
Daniel Han
7af84b491e Versioning 2025-11-25 07:12:45 -08:00
Daniel Han
83aaeb150e Update fp8.py 2025-11-25 07:11:30 -08:00
Daniel Han
ba0d16ce36 Update loader.py 2025-11-25 07:06:41 -08:00
Daniel Han
8bd5afbd50 Update loader_utils.py 2025-11-25 07:05:43 -08:00
Daniel Han
ee6ab2ec28 Update mapper.py 2025-11-25 07:02:20 -08:00
Daniel Han
ca8b938018 Update mapper.py 2025-11-25 06:53:06 -08:00
Daniel Han
6360dfbf5a Update fp8.py 2025-11-25 06:50:58 -08:00
Daniel Han
f6509e6939 Float8 2025-11-25 06:48:10 -08:00
Daniel Han
06491d1b99 Update fp8.py 2025-11-25 05:35:34 -08:00
vangmay
646629884b Remove training mode arg 2025-11-25 21:01:43 +08:00
Daniel Han
3fee93b48e Fix auto installer 2025-11-25 01:47:58 -08:00
Daniel Han
49607bf27f Add torch 2.9.1 2025-11-25 01:36:11 -08:00
Daniel Han
5c2c53afee Update rl.py 2025-11-24 22:13:17 -08:00
pre-commit-ci[bot]
ba150c34b3 [pre-commit.ci] pre-commit autoupdate (#3634)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.14.5 → v0.14.6](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.5...v0.14.6)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-11-24 17:16:56 -08:00
Daniel Han
4ed44a2159 Update vision.py 2025-11-24 05:50:22 -08:00
Daniel Han
d7a9f801ff Merge branch 'main' into nightly 2025-11-24 02:16:53 -08:00
Lei Zhenyuan
f746d854c5 [intel] change windows to remove windows-triton for intel xpu (#3168)
* change windows to remove windows-triton for intel xpu

* add changes for different platform

* Update pyproject.toml

* update mode windows

* Update pyproject.toml

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update pyproject.toml

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update pyproject.toml

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update pyproject.toml

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update pyproject.toml

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update pyproject.toml

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update pyproject.toml

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update pyproject.toml

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-23 22:03:54 -08:00
Etherll
f91ea1e9a6 Add trust_remote_code parameter to tokenizer (#3631) 2025-11-23 21:12:40 -08:00
Daniel Han
61ce3f0e73 Version 2025-11-22 06:20:00 -08:00
andrewor14
4320a8e82d Add 128x128 PerBlock FP8 + RL (#3629)
* Add 128x128 PerBlock FP8 + RL

**Summary:** Following https://github.com/unslothai/unsloth/pull/3440,
this PR extends torchao FP8 + RL support to also handle 128x128
PerBlock granularity (in addition to PerRow).

**Example usage:**

```
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen3-8B-Base",
    max_seq_length = 2048,
    load_in_4bit = False,
    fast_inference = True,
    max_lora_rank = 32,
    load_in_fp8 = "block",  # or "row" or True
)
```

**Initial results:** TBD

**Note:**
- Requires https://github.com/pytorch/ao/pull/3370

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-11-21 20:09:27 -08:00
Mercury
13b3e7e6a8 Fix missing code and support inputs_embeds only input. (#3623) 2025-11-20 07:56:52 -08:00
vangmay
082da69cc4 remove old function 2025-11-20 21:40:45 +08:00
pre-commit-ci[bot]
3bf8ca7da2 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-11-20 13:09:08 +00:00
vangmay
f05169e56a Make the chunk function efficient 2025-11-20 21:08:33 +08:00
pre-commit-ci[bot]
25e69f2d36 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-11-20 12:57:53 +00:00
vangmay
d253b392fb Merge branch 'feature/raw-text-dataprep' of https://github.com/Vangmay/unsloth into feature/raw-text-dataprep 2025-11-20 20:57:27 +08:00
vangmay
c20a3b40ee Integrate smart dataset loader 2025-11-20 20:53:22 +08:00
pre-commit-ci[bot]
d429363c23 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-11-20 12:51:18 +00:00
Daniel Han
3fd334c2ee Update loader_utils.py 2025-11-20 04:04:19 -08:00
Daniel Han
3d099a3bb6 Update loader_utils.py 2025-11-20 00:06:11 -08:00
Roland Tannous
22e0c63166 Fix/save torchao model loading logic (#3621)
* make loading gpt-oss-BF16 faster. Linked to unsloth-zoo PR #314

* fix model loading and clean merged model directory

* revert default quant

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert mapper.py

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-11-20 00:02:31 -08:00
Daniel Han
1b0852c42e Update __init__.py 2025-11-19 23:56:38 -08:00
andrewor14
f1b24a6152 Enable FP8 + RL training for bf16 models (#3440)
* Enable FP8 + RL training for bf16 models

**Summary:** Enable FP8 + RL training using TorchAO for 1.33x faster training and 42% less model memory usage:
- We quantize the frozen LoRA weights into fp8 and keep the LoRA adapters in bf16
- We leverage TorchAO's `Float8Tensor`, which calls into fbgemm's fp8 x fp8 rowwise matmul kernel
- For now, we need to do an offline quantization first, because vllm doesn't support on-the-fly quantization for torchao yet  (this is in progress: https://github.com/vllm-project/vllm/pull/26327)

**Example usage:**
```
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen3-8B-Base",
    max_seq_length = 2048,
    load_in_4bit = False,
    fast_inference = True,
    max_lora_rank = 32,
    load_in_fp8 = True,  # set this to True
)

\# the rest is the same as before
model = FastLanguageModel.get_peft_model(...)
```

**Initial results:**
```
\# fp8
{'train_runtime': 1725.4337, 'train_samples_per_second': 0.232, 'train_steps_per_second': 0.058, 'train_loss': 0.00015715716748673002, 'epoch': 0.01}

\# bf16
{'train_runtime': 2297.8145, 'train_samples_per_second': 0.174, 'train_steps_per_second': 0.044, 'train_loss': 0.00016081033063528594, 'epoch': 0.01}
```

<img width="1199" height="448" alt="Screenshot 2025-11-11 at 4 10 50 PM" src="https://github.com/user-attachments/assets/b6304afd-89e9-42b1-8064-775807e17b23" />

Test script: https://gist.github.com/andrewor14/5b85119fae46845d07b608d420907423

**Requires:**
- https://github.com/pytorch/ao/pull/3158 (torchao nightly or 0.15.0+)
- https://github.com/unslothai/unsloth-zoo/pull/351

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update utils.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* _get_inference_mode_context_manager

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update utils.py

* Update utils.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-11-19 23:51:43 -08:00
DoubleMathew
5b0f19624c Remove grpo requirement bs=num_generations (#3609)
* Remove grpo requirement bs=num_generations

* Update rl.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-11-19 19:57:01 -08:00
DoubleMathew
e8e0f6aa58 Add an int64 path for mlp kernels (#3614)
* Add an int64 path for mlp kernels

* move constant expressions to globals

* fix name
2025-11-19 19:45:10 -08:00
Dan Saunders
a3ed3c395d remove pre-commit workflow (covered by pre-commit app) (#3618) 2025-11-19 15:34:32 -08:00
mk0walsk
8efbd5ac9c Fix broken links and typo in README (#3611)
* README Link Fixes

* Update README.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-18 20:04:14 -08:00
vangmay
171fb12573 Add module to init 2025-11-18 22:44:48 +08:00
vangmay
ee37dd9f92 Write simple test 2025-11-18 22:36:38 +08:00
vangmay
8d482c2129 Add validation code 2025-11-18 22:02:35 +08:00
vangmay
6014bb4dd2 Add logic to clean and extract text sections 2025-11-18 22:01:36 +08:00
vangmay
ed5820e667 Write chunking logic 2025-11-18 22:00:07 +08:00
vangmay
aecfbe1fff Add support for multiple files 2025-11-18 21:59:01 +08:00
vangmay
d75fbb5d0a Add implementation to cli 2025-11-18 21:53:20 +08:00
vangmay
face46d188 Write file and template for raw_text dataprep 2025-11-18 21:46:41 +08:00
pre-commit-ci[bot]
2f68f246a4 [pre-commit.ci] pre-commit autoupdate (#3606)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.14.4 → v0.14.5](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.4...v0.14.5)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-11-17 17:02:44 -08:00
Datta Nimmaturi
4571ecaca3 Do not force set beta to 0 for DAPO (#3604) 2025-11-16 22:39:36 -08:00
DoubleMathew
daeb4d57a3 fix qwen3 vl gradient accumulation (#3598)
* fix qwen3 vl gradient accumulation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update unsloth/models/_utils.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-15 02:26:34 -08:00
Daniel Han
3f0dde40d1 Update pyproject.toml 2025-11-14 20:01:02 -08:00
Scott Roy
20bd66f49f Extend TorchAOConfig to support mobile usecases (#3587)
* up

* up

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-11-14 03:08:21 -08:00
Yuxiao Cheng
ed829b672a Fix: prevent rope_embedding AssertionError by checking kv_seq_len before reuse (#3578)
* fix: add kv_seq_len boundary check before reusing RoPE embeddings

Prevented AssertionError in rope_embedding.forward when kv_seq_len exceeds
the cached rope size. Added condition to verify kv_seq_len <=
position_embeddings[0].shape[0] before reuse, ensuring dynamic extension
triggers correctly.

Fixes #3036 #3216

* fix falcon h1

---------

Co-authored-by: jarrycyx <dzdzzd@126.com>
2025-11-14 03:06:33 -08:00
Giuseppe Franco
069781bcd6 Support for out-of-source quantizers (#3534)
* Support for out-of-source quantizers

* Fix decorators and functions to be staticmethod

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-11-14 02:52:24 -08:00
DoubleMathew
a3d42aaa28 Patch in tiled mlp (#3584)
* Patch in tiled mlp

* Update unsloth/models/llama.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-11-13 21:26:49 -08:00
DoubleMathew
ccebde2cb3 Resize rope embeddings for long sequence training (#3586) 2025-11-11 18:11:31 -08:00
pre-commit-ci[bot]
3d34ed4def [pre-commit.ci] pre-commit autoupdate (#3576)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.14.0 → v0.14.4](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.0...v0.14.4)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-11-11 18:10:49 -08:00
Daniel Han
03733cf180 Update _utils.py 2025-11-10 04:49:27 -08:00
Dan Saunders
45865ead0c pre-commit CI config (#3565) 2025-11-07 14:44:18 -08:00
DoubleMathew
01d3794828 add trust_remote_code kwarg (#3564) 2025-11-07 14:16:35 -08:00
Daniel Han
d6bb89ad44 Formatting & bug fixes (#3563)
* Update rl.py

* Fix CE Loss

* Versioning

* Update loader.py

* Update loader.py

* extract_model_type_from_config

* Model types

* Update loader.py

* get_transformers_model_type

* Update loader.py

* Update loader.py

* Update loader.py

* Update rl.py

* Update pyproject.toml

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update vision.py

* Update vision.py

* Fix DataParallel

* Update _utils.py

* Update rl.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update mapper.py

* Versioning

* Update loader.py

* Update loader.py

* Update rl.py

* Versioning

* Update _utils.py

* Fix auto_mapping

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Message

* Update vision.py

* Update loader.py

* Update vision.py

* cache_implementation

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Save max_seq_length

* Update _utils.py

* Update rl.py

* Update vision.py

* Update llama.py

* Mistral3 vllm (#3349)

* [WIP] use vLLM for vision language models

* Update README.md

Editing icon sizes

* Update README.md

Updating icon sizes

* Update README.md (#2885)

* MoE kernels AGPLv3

* versioning

* Many bug fixes (#2908)

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912)

* Dynamically adjust get_per_token_logps function and patch as well (#2911)

* add intel gpu with vllm support (#2903)

* [bugs] fix for casual mask (#2868)

* fix for casual mask

* use un_casual in sdpa

* add missing mask

* fix for type

* Explicitly check if xformers exists for attention (#2889)

* Update __init__.py

* Update llama.py

* if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913)

* Move inputs to right devices. (#2919)

* Move tensors to right devices

* fix multi gpu for non mistral models

* multi GPU RoPE for gemma2

* Finish up multi GPU inference

* Make multiGPU rope a list

* Remove unnecessary transfer to CPU

* Remove unnecessary move to CPU

* Donot move inputs to device yet

will be handled separately in another PR

* Move inputs to appropriate decoder device

* Make device count global variable

* Cleanup RoPE device code

* Fixup num_gpu to device count

* Cleanup device counts

* Use device index for RoPE get_cache

* Donot typecast

* Use tuple instead of list for tensors. Use device index directly

* fixup move to device logic

* WIP VLM vLLM

* Make vLLM patch a function

* Add save and load lora functions

* Make fast_inference setup depend on the flag

* Improve fast inference patching mechanism

* Make vision setting depend on checks in fastbasemodel

* Check LoRA and vLLM intercompatibility for vision models

* Comment pointing to vLLM LoRA check

* Improve lora validation on vLLM

* Error out on no vLLM and increase max lora rank

* Bug fixes (#3017)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* fix for casual mask (#3011)

* [intel] add for intel path for llama.py (#3012)

* fix for intel path

* remove unuse code

* Update unsloth/models/llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* Fix Gemma 2 (#3024)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* falcon force float32 on sm<75 machines (#3026)

* Fix torch compile issues (#3028)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* check stride

* Cleanup

* Update rope_embedding.py

* Update gemma2.py

* Fix `set_stance`

* Update pyproject.toml

* Update _utils.py

* Fixup patch vllm

* Disable mllama

* Use variables to decide VLM support

* Better attn_impl handling

* Patch TF protobuf incompatability

* Torch 2.8 (#3186)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update _auto_install.py

* Update pyproject.toml

* Update rl.py

* Protobuf issue

* Update pyproject.toml

* Fix extras transformers typo in pyproject.toml

* Update _utils.py

* Bug fixes (#3195)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* adallow float32 dtype in FastLanguageModel (#3204)

* Update loader.py

* Update vision.py

* Suppress message and use unsloth sampling params

* Use trl sampling params for now

* Improve error message

* fixup quantized fast inference model name

* Add mistral 3 support

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>

* Set padding to 0

* Fix patch

* fixup patch (#3359)

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update vision.py

* Versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* MXFP4 dequant

* Update loader.py

* Update vision.py

* load_in_16bit

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* offload_embedding

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update loader.py

* Fix padding issue

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* New models

* Update llama.py

* Versioning

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Fix AMD

* Update _utils.py

* Update llama.py

* Update vision.py

* DEVICE_TYPE_TORCH

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Move DEVICE_TYPE

* Update rl_replacements.py

* Update loader.py

* AMD install script

* Move AMD

* Update _amd_install.sh

* Update pyproject.toml

* Update pyproject.toml

* Delete _amd_install.sh

* Update device_type.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Versioning

* Update pyproject.toml

* Update loader.py

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update _utils.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* local_files_only

* Cut Cross Entropy

* Update llama.py

* Update vision.py

* Update vision.py

* Update vision.py

* Qwen 3 VL vLLM (#3489)

* Update __init__.py

* patch_torchao

* torchao_logger

* Update rl_replacements.py

* Fix

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Versioning

* fbgemm fp8 block quant support (>=1.4.0) (#3531)

* fbgemm fp8 block quant support (>=1.4.0)

* Verify for fp8 support before proceeding

* Use unsloth zoo's Version and improve comments

* spacessss

* Update vision.py

* Update vision.py

* Update rl.py

* vllm_sampling_params

* Update rl.py

* Update rl.py

* Update rl.py

* Add `ruff` pre-commit hook and apply it (#3424)

* Add Ruff pre-commit config and workflow

* Add kwarg spacing enforcement helper

* Apply Ruff formatting

* Update fp8.py

* Revert ruff on some files

* Update

* force-exclude = true

* Datasets issue

* Ruff

* Remove mapper

* Update mapper.py

* Update pyproject.toml

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>
Co-authored-by: Dan Saunders <danjsaund@gmail.com>
2025-11-07 06:00:22 -08:00
mk0walsk
d8ae1e266e Fix typos in comment (#3557) 2025-11-05 19:29:36 -08:00
Michael Han
c8421a939b Update README.md 2025-11-04 22:00:06 -08:00
pluesclues
91db850488 Detach logits before returning from function (#3554) 2025-11-04 07:29:27 -08:00
Datta Nimmaturi
7fe58d8c15 Sleep trl patch (#3517)
* Patch sleep mode properly for trl

* empty cache after sleep/wakeup

* no extra wakeups

* Do not redo wakeups

* cleanup

* post trl 0.23 sleep patch
2025-11-03 23:00:54 -08:00
Daniel Han
a9ff4e23c9 Bug fixes (#3546)
* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Bug fix

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* torch_dtype

* Update rl.py

* Fix CE Loss

* Versioning

* Update loader.py

* Update loader.py

* extract_model_type_from_config

* Model types

* Update loader.py

* get_transformers_model_type

* Update loader.py

* Update loader.py

* Update loader.py

* Update rl.py

* Update pyproject.toml

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update vision.py

* Update vision.py

* Fix DataParallel

* Update _utils.py

* Update rl.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update mapper.py

* Versioning

* Update loader.py

* Update loader.py

* Update rl.py

* Versioning

* Update _utils.py

* Fix auto_mapping

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Message

* Update vision.py

* Update loader.py

* Update vision.py

* cache_implementation

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Save max_seq_length

* Update _utils.py

* Update rl.py

* Update vision.py

* Update llama.py

* Mistral3 vllm (#3349)

* [WIP] use vLLM for vision language models

* Update README.md

Editing icon sizes

* Update README.md

Updating icon sizes

* Update README.md (#2885)

* MoE kernels AGPLv3

* versioning

* Many bug fixes (#2908)

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912)

* Dynamically adjust get_per_token_logps function and patch as well (#2911)

* add intel gpu with vllm support (#2903)

* [bugs] fix for casual mask (#2868)

* fix for casual mask

* use un_casual in sdpa

* add missing mask

* fix for type

* Explicitly check if xformers exists for attention (#2889)

* Update __init__.py

* Update llama.py

* if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913)

* Move inputs to right devices. (#2919)

* Move tensors to right devices

* fix multi gpu for non mistral models

* multi GPU RoPE for gemma2

* Finish up multi GPU inference

* Make multiGPU rope a list

* Remove unnecessary transfer to CPU

* Remove unnecessary move to CPU

* Donot move inputs to device yet

will be handled separately in another PR

* Move inputs to appropriate decoder device

* Make device count global variable

* Cleanup RoPE device code

* Fixup num_gpu to device count

* Cleanup device counts

* Use device index for RoPE get_cache

* Donot typecast

* Use tuple instead of list for tensors. Use device index directly

* fixup move to device logic

* WIP VLM vLLM

* Make vLLM patch a function

* Add save and load lora functions

* Make fast_inference setup depend on the flag

* Improve fast inference patching mechanism

* Make vision setting depend on checks in fastbasemodel

* Check LoRA and vLLM intercompatibility for vision models

* Comment pointing to vLLM LoRA check

* Improve lora validation on vLLM

* Error out on no vLLM and increase max lora rank

* Bug fixes (#3017)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* fix for casual mask (#3011)

* [intel] add for intel path for llama.py (#3012)

* fix for intel path

* remove unuse code

* Update unsloth/models/llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* Fix Gemma 2 (#3024)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* falcon force float32 on sm<75 machines (#3026)

* Fix torch compile issues (#3028)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* check stride

* Cleanup

* Update rope_embedding.py

* Update gemma2.py

* Fix `set_stance`

* Update pyproject.toml

* Update _utils.py

* Fixup patch vllm

* Disable mllama

* Use variables to decide VLM support

* Better attn_impl handling

* Patch TF protobuf incompatability

* Torch 2.8 (#3186)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update _auto_install.py

* Update pyproject.toml

* Update rl.py

* Protobuf issue

* Update pyproject.toml

* Fix extras transformers typo in pyproject.toml

* Update _utils.py

* Bug fixes (#3195)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* adallow float32 dtype in FastLanguageModel (#3204)

* Update loader.py

* Update vision.py

* Suppress message and use unsloth sampling params

* Use trl sampling params for now

* Improve error message

* fixup quantized fast inference model name

* Add mistral 3 support

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>

* Set padding to 0

* Fix patch

* fixup patch (#3359)

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update vision.py

* Versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* MXFP4 dequant

* Update loader.py

* Update vision.py

* load_in_16bit

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* offload_embedding

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update loader.py

* Fix padding issue

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* New models

* Update llama.py

* Versioning

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Fix AMD

* Update _utils.py

* Update llama.py

* Update vision.py

* DEVICE_TYPE_TORCH

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Move DEVICE_TYPE

* Update rl_replacements.py

* Update loader.py

* AMD install script

* Move AMD

* Update _amd_install.sh

* Update pyproject.toml

* Update pyproject.toml

* Delete _amd_install.sh

* Update device_type.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Versioning

* Update pyproject.toml

* Update loader.py

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update _utils.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* local_files_only

* Cut Cross Entropy

* Update llama.py

* Update vision.py

* Update vision.py

* Update vision.py

* Qwen 3 VL vLLM (#3489)

* Update __init__.py

* patch_torchao

* torchao_logger

* Update rl_replacements.py

* Fix

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Versioning

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>
2025-11-03 06:47:26 -08:00
pluesclues
c449c7b06e Handle TRL version compatibility in rl_replacements.py (#3540) 2025-11-01 05:17:27 -07:00
Daniel Han
f67c4a172a Update mapper.py 2025-10-30 06:56:22 -07:00
Daniel Han
d6aa072c29 Update pyproject.toml 2025-10-30 06:48:14 -07:00
Daniel Han
1fd8c72aee Nightly (#3532)
* Update loader.py

* Update vision.py

* Update vision.py

* custom_datatype

* recheck

* Float16

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Bug fix

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* torch_dtype

* Update rl.py

* Fix CE Loss

* Versioning

* Update loader.py

* Update loader.py

* extract_model_type_from_config

* Model types

* Update loader.py

* get_transformers_model_type

* Update loader.py

* Update loader.py

* Update loader.py

* Update rl.py

* Update pyproject.toml

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update vision.py

* Update vision.py

* Fix DataParallel

* Update _utils.py

* Update rl.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update mapper.py

* Versioning

* Update loader.py

* Update loader.py

* Update rl.py

* Versioning

* Update _utils.py

* Fix auto_mapping

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Message

* Update vision.py

* Update loader.py

* Update vision.py

* cache_implementation

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Save max_seq_length

* Update _utils.py

* Update rl.py

* Update vision.py

* Update llama.py

* Mistral3 vllm (#3349)

* [WIP] use vLLM for vision language models

* Update README.md

Editing icon sizes

* Update README.md

Updating icon sizes

* Update README.md (#2885)

* MoE kernels AGPLv3

* versioning

* Many bug fixes (#2908)

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912)

* Dynamically adjust get_per_token_logps function and patch as well (#2911)

* add intel gpu with vllm support (#2903)

* [bugs] fix for casual mask (#2868)

* fix for casual mask

* use un_casual in sdpa

* add missing mask

* fix for type

* Explicitly check if xformers exists for attention (#2889)

* Update __init__.py

* Update llama.py

* if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913)

* Move inputs to right devices. (#2919)

* Move tensors to right devices

* fix multi gpu for non mistral models

* multi GPU RoPE for gemma2

* Finish up multi GPU inference

* Make multiGPU rope a list

* Remove unnecessary transfer to CPU

* Remove unnecessary move to CPU

* Donot move inputs to device yet

will be handled separately in another PR

* Move inputs to appropriate decoder device

* Make device count global variable

* Cleanup RoPE device code

* Fixup num_gpu to device count

* Cleanup device counts

* Use device index for RoPE get_cache

* Donot typecast

* Use tuple instead of list for tensors. Use device index directly

* fixup move to device logic

* WIP VLM vLLM

* Make vLLM patch a function

* Add save and load lora functions

* Make fast_inference setup depend on the flag

* Improve fast inference patching mechanism

* Make vision setting depend on checks in fastbasemodel

* Check LoRA and vLLM intercompatibility for vision models

* Comment pointing to vLLM LoRA check

* Improve lora validation on vLLM

* Error out on no vLLM and increase max lora rank

* Bug fixes (#3017)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* fix for casual mask (#3011)

* [intel] add for intel path for llama.py (#3012)

* fix for intel path

* remove unuse code

* Update unsloth/models/llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* Fix Gemma 2 (#3024)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* falcon force float32 on sm<75 machines (#3026)

* Fix torch compile issues (#3028)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* check stride

* Cleanup

* Update rope_embedding.py

* Update gemma2.py

* Fix `set_stance`

* Update pyproject.toml

* Update _utils.py

* Fixup patch vllm

* Disable mllama

* Use variables to decide VLM support

* Better attn_impl handling

* Patch TF protobuf incompatability

* Torch 2.8 (#3186)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update _auto_install.py

* Update pyproject.toml

* Update rl.py

* Protobuf issue

* Update pyproject.toml

* Fix extras transformers typo in pyproject.toml

* Update _utils.py

* Bug fixes (#3195)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* adallow float32 dtype in FastLanguageModel (#3204)

* Update loader.py

* Update vision.py

* Suppress message and use unsloth sampling params

* Use trl sampling params for now

* Improve error message

* fixup quantized fast inference model name

* Add mistral 3 support

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>

* Set padding to 0

* Fix patch

* fixup patch (#3359)

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update vision.py

* Versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* MXFP4 dequant

* Update loader.py

* Update vision.py

* load_in_16bit

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* offload_embedding

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update loader.py

* Fix padding issue

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* New models

* Update llama.py

* Versioning

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Fix AMD

* Update _utils.py

* Update llama.py

* Update vision.py

* DEVICE_TYPE_TORCH

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Move DEVICE_TYPE

* Update rl_replacements.py

* Update loader.py

* AMD install script

* Move AMD

* Update _amd_install.sh

* Update pyproject.toml

* Update pyproject.toml

* Delete _amd_install.sh

* Update device_type.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Versioning

* Update pyproject.toml

* Update loader.py

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update _utils.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* local_files_only

* Cut Cross Entropy

* Update llama.py

* Update vision.py

* Update vision.py

* Update vision.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>
2025-10-30 06:45:57 -07:00
Daniel Han
067db89dc3 Update vision.py 2025-10-30 06:30:43 -07:00
Daniel Han
a3d6b3a4bf Update vision.py 2025-10-30 06:28:05 -07:00
Daniel Han
64136e6336 Update vision.py 2025-10-30 06:27:58 -07:00
Daniel Han
dfe35fb441 Update vision.py 2025-10-30 06:27:31 -07:00
Daniel Han
be1c2ca95c Update vision.py 2025-10-30 06:23:28 -07:00
Daniel Han
b3aa029c7a Update rl_replacements.py 2025-10-30 06:04:30 -07:00
Daniel Han
8f7e0164df Update vision.py 2025-10-30 05:54:26 -07:00
Daniel Han
ab98999a3f Update vision.py 2025-10-30 05:53:43 -07:00
Daniel Han
60081c2f24 Update import_fixes.py 2025-10-30 05:41:06 -07:00
Daniel Han
6ef73397f2 Update vision.py 2025-10-30 05:38:16 -07:00
Daniel Han
e88cb620ab Bug fixes 2025-10-30 05:35:47 -07:00
Daniel Han
810171d82c Merge branch 'main' of https://github.com/unslothai/unsloth 2025-10-29 06:31:40 -07:00
Daniel Han
df4133ac36 Update import_fixes.py 2025-10-29 05:43:36 -07:00
pluesclues
45b1c7f7c8 Grpo gradient accumulation edits (#3390)
* Update rl_replacements.py grpo accumulation kwargs

* Update rl.py, remove bnpo default when setting dapo

* Update rl.py

* Update rl_replacements.py, add support for vllm importance sampling

* Update rl_replacements.py, added ability to get metrics

* Update rl_replacements.py send sampling per token logps to backend

* Update rl_replacements.py, corrected if statement in monkey patch

* Update rl_replacements.py, updating to handle nan cases as well

* Update rl_replacements.py, imported text warp

* Update rl_replacements.py, yes

* Add error handling for sampling_per_token_logps

Handle NameError for sampling_per_token_logps assignment.

* Add delta check for use_vllm condition

* Refactor vision model flag to use is_vlm variable
2025-10-28 22:54:34 -07:00
Daniel Han
0e766b28f0 Versioning 2025-10-28 05:35:47 -07:00
Daniel Han
160ba77142 Quant Method missing 2025-10-28 05:26:51 -07:00
Daniel Han
2c47b8a7ac Update fp8.py 2025-10-26 23:29:14 -07:00
Daniel Han
52765eff31 Update fp8.py 2025-10-26 23:26:51 -07:00
Daniel Han
3ba905d0cc Update fp8.py 2025-10-26 23:24:57 -07:00
Datta Nimmaturi
2585e57b6e FP8 training enhancements (#3496)
* Fix FP8 for models with non 8 multiple weights

* patch fp8 forward methods for compiled models

* patch hf quantizer for fp8

* Failsafe import of fbgemmfp8linear and fp8linear

* Beautify
2025-10-26 23:22:20 -07:00
Daniel Han
b72306d148 Update pyproject.toml 2025-10-26 23:17:59 -07:00
Lei Zhenyuan
0079619063 enable support 2.9 for intel xpu (#3514) 2025-10-26 23:14:42 -07:00
Lei Zhenyuan
57a03c35f4 fix for intel memory (#3513) 2025-10-26 23:12:18 -07:00
Daniel Han
c9274533d2 Fix GPU name 2025-10-26 22:50:52 -07:00
Daniel Han
6f0f05518b Update loader.py 2025-10-26 22:40:59 -07:00
Daniel Han
0528b4ce71 Fixes 2025-10-26 22:39:38 -07:00
Daniel Han
5273eb5cd5 Update import_fixes.py 2025-10-26 22:34:39 -07:00
Daniel Han
b0498fc4dd OpenEnv patches 2025-10-26 22:31:04 -07:00
Daniel Han
9346b5ab6b Update pyproject.toml 2025-10-26 21:59:51 -07:00
Daniel Han
30631866de Add Torch 2.9 options 2025-10-26 21:49:30 -07:00
Lei Zhenyuan
281e38c918 add code for intel qlora (#3370)
* add code for intel qlora

* add specified code for xpu device
2025-10-26 21:44:29 -07:00
Lei Zhenyuan
e09787ab9d add code changes for pyproject.toml (#3381) 2025-10-26 21:43:17 -07:00
DoubleMathew
1c1f7033cd move PYTORCH_CUDA_ALLOC_CONF into zoo (#3499) 2025-10-26 21:29:18 -07:00
wangxunx
5d86b6e756 fix cross entropy loss issue for small vocab size on amd gpu (#3503) 2025-10-26 21:20:47 -07:00
Michael Han
c2e2474e51 Update CODE_OF_CONDUCT.md 2025-10-25 19:31:05 -07:00
Michael Han
381e181e99 Update README.md 2025-10-25 19:26:05 -07:00
Daniel Han
60ab88301e Versioning 2025-10-23 05:53:12 -07:00
Datta Nimmaturi
635cfdbbb0 Sleep trl patch (#3494)
* Patch sleep mode properly for trl

* empty cache after sleep/wakeup

* no extra wakeups

* Do not redo wakeups

* cleanup
2025-10-23 01:43:55 -07:00
Daniel Han
ee473f6c52 Update pyproject.toml 2025-10-22 07:57:55 -07:00
Daniel Han
54cfe1f241 Update _utils.py 2025-10-22 05:16:22 -07:00
Daniel Han
0cd0635a90 More Qwen3-VL 2025-10-22 05:12:09 -07:00
Datta Nimmaturi
26ddbb5b8e Patch sleep mode properly for trl (#3492) 2025-10-22 05:00:52 -07:00
Daniel Han
06162ad350 Update save.py 2025-10-21 11:33:40 -07:00
Daniel Han
5e1b4e744e Bug fixes (#3484)
* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* custom_datatype

* recheck

* Float16

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Bug fix

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* torch_dtype

* Update rl.py

* Fix CE Loss

* Versioning

* Update loader.py

* Update loader.py

* extract_model_type_from_config

* Model types

* Update loader.py

* get_transformers_model_type

* Update loader.py

* Update loader.py

* Update loader.py

* Update rl.py

* Update pyproject.toml

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update vision.py

* Update vision.py

* Fix DataParallel

* Update _utils.py

* Update rl.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update mapper.py

* Versioning

* Update loader.py

* Update loader.py

* Update rl.py

* Versioning

* Update _utils.py

* Fix auto_mapping

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Message

* Update vision.py

* Update loader.py

* Update vision.py

* cache_implementation

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Save max_seq_length

* Update _utils.py

* Update rl.py

* Update vision.py

* Update llama.py

* Mistral3 vllm (#3349)

* [WIP] use vLLM for vision language models

* Update README.md

Editing icon sizes

* Update README.md

Updating icon sizes

* Update README.md (#2885)

* MoE kernels AGPLv3

* versioning

* Many bug fixes (#2908)

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912)

* Dynamically adjust get_per_token_logps function and patch as well (#2911)

* add intel gpu with vllm support (#2903)

* [bugs] fix for casual mask (#2868)

* fix for casual mask

* use un_casual in sdpa

* add missing mask

* fix for type

* Explicitly check if xformers exists for attention (#2889)

* Update __init__.py

* Update llama.py

* if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913)

* Move inputs to right devices. (#2919)

* Move tensors to right devices

* fix multi gpu for non mistral models

* multi GPU RoPE for gemma2

* Finish up multi GPU inference

* Make multiGPU rope a list

* Remove unnecessary transfer to CPU

* Remove unnecessary move to CPU

* Donot move inputs to device yet

will be handled separately in another PR

* Move inputs to appropriate decoder device

* Make device count global variable

* Cleanup RoPE device code

* Fixup num_gpu to device count

* Cleanup device counts

* Use device index for RoPE get_cache

* Donot typecast

* Use tuple instead of list for tensors. Use device index directly

* fixup move to device logic

* WIP VLM vLLM

* Make vLLM patch a function

* Add save and load lora functions

* Make fast_inference setup depend on the flag

* Improve fast inference patching mechanism

* Make vision setting depend on checks in fastbasemodel

* Check LoRA and vLLM intercompatibility for vision models

* Comment pointing to vLLM LoRA check

* Improve lora validation on vLLM

* Error out on no vLLM and increase max lora rank

* Bug fixes (#3017)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* fix for casual mask (#3011)

* [intel] add for intel path for llama.py (#3012)

* fix for intel path

* remove unuse code

* Update unsloth/models/llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* Fix Gemma 2 (#3024)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* falcon force float32 on sm<75 machines (#3026)

* Fix torch compile issues (#3028)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* check stride

* Cleanup

* Update rope_embedding.py

* Update gemma2.py

* Fix `set_stance`

* Update pyproject.toml

* Update _utils.py

* Fixup patch vllm

* Disable mllama

* Use variables to decide VLM support

* Better attn_impl handling

* Patch TF protobuf incompatability

* Torch 2.8 (#3186)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update _auto_install.py

* Update pyproject.toml

* Update rl.py

* Protobuf issue

* Update pyproject.toml

* Fix extras transformers typo in pyproject.toml

* Update _utils.py

* Bug fixes (#3195)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* adallow float32 dtype in FastLanguageModel (#3204)

* Update loader.py

* Update vision.py

* Suppress message and use unsloth sampling params

* Use trl sampling params for now

* Improve error message

* fixup quantized fast inference model name

* Add mistral 3 support

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>

* Set padding to 0

* Fix patch

* fixup patch (#3359)

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update vision.py

* Versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* MXFP4 dequant

* Update loader.py

* Update vision.py

* load_in_16bit

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* offload_embedding

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update loader.py

* Fix padding issue

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* New models

* Update llama.py

* Versioning

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Fix AMD

* Update _utils.py

* Update llama.py

* Update vision.py

* DEVICE_TYPE_TORCH

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Move DEVICE_TYPE

* Update rl_replacements.py

* Update loader.py

* AMD install script

* Move AMD

* Update _amd_install.sh

* Update pyproject.toml

* Update pyproject.toml

* Delete _amd_install.sh

* Update device_type.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Versioning

* Update pyproject.toml

* Update loader.py

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update _utils.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* local_files_only

* Cut Cross Entropy

* Update llama.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>
2025-10-20 04:57:01 -07:00
Daniel Han
462e59b5e1 Bug fixes (#3483)
* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* custom_datatype

* recheck

* Float16

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Bug fix

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* torch_dtype

* Update rl.py

* Fix CE Loss

* Versioning

* Update loader.py

* Update loader.py

* extract_model_type_from_config

* Model types

* Update loader.py

* get_transformers_model_type

* Update loader.py

* Update loader.py

* Update loader.py

* Update rl.py

* Update pyproject.toml

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update vision.py

* Update vision.py

* Fix DataParallel

* Update _utils.py

* Update rl.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update mapper.py

* Versioning

* Update loader.py

* Update loader.py

* Update rl.py

* Versioning

* Update _utils.py

* Fix auto_mapping

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Message

* Update vision.py

* Update loader.py

* Update vision.py

* cache_implementation

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Save max_seq_length

* Update _utils.py

* Update rl.py

* Update vision.py

* Update llama.py

* Mistral3 vllm (#3349)

* [WIP] use vLLM for vision language models

* Update README.md

Editing icon sizes

* Update README.md

Updating icon sizes

* Update README.md (#2885)

* MoE kernels AGPLv3

* versioning

* Many bug fixes (#2908)

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912)

* Dynamically adjust get_per_token_logps function and patch as well (#2911)

* add intel gpu with vllm support (#2903)

* [bugs] fix for casual mask (#2868)

* fix for casual mask

* use un_casual in sdpa

* add missing mask

* fix for type

* Explicitly check if xformers exists for attention (#2889)

* Update __init__.py

* Update llama.py

* if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913)

* Move inputs to right devices. (#2919)

* Move tensors to right devices

* fix multi gpu for non mistral models

* multi GPU RoPE for gemma2

* Finish up multi GPU inference

* Make multiGPU rope a list

* Remove unnecessary transfer to CPU

* Remove unnecessary move to CPU

* Donot move inputs to device yet

will be handled separately in another PR

* Move inputs to appropriate decoder device

* Make device count global variable

* Cleanup RoPE device code

* Fixup num_gpu to device count

* Cleanup device counts

* Use device index for RoPE get_cache

* Donot typecast

* Use tuple instead of list for tensors. Use device index directly

* fixup move to device logic

* WIP VLM vLLM

* Make vLLM patch a function

* Add save and load lora functions

* Make fast_inference setup depend on the flag

* Improve fast inference patching mechanism

* Make vision setting depend on checks in fastbasemodel

* Check LoRA and vLLM intercompatibility for vision models

* Comment pointing to vLLM LoRA check

* Improve lora validation on vLLM

* Error out on no vLLM and increase max lora rank

* Bug fixes (#3017)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* fix for casual mask (#3011)

* [intel] add for intel path for llama.py (#3012)

* fix for intel path

* remove unuse code

* Update unsloth/models/llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* Fix Gemma 2 (#3024)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* falcon force float32 on sm<75 machines (#3026)

* Fix torch compile issues (#3028)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* check stride

* Cleanup

* Update rope_embedding.py

* Update gemma2.py

* Fix `set_stance`

* Update pyproject.toml

* Update _utils.py

* Fixup patch vllm

* Disable mllama

* Use variables to decide VLM support

* Better attn_impl handling

* Patch TF protobuf incompatability

* Torch 2.8 (#3186)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update _auto_install.py

* Update pyproject.toml

* Update rl.py

* Protobuf issue

* Update pyproject.toml

* Fix extras transformers typo in pyproject.toml

* Update _utils.py

* Bug fixes (#3195)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* adallow float32 dtype in FastLanguageModel (#3204)

* Update loader.py

* Update vision.py

* Suppress message and use unsloth sampling params

* Use trl sampling params for now

* Improve error message

* fixup quantized fast inference model name

* Add mistral 3 support

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>

* Set padding to 0

* Fix patch

* fixup patch (#3359)

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update vision.py

* Versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* MXFP4 dequant

* Update loader.py

* Update vision.py

* load_in_16bit

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* offload_embedding

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update loader.py

* Fix padding issue

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* New models

* Update llama.py

* Versioning

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Fix AMD

* Update _utils.py

* Update llama.py

* Update vision.py

* DEVICE_TYPE_TORCH

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Move DEVICE_TYPE

* Update rl_replacements.py

* Update loader.py

* AMD install script

* Move AMD

* Update _amd_install.sh

* Update pyproject.toml

* Update pyproject.toml

* Delete _amd_install.sh

* Update device_type.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Versioning

* Update pyproject.toml

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>
2025-10-19 23:21:39 -07:00
Daniel Han
267e74b624 Update pyproject.toml 2025-10-18 18:11:58 -07:00
Daniel Han
db4f000e07 Update __init__.py 2025-10-18 17:35:50 -07:00
Daniel Han
7e0ea4c66f Zoo 2025-10-18 17:34:32 -07:00
Daniel Han
7520006b6a Update __init__.py 2025-10-18 17:32:01 -07:00
Daniel Han
9714ab85f2 Update utils.py 2025-10-17 20:51:54 -07:00
Daniel Han
f05f7c019d Update utils.py 2025-10-17 17:07:02 -07:00
wangxunx
caeb7f7cb9 fix out of resources issue for llama3.2 sft on amd gpu (#3455)
Co-authored-by: Xun Wang <xunwang2@amd.com>
2025-10-17 16:24:02 -07:00
Dan Saunders
f845cf964f EOL LF (unix line endings) normalization (#3478) 2025-10-17 16:22:42 -07:00
Daniel Han
f62c454a86 GRPO bug fixes (#3474)
* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* custom_datatype

* recheck

* Float16

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Bug fix

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* torch_dtype

* Update rl.py

* Fix CE Loss

* Versioning

* Update loader.py

* Update loader.py

* extract_model_type_from_config

* Model types

* Update loader.py

* get_transformers_model_type

* Update loader.py

* Update loader.py

* Update loader.py

* Update rl.py

* Update pyproject.toml

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update vision.py

* Update vision.py

* Fix DataParallel

* Update _utils.py

* Update rl.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update mapper.py

* Versioning

* Update loader.py

* Update loader.py

* Update rl.py

* Versioning

* Update _utils.py

* Fix auto_mapping

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Message

* Update vision.py

* Update loader.py

* Update vision.py

* cache_implementation

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Save max_seq_length

* Update _utils.py

* Update rl.py

* Update vision.py

* Update llama.py

* Mistral3 vllm (#3349)

* [WIP] use vLLM for vision language models

* Update README.md

Editing icon sizes

* Update README.md

Updating icon sizes

* Update README.md (#2885)

* MoE kernels AGPLv3

* versioning

* Many bug fixes (#2908)

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912)

* Dynamically adjust get_per_token_logps function and patch as well (#2911)

* add intel gpu with vllm support (#2903)

* [bugs] fix for casual mask (#2868)

* fix for casual mask

* use un_casual in sdpa

* add missing mask

* fix for type

* Explicitly check if xformers exists for attention (#2889)

* Update __init__.py

* Update llama.py

* if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913)

* Move inputs to right devices. (#2919)

* Move tensors to right devices

* fix multi gpu for non mistral models

* multi GPU RoPE for gemma2

* Finish up multi GPU inference

* Make multiGPU rope a list

* Remove unnecessary transfer to CPU

* Remove unnecessary move to CPU

* Donot move inputs to device yet

will be handled separately in another PR

* Move inputs to appropriate decoder device

* Make device count global variable

* Cleanup RoPE device code

* Fixup num_gpu to device count

* Cleanup device counts

* Use device index for RoPE get_cache

* Donot typecast

* Use tuple instead of list for tensors. Use device index directly

* fixup move to device logic

* WIP VLM vLLM

* Make vLLM patch a function

* Add save and load lora functions

* Make fast_inference setup depend on the flag

* Improve fast inference patching mechanism

* Make vision setting depend on checks in fastbasemodel

* Check LoRA and vLLM intercompatibility for vision models

* Comment pointing to vLLM LoRA check

* Improve lora validation on vLLM

* Error out on no vLLM and increase max lora rank

* Bug fixes (#3017)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* fix for casual mask (#3011)

* [intel] add for intel path for llama.py (#3012)

* fix for intel path

* remove unuse code

* Update unsloth/models/llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* Fix Gemma 2 (#3024)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* falcon force float32 on sm<75 machines (#3026)

* Fix torch compile issues (#3028)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* check stride

* Cleanup

* Update rope_embedding.py

* Update gemma2.py

* Fix `set_stance`

* Update pyproject.toml

* Update _utils.py

* Fixup patch vllm

* Disable mllama

* Use variables to decide VLM support

* Better attn_impl handling

* Patch TF protobuf incompatability

* Torch 2.8 (#3186)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update _auto_install.py

* Update pyproject.toml

* Update rl.py

* Protobuf issue

* Update pyproject.toml

* Fix extras transformers typo in pyproject.toml

* Update _utils.py

* Bug fixes (#3195)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* adallow float32 dtype in FastLanguageModel (#3204)

* Update loader.py

* Update vision.py

* Suppress message and use unsloth sampling params

* Use trl sampling params for now

* Improve error message

* fixup quantized fast inference model name

* Add mistral 3 support

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>

* Set padding to 0

* Fix patch

* fixup patch (#3359)

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update vision.py

* Versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* MXFP4 dequant

* Update loader.py

* Update vision.py

* load_in_16bit

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* offload_embedding

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update loader.py

* Fix padding issue

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* New models

* Update llama.py

* Versioning

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Fix AMD

* Update _utils.py

* Update llama.py

* Update vision.py

* DEVICE_TYPE_TORCH

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Move DEVICE_TYPE

* Update rl_replacements.py

* Update loader.py

* AMD install script

* Move AMD

* Update _amd_install.sh

* Update pyproject.toml

* Update pyproject.toml

* Delete _amd_install.sh

* Update device_type.py

* Update loader.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>
2025-10-17 06:56:12 -07:00
Daniel Han
b80f110a44 Update _utils.py 2025-10-17 06:43:11 -07:00
Daniel Han
657580cd67 Update loader.py 2025-10-17 05:19:26 -07:00
Daniel Han
287b67eb91 Update device_type.py 2025-10-17 04:57:39 -07:00
Daniel Han
279063c7ed Update device_type.py 2025-10-17 04:55:21 -07:00
Daniel Han
33b67f1ebe Missing inspect 2025-10-17 04:54:58 -07:00
Daniel Han
7271f79284 Disable BnB for AMD 2025-10-17 04:48:29 -07:00
Daniel Han
edff83544b Update pyproject.toml 2025-10-17 04:29:48 -07:00
Daniel Han
e5c7fe9c53 Delete _amd_install.sh 2025-10-17 04:13:25 -07:00
Daniel Han
cb8484eaf1 Fix transformers 4.57.1 (#3473)
* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* custom_datatype

* recheck

* Float16

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Bug fix

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* torch_dtype

* Update rl.py

* Fix CE Loss

* Versioning

* Update loader.py

* Update loader.py

* extract_model_type_from_config

* Model types

* Update loader.py

* get_transformers_model_type

* Update loader.py

* Update loader.py

* Update loader.py

* Update rl.py

* Update pyproject.toml

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update vision.py

* Update vision.py

* Fix DataParallel

* Update _utils.py

* Update rl.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update mapper.py

* Versioning

* Update loader.py

* Update loader.py

* Update rl.py

* Versioning

* Update _utils.py

* Fix auto_mapping

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Message

* Update vision.py

* Update loader.py

* Update vision.py

* cache_implementation

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Save max_seq_length

* Update _utils.py

* Update rl.py

* Update vision.py

* Update llama.py

* Mistral3 vllm (#3349)

* [WIP] use vLLM for vision language models

* Update README.md

Editing icon sizes

* Update README.md

Updating icon sizes

* Update README.md (#2885)

* MoE kernels AGPLv3

* versioning

* Many bug fixes (#2908)

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912)

* Dynamically adjust get_per_token_logps function and patch as well (#2911)

* add intel gpu with vllm support (#2903)

* [bugs] fix for casual mask (#2868)

* fix for casual mask

* use un_casual in sdpa

* add missing mask

* fix for type

* Explicitly check if xformers exists for attention (#2889)

* Update __init__.py

* Update llama.py

* if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913)

* Move inputs to right devices. (#2919)

* Move tensors to right devices

* fix multi gpu for non mistral models

* multi GPU RoPE for gemma2

* Finish up multi GPU inference

* Make multiGPU rope a list

* Remove unnecessary transfer to CPU

* Remove unnecessary move to CPU

* Donot move inputs to device yet

will be handled separately in another PR

* Move inputs to appropriate decoder device

* Make device count global variable

* Cleanup RoPE device code

* Fixup num_gpu to device count

* Cleanup device counts

* Use device index for RoPE get_cache

* Donot typecast

* Use tuple instead of list for tensors. Use device index directly

* fixup move to device logic

* WIP VLM vLLM

* Make vLLM patch a function

* Add save and load lora functions

* Make fast_inference setup depend on the flag

* Improve fast inference patching mechanism

* Make vision setting depend on checks in fastbasemodel

* Check LoRA and vLLM intercompatibility for vision models

* Comment pointing to vLLM LoRA check

* Improve lora validation on vLLM

* Error out on no vLLM and increase max lora rank

* Bug fixes (#3017)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* fix for casual mask (#3011)

* [intel] add for intel path for llama.py (#3012)

* fix for intel path

* remove unuse code

* Update unsloth/models/llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* Fix Gemma 2 (#3024)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* falcon force float32 on sm<75 machines (#3026)

* Fix torch compile issues (#3028)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* check stride

* Cleanup

* Update rope_embedding.py

* Update gemma2.py

* Fix `set_stance`

* Update pyproject.toml

* Update _utils.py

* Fixup patch vllm

* Disable mllama

* Use variables to decide VLM support

* Better attn_impl handling

* Patch TF protobuf incompatability

* Torch 2.8 (#3186)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update _auto_install.py

* Update pyproject.toml

* Update rl.py

* Protobuf issue

* Update pyproject.toml

* Fix extras transformers typo in pyproject.toml

* Update _utils.py

* Bug fixes (#3195)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* adallow float32 dtype in FastLanguageModel (#3204)

* Update loader.py

* Update vision.py

* Suppress message and use unsloth sampling params

* Use trl sampling params for now

* Improve error message

* fixup quantized fast inference model name

* Add mistral 3 support

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>

* Set padding to 0

* Fix patch

* fixup patch (#3359)

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update vision.py

* Versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* MXFP4 dequant

* Update loader.py

* Update vision.py

* load_in_16bit

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* offload_embedding

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update loader.py

* Fix padding issue

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* New models

* Update llama.py

* Versioning

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Fix AMD

* Update _utils.py

* Update llama.py

* Update vision.py

* DEVICE_TYPE_TORCH

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Move DEVICE_TYPE

* Update rl_replacements.py

* Update loader.py

* AMD install script

* Move AMD

* Update _amd_install.sh

* Update pyproject.toml

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>
2025-10-17 04:05:10 -07:00
Daniel Han
77b5256f71 Update _utils.py 2025-10-16 15:59:38 -07:00
Daniel Han
c51abba19f Update _utils.py 2025-10-16 15:36:58 -07:00
Daniel Han
0ede099ef0 AMD fixes (#3467)
* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* custom_datatype

* recheck

* Float16

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Bug fix

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* torch_dtype

* Update rl.py

* Fix CE Loss

* Versioning

* Update loader.py

* Update loader.py

* extract_model_type_from_config

* Model types

* Update loader.py

* get_transformers_model_type

* Update loader.py

* Update loader.py

* Update loader.py

* Update rl.py

* Update pyproject.toml

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update vision.py

* Update vision.py

* Fix DataParallel

* Update _utils.py

* Update rl.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update mapper.py

* Versioning

* Update loader.py

* Update loader.py

* Update rl.py

* Versioning

* Update _utils.py

* Fix auto_mapping

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Message

* Update vision.py

* Update loader.py

* Update vision.py

* cache_implementation

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Save max_seq_length

* Update _utils.py

* Update rl.py

* Update vision.py

* Update llama.py

* Mistral3 vllm (#3349)

* [WIP] use vLLM for vision language models

* Update README.md

Editing icon sizes

* Update README.md

Updating icon sizes

* Update README.md (#2885)

* MoE kernels AGPLv3

* versioning

* Many bug fixes (#2908)

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912)

* Dynamically adjust get_per_token_logps function and patch as well (#2911)

* add intel gpu with vllm support (#2903)

* [bugs] fix for casual mask (#2868)

* fix for casual mask

* use un_casual in sdpa

* add missing mask

* fix for type

* Explicitly check if xformers exists for attention (#2889)

* Update __init__.py

* Update llama.py

* if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913)

* Move inputs to right devices. (#2919)

* Move tensors to right devices

* fix multi gpu for non mistral models

* multi GPU RoPE for gemma2

* Finish up multi GPU inference

* Make multiGPU rope a list

* Remove unnecessary transfer to CPU

* Remove unnecessary move to CPU

* Donot move inputs to device yet

will be handled separately in another PR

* Move inputs to appropriate decoder device

* Make device count global variable

* Cleanup RoPE device code

* Fixup num_gpu to device count

* Cleanup device counts

* Use device index for RoPE get_cache

* Donot typecast

* Use tuple instead of list for tensors. Use device index directly

* fixup move to device logic

* WIP VLM vLLM

* Make vLLM patch a function

* Add save and load lora functions

* Make fast_inference setup depend on the flag

* Improve fast inference patching mechanism

* Make vision setting depend on checks in fastbasemodel

* Check LoRA and vLLM intercompatibility for vision models

* Comment pointing to vLLM LoRA check

* Improve lora validation on vLLM

* Error out on no vLLM and increase max lora rank

* Bug fixes (#3017)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* fix for casual mask (#3011)

* [intel] add for intel path for llama.py (#3012)

* fix for intel path

* remove unuse code

* Update unsloth/models/llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* Fix Gemma 2 (#3024)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* falcon force float32 on sm<75 machines (#3026)

* Fix torch compile issues (#3028)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* check stride

* Cleanup

* Update rope_embedding.py

* Update gemma2.py

* Fix `set_stance`

* Update pyproject.toml

* Update _utils.py

* Fixup patch vllm

* Disable mllama

* Use variables to decide VLM support

* Better attn_impl handling

* Patch TF protobuf incompatability

* Torch 2.8 (#3186)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update _auto_install.py

* Update pyproject.toml

* Update rl.py

* Protobuf issue

* Update pyproject.toml

* Fix extras transformers typo in pyproject.toml

* Update _utils.py

* Bug fixes (#3195)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* adallow float32 dtype in FastLanguageModel (#3204)

* Update loader.py

* Update vision.py

* Suppress message and use unsloth sampling params

* Use trl sampling params for now

* Improve error message

* fixup quantized fast inference model name

* Add mistral 3 support

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>

* Set padding to 0

* Fix patch

* fixup patch (#3359)

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update vision.py

* Versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* MXFP4 dequant

* Update loader.py

* Update vision.py

* load_in_16bit

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* offload_embedding

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update loader.py

* Fix padding issue

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* New models

* Update llama.py

* Versioning

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Fix AMD

* Update _utils.py

* Update llama.py

* Update vision.py

* DEVICE_TYPE_TORCH

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>
2025-10-16 07:05:58 -07:00
Daniel Han
6440419207 Fix (#3466)
* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* custom_datatype

* recheck

* Float16

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Bug fix

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* torch_dtype

* Update rl.py

* Fix CE Loss

* Versioning

* Update loader.py

* Update loader.py

* extract_model_type_from_config

* Model types

* Update loader.py

* get_transformers_model_type

* Update loader.py

* Update loader.py

* Update loader.py

* Update rl.py

* Update pyproject.toml

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update vision.py

* Update vision.py

* Fix DataParallel

* Update _utils.py

* Update rl.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update mapper.py

* Versioning

* Update loader.py

* Update loader.py

* Update rl.py

* Versioning

* Update _utils.py

* Fix auto_mapping

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Message

* Update vision.py

* Update loader.py

* Update vision.py

* cache_implementation

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Save max_seq_length

* Update _utils.py

* Update rl.py

* Update vision.py

* Update llama.py

* Mistral3 vllm (#3349)

* [WIP] use vLLM for vision language models

* Update README.md

Editing icon sizes

* Update README.md

Updating icon sizes

* Update README.md (#2885)

* MoE kernels AGPLv3

* versioning

* Many bug fixes (#2908)

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912)

* Dynamically adjust get_per_token_logps function and patch as well (#2911)

* add intel gpu with vllm support (#2903)

* [bugs] fix for casual mask (#2868)

* fix for casual mask

* use un_casual in sdpa

* add missing mask

* fix for type

* Explicitly check if xformers exists for attention (#2889)

* Update __init__.py

* Update llama.py

* if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913)

* Move inputs to right devices. (#2919)

* Move tensors to right devices

* fix multi gpu for non mistral models

* multi GPU RoPE for gemma2

* Finish up multi GPU inference

* Make multiGPU rope a list

* Remove unnecessary transfer to CPU

* Remove unnecessary move to CPU

* Donot move inputs to device yet

will be handled separately in another PR

* Move inputs to appropriate decoder device

* Make device count global variable

* Cleanup RoPE device code

* Fixup num_gpu to device count

* Cleanup device counts

* Use device index for RoPE get_cache

* Donot typecast

* Use tuple instead of list for tensors. Use device index directly

* fixup move to device logic

* WIP VLM vLLM

* Make vLLM patch a function

* Add save and load lora functions

* Make fast_inference setup depend on the flag

* Improve fast inference patching mechanism

* Make vision setting depend on checks in fastbasemodel

* Check LoRA and vLLM intercompatibility for vision models

* Comment pointing to vLLM LoRA check

* Improve lora validation on vLLM

* Error out on no vLLM and increase max lora rank

* Bug fixes (#3017)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* fix for casual mask (#3011)

* [intel] add for intel path for llama.py (#3012)

* fix for intel path

* remove unuse code

* Update unsloth/models/llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* Fix Gemma 2 (#3024)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* falcon force float32 on sm<75 machines (#3026)

* Fix torch compile issues (#3028)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* check stride

* Cleanup

* Update rope_embedding.py

* Update gemma2.py

* Fix `set_stance`

* Update pyproject.toml

* Update _utils.py

* Fixup patch vllm

* Disable mllama

* Use variables to decide VLM support

* Better attn_impl handling

* Patch TF protobuf incompatability

* Torch 2.8 (#3186)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update _auto_install.py

* Update pyproject.toml

* Update rl.py

* Protobuf issue

* Update pyproject.toml

* Fix extras transformers typo in pyproject.toml

* Update _utils.py

* Bug fixes (#3195)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* adallow float32 dtype in FastLanguageModel (#3204)

* Update loader.py

* Update vision.py

* Suppress message and use unsloth sampling params

* Use trl sampling params for now

* Improve error message

* fixup quantized fast inference model name

* Add mistral 3 support

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>

* Set padding to 0

* Fix patch

* fixup patch (#3359)

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update vision.py

* Versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* MXFP4 dequant

* Update loader.py

* Update vision.py

* load_in_16bit

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* offload_embedding

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update loader.py

* Fix padding issue

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* New models

* Update llama.py

* Versioning

* Update _utils.py

* Update llama.py

* Update _utils.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>
2025-10-16 05:38:54 -07:00
Daniel Han
c95dea33a9 Update rl.py 2025-10-16 05:26:36 -07:00
Daniel Han
516f771697 Update _utils.py 2025-10-16 05:13:07 -07:00
Daniel Han
2019d0653f Update _utils.py 2025-10-16 05:06:37 -07:00
Daniel Han
56a4eb4212 Update _utils.py 2025-10-16 04:48:55 -07:00
Daniel Han
7770a24ab2 Update _utils.py 2025-10-16 04:45:27 -07:00
Daniel Han
a6cf91d869 TorchAO 2025-10-16 03:55:20 -07:00
Datta Nimmaturi
1dd5485e95 vLLM FP8 quantized support for SFT/GRPO (#3414)
* Prefer loading model from pretrained instead of config

* Fixup FP8 forward pass and inference

* [WIP] Fix lora forwards

* Infer block size from weight shapes

* reconstruct weights from fp8 quants for lora matmul

* Return weight transpose and fix dtype

* Refactor FP8 operations

* Fix naming :)

* Saner compile

* do not depend on transformers

* [WIP] fix training

* Update comment

* fixup training

* use dequant kernel from deepseek

* Differentiate between fp8 and fbgemmfp8

* fixup differentiation b/w fp8 and fbgemm_fp8

* make inputs contiguous if required

* Improve dequant

* More robust handling

* Fixup backward pass for fbgemm_fp8

* refactor and use bf16 for dequant

* Use torch fp8 block matmul

* Disable torch block matmul for now

* safer import and cosmetics

* more cosmectics

* add torchao operations

* Spaceeeeeee
2025-10-16 03:07:05 -07:00
Daniel Han
7797f373d1 Update _utils.py 2025-10-16 02:42:26 -07:00
Daniel Han
e7ab86db0d Update _utils.py 2025-10-15 14:30:50 -07:00
Daniel Han
dc060f0f7d Update _utils.py 2025-10-15 14:29:06 -07:00
Daniel Han
0dcfb03b8b Update _utils.py 2025-10-15 14:28:19 -07:00
Michael Han
e1a9c130e5 Update README.md
Qwen3-VL + DGX
2025-10-14 20:23:32 -07:00
Daniel Han
a219198d41 Update mapper.py 2025-10-14 16:22:47 -07:00
Daniel Han
7710adc318 Update mapper.py 2025-10-14 08:02:03 -07:00
Daniel Han
b81935bd21 Versioning 2025-10-14 07:27:38 -07:00
Daniel Han
7d2f07a0b2 Update mapper.py 2025-10-14 07:20:22 -07:00
Daniel Han
41ad82c1ba Update __init__.py 2025-10-14 05:54:32 -07:00
Daniel Han
781a4507e5 Update import_fixes.py 2025-10-14 05:40:21 -07:00
Daniel Han
abdf91927c Update loader.py 2025-10-14 05:29:20 -07:00
Daniel Han
b3fc77f1fa Update pyproject.toml 2025-10-14 01:55:39 -07:00
Daniel Han
f98ebd192f Update _utils.py 2025-10-14 01:52:05 -07:00
Roland Tannous
e35be2b490 [Part2] Reinstate llama.cpp Compatibility and GGUF Conversion with Multiple Quantizations and Automated Ollama Modelfile Creation (#3356)
* GGUF conversion code + model to template mappers + chat template adds/fixes

* syntax fixes

* extract tokenizer from video processor

* model file cleanup after multiple quantizations

* flip is_vlm flag is mmproj has text only llama.cpp support for MLM

* preserve processor files for merge operation

* reinstate chr(92)

* fixed starling mapping

* ollama Modelfile from gguf for text models

* specify bf16 ollama model precision for vision models

* fix keyError in templatedict when no mapping

* revert chat_templates.py to original syntax

* ollama modelfile template to model mapper

* link save to ollama mapper, fix some bugs

* rename to ollama_template_mappers

* Remove old template_mappers file (renamed ollama_template_mappers)

* fix final printout

* fix model list and printout

* remove yi base model, keep chat/instruct

* fixed dangling > in HF repo readme for uploaded models

* added granite model ollama support

* Combine use_local_gguf() blocks

* model_name relative to base_model_name
2025-10-14 01:23:14 -07:00
pluesclues
0aed5ae94a Fix eval metric issue (#3420)
* Update rl.py, added fix eval metric issue from online DPO

* Update rl.py, enabled unsloth return logits flag for metrics
2025-10-13 17:20:46 -07:00
Etherll
3d73aebec8 improve qat (#3446)
* Update save.py

* Update vision.py

* Update save.py
2025-10-13 17:03:15 -07:00
DoubleMathew
33023b9ac9 Handle transformers rename from PretrainedConfig to PreTrainedConfig (#3445) 2025-10-13 16:00:28 -07:00
Michael Han
a049fcd460 Update README.md 2025-10-12 05:32:42 -07:00
Daniel Han
57e7589b55 Update pyproject.toml 2025-10-07 20:52:38 -07:00
Daniel Han
41ee93a2e8 Update pyproject.toml 2025-10-07 20:50:31 -07:00
Daniel Han
452ad1959e Gemma 3 bug fixes (#3410)
* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* custom_datatype

* recheck

* Float16

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Bug fix

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* torch_dtype

* Update rl.py

* Fix CE Loss

* Versioning

* Update loader.py

* Update loader.py

* extract_model_type_from_config

* Model types

* Update loader.py

* get_transformers_model_type

* Update loader.py

* Update loader.py

* Update loader.py

* Update rl.py

* Update pyproject.toml

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update vision.py

* Update vision.py

* Fix DataParallel

* Update _utils.py

* Update rl.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update mapper.py

* Versioning

* Update loader.py

* Update loader.py

* Update rl.py

* Versioning

* Update _utils.py

* Fix auto_mapping

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Message

* Update vision.py

* Update loader.py

* Update vision.py

* cache_implementation

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Save max_seq_length

* Update _utils.py

* Update rl.py

* Update vision.py

* Update llama.py

* Mistral3 vllm (#3349)

* [WIP] use vLLM for vision language models

* Update README.md

Editing icon sizes

* Update README.md

Updating icon sizes

* Update README.md (#2885)

* MoE kernels AGPLv3

* versioning

* Many bug fixes (#2908)

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912)

* Dynamically adjust get_per_token_logps function and patch as well (#2911)

* add intel gpu with vllm support (#2903)

* [bugs] fix for casual mask (#2868)

* fix for casual mask

* use un_casual in sdpa

* add missing mask

* fix for type

* Explicitly check if xformers exists for attention (#2889)

* Update __init__.py

* Update llama.py

* if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913)

* Move inputs to right devices. (#2919)

* Move tensors to right devices

* fix multi gpu for non mistral models

* multi GPU RoPE for gemma2

* Finish up multi GPU inference

* Make multiGPU rope a list

* Remove unnecessary transfer to CPU

* Remove unnecessary move to CPU

* Donot move inputs to device yet

will be handled separately in another PR

* Move inputs to appropriate decoder device

* Make device count global variable

* Cleanup RoPE device code

* Fixup num_gpu to device count

* Cleanup device counts

* Use device index for RoPE get_cache

* Donot typecast

* Use tuple instead of list for tensors. Use device index directly

* fixup move to device logic

* WIP VLM vLLM

* Make vLLM patch a function

* Add save and load lora functions

* Make fast_inference setup depend on the flag

* Improve fast inference patching mechanism

* Make vision setting depend on checks in fastbasemodel

* Check LoRA and vLLM intercompatibility for vision models

* Comment pointing to vLLM LoRA check

* Improve lora validation on vLLM

* Error out on no vLLM and increase max lora rank

* Bug fixes (#3017)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* fix for casual mask (#3011)

* [intel] add for intel path for llama.py (#3012)

* fix for intel path

* remove unuse code

* Update unsloth/models/llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* Fix Gemma 2 (#3024)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* falcon force float32 on sm<75 machines (#3026)

* Fix torch compile issues (#3028)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* check stride

* Cleanup

* Update rope_embedding.py

* Update gemma2.py

* Fix `set_stance`

* Update pyproject.toml

* Update _utils.py

* Fixup patch vllm

* Disable mllama

* Use variables to decide VLM support

* Better attn_impl handling

* Patch TF protobuf incompatability

* Torch 2.8 (#3186)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update _auto_install.py

* Update pyproject.toml

* Update rl.py

* Protobuf issue

* Update pyproject.toml

* Fix extras transformers typo in pyproject.toml

* Update _utils.py

* Bug fixes (#3195)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* adallow float32 dtype in FastLanguageModel (#3204)

* Update loader.py

* Update vision.py

* Suppress message and use unsloth sampling params

* Use trl sampling params for now

* Improve error message

* fixup quantized fast inference model name

* Add mistral 3 support

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>

* Set padding to 0

* Fix patch

* fixup patch (#3359)

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update vision.py

* Versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* MXFP4 dequant

* Update loader.py

* Update vision.py

* load_in_16bit

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* offload_embedding

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update loader.py

* Fix padding issue

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* New models

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>
2025-10-05 00:54:24 -07:00
Michael Han
b235ec7f7f Update README.md 2025-10-04 16:12:02 -07:00
Michael Han
afe9d39981 Update README.md 2025-10-03 04:18:21 -07:00
Michael Han
aeb2829ec9 Update README.md 2025-10-03 04:01:17 -07:00
Scott Roy
291987113b up (#3391) 2025-10-01 19:14:32 -07:00
Michael Han
5745677718 Adding Docker support 2025-10-01 17:04:46 -07:00
Daniel Han
a0ce4a982a Nightly (#3394)
* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* custom_datatype

* recheck

* Float16

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Bug fix

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* torch_dtype

* Update rl.py

* Fix CE Loss

* Versioning

* Update loader.py

* Update loader.py

* extract_model_type_from_config

* Model types

* Update loader.py

* get_transformers_model_type

* Update loader.py

* Update loader.py

* Update loader.py

* Update rl.py

* Update pyproject.toml

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update vision.py

* Update vision.py

* Fix DataParallel

* Update _utils.py

* Update rl.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update mapper.py

* Versioning

* Update loader.py

* Update loader.py

* Update rl.py

* Versioning

* Update _utils.py

* Fix auto_mapping

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Message

* Update vision.py

* Update loader.py

* Update vision.py

* cache_implementation

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Save max_seq_length

* Update _utils.py

* Update rl.py

* Update vision.py

* Update llama.py

* Mistral3 vllm (#3349)

* [WIP] use vLLM for vision language models

* Update README.md

Editing icon sizes

* Update README.md

Updating icon sizes

* Update README.md (#2885)

* MoE kernels AGPLv3

* versioning

* Many bug fixes (#2908)

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912)

* Dynamically adjust get_per_token_logps function and patch as well (#2911)

* add intel gpu with vllm support (#2903)

* [bugs] fix for casual mask (#2868)

* fix for casual mask

* use un_casual in sdpa

* add missing mask

* fix for type

* Explicitly check if xformers exists for attention (#2889)

* Update __init__.py

* Update llama.py

* if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913)

* Move inputs to right devices. (#2919)

* Move tensors to right devices

* fix multi gpu for non mistral models

* multi GPU RoPE for gemma2

* Finish up multi GPU inference

* Make multiGPU rope a list

* Remove unnecessary transfer to CPU

* Remove unnecessary move to CPU

* Donot move inputs to device yet

will be handled separately in another PR

* Move inputs to appropriate decoder device

* Make device count global variable

* Cleanup RoPE device code

* Fixup num_gpu to device count

* Cleanup device counts

* Use device index for RoPE get_cache

* Donot typecast

* Use tuple instead of list for tensors. Use device index directly

* fixup move to device logic

* WIP VLM vLLM

* Make vLLM patch a function

* Add save and load lora functions

* Make fast_inference setup depend on the flag

* Improve fast inference patching mechanism

* Make vision setting depend on checks in fastbasemodel

* Check LoRA and vLLM intercompatibility for vision models

* Comment pointing to vLLM LoRA check

* Improve lora validation on vLLM

* Error out on no vLLM and increase max lora rank

* Bug fixes (#3017)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* fix for casual mask (#3011)

* [intel] add for intel path for llama.py (#3012)

* fix for intel path

* remove unuse code

* Update unsloth/models/llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* Fix Gemma 2 (#3024)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* falcon force float32 on sm<75 machines (#3026)

* Fix torch compile issues (#3028)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* check stride

* Cleanup

* Update rope_embedding.py

* Update gemma2.py

* Fix `set_stance`

* Update pyproject.toml

* Update _utils.py

* Fixup patch vllm

* Disable mllama

* Use variables to decide VLM support

* Better attn_impl handling

* Patch TF protobuf incompatability

* Torch 2.8 (#3186)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update _auto_install.py

* Update pyproject.toml

* Update rl.py

* Protobuf issue

* Update pyproject.toml

* Fix extras transformers typo in pyproject.toml

* Update _utils.py

* Bug fixes (#3195)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* adallow float32 dtype in FastLanguageModel (#3204)

* Update loader.py

* Update vision.py

* Suppress message and use unsloth sampling params

* Use trl sampling params for now

* Improve error message

* fixup quantized fast inference model name

* Add mistral 3 support

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>

* Set padding to 0

* Fix patch

* fixup patch (#3359)

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update vision.py

* Versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* MXFP4 dequant

* Update loader.py

* Update vision.py

* load_in_16bit

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* offload_embedding

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update loader.py

* Fix padding issue

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>
2025-10-01 05:07:14 -07:00
Daniel Han
f78abbc751 Update vision.py 2025-10-01 04:36:58 -07:00
Daniel Han
cc810e17cb Update vision.py 2025-10-01 04:15:14 -07:00
Daniel Han
0ca2a140a0 execute_with_time_limit 2025-10-01 01:20:30 -07:00
Daniel Han
72d4ce88c0 Update __init__.py 2025-09-30 23:14:35 -07:00
Daniel Han
d3e04ca1d6 Update _utils.py 2025-09-30 23:10:29 -07:00
Daniel Han
83bf7d1435 Update _utils.py 2025-09-30 23:07:19 -07:00
Daniel Han
032bfb01e5 Nightly (#3392)
* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* custom_datatype

* recheck

* Float16

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Bug fix

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* torch_dtype

* Update rl.py

* Fix CE Loss

* Versioning

* Update loader.py

* Update loader.py

* extract_model_type_from_config

* Model types

* Update loader.py

* get_transformers_model_type

* Update loader.py

* Update loader.py

* Update loader.py

* Update rl.py

* Update pyproject.toml

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update vision.py

* Update vision.py

* Fix DataParallel

* Update _utils.py

* Update rl.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update mapper.py

* Versioning

* Update loader.py

* Update loader.py

* Update rl.py

* Versioning

* Update _utils.py

* Fix auto_mapping

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Message

* Update vision.py

* Update loader.py

* Update vision.py

* cache_implementation

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Save max_seq_length

* Update _utils.py

* Update rl.py

* Update vision.py

* Update llama.py

* Mistral3 vllm (#3349)

* [WIP] use vLLM for vision language models

* Update README.md

Editing icon sizes

* Update README.md

Updating icon sizes

* Update README.md (#2885)

* MoE kernels AGPLv3

* versioning

* Many bug fixes (#2908)

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912)

* Dynamically adjust get_per_token_logps function and patch as well (#2911)

* add intel gpu with vllm support (#2903)

* [bugs] fix for casual mask (#2868)

* fix for casual mask

* use un_casual in sdpa

* add missing mask

* fix for type

* Explicitly check if xformers exists for attention (#2889)

* Update __init__.py

* Update llama.py

* if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913)

* Move inputs to right devices. (#2919)

* Move tensors to right devices

* fix multi gpu for non mistral models

* multi GPU RoPE for gemma2

* Finish up multi GPU inference

* Make multiGPU rope a list

* Remove unnecessary transfer to CPU

* Remove unnecessary move to CPU

* Donot move inputs to device yet

will be handled separately in another PR

* Move inputs to appropriate decoder device

* Make device count global variable

* Cleanup RoPE device code

* Fixup num_gpu to device count

* Cleanup device counts

* Use device index for RoPE get_cache

* Donot typecast

* Use tuple instead of list for tensors. Use device index directly

* fixup move to device logic

* WIP VLM vLLM

* Make vLLM patch a function

* Add save and load lora functions

* Make fast_inference setup depend on the flag

* Improve fast inference patching mechanism

* Make vision setting depend on checks in fastbasemodel

* Check LoRA and vLLM intercompatibility for vision models

* Comment pointing to vLLM LoRA check

* Improve lora validation on vLLM

* Error out on no vLLM and increase max lora rank

* Bug fixes (#3017)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* fix for casual mask (#3011)

* [intel] add for intel path for llama.py (#3012)

* fix for intel path

* remove unuse code

* Update unsloth/models/llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* Fix Gemma 2 (#3024)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* falcon force float32 on sm<75 machines (#3026)

* Fix torch compile issues (#3028)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* check stride

* Cleanup

* Update rope_embedding.py

* Update gemma2.py

* Fix `set_stance`

* Update pyproject.toml

* Update _utils.py

* Fixup patch vllm

* Disable mllama

* Use variables to decide VLM support

* Better attn_impl handling

* Patch TF protobuf incompatability

* Torch 2.8 (#3186)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update _auto_install.py

* Update pyproject.toml

* Update rl.py

* Protobuf issue

* Update pyproject.toml

* Fix extras transformers typo in pyproject.toml

* Update _utils.py

* Bug fixes (#3195)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* adallow float32 dtype in FastLanguageModel (#3204)

* Update loader.py

* Update vision.py

* Suppress message and use unsloth sampling params

* Use trl sampling params for now

* Improve error message

* fixup quantized fast inference model name

* Add mistral 3 support

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>

* Set padding to 0

* Fix patch

* fixup patch (#3359)

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update vision.py

* Versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* MXFP4 dequant

* Update loader.py

* Update vision.py

* load_in_16bit

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* offload_embedding

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update loader.py

* Fix padding issue

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>
2025-09-30 05:03:34 -07:00
Michael Han
f529568194 Merge pull request #3384 from Etherll/patch-928
Fix loading as 8bit
2025-09-28 16:55:55 -07:00
Etherll
4e2b12101b Update vision.py 2025-09-28 23:52:08 +03:00
Michael Han
a6dfb2894d Update README.md 2025-09-26 17:31:46 -07:00
Daniel Han
2fd5cc70d8 Versioning 2025-09-26 07:11:48 -07:00
Daniel Han
82f76a8609 Update pyproject.toml 2025-09-26 05:29:02 -07:00
Daniel Han
61da0d3237 GPT OSS RL (#3362)
* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* custom_datatype

* recheck

* Float16

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Bug fix

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* torch_dtype

* Update rl.py

* Fix CE Loss

* Versioning

* Update loader.py

* Update loader.py

* extract_model_type_from_config

* Model types

* Update loader.py

* get_transformers_model_type

* Update loader.py

* Update loader.py

* Update loader.py

* Update rl.py

* Update pyproject.toml

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update vision.py

* Update vision.py

* Fix DataParallel

* Update _utils.py

* Update rl.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update mapper.py

* Versioning

* Update loader.py

* Update loader.py

* Update rl.py

* Versioning

* Update _utils.py

* Fix auto_mapping

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Message

* Update vision.py

* Update loader.py

* Update vision.py

* cache_implementation

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Save max_seq_length

* Update _utils.py

* Update rl.py

* Update vision.py

* Update llama.py

* Mistral3 vllm (#3349)

* [WIP] use vLLM for vision language models

* Update README.md

Editing icon sizes

* Update README.md

Updating icon sizes

* Update README.md (#2885)

* MoE kernels AGPLv3

* versioning

* Many bug fixes (#2908)

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912)

* Dynamically adjust get_per_token_logps function and patch as well (#2911)

* add intel gpu with vllm support (#2903)

* [bugs] fix for casual mask (#2868)

* fix for casual mask

* use un_casual in sdpa

* add missing mask

* fix for type

* Explicitly check if xformers exists for attention (#2889)

* Update __init__.py

* Update llama.py

* if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913)

* Move inputs to right devices. (#2919)

* Move tensors to right devices

* fix multi gpu for non mistral models

* multi GPU RoPE for gemma2

* Finish up multi GPU inference

* Make multiGPU rope a list

* Remove unnecessary transfer to CPU

* Remove unnecessary move to CPU

* Donot move inputs to device yet

will be handled separately in another PR

* Move inputs to appropriate decoder device

* Make device count global variable

* Cleanup RoPE device code

* Fixup num_gpu to device count

* Cleanup device counts

* Use device index for RoPE get_cache

* Donot typecast

* Use tuple instead of list for tensors. Use device index directly

* fixup move to device logic

* WIP VLM vLLM

* Make vLLM patch a function

* Add save and load lora functions

* Make fast_inference setup depend on the flag

* Improve fast inference patching mechanism

* Make vision setting depend on checks in fastbasemodel

* Check LoRA and vLLM intercompatibility for vision models

* Comment pointing to vLLM LoRA check

* Improve lora validation on vLLM

* Error out on no vLLM and increase max lora rank

* Bug fixes (#3017)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* fix for casual mask (#3011)

* [intel] add for intel path for llama.py (#3012)

* fix for intel path

* remove unuse code

* Update unsloth/models/llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* Fix Gemma 2 (#3024)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* falcon force float32 on sm<75 machines (#3026)

* Fix torch compile issues (#3028)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* check stride

* Cleanup

* Update rope_embedding.py

* Update gemma2.py

* Fix `set_stance`

* Update pyproject.toml

* Update _utils.py

* Fixup patch vllm

* Disable mllama

* Use variables to decide VLM support

* Better attn_impl handling

* Patch TF protobuf incompatability

* Torch 2.8 (#3186)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update _auto_install.py

* Update pyproject.toml

* Update rl.py

* Protobuf issue

* Update pyproject.toml

* Fix extras transformers typo in pyproject.toml

* Update _utils.py

* Bug fixes (#3195)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* adallow float32 dtype in FastLanguageModel (#3204)

* Update loader.py

* Update vision.py

* Suppress message and use unsloth sampling params

* Use trl sampling params for now

* Improve error message

* fixup quantized fast inference model name

* Add mistral 3 support

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>

* Set padding to 0

* Fix patch

* fixup patch (#3359)

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update vision.py

* Versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* MXFP4 dequant

* Update loader.py

* Update vision.py

* load_in_16bit

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* offload_embedding

* Update vision.py

* Update vision.py

* Update vision.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>
2025-09-26 04:55:12 -07:00
laz-001
373c3188e1 correct python support statement (#3374) 2025-09-26 04:52:23 -07:00
Michael Han
9c9f85b28a Update README.md
Fresh upate
2025-09-26 02:50:02 -07:00
DoubleMathew
ab6eb686dd specify different tokenizer_path/name (#3343) 2025-09-19 20:01:33 -07:00
DoubleMathew
35ff0f4564 peft_config before model_config (#3342) 2025-09-19 20:01:13 -07:00
Daniel Han
d5f1abfddd Update vision.py 2025-09-19 04:01:52 -07:00
Daniel Han
8817a91984 Update _utils.py 2025-09-19 01:17:36 -07:00
Daniel Han
a4ad3e0d70 Update loader.py 2025-09-19 01:07:34 -07:00
Daniel Han
af60134d7d Update vision.py 2025-09-18 22:33:13 -07:00
Daniel Han
5d6fbda29a Bug fixes (#3335)
* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* custom_datatype

* recheck

* Float16

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Bug fix

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* torch_dtype

* Update rl.py

* Fix CE Loss

* Versioning

* Update loader.py

* Update loader.py

* extract_model_type_from_config

* Model types

* Update loader.py

* get_transformers_model_type

* Update loader.py

* Update loader.py

* Update loader.py

* Update rl.py

* Update pyproject.toml

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update vision.py

* Update vision.py

* Fix DataParallel

* Update _utils.py

* Update rl.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update mapper.py

* Versioning

* Update loader.py

* Update loader.py

* Update rl.py

* Versioning

* Update _utils.py

* Fix auto_mapping

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
2025-09-18 19:00:17 -07:00
Etherll
727c938ac5 Update vision.py (#3339) 2025-09-18 14:27:39 -07:00
DoubleMathew
a26d0de384 Synthetic Data updates (#3333) 2025-09-17 21:43:00 -07:00
DoubleMathew
0faea1fb86 update (#3332) 2025-09-17 21:42:12 -07:00
andrewor14
3ffb3bdcfe Fix QAT + LoRA fast path, add tests (#3307)
**Summary:** The existing QAT + LoRA path only applied fake
quantization to the original slow path, but the default is the
fast path that calls unsloth's fast LoRA primitives. This commit
integrates fake quantization into these fast primitives as well,
and add unit tests to assert that fake quantization is actually
taking place.

**Test Plan:**

Unit tests:
```
pytest tests/utils/test_qat.py
```

End-to-end test: https://gist.github.com/andrewor14/6360dd69b5784c71c46e80c14f53e6b6

Full fine-tuning Llama3.1-8B with and without QAT + LoRA on yahma/alpaca-cleaned for 1 epoch:

- Batch size = 8 (no grad accum)
- Learning rate = 2e-4
- Quantization scheme = int4 weight only (with bf16 activations)

Wikitext perplexity:

- Baseline = int4 quantized model finetuned without QAT
- QAT int4 quantized model (with this PR) achieved 33% lower perplexity than the int4 baseline
- QAT int4 quantized model without this PR was worse than the int4 baseline

```
==> unsloth_model_lora_baseline_output/lm_eval_float.log <==
|        |       |none  |     0|word_perplexity|↓  |7.5551|±  |   N/A|

==> unsloth_model_lora_baseline_output/lm_eval_quantized.log <==
|        |       |none  |     0|word_perplexity|↓  |8.7655|±  |   N/A|

==> unsloth_model_lora_qat_int4_output/lm_eval_quantized.log <==
|        |       |none  |     0|word_perplexity|↓  |8.3548|±  |   N/A|
```
2025-09-17 15:18:17 -07:00
Daniel Han
70f790a8e4 Bug fixes (#3329)
* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* custom_datatype

* recheck

* Float16

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Bug fix

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* torch_dtype

* Update rl.py

* Fix CE Loss

* Versioning

* Update loader.py

* Update loader.py

* extract_model_type_from_config

* Model types

* Update loader.py

* get_transformers_model_type

* Update loader.py

* Update loader.py

* Update loader.py

* Update rl.py

* Update pyproject.toml

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update vision.py

* Update vision.py

* Fix DataParallel

* Update _utils.py

* Update rl.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
2025-09-17 05:06:23 -07:00
Michael Han
1b3fdd5565 Update README.md 2025-09-16 10:07:02 -07:00
Daniel Han
bf0267450b Versioning 2025-09-16 08:49:42 -07:00
Daniel Han
8b1ad0ae82 Update pyproject.toml 2025-09-16 08:17:21 -07:00
Daniel Han
a6fcbbe814 Update _utils.py 2025-09-16 06:23:19 -07:00
Daniel Han
0569186f14 Update rl.py 2025-09-16 06:20:06 -07:00
pluesclues
d4c653dc2e TRL Updated version of VLM GRPO update along with GSPO (#3132)
* Kept, padding logic

* Made sure prediction step in rl.py allows logging for callbacks in RL trainers

* updated llama.py to new online_dpo changes

* Update rl.py to make logic simpiler

* Update rl.py, made sure tokenized_output on eval step was on same device

* Update rl.py, corrected tokenized_outputs to inputs

* Update rl.py, removed sagemaker stuff

* Update llama.py, figures out if there is right padding automatically

* Update llama.py, changed conditional statement for right padding slightlyt

* Update llama.py, updated OS.environ variable to temp variable

* Update rl.py, made it account for right padding in online dpo and reward modeling

* Update llama.py, automatically figures out if right padding is needed

* Update rl_replacements.py, fixed up passing image data to functions

* Update rl_replacements.py, for VLM GRPO support with TRL

* Update rl_replacements.py, gspo added

* Update rl.py, forgot about Online_DPO changes in this branch

* Update rl.py, forgot to not include Online DPO PR changes

* Update llama.py, forgot to disinclude Online DPO PR changes

* Update rl_replacements.py, updated generate and score completions to be up to date for trl

* Update rl_replacements.py

* Update rl_replacements.py, fixed nan issues with vlms

* Update rl_replacements.py, added indent

* Update rl_replacements.py, added attention mask to calculations of old and ref hidden states

* Update unsloth/models/rl_replacements.py

* Update unsloth/models/rl_replacements.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-09-16 05:43:07 -07:00
Datta Nimmaturi
51879e513a Fast Inference with vLLM for VLMs (#2975)
* [WIP] use vLLM for vision language models

* Update README.md

Editing icon sizes

* Update README.md

Updating icon sizes

* Update README.md (#2885)

* MoE kernels AGPLv3

* versioning

* Many bug fixes (#2908)

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912)

* Dynamically adjust get_per_token_logps function and patch as well (#2911)

* add intel gpu with vllm support (#2903)

* [bugs] fix for casual mask (#2868)

* fix for casual mask

* use un_casual in sdpa

* add missing mask

* fix for type

* Explicitly check if xformers exists for attention (#2889)

* Update __init__.py

* Update llama.py

* if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913)

* Move inputs to right devices. (#2919)

* Move tensors to right devices

* fix multi gpu for non mistral models

* multi GPU RoPE for gemma2

* Finish up multi GPU inference

* Make multiGPU rope a list

* Remove unnecessary transfer to CPU

* Remove unnecessary move to CPU

* Donot move inputs to device yet

will be handled separately in another PR

* Move inputs to appropriate decoder device

* Make device count global variable

* Cleanup RoPE device code

* Fixup num_gpu to device count

* Cleanup device counts

* Use device index for RoPE get_cache

* Donot typecast

* Use tuple instead of list for tensors. Use device index directly

* fixup move to device logic

* WIP VLM vLLM

* Make vLLM patch a function

* Add save and load lora functions

* Make fast_inference setup depend on the flag

* Improve fast inference patching mechanism

* Make vision setting depend on checks in fastbasemodel

* Check LoRA and vLLM intercompatibility for vision models

* Comment pointing to vLLM LoRA check

* Improve lora validation on vLLM

* Error out on no vLLM and increase max lora rank

* Bug fixes (#3017)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* fix for casual mask (#3011)

* [intel] add for intel path for llama.py (#3012)

* fix for intel path

* remove unuse code

* Update unsloth/models/llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* Fix Gemma 2 (#3024)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* falcon force float32 on sm<75 machines (#3026)

* Fix torch compile issues (#3028)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* check stride

* Cleanup

* Update rope_embedding.py

* Update gemma2.py

* Fix `set_stance`

* Update pyproject.toml

* Update _utils.py

* Fixup patch vllm

* Disable mllama

* Use variables to decide VLM support

* Better attn_impl handling

* Patch TF protobuf incompatability

* Torch 2.8 (#3186)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update _auto_install.py

* Update pyproject.toml

* Update rl.py

* Protobuf issue

* Update pyproject.toml

* Fix extras transformers typo in pyproject.toml

* Update _utils.py

* Bug fixes (#3195)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* adallow float32 dtype in FastLanguageModel (#3204)

* Update loader.py

* Update vision.py

* Suppress message and use unsloth sampling params

* Use trl sampling params for now

* Improve error message

* fixup quantized fast inference model name

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>
2025-09-16 05:29:08 -07:00
lightsource
67d918b00b Add support for modules_to_save in FastModel.get_peft_model (#3317)
* patch modules_to_save

* Update vision.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-09-15 22:41:34 -07:00
Daniel Han
977689b2a2 Bug fixes (#3322)
* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* custom_datatype

* recheck

* Float16

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Bug fix

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* torch_dtype

* Update rl.py

* Fix CE Loss

* Versioning

* Update loader.py

* Update loader.py

* extract_model_type_from_config

* Model types

* Update loader.py

* get_transformers_model_type

* Update loader.py

* Update loader.py

* Update loader.py

* Update rl.py

* Update pyproject.toml

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
2025-09-15 01:55:24 -07:00
Daniel Han
6f5b6e90fd Update README.md 2025-09-15 01:46:07 -07:00
Daniel Han
846a5dcbc4 Update README.md 2025-09-15 01:43:11 -07:00
Daniel Han
29ed805a13 Update README.md 2025-09-15 01:42:59 -07:00
Daniel Han
db4f3cde14 Update README.md 2025-09-15 01:40:28 -07:00
Daniel Han
2f8baabd7a Update README.md 2025-09-15 01:40:06 -07:00
Daniel Han
92f972bb7c Update README.md 2025-09-15 01:39:39 -07:00
Daniel Han
46e7370878 Blackwell support 2025-09-15 01:39:03 -07:00
Michael Han
bf92d129b4 Update README.md 2025-09-13 21:45:22 -07:00
Michael Han
8a1ff4a3f0 Update README.md
Adding new install instructions
2025-09-13 21:30:52 -07:00
Daniel Han
2003da34f8 importlib_version 2025-09-12 03:13:15 -07:00
billishyahao
71ae760aa0 [ROCm] add hip device path (#3301) 2025-09-12 02:57:19 -07:00
Daniel Han
6087e203f9 Bug fix 2025-09-10 05:14:15 -07:00
Daniel Han
9e837e2dc3 Update rl.py 2025-09-10 01:39:35 -07:00
Daniel Han
39e532de39 Bug fixes (#3295)
* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* custom_datatype

* recheck

* Float16

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Bug fix

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* torch_dtype

* Update rl.py

* Fix CE Loss

* Versioning

* Update loader.py

* Update loader.py

* extract_model_type_from_config

* Model types

* Update loader.py

* get_transformers_model_type

* Update loader.py

* Update loader.py

* Update loader.py

* Update rl.py

* Update pyproject.toml

* Update loader.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
2025-09-10 01:28:13 -07:00
Daniel Han
fe2c8eb76e Update loader.py 2025-09-09 02:06:07 -07:00
Daniel Han
66b45b5aaa Merge branch 'main' of https://github.com/unslothai/unsloth 2025-09-09 01:22:27 -07:00
DoubleMathew
034db11215 simplify uns inference (#3291) 2025-09-08 21:07:02 -07:00
Daniel Han
4ae5db3287 Merge branch 'main' of https://github.com/unslothai/unsloth 2025-09-08 17:15:57 -07:00
Daniel Han
14791dc6c2 Update __init__.py 2025-09-08 17:15:55 -07:00
andrewor14
fa93e36312 Add support for QAT full fine-tuning (#3238)
**Summary:** Following https://github.com/unslothai/unsloth/pull/2976,
which adds support for QAT + LoRA, this PR adds support for QAT
during full fine-tuning. See the [torchao QAT README](https://github.com/pytorch/ao/blob/main/torchao/quantization/qat/README.md)
for more details.

Current QAT schemes supported are:
```
fp8-int4, targeting the torch.ops.fbgemm.f8i4bf16_shuffled kernel
fp8-fp8, targeting the torch.ops.fbgemm.f8f8bf16_rowwise kernel
```

**Test Plan:** https://gist.github.com/andrewor14/048b5c1bd01b7fa23c53913856a8ef9f

Full fine-tuning Llama3.1-8B with and without QAT on `yahma/alpaca-cleaned` for 1 epoch:
- Batch size = 16 (no grad accum)
- Learning rate = 4e-5
- Quantization scheme = fp8-int4

Wikitext perplexity:
- QAT improved perplexity by 19.2% compared to regular fine-tuning
- QAT's int4 quantized model even outperformed the bf16 baseline
- Regular int4 quantized model (without QAT) was significantly worse than the bf16 baseline

```
==> unsloth_model_full_baseline_output/eval_float.log <==
|        |       |none  |     0|word_perplexity|↓  |9.8446|±  |   N/A|

==> unsloth_model_full_baseline_output/eval_quantized.log <==
|        |       |none  |     0|word_perplexity|↓  |11.4595|±  |   N/A|

==> unsloth_model_full_qat_fp8-int4_output/eval_quantized.log <==
|        |       |none  |     0|word_perplexity|↓  |9.2336|±  |   N/A|
```

Fibonacci test:
- Both bf16 baseline and int4 quantized models correctly identified 13 as the next number
- QAT quantized model was more succinct in its response
- No substantial differences here

```
### Instruction:
Continue the fibonnaci sequence.

### Input:
1, 1, 2, 3, 5, 8

==> unsloth_model_full_baseline_output/eval_float.log <==
### Response:
The next number in the Fibonacci sequence is 13.<|end_of_text|>

==> unsloth_model_full_baseline_output/eval_quantized.log <==
### Response:
The next number in the Fibonacci sequence is 13.<|end_of_text|>

==> unsloth_model_full_qat_fp8-int4_output/eval_quantized.log <==
### Response:
13<|end_of_text|>
```
2025-09-08 15:07:50 -07:00
DoubleMathew
c9c068fa0b GptAttention turn training off during inference (#3289) 2025-09-08 13:47:32 -07:00
Daniel Han
6e237fac7f Versioning 2025-09-08 06:06:04 -07:00
Daniel Han
8a0de46a71 Update __init__.py 2025-09-08 04:57:04 -07:00
Daniel Han
63b2e8fc35 Update __init__.py 2025-09-08 02:02:11 -07:00
Roland Tannous
2011859430 Add TorchAO quantization tests with FP16 models and serialization workarounds (#3269)
* Add TorchAO quantization tests with FP16 models and serialization workarounds

* remove unrelated files

* cleaned submission
2025-09-04 17:22:07 -07:00
DoubleMathew
b969975ba5 llama vision inference fix (#3270)
* llama vision inference fix

* fix via can_compile_fullgraph instead
2025-09-04 16:06:49 -07:00
Datta Nimmaturi
2c2662b51c Filter executor not sleeping log (#3268) 2025-09-04 22:05:42 +05:30
Daniel Han
5c1b0ae9dd Bug fixes (#3266)
* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* custom_datatype

* recheck

* Float16

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Bug fix

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* torch_dtype

* Update rl.py

* Fix CE Loss

* Versioning

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
2025-09-04 05:17:59 -07:00
DoubleMathew
490027f988 disable _is_vlm (#3265) 2025-09-04 04:52:35 -07:00
Daniel Han
721eee6a80 Update rl.py 2025-09-04 03:25:40 -07:00
Roland Tannous
0135d126df fixed save_pretrained_torchao and associated tests (#3264) 2025-09-03 20:24:12 -07:00
Daniel Han
f42f0d2116 Update import_fixes.py 2025-09-03 20:17:36 -07:00
Daniel Han
4a52d0f78e Update import_fixes.py 2025-09-03 20:12:42 -07:00
Daniel Han
f1f0036a92 Move logging 2025-09-03 20:07:27 -07:00
Daniel Han
7094b4843a Update _utils.py 2025-09-03 20:00:21 -07:00
Daniel Han
fa3575920c Update pyproject.toml 2025-09-03 19:55:10 -07:00
Daniel Han
33ed154e81 Update llama.py 2025-09-03 19:11:53 -07:00
Jerry Zhang
969c6a0bd8 Support saving locally in model.save_pretrained_torchao (#3263)
Summary:
Previously the test was not ran correctly and the save to local path is not tested
this PR added support for that and tries to test properly

Note: `python tests/saving/test_unsloth_save.py` doesn't run test

Test Plan:
pytest tests/saving/test_unsloth_save.py -k test_save_torchao

Reviewers:

Subscribers:

Tasks:

Tags:
2025-09-03 17:51:33 -07:00
Daniel Han
15f3ce1372 Update save.py 2025-09-03 15:19:02 -07:00
Lei Zhenyuan
781c890c65 [Intel] make intel device support ROPE (#3164)
* make intel device pass

* abstract torch device stream
2025-09-03 04:39:57 -07:00
stevenxdavis
56dd244340 Fix incorrect function call in test_qwen3_grpo.py (#3212)
* Update test_qwen3_grpo.py to correct function call

This test file uses the incorrect name for the function, which is gradient_checkpointing_disable(), not disable_gradient_checkpointing(). 

I copied the line from test_llama32_sft.py - I'm not sure if this actually is required, just wanted it consistent for when other people like me test this and have no clue what they're doing when it throws an exception.

* Update blackwell/test_qwen3_grpo.py

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-09-03 04:39:12 -07:00
Defi Wimar
8920d2eed2 chore: Fix Typos (#3246) 2025-09-03 04:38:41 -07:00
Tim Paine
3f6ac1ce25 Remove old version constraint in dependency list (#3237)
xref: https://github.com/unslothai/unsloth-zoo/pull/258
2025-09-01 02:29:58 -07:00
pluesclues
2b88f93bce Update mistral.py, showed flag to not call cut cross entropy (#3233)
* Update mistral.py, showed flag to not call cut cross entropy

* Update mistral.py, made it so if its not equal to zero

* Update unsloth/models/mistral.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-08-29 01:32:21 -07:00
Roland Tannous
711ec4a3ac tests for mxfp4 and quantized models merge fix unsloth zoo pr 254 (#3223) 2025-08-29 01:30:48 -07:00
Daniel Han
25b21f4899 GPT OSS Bug fixes (#3231)
* Update rl.py

* Update rl.py

* Update rl.py

* GPT OSS float32

* Update vision.py

* Update loader.py

* Update loader.py
2025-08-28 09:39:46 -07:00
Daniel Han
4058d7861a Merge branch 'main' of https://github.com/unslothai/unsloth 2025-08-28 03:19:16 -07:00
Daniel Han
01500fdcbb Versioning 2025-08-28 03:19:14 -07:00
Michael Han
a10e9d6d49 Merge pull request #3224 from DefiWimar7/typos
chore: Fix Typos

Thank you @DefiWimar7
2025-08-28 02:46:27 -07:00
DoubleMathew
1c08e89cc7 Handle transformers move to dtype from torch_dtype (#3225) 2025-08-28 02:43:41 -07:00
DefiWimar7
8c39cb45e4 chore: Fix Typos 2025-08-28 10:44:28 +08:00
DoubleMathew
ceff1b43b3 Fix gemma-3n (#3219)
* place gemma-3n handling inside gemma-3 conditional

* cleanup
2025-08-26 19:46:27 -07:00
Jerry Zhang
f3ab8c21af Support model.save_pretrained_torchao (#3111)
Summary:
Allow users merge the LoRA weights and then do a post training quantization with torchao

Usage:

```
from torchao.quantization import Int8DynamicActivationInt8WeightConfig
torchao_config = Int8DynamicActivationInt8WeightConfig()
model.save_pretrained_torchao(
    save_path,
    tokenizer=tokenizer,
    torchao_config=torchao_config,
)
```

Test Plan:
python tests/saving/test_unsloth_save.py

Reviewers:

Subscribers:

Tasks:

Tags:
2025-08-26 04:53:39 -07:00
Lei Zhenyuan
ac78311261 fix is casual for qwen3 (#3213) 2025-08-26 04:45:20 -07:00
Daniel Han
f35077388d Update vision.py 2025-08-22 04:02:59 -07:00
Daniel Han
a33ff972c1 Update loader.py 2025-08-22 04:02:19 -07:00
DoubleMathew
651970094d adallow float32 dtype in FastLanguageModel (#3204) 2025-08-21 16:54:39 -07:00
Daniel Han
2525052e4f Bug fixes (#3195)
* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
2025-08-20 07:39:43 -07:00
Daniel Han
dfb936743d Update _utils.py 2025-08-19 23:43:58 -07:00
Michael Han
6a9f1ada59 Merge pull request #3187 from parth2510/fix-transformers-typo-extras
Fix extras transformers typo in pyproject.toml
2025-08-19 15:06:11 -07:00
parth2510
3e9ef8024c Fix extras transformers typo in pyproject.toml 2025-08-19 19:55:29 +05:30
Daniel Han
308f1b422b Update pyproject.toml 2025-08-19 05:20:19 -07:00
Daniel Han
6fc745c731 Protobuf issue 2025-08-19 05:19:41 -07:00
Daniel Han
17a1e13b8b Update rl.py 2025-08-19 05:04:17 -07:00
Daniel Han
7bf39fcef2 Update pyproject.toml 2025-08-19 03:24:02 -07:00
Daniel Han
5a8c81c4f9 Update _auto_install.py 2025-08-19 03:20:37 -07:00
Daniel Han
089a0056e2 Torch 2.8 (#3186)
* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
2025-08-19 03:16:49 -07:00
Daniel Han
10f68527d8 Bug fixes (#3180)
* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
2025-08-18 06:12:49 -07:00
andrewor14
bb7b2f40fc Add support for QAT + LoRA (#2976)
**Summary:** Quantization-aware training (QAT) helps mitigate
quantization degradation by simulating quantization numerics
in high precision during training (fake quantization). This PR
combines QAT with LoRA by applying torchao's QAT support to
the peft model.

See the following for more details:

- torchao QAT: https://github.com/pytorch/ao/blob/main/torchao/quantization/qat/README.md
- torchtune QAT + LoRA: https://dev-discuss.pytorch.org/t/speeding-up-qat-by-1-89x-with-lora/2700

Current QAT schemes supported are:

```
fp8-fp8, targeting the torch.ops.fbgemm.f8i4bf16_shuffled kernel
fp8-int4, targeting the torch.ops.fbgemm.f8f8bf16_rowwise kernel
```

**Test Plan:**

```
from unsloth import FastLanguageModel

lora_rank = 32

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen3-4B-Base",
    max_seq_length = 2048,
    load_in_4bit = False,
    fast_inference = False,
    max_lora_rank = lora_rank,
)

model = FastLanguageModel.get_peft_model(
    model,
    r = lora_rank,
    target_modules = [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    lora_alpha = lora_rank*2,
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    qat_scheme = "fp8-fp8",
)

lora.Linear(
  (base_layer): FakeQuantizedLinear(
    in_features=2560, out_features=4096, bias=False
    (activation_fake_quantizer): FakeQuantizer(Float8FakeQuantizeConfig(dtype=torch.float8_e4m3fn, granularity=PerRow(), hp_value_lb=None, hp_value_ub=None))
    (weight_fake_quantizer): FakeQuantizer(Float8FakeQuantizeConfig(dtype=torch.float8_e4m3fn, granularity=PerRow(), hp_value_lb=None, hp_value_ub=None))
  )
  ...
)
```
2025-08-18 05:56:35 -07:00
Ball
94c7392c40 fix original_push_to_hub fallback (#3115)
Co-authored-by: root <root@LAPTOP-VEI2ITL9.localdomain>
2025-08-18 05:54:24 -07:00
RJ Nowling
36c563fffd Replace back ticks with single quotes (#3157)
Back ticks attempt to execute program and capture output.  Should be using single quote marks.
2025-08-18 05:52:30 -07:00
Daniel Han
176805802f Update _utils.py 2025-08-18 05:28:30 -07:00
Roland Tannous
208f68f164 fix save_to_gguf_generic quantization_method type error (#3173) 2025-08-18 04:10:17 -07:00
Roland Tannous
bacfc57380 Convert generator expression to list to prevent potential bugs if the files variable is used multiple times in future modifications. (#3167) 2025-08-18 04:10:08 -07:00
Daniel Han
19b2fa3ac8 Fix Blackwell 2025-08-18 03:46:39 -07:00
QL
ea4b7c2c6b Update install instructions for latest vLLM release (#3175)
1. Removed the `--extra-index-url https://wheels.vllm.ai/nightly` from the uv install instructions because this causes it to crash; Removing that flag solves the issue and is more stable overall. Tested with RTX 5090 CUDA 12.8 on Linux. 

2. Removed `uv pip install -U triton>=3.3.1` because triton 3.3.1 is already installed with the vllm command.
2025-08-18 03:39:05 -07:00
Daniel Han
7e1581b929 Nightly (#3169)
* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
2025-08-15 05:03:38 -07:00
Daniel Han
b45a977c8d Update loader.py 2025-08-15 04:08:21 -07:00
Daniel Han
64dff1456b GPT OSS MXFP4 fix 2025-08-15 04:00:00 -07:00
Daniel Han
bbac3e3de7 Update tokenizer_utils.py 2025-08-14 19:30:33 -07:00
Daniel Han
b11d9cc935 Encoding UTF-8 2025-08-14 15:24:25 -07:00
Daniel Han
abbf1f0a43 Fix GPT OSS (#3154)
* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
2025-08-14 05:06:58 -07:00
Daniel Han
6b5fb59eb7 Update loader.py 2025-08-13 06:33:39 -07:00
Daniel Han
806bee2433 Nightly (#3148)
* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
2025-08-13 06:12:38 -07:00
Michael Han
413ec45d8b Update README.md 2025-08-09 15:53:29 -07:00
Michael Han
230061cd48 Merge pull request #3120 from Etherll/patch-8825
Add Qwen3 4B to mapper.py
2025-08-08 14:31:31 -07:00
Etherll
b0004c0001 Update mapper.py 2025-08-08 23:27:21 +03:00
Michael Han
2a8ca1ef5a Update README.md 2025-08-08 12:14:38 -07:00
Daniel Han
9975cf889a Update pyproject.toml 2025-08-08 11:49:56 -07:00
Daniel Han
424e648005 Update _utils.py 2025-08-08 11:48:13 -07:00
Daniel Han
3fc4e4dff9 Update loader.py 2025-08-08 11:44:07 -07:00
Daniel Han
628bb6c97f Update loader.py 2025-08-08 11:38:03 -07:00
Daniel Han
20e7c33550 Merge branch 'main' into nightly 2025-08-08 10:55:52 -07:00
Daniel Han
84ff585855 Update loader.py 2025-08-08 10:55:30 -07:00
Daniel Han
b277a09401 Update _utils.py 2025-08-08 09:59:24 -07:00
Daniel Han
5931011d6c Update vision.py 2025-08-08 09:35:54 -07:00
Daniel Han
34a0f2901f Update vision.py 2025-08-08 09:31:38 -07:00
Michael Han
22326b7b15 Merge pull request #3110 from Etherll/qwen3-chat-template
Add Qwen3 Instruct / Thinking chat templates
2025-08-08 09:27:12 -07:00
Daniel Han
6220a4c700 Update _utils.py 2025-08-08 09:16:37 -07:00
Daniel Han
52c6458fdd Update vision.py 2025-08-08 09:15:52 -07:00
Daniel Han
8372b318c5 Update chat_templates.py 2025-08-08 08:29:52 -07:00
Daniel Han
34776a6b84 Update vision.py 2025-08-08 03:35:56 -07:00
Daniel Han
f83f3f0efe Update vision.py 2025-08-07 16:21:25 -07:00
Daniel Han
11b5136d99 Update vision.py 2025-08-07 16:19:31 -07:00
Daniel Han
82c4bd6005 Merge branch 'main' into nightly 2025-08-07 16:19:20 -07:00
Etherll
ab03d44d9b Update chat_templates.py
add qwen3-instruct/thinking-chat-template
2025-08-08 01:01:58 +03:00
Daniel Han
180917f148 Update loader.py 2025-08-07 09:38:49 -07:00
Daniel Han
aa96d65534 Update loader.py 2025-08-07 09:38:24 -07:00
Daniel Han
9889015485 Update mapper.py 2025-08-07 08:15:59 -07:00
Daniel Han
1a8eabebab Update vision.py 2025-08-07 07:13:40 -07:00
Daniel Han
accb7b1ead Update vision.py 2025-08-07 07:10:36 -07:00
Daniel Han
5a31edef5f Update loader.py 2025-08-07 05:18:53 -07:00
Daniel Han
4d89527df6 Update vision.py 2025-08-07 05:03:21 -07:00
Daniel Han
d605b629ec Update mapper.py 2025-08-07 04:59:19 -07:00
Daniel Han
a1746fc03e Update pyproject.toml 2025-08-07 03:47:08 -07:00
Daniel Han
b7dcf7e5ed Update pyproject.toml 2025-08-07 03:43:25 -07:00
Daniel Han
0171432daf Update pyproject.toml 2025-08-07 03:29:44 -07:00
Daniel Han
382042f3b0 GPT OSS fixes 2025-08-07 02:57:19 -07:00
DoubleMathew
92c024df79 gpt-oss manually call temporary patch (#3104)
Co-authored-by: Mathew Mathew <mathew@Mathews-MacBook-Pro.local>
2025-08-06 12:13:35 -07:00
Daniel Han
3cfdd5f59c Merge branch 'main' into nightly 2025-08-06 06:56:44 -07:00
Daniel Han
806f926750 Nightly (#3102)
* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* check stride

* Cleanup

* Update rope_embedding.py

* Update gemma2.py

* Fix `set_stance`

* Update _utils.py

* Update vision.py
2025-08-06 06:40:59 -07:00
Daniel Han
45b6f6e860 Update vision.py 2025-08-06 06:37:03 -07:00
Daniel Han
57d858b27a Update _utils.py 2025-08-06 06:34:07 -07:00
Daniel Han
31fb573e1e Merge branch 'main' into nightly 2025-08-06 06:24:27 -07:00
Daniel Han
5fa56a5cda GPT-OSS 2025-08-06 06:21:28 -07:00
DoubleMathew
9eab2cbc45 GPT-OSS support (#3099) 2025-08-05 20:58:33 -07:00
Daniel Han
d3172a852b Update __init__.py 2025-08-02 03:39:17 -07:00
Daniel Han
5202538bf5 Update 2025-08-02 03:36:05 -07:00
Daniel Han
c14b20010a Merge branch 'main' into nightly 2025-08-02 03:35:03 -07:00
dongbin-lunark
eff74dc966 docs: Add WSL installation guide for xformers (#3079) 2025-08-02 03:32:47 -07:00
DoubleMathew
8a69a68ece get_per_token_logps_and_entropies: return tuple instead of dict (#3080) 2025-08-02 03:31:41 -07:00
Datta Nimmaturi
d75d1dce6d fixup rope sync for everything (#3061) 2025-08-02 03:30:34 -07:00
Daniel Han
8e040e5870 Merge branch 'main' into nightly 2025-07-29 03:07:00 -07:00
Daniel Han
7b8505b4b7 Update _utils.py 2025-07-29 02:19:43 -07:00
Daniel Han
d0ce49732a Update rl_replacements.py 2025-07-29 01:59:08 -07:00
Daniel Han
121785dd23 Update rl.py 2025-07-29 01:32:39 -07:00
Daniel Han
b9bcce89e9 Update pyproject.toml 2025-07-29 01:03:19 -07:00
Daniel Han
461ff94bcc Merge branch 'main' of https://github.com/unslothai/unsloth 2025-07-29 00:43:52 -07:00
Daniel Han
c2540a1897 Merge branch 'pr/3055' 2025-07-29 00:43:19 -07:00
Etherll
b14841facf Add gemma-3n chat template to chat_templates.py (#3051)
* Update chat_templates.py

* Update chat_templates.py
2025-07-29 00:39:38 -07:00
Daniel Han
4968d3d059 Merge branch 'pr/3052' 2025-07-29 00:38:50 -07:00
Daniel Han
284f7ed538 Fix TRL 0.20.0 2025-07-29 00:36:25 -07:00
Daniel Han
96b1b5ac2b Merge branch 'main' of https://github.com/unslothai/unsloth 2025-07-29 00:35:41 -07:00
Sekinal
5899d5441b Fixed wrong syntax in f-string for exception 2025-07-28 23:16:05 -06:00
Sekinal
55fc4a53f7 Fix: Added specific check for Gemma so models like BERT properly initialize 2025-07-28 22:47:58 -06:00
Etherll
4c3139c069 Update loader.py 2025-07-28 22:15:23 +03:00
Etherll
c2b06fc279 Update _utils.py 2025-07-28 21:48:36 +03:00
Etherll
49d75674f4 Update vision.py 2025-07-28 20:55:07 +03:00
Etherll
b98948db85 Update loader.py 2025-07-28 20:53:38 +03:00
Daniel Han
e812382eaf Update _utils.py 2025-07-28 08:28:18 -07:00
Datta Nimmaturi
9deeaeebeb Fixup multi GPU workload. (#3049)
* sync all instead

* sync after move and rope init instead

* sync after rope inside

* Return new tensors and no sync

* Sync only current stream

* Fixup mask for xformers

* sync for prefill only

* clean up
2025-07-28 03:04:49 -07:00
Daniel Han
aec983ea3f Merge branch 'main' into nightly 2025-07-24 23:40:23 -07:00
Edd
8e994f07fb Fix Llama and Gemma inference (#3034)
* Fix Llama and Gemma inference

* Add simple quality life for CUDA link error (which is not captured since we bypass all error)
2025-07-24 23:38:20 -07:00
Daniel Han
ffbd2deaea Merge branch 'main' into nightly 2025-07-23 06:08:35 -07:00
Daniel Han
aa391aef66 Update _utils.py 2025-07-23 06:08:31 -07:00
Daniel Han
cc8fe6908b Merge branch 'main' into nightly 2025-07-23 06:04:39 -07:00
Daniel Han
a0836ffdaf Update pyproject.toml 2025-07-23 06:04:35 -07:00
Daniel Han
08f577854f Merge branch 'main' into nightly 2025-07-23 05:53:09 -07:00
Daniel Han
d27e4e44d1 Fix torch compile issues (#3028)
* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* check stride

* Cleanup

* Update rope_embedding.py

* Update gemma2.py

* Fix `set_stance`
2025-07-23 05:52:28 -07:00
Daniel Han
fec1b2d5f6 Fix set_stance 2025-07-23 05:19:08 -07:00
Daniel Han
56cc02b230 Update gemma2.py 2025-07-23 03:27:52 -07:00
Daniel Han
a5f26f4f76 Update rope_embedding.py 2025-07-23 03:26:16 -07:00
Daniel Han
11d8e5fe53 Cleanup 2025-07-23 03:23:23 -07:00
Daniel Han
bf8049c1c9 check stride 2025-07-23 02:52:29 -07:00
Daniel Han
8fd8a051a9 Merge branch 'main' into nightly 2025-07-23 02:51:00 -07:00
DoubleMathew
f2ef5bd16b falcon force float32 on sm<75 machines (#3026) 2025-07-22 13:18:42 -07:00
Daniel Han
282ae72862 Fix Gemma 2 (#3024)
* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py
2025-07-22 04:43:43 -07:00
Daniel Han
e402cc2a74 Update _utils.py 2025-07-22 04:42:51 -07:00
Daniel Han
d23a96b060 Update _utils.py 2025-07-22 04:42:06 -07:00
Daniel Han
9884e991c0 Update _utils.py 2025-07-22 04:39:04 -07:00
Daniel Han
5906e612a0 Merge branch 'main' into nightly 2025-07-22 03:34:55 -07:00
Daniel Han
35dada2a99 Update llama.py 2025-07-22 03:34:51 -07:00
Lei Zhenyuan
7123849cf8 [intel] add for intel path for llama.py (#3012)
* fix for intel path

* remove unuse code

* Update unsloth/models/llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-07-22 03:33:26 -07:00
Lei Zhenyuan
f3e41d0b1e fix for casual mask (#3011) 2025-07-22 03:27:57 -07:00
Daniel Han
673476393b Merge branch 'main' into nightly 2025-07-21 05:35:09 -07:00
Daniel Han
80e7af5b9f Bug fixes (#3017)
* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning
2025-07-21 05:30:14 -07:00
Daniel Han
7e07941383 versioning 2025-07-21 05:15:48 -07:00
Daniel Han
06f1b961f6 Fix quantization_method 2025-07-21 05:14:17 -07:00
Daniel Han
bef2b47599 Update llama.py 2025-07-20 03:19:10 -07:00
Daniel Han
48296987a3 Update llama.py 2025-07-20 03:17:02 -07:00
Daniel Han
9c6b199716 Update synthetic.py 2025-07-20 00:10:41 -07:00
Daniel Han
27503af0dc Update synthetic.py 2025-07-19 03:36:53 -07:00
Daniel Han
75f615891a Update synthetic.py 2025-07-19 03:18:18 -07:00
Daniel Han
6a65ee478c Update synthetic.py 2025-07-19 03:11:33 -07:00
Daniel Han
36ba3c7c69 Update synthetic.py 2025-07-19 02:54:28 -07:00
Daniel Han
bb0abf54df Merge branch 'main' into nightly 2025-07-19 01:29:41 -07:00
Quentin Gallouédec
580b5bca11 Update README.md (#2991)
* Update README.md

* Update README.md
2025-07-18 15:43:59 -07:00
Daniel Han
f32ee75b45 Merge branch 'main' into nightly 2025-07-18 05:37:10 -07:00
Daniel Han
67b16ae5c0 Bug fixes (#2998)
* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix
2025-07-18 05:36:15 -07:00
Daniel Han
83892cd097 skip_guard_eval_unsafe fix 2025-07-18 05:31:21 -07:00
Daniel Han
ce6a73986d Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)
This reverts commit 4021da634a.
2025-07-17 15:37:23 -07:00
Daniel Han
5ee84edade Merge branch 'main' into nightly 2025-07-17 15:36:24 -07:00
Daniel Han
4021da634a Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model merge erro…" (#2988)
This reverts commit 4565698ca5.
2025-07-17 15:35:28 -07:00
Roland Tannous
4565698ca5 Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model merge error (#2986) 2025-07-17 15:35:05 -07:00
DoubleMathew
6cdc11816d use fastmodel (#2987) 2025-07-17 15:34:14 -07:00
Quentin Gallouédec
2960bf8d94 Update unsloth-cli.py (#2985) 2025-07-17 15:08:38 -07:00
Daniel Han
3824a6ad78 Bug fixes (#2982)
* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py
2025-07-17 07:02:38 -07:00
Daniel Han
06da87c79a Update rl_replacements.py 2025-07-17 06:54:25 -07:00
Daniel Han
9d8f2e4c83 Update rl_replacements.py 2025-07-17 06:51:59 -07:00
Daniel Han
8d983b01ff Update rl_replacements.py 2025-07-17 06:49:21 -07:00
Daniel Han
640032f115 Update rl_replacements.py 2025-07-17 06:47:23 -07:00
Daniel Han
2ae069ab2d Update rl_replacements.py 2025-07-17 06:44:28 -07:00
Daniel Han
5ad0f54c2d Update rl_replacements.py 2025-07-17 06:44:13 -07:00
Daniel Han
23dbf731a9 Update rl_replacements.py 2025-07-17 06:39:59 -07:00
Daniel Han
6c1a57ee4d Update rl.py 2025-07-17 06:30:26 -07:00
Daniel Han
f6b6ab7a1b Update rl_replacements.py 2025-07-17 06:26:10 -07:00
Daniel Han
0a67f44bb6 Update rl_replacements.py 2025-07-17 06:25:21 -07:00
Daniel Han
e66792eb05 Update rl_replacements.py 2025-07-17 06:24:33 -07:00
Daniel Han
771d5ff25f Update rl_replacements.py 2025-07-17 06:23:37 -07:00
Daniel Han
72e2debbd6 Update pyproject.toml 2025-07-17 05:38:04 -07:00
Daniel Han
f3606ea3d9 Update pyproject.toml 2025-07-17 05:26:58 -07:00
Daniel Han
1185706555 Merge branch 'main' into nightly 2025-07-17 05:14:52 -07:00
Daniel Han
2c468550e6 Revert "GRPO Fix - Support vllm pre-dequantized quantization states in fast_dequantize kernel (#2943)"
This reverts commit 1cefffa2d2.
2025-07-17 05:02:08 -07:00
Daniel Han
03c880f5da Update _utils.py 2025-07-17 02:08:49 -07:00
Daniel Han
c88758124a compiler stance 2025-07-17 01:40:49 -07:00
Daniel Han
aa8e172396 Update vision.py 2025-07-17 01:19:07 -07:00
Daniel Han
b112838f58 Update vision.py 2025-07-17 00:33:27 -07:00
Daniel Han
98cebf3d06 Update _utils.py 2025-07-14 02:45:19 -07:00
Daniel Han
e718c27474 Merge branch 'main' into nightly 2025-07-14 02:44:42 -07:00
Roland Tannous
1cefffa2d2 GRPO Fix - Support vllm pre-dequantized quantization states in fast_dequantize kernel (#2943)
* Support pre-dequantized quantization states in fast_dequantize kernel

* has_nested_quant conditional set to  only

* Update utils.py

* Update utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-07-14 02:41:15 -07:00
Roland Tannous
32738da0d1 fix dataloader_num_workers value error in GRPOTrainer (#2944) 2025-07-14 01:43:33 -07:00
Muzammil Khan
1eaa52ae55 fix: change lora_dropout from int to float for type consistency (#2949)
Fixes "Argument of type 'float' cannot be assigned to parameter 'lora_dropout' of type 'int'" error by ensuring lora_dropout is consistently a float (0.0) rather than int (0) across vision.py, llama.py, and unsloth-cli.py
2025-07-14 01:42:07 -07:00
Datta Nimmaturi
665a8e4b1d Fix falcon H1 dropout issue (#2938)
Because we don't have down and gate multipliers, the MLP output values are too huge, causing NaN and unstable training. To bypass that lets rely on HF's implementation for the time being
2025-07-12 15:53:07 -07:00
DoubleMathew
bbcba7fc21 patch falcon h1 inference (#2932) 2025-07-12 15:52:24 -07:00
Daniel Han
01f649f6fc Update rl.py 2025-07-11 03:13:55 -07:00
Daniel Han
7c8cd3dd6d Update rl.py 2025-07-11 03:11:07 -07:00
Daniel Han
7d2106e5c6 Merge branch 'main' into nightly 2025-07-11 03:04:28 -07:00
Daniel Han
f155097cd8 Uninitialized handler 2025-07-11 03:04:17 -07:00
Daniel Han
681b10dc0c Fixes 2025-07-11 00:01:37 -07:00
Daniel Han
2dae012308 Update llama.py 2025-07-10 17:18:40 -07:00
Daniel Han
17f7447a39 Merge branch 'main' into nightly 2025-07-10 17:12:59 -07:00
Daniel Han
9071adb723 Fix GRPO 2025-07-10 17:12:49 -07:00
Michael Han
6503961a33 Merge pull request #2929 from rolandtannous/fix/fix-grpo-get-per-token-logps-argument-mismatch
Fix argument mismatch in GRPO _get_per_token_logps lambda function
2025-07-10 14:30:05 -07:00
Roland Tannous
8d6da15c2e Fix argument mismatch in GRPO _get_per_token_logps lambda function 2025-07-10 18:24:53 +00:00
Daniel Han
07df9c1233 Merge branch 'main' into nightly 2025-07-10 07:03:59 -07:00
Daniel Han
5a2dcc924b Many bug fixes (#2927)
* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

---------

Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2025-07-10 07:03:48 -07:00
Daniel Han
c73406d8ba Update __init__.py 2025-07-10 07:03:28 -07:00
Daniel Han
ee93391155 versioning 2025-07-10 07:01:44 -07:00
Daniel Han
509cf61834 Update vision.py 2025-07-10 05:15:03 -07:00
Daniel Han
d3bd189e47 Update vision.py 2025-07-10 04:34:23 -07:00
Daniel Han
ace8de011d Small fixes 2025-07-10 04:04:51 -07:00
Daniel Han
d4a302ff3d Merge branch 'main' into nightly 2025-07-10 04:02:01 -07:00
Datta Nimmaturi
c77f6c3719 Move inputs to right devices. (#2919)
* Move tensors to right devices

* fix multi gpu for non mistral models

* multi GPU RoPE for gemma2

* Finish up multi GPU inference

* Make multiGPU rope a list

* Remove unnecessary transfer to CPU

* Remove unnecessary move to CPU

* Donot move inputs to device yet

will be handled separately in another PR

* Move inputs to appropriate decoder device

* Make device count global variable

* Cleanup RoPE device code

* Fixup num_gpu to device count

* Cleanup device counts

* Use device index for RoPE get_cache

* Donot typecast

* Use tuple instead of list for tensors. Use device index directly

* fixup move to device logic
2025-07-10 04:01:03 -07:00
Daniel Han
13a32054b7 Merge branch 'main' into nightly 2025-07-10 01:50:14 -07:00
Daniel Han
62c5c315dd Merge branch 'main' of https://github.com/unslothai/unsloth 2025-07-10 01:50:04 -07:00
Daniel Han
87e1a933d8 Update llama.py 2025-07-10 01:50:03 -07:00
DoubleMathew
643f9b068b if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913) 2025-07-09 23:29:41 -07:00
Daniel Han
20f665a98a Update __init__.py 2025-07-09 16:30:57 -07:00
Datta Nimmaturi
ced87c6059 Explicitly check if xformers exists for attention (#2889) 2025-07-09 14:15:35 -07:00
Lei Zhenyuan
6a36b6e1fc [bugs] fix for casual mask (#2868)
* fix for casual mask

* use un_casual in sdpa

* add missing mask

* fix for type
2025-07-09 14:10:25 -07:00
Lei Zhenyuan
74b9feb674 add intel gpu with vllm support (#2903) 2025-07-09 14:08:38 -07:00
Datta Nimmaturi
772f15ca49 Dynamically adjust get_per_token_logps function and patch as well (#2911) 2025-07-09 14:07:33 -07:00
DoubleMathew
c4901fd894 silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912) 2025-07-09 14:05:41 -07:00
Daniel Han
1fb1b72ae1 Merge branch 'main' into nightly 2025-07-09 14:03:38 -07:00
Daniel Han
7dde992481 Many bug fixes (#2908)
* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2025-07-09 07:14:18 -07:00
Daniel Han
3cd089ec02 Update __init__.py 2025-07-09 07:00:17 -07:00
Daniel Han
8cf37d23dc Update __init__.py 2025-07-09 03:58:13 -07:00
Daniel Han
bf7bf02f3e Update _utils.py 2025-07-07 04:56:04 -07:00
Daniel Han
45c73f7c36 Merge branch 'main' into nightly 2025-07-06 23:01:53 -07:00
Daniel Han
735466b102 versioning 2025-07-06 22:58:14 -07:00
Daniel Han
924283602a Merge branch 'main' into nightly 2025-07-06 22:44:43 -07:00
Daniel Han
538558df53 MoE kernels AGPLv3 2025-07-06 22:44:35 -07:00
Daniel Han
6c98d33275 Merge branch 'main' into nightly 2025-07-06 22:40:31 -07:00
Daniel Han
f0a9442a06 Update README.md (#2885) 2025-07-05 02:07:20 -07:00
Michael Han
78e17304a0 Update README.md
Updating icon sizes
2025-07-04 15:50:31 -07:00
Michael Han
9ecc97a67c Update README.md
Editing icon sizes
2025-07-04 15:37:44 -07:00
DoubleMathew
e858f08047 only warn about prepare causal attention mask when transformers<=4.52.4 (#2867) 2025-07-04 01:54:27 -07:00
Daniel Han
68c279eaef Merge branch 'main' into nightly 2025-07-03 15:55:30 -07:00
Michael Han
eb97606c53 Merge pull request #2873 from Erland366/fix/unslothtrainingarguments
Fix `UnslothTrainingArguments` not patching `trl.Config` properly
2025-07-03 14:00:12 -07:00
Erland366
7a306bfbdc Initialize parent class in UnslothTrainingArguments constructor 2025-07-03 15:47:58 +00:00
Erland366
d2d3875596 Refactor UnslothTrainingArguments to initialize embedding_learning_rate in constructor 2025-07-03 15:47:10 +00:00
Erland366
fe5358bacc Refactor UnslothTrainingArguments to support fallback for TrainingArguments import 2025-07-03 15:39:56 +00:00
Erland366
80206ec1eb Always use SFTConfig 2025-07-03 14:08:04 +00:00
Daniel Han
172084f14f Merge branch 'main' into nightly 2025-07-02 18:02:57 -07:00
DoubleMathew
bd32b2fd66 Update CSM for faster inference (no compile) (#2865) 2025-07-02 16:59:56 -07:00
Roland Tannous
59b30f335c fix quantized model parameter count method (#2855)
* fix quantized model parameter count method

* function cleanup

* parameter space cleanup
2025-07-01 23:36:59 -07:00
Michael Han
f4a922fc6f Update README.md 2025-07-01 09:28:59 -07:00
Daniel Han
ed969515ee Merge branch 'main' into nightly 2025-07-01 07:01:42 -07:00
Daniel Han
9ad09f9ab2 Fix Gemma 3N (#2854)
* add llama model registration

* fix quant tag mapping

* add qwen2.5 models to registry

* add option to include original model in registry

* handle quant types per model size

* separate registration of base and instruct llama3.2

* add QwenQVQ to registry

* add gemma3 to registry

* add phi

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

---------

Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2025-07-01 07:01:31 -07:00
Daniel Han
d11d7a2c16 Update vision.py 2025-07-01 06:59:30 -07:00
Daniel Han
0b1545d218 Update pyproject.toml 2025-07-01 06:57:58 -07:00
Daniel Han
93487dbf81 Update vision.py 2025-07-01 04:27:40 -07:00
Daniel Han
34464fa436 Update vision.py 2025-07-01 02:49:48 -07:00
Daniel Han
48986eb6c4 Merge branch 'main' into nightly 2025-07-01 02:47:20 -07:00
Daniel Han
00c947798d Update vision.py 2025-07-01 02:47:17 -07:00
Daniel Han
c2ecdecd5c Merge branch 'main' into nightly 2025-07-01 02:37:07 -07:00
Daniel Han
a98505a5ae Update _utils.py 2025-07-01 02:36:54 -07:00
Daniel Han
19448b2517 Update pyproject.toml 2025-07-01 01:34:17 -07:00
Daniel Han
622f3fc85a Move AMD to AMD branch 2025-07-01 01:10:40 -07:00
Daniel Han
f1e1b890ac Move AMD to AMD branch 2025-07-01 01:02:51 -07:00
Daniel Han
9886a2a3a0 subprocess 2025-07-01 00:51:04 -07:00
Daniel Han
46bc57ce49 Update setup.py 2025-07-01 00:34:27 -07:00
Daniel Han
b4b75a9ea8 Update setup.py 2025-07-01 00:32:23 -07:00
Daniel Han
95d5d7dbcc Update setup.py 2025-07-01 00:27:34 -07:00
Daniel Han
4f7185dc08 Update setup.py 2025-07-01 00:26:34 -07:00
Daniel Han
725c4616ef Update setup.py 2025-07-01 00:19:25 -07:00
Daniel Han
71f7bb26df Update setup.py 2025-07-01 00:18:22 -07:00
Daniel Han
ccd15f09bc Update setup.py 2025-07-01 00:14:58 -07:00
Daniel Han
6093e18e26 Cmake and ninja move 2025-07-01 00:13:00 -07:00
Daniel Han
9d0f44cdf6 Fix setup.py 2025-07-01 00:05:56 -07:00
Daniel Han
507b5c41e4 Update _utils.py 2025-06-30 23:13:33 -07:00
Daniel Han
fba0bff2f4 Remove stale bot 2025-06-30 23:11:57 -07:00
Daniel Han
97a5f2c7e1 Update pyproject.toml 2025-06-30 21:01:55 -07:00
Daniel Han
c6155cc6d3 Update pyproject.toml 2025-06-30 21:01:32 -07:00
Daniel Han
6e612e3aa7 Update pyproject.toml 2025-06-30 21:00:49 -07:00
Daniel Han
3aec8de53b Update pyproject.toml 2025-06-30 20:25:46 -07:00
Daniel Han
713b59a04a Update pyproject.toml 2025-06-30 20:01:13 -07:00
Daniel Han
a409f2a430 Update pyproject.toml 2025-06-30 20:00:20 -07:00
Daniel Han
2b50604876 Update setup.py 2025-06-30 19:50:15 -07:00
Daniel Han
c7d765c425 Update setup.py 2025-06-30 19:48:00 -07:00
Daniel Han
e21ac3c3ba Update setup.py 2025-06-30 19:47:22 -07:00
Daniel Han
719a626af0 Prints 2025-06-30 19:35:59 -07:00
Daniel Han
28a968d118 setup.py 2025-06-30 19:32:32 -07:00
Daniel Han
d7ac7f46c0 Fix setup.py 2025-06-30 19:25:49 -07:00
Daniel Han
021d1f78fc Merge branch 'main' into nightly 2025-06-30 16:57:19 -07:00
billishyahao
25d73efe8a [Feature] enable unsloth on amd gpu (#2520)
* [Feature] enable unsloth on amd gpu

* fix the comment

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-06-30 16:52:05 -07:00
Rishabh
3ff04c3d09 Convert torch.bfloat16, torch.float16, etc. to vLLM valid dtypes (#2811)
* Convert torch.bfloat16, torch.float16, etc. to vLLM valid dtypes

* removed newlines and extra whitespace
2025-06-30 16:39:36 -07:00
DoubleMathew
35b09e2d2a Fix loftq None config for FastBaseModel (#2848)
add new validate_loftq_config to __all__
2025-06-30 16:38:51 -07:00
Daniel Han
e1693c1caf Gemma 3N bug fixes (#2842)
* Update vision.py

* Bug fix

* Update mapper.py

* check SDPA for Mistral 3, Pixtral

* Update vision.py

* Versioning

* Update rl_replacements.py

* Update README.md

* add model registry

* move hf hub utils to unsloth/utils

* refactor global model info dicts to dataclasses

* fix dataclass init

* fix llama registration

* remove deprecated key function

* start registry reog

* add llama vision

* quant types -> Enum

* remap literal quant types to QuantType Enum

* add llama model registration

* fix quant tag mapping

* add qwen2.5 models to registry

* add option to include original model in registry

* handle quant types per model size

* separate registration of base and instruct llama3.2

* add QwenQVQ to registry

* add gemma3 to registry

* add phi

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

---------

Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2025-06-30 07:15:48 -07:00
Daniel Han
21ec11eaa4 Update vision.py 2025-06-30 07:07:38 -07:00
Daniel Han
b8e661b39e Update loader.py 2025-06-30 07:06:35 -07:00
Daniel Han
1444b1011a Update vision.py 2025-06-30 07:03:36 -07:00
Daniel Han
ac09d15750 Update vision.py 2025-06-30 06:53:28 -07:00
Daniel Han
743a469d9f Gemma 3N fixes 2025-06-30 06:51:49 -07:00
Daniel Han
e9257e5ecd Versioning 2025-06-30 06:08:14 -07:00
Daniel Han
244d97d1b6 Update loader.py 2025-06-30 06:04:19 -07:00
Roland Tannous
ed93ec6049 Added conda/mamba section to blackwell installation readme (#2817)
* Added conda/mamba section to blackwell installation readme

* fix conda creation suffix and vllm install syntax
2025-06-30 06:02:57 -07:00
Daniel Han
3b998dbe9d Gemma 3N fixes 2025-06-30 06:01:50 -07:00
Daniel Han
ee54d928f6 Merge branch 'main' into nightly 2025-06-30 05:52:19 -07:00
Daniel Han
b0088817cd Update stale.yml 2025-06-30 02:16:09 -07:00
Daniel Han
95d2bdbec3 Create stale.yml (#2836) 2025-06-29 21:59:43 -07:00
Daniel Han
550f19fc0d Delete stale.yml 2025-06-29 21:58:55 -07:00
Daniel Han
901c3216d0 Update stale.yml 2025-06-29 21:57:30 -07:00
Daniel Han
cc69f5c3cb Create stale.yml (#2832) 2025-06-29 17:36:23 -07:00
Daniel Han
4d4d5e6da2 Delete stale.yml 2025-06-29 17:35:01 -07:00
Daniel Han
018f7677a5 Update stale.yml 2025-06-29 17:29:36 -07:00
Daniel Han
c396630163 Create stale.yml 2025-06-29 17:28:18 -07:00
Daniel Han
a59da3065d Merge branch 'main' into nightly 2025-06-29 16:38:27 -07:00
Mehmet Oguz Derin
32eaa27b1a Fix LoftQ with FastBaseModel (#2826)
Pass `init_lora_weights` and `loftq_config` to `LoraConfig` constructor, which enables classes like `FastModel` to use LoftQ support. Thank you very much in advance!
2025-06-29 16:14:02 -07:00
Daniel Han
6cd3980099 gradient checkpointing 2025-06-29 03:19:20 -07:00
Daniel Han
a4a15aa9f3 Merge branch 'main' into nightly 2025-06-28 17:44:21 -07:00
DoubleMathew
f02a29e017 import undefined transformers_version for falcon model (#2822)
* import undefined transformers_version for falcon model

fixed falcon transformers version check and added error handling for FalconH1Attention bad import

* Also, conditionally load module from falcon_h1 depending on if the transformers version supports is
2025-06-28 17:41:19 -07:00
DoubleMathew
cbc80ca97c granite force layernorm upcast (#2799) 2025-06-28 05:27:56 -07:00
Dhia Eddine Rhaiem
302691a5c1 Add falcon h1 (#2650)
* add falcon h1

* feat: add Falcon-H1 into unsloth

* address comments

* fix

* Update unsloth/models/llama.py

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update unsloth/models/llama.py

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fixes

* fix comments

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Younes B <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Younes Belkada <younes.belkada@tii.ae>
Co-authored-by: ilyasch2 <ilyas.chahed@tii.ae>
2025-06-28 04:55:13 -07:00
jeromeku
dd50a76af8 add instructions for installing on blackwell (#2812) 2025-06-27 04:54:55 -07:00
Daniel Han
6f74526a98 Update vision.py 2025-06-27 04:12:10 -07:00
Daniel Han
bf032eab0b Merge branch 'main' into nightly 2025-06-27 03:10:11 -07:00
Daniel Han
1aa4fa6fe7 Update loader.py 2025-06-26 12:03:29 -07:00
Daniel Han
3a37cccc6b Update _utils.py 2025-06-26 11:31:17 -07:00
Daniel Han
ad0dbb9616 Update loader.py 2025-06-26 11:31:08 -07:00
Daniel Han
4ee91eb5ad Merge branch 'main' into nightly 2025-06-26 10:30:51 -07:00
Daniel Han
c3c2fa2e1b Update pyproject.toml 2025-06-26 09:15:14 -07:00
Daniel Han
e49e2e13f0 Gemma 3N (#2809)
* Add QLoRA Train and Merge16bit Test (#2130)

* add reference and unsloth lora merging tests

* add test / dataset printing to test scripts

* allow running tests from repo root

* add qlora test readme

* more readme edits

* ruff formatting

* additional readme comments

* forgot to add actual tests

* add apache license

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Revert

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Bug fix

* Update mapper.py

* check SDPA for Mistral 3, Pixtral

* Update vision.py

* Versioning

* Update rl_replacements.py

* Update README.md

* add model registry

* move hf hub utils to unsloth/utils

* refactor global model info dicts to dataclasses

* fix dataclass init

* fix llama registration

* remove deprecated key function

* start registry reog

* add llama vision

* quant types -> Enum

* remap literal quant types to QuantType Enum

* add llama model registration

* fix quant tag mapping

* add qwen2.5 models to registry

* add option to include original model in registry

* handle quant types per model size

* separate registration of base and instruct llama3.2

* add QwenQVQ to registry

* add gemma3 to registry

* add phi

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

---------

Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2025-06-26 09:14:28 -07:00
Daniel Han
48f4ac7888 Update _utils.py 2025-06-26 09:13:20 -07:00
Daniel Han
de24fd3191 Update loader.py 2025-06-26 09:11:30 -07:00
Daniel Han
65102f9b75 Update vision.py 2025-06-26 09:01:07 -07:00
Daniel Han
9298d90853 Update loader.py 2025-06-26 08:57:06 -07:00
Daniel Han
43fb58672b Update vision.py 2025-06-26 08:54:51 -07:00
Daniel Han
3023dc63aa Update mapper.py 2025-06-26 08:53:47 -07:00
Daniel Han
71b910a769 Update loader.py 2025-06-26 08:51:09 -07:00
Daniel Han
d82ebea900 Update mapper.py 2025-06-26 08:37:41 -07:00
Daniel Han
83e8b47a0b Merge branch 'main' into nightly 2025-06-26 04:41:55 -07:00
Daniel Han
9746799feb Bug fixes (#2807)
* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update vision.py

* HF Transfer

* fix(utils): add missing importlib import to fix NameError (#2134)

This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py
without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled.
By adding the missing import statement, the code will no longer throw a NameError.

* Add QLoRA Train and Merge16bit Test (#2130)

* add reference and unsloth lora merging tests

* add test / dataset printing to test scripts

* allow running tests from repo root

* add qlora test readme

* more readme edits

* ruff formatting

* additional readme comments

* forgot to add actual tests

* add apache license

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Revert

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Bug fix

* Update mapper.py

* check SDPA for Mistral 3, Pixtral

* Update vision.py

* Versioning

* Update rl_replacements.py

* Update README.md

* add model registry

* move hf hub utils to unsloth/utils

* refactor global model info dicts to dataclasses

* fix dataclass init

* fix llama registration

* remove deprecated key function

* start registry reog

* add llama vision

* quant types -> Enum

* remap literal quant types to QuantType Enum

* add llama model registration

* fix quant tag mapping

* add qwen2.5 models to registry

* add option to include original model in registry

* handle quant types per model size

* separate registration of base and instruct llama3.2

* add QwenQVQ to registry

* add gemma3 to registry

* add phi

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

---------

Co-authored-by: naliazheli <nalia0316@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2025-06-26 04:40:47 -07:00
Daniel Han
4457366562 Update vision.py 2025-06-26 03:55:07 -07:00
Daniel Han
4663ba3be0 Update _utils.py 2025-06-26 03:39:13 -07:00
Daniel Han
388f0203df Merge branch 'main' into nightly 2025-06-26 03:16:08 -07:00
Daniel Han
6a83bb53a5 Bug fixes (#2805)
* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update vision.py

* HF Transfer

* fix(utils): add missing importlib import to fix NameError (#2134)

This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py
without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled.
By adding the missing import statement, the code will no longer throw a NameError.

* Add QLoRA Train and Merge16bit Test (#2130)

* add reference and unsloth lora merging tests

* add test / dataset printing to test scripts

* allow running tests from repo root

* add qlora test readme

* more readme edits

* ruff formatting

* additional readme comments

* forgot to add actual tests

* add apache license

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Revert

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Bug fix

* Update mapper.py

* check SDPA for Mistral 3, Pixtral

* Update vision.py

* Versioning

* Update rl_replacements.py

* Update README.md

* add model registry

* move hf hub utils to unsloth/utils

* refactor global model info dicts to dataclasses

* fix dataclass init

* fix llama registration

* remove deprecated key function

* start registry reog

* add llama vision

* quant types -> Enum

* remap literal quant types to QuantType Enum

* add llama model registration

* fix quant tag mapping

* add qwen2.5 models to registry

* add option to include original model in registry

* handle quant types per model size

* separate registration of base and instruct llama3.2

* add QwenQVQ to registry

* add gemma3 to registry

* add phi

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

---------

Co-authored-by: naliazheli <nalia0316@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2025-06-26 02:17:12 -07:00
Daniel Han
83c0e52e90 versioning 2025-06-26 02:13:49 -07:00
Daniel Han
c6e8c516a5 Update rl_replacements.py 2025-06-26 01:57:56 -07:00
Daniel Han
0922ec5544 Update llama.py 2025-06-26 01:56:55 -07:00
Daniel Han
5f28dbe8e4 Merge branch 'main' into nightly 2025-06-26 01:51:20 -07:00
Datta Nimmaturi
e402be69b7 Fix grpo sleep regex and indentation (#2804) 2025-06-26 01:50:47 -07:00
Lei Zhenyuan
48d51bac5f fix for inductor no attribute prop.multi_processor_count (#2803) 2025-06-26 01:44:15 -07:00
Daniel Han
29e4870a45 Update vision.py 2025-06-26 01:27:16 -07:00
Daniel Han
2e9b504b28 Update rl_replacements.py 2025-06-26 01:12:21 -07:00
Daniel Han
5e99e7467f Update rl_replacements.py 2025-06-26 00:56:05 -07:00
Daniel Han
f41bfbc092 Remove debugging 2025-06-26 00:41:37 -07:00
Daniel Han
e1ca077164 Update rl_replacements.py 2025-06-26 00:35:20 -07:00
Daniel Han
c2a901493f Update rl_replacements.py 2025-06-26 00:27:17 -07:00
Daniel Han
33f20f0289 Generic efficient GRPO 2025-06-26 00:10:07 -07:00
DoubleMathew
c928612ee0 [4/N] Enable intel GPU for unsloth (#2801)
* add code for xpu llama

* refine code

* change version check to 2.6.0

* remove unuse blank

* reslove commits

* Cleaned up statistics printing

* Update unsloth/models/_utils.py

---------

Co-authored-by: lei,zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-06-25 20:28:07 -07:00
Daniel Han
b526f8b091 Update rl_replacements.py 2025-06-25 20:13:17 -07:00
Daniel Han
2a67486109 Update rl_replacements.py 2025-06-25 19:51:43 -07:00
Daniel Han
5dea7d9076 Update rl_replacements.py 2025-06-25 18:53:19 -07:00
Daniel Han
ba9e9dcef8 Update rl_replacements.py 2025-06-25 18:51:54 -07:00
Daniel Han
2414e577da Update rl_replacements.py 2025-06-25 18:50:45 -07:00
Daniel Han
cda0aacb5c Merge branch 'main' into nightly 2025-06-25 16:50:18 -07:00
Daniel Han
7dc1ce9eb3 Update llama.py 2025-06-25 02:09:56 -07:00
Daniel Han
cd1aff0222 Update llama.py 2025-06-25 01:57:15 -07:00
Daniel Han
537d4b217c Debugging only 2025-06-25 01:44:07 -07:00
Michael Han
b017f2395a Update README.md
Updating links
2025-06-25 01:32:24 -07:00
Daniel Han
404052510b Merge branch 'main' into nightly 2025-06-24 02:03:47 -07:00
Lei Zhenyuan
dcf26ac3fb [3/N] Enable intel GPU for unsloth (#2620)
* enable intel xpu changes within kernels

* reslove torch.version < 2.6

* change version check to 2.6.0

* resolve comments for torch_gpu_device

* resolve amp fwd comments

* fix typo

* change cuda default logic

* clean this pr

* add HAS_CUDA_STREAM as default False

* split GPU streams to cuda and xpu streams

* add optional
2025-06-24 02:01:28 -07:00
Daniel Han
9476eccb31 Merge branch 'main' of https://github.com/unslothai/unsloth into nightly 2025-06-24 01:36:03 -07:00
Daniel Han
48ccca95e8 Merge branch 'main' into nightly 2025-06-24 01:36:02 -07:00
DoubleMathew
0a14b795d0 move min_sms in is_big_gpu inside DEVICE_TYPE if else (#2792)
log is not defined in torch inductor so remove

Remove log.warning entirely
2025-06-23 18:57:55 -07:00
pluesclues
22c4d45d48 Fixed Sequence Classification errors, loaded model weirdly (#2793) 2025-06-23 18:56:56 -07:00
Michael Han
853a72592b Update issue templates 2025-06-23 05:34:46 -07:00
Daniel Han
97c10f9494 Update issue templates 2025-06-23 05:26:28 -07:00
Lei Zhenyuan
1ae3425b07 [5/N] Enable intel GPU for unsloth (#2768)
* add is_big_gpu support for xpu

* make code unsloth's style
2025-06-23 04:47:34 -07:00
kilavvy
1d5af06e00 Docs: Fix typo and improve MoE docstrings (#2784)
* Update qwen3_moe.py

* Update interface.py
2025-06-23 01:09:23 -07:00
Daniel Han
d7b0653a2a Fix GRPO (#2787)
* Update _utils.py

* Update _utils.py

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update vision.py

* HF Transfer

* fix(utils): add missing importlib import to fix NameError (#2134)

This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py
without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled.
By adding the missing import statement, the code will no longer throw a NameError.

* Add QLoRA Train and Merge16bit Test (#2130)

* add reference and unsloth lora merging tests

* add test / dataset printing to test scripts

* allow running tests from repo root

* add qlora test readme

* more readme edits

* ruff formatting

* additional readme comments

* forgot to add actual tests

* add apache license

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Revert

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Bug fix

* Update mapper.py

* check SDPA for Mistral 3, Pixtral

* Update vision.py

* Versioning

* Update rl_replacements.py

* Update README.md

* add model registry

* move hf hub utils to unsloth/utils

* refactor global model info dicts to dataclasses

* fix dataclass init

* fix llama registration

* remove deprecated key function

* start registry reog

* add llama vision

* quant types -> Enum

* remap literal quant types to QuantType Enum

* add llama model registration

* fix quant tag mapping

* add qwen2.5 models to registry

* add option to include original model in registry

* handle quant types per model size

* separate registration of base and instruct llama3.2

* add QwenQVQ to registry

* add gemma3 to registry

* add phi

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

---------

Co-authored-by: naliazheli <nalia0316@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2025-06-22 05:54:29 -07:00
Daniel Han
1e8201f465 Update rl_replacements.py 2025-06-22 05:51:07 -07:00
Daniel Han
6f8711e2e6 Update rl_replacements.py 2025-06-22 05:34:52 -07:00
Daniel Han
ae992555e4 Update pyproject.toml 2025-06-22 05:33:57 -07:00
Daniel Han
c8e5a88001 Update rl_replacements.py 2025-06-22 05:32:38 -07:00
Daniel Han
d9601bd14a logits / temperature 2025-06-22 04:54:18 -07:00
Daniel Han
d36abd3b71 Update rl_replacements.py 2025-06-22 03:44:56 -07:00
Daniel Han
7837f6f40c Update rl_replacements.py 2025-06-22 03:39:30 -07:00
Daniel Han
9b612677f9 Update rl.py 2025-06-22 03:29:20 -07:00
Daniel Han
c465cf2fd2 Update rl_replacements.py 2025-06-22 02:55:00 -07:00
Daniel Han
0815525b15 Update rl_replacements.py 2025-06-22 02:37:52 -07:00
Daniel Han
0012c8906c Merge branch 'main' into nightly 2025-06-22 02:37:11 -07:00
Daniel Han
c7c6f2c88d Update rl.py 2025-06-21 23:29:08 -07:00
Daniel Han
d9be145849 Fix bf16 = None 2025-06-21 22:50:29 -07:00
Daniel Han
2f2930ee50 Update rl.py 2025-06-21 22:26:46 -07:00
Daniel Han
6af83b76b0 Merge branch 'main' into nightly 2025-06-21 22:21:10 -07:00
Daniel Han
05867537c1 Update _utils.py 2025-06-21 22:20:46 -07:00
Daniel Han
d71dfb1d01 Update rl_replacements.py 2025-06-21 22:20:32 -07:00
Daniel Han
3461b987fd Fix DAPO, TRL 0.19.0 2025-06-21 22:14:21 -07:00
simpissa
8a202d6175 Fix for grpo_compute_loss_slow (#2702)
* slice last logit

* move slicing
2025-06-21 21:58:06 -07:00
Daniel Han
447ce0fb4f Mistral Small 3.2 2025-06-21 06:44:14 -07:00
amrothemich
ca150bb27a Update pyproject.toml (#2778)
Switched pyproject license to dictionary type
2025-06-21 02:44:24 -07:00
Michael Han
2a200e739a Merge pull request #2780 from rolandtannous/fix/gemma3-grpo-self-llm
Fix AttributeError in GRPO trainer for models without llm attribute
2025-06-20 21:15:54 -07:00
Roland Tannous
8c563abb87 Fix Gemma3ForCausalLm does not have attribute self.llm 2025-06-21 01:07:32 +00:00
Roland Tannous
061f038ec7 Additional tests for unsloth-zoo PR#174 2025-06-21 00:22:00 +00:00
Daniel Han
1be2a6e90a Merge branch 'main' of https://github.com/unslothai/unsloth 2025-06-20 06:30:43 -07:00
Daniel Han
d8846ebdfd Update pyproject.toml 2025-06-20 06:30:35 -07:00
marcandrelarochelle
481292a96a Fix TRL 1.8.2 (#2774)
* Fix for TRL 1.8.2

Regex matching LLM initialization

* Update Regex
2025-06-20 06:28:58 -07:00
Daniel Han
e43babe76f Update __init__.py 2025-06-20 06:13:45 -07:00
Daniel Han
2e9724f279 Fix bugs 2025-06-20 06:09:03 -07:00
Datta Nimmaturi
b87ff3f528 Enable vLLM to share memory space (#2712)
* vLLM sleep once generation is done

* Make enable_sleep_model configurable

* Make default to false

Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>

* Force standby under environment variable

---------

Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>
2025-06-19 04:04:14 -07:00
Edd
a398484d0a Fix renaming on other model than Llama (#2762) 2025-06-18 13:38:36 -07:00
leopardracer
c6e0366e0d Fix Typos in Documentation and Comments (#2721)
* Update ocr_eval.md

* Update backward.py
2025-06-17 04:34:51 -07:00
pluesclues
440bbf5b52 Reward modeling update (There seems to be another patch) (#2710)
* Update llama.py, sequence_classifcaiton update

* Update llama.py, adapting to original commit

* Update llama.py, for seqeuence classifcation update

* Update llama.py, added transformer import

* Update llama.py, dealt with output weight

* Update llama.py, renamed it peft model fast forward

* Update llama.py, set up is classification varaiable

* Update llama.py, updated lora dict to initialize sequence classification object

* Update llama.py, gets model name correctly before Lora dict is initialized

* Update llama.py, Task_type_SEQ_CLS doesnt work but it does work with Task_type.CAUSAL_LM
2025-06-17 04:33:45 -07:00
Michael Han
ee76be7d58 Update issue templates
Adding Reddit link
2025-06-12 01:23:36 -07:00
Roland Tannous
efe2cc43a7 tests for additional merge fix unsloth zoo pr 163 (#2719)
* tests for additional merge fix unsloth zoo pr 163

* fixed load_dataset indent in mistral perplexity test file
2025-06-11 14:08:41 -07:00
Daniel Han
d535bf067e Versioning 2025-06-10 06:51:07 -07:00
user799595
19399e09f9 Making protobuf version more flexible (#2637)
* Making protobuf version more flexible

* Update pyproject.toml

* Update pyproject.toml

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-06-10 04:13:25 -07:00
Daniel Han
af47cfb9a3 Update pyproject.toml 2025-06-10 04:04:25 -07:00
Lei Zhenyuan
fc78af6d76 add support for torch270 (#2709) 2025-06-10 03:59:15 -07:00
Daniel Han
0ab4544d37 Merge branch 'main' into nightly 2025-06-06 05:55:46 -07:00
Daniel Han
16af0ceb8e versioning 2025-06-06 05:46:49 -07:00
Salpingopharyngeus
0012b13573 Ignore None to Subprocess_Commands (#2680)
Ignores none params when building the subprocess_command for vllm. As none values stop vllm from deploying properly, as --quantize will be passed with none if quantization type isn't specified in the model name.
2025-06-05 01:25:12 -07:00
DoubleMathew
aa50ef2862 Update prepare 4d causal attention call (#2678) 2025-06-04 12:58:50 -07:00
Daniel Han
8f465e21c5 Update rl.py 2025-06-03 00:07:52 -07:00
DoubleMathew
9bf691061d patch sft_trainer to favor max_seq_length over max_length in config (#2669) 2025-06-03 00:06:44 -07:00
DoubleMathew
90a4aacbf8 unsloth checkpointing fix for latest transformers==4.52.x (#2674) 2025-06-03 00:06:06 -07:00
Roland Tannous
58f3a6e29d reroute merge logic language models + comprehensive tests + eval kits (#2673) 2025-06-02 20:32:57 -07:00
Daniel Han
a58fec36a5 Merge branch 'main' into nightly 2025-06-02 18:58:24 -07:00
RunFMe
332eabf309 Fix batched generation for prompts of different lengths (#2216)
* fix ignoring of attention mask after prefill stage in decoding

* update naming to avoid confusion

---------

Co-authored-by: Неизвестный Пользователь722497 <dolegosmirnov@sberbank.ru>
2025-06-02 03:59:10 -07:00
Michael Han
e76172c638 Merge pull request #2662 from Datta0/model_param_fix
Fix quant model param fetch regex
2025-06-01 04:19:12 -07:00
datta0
e8d6ede1fd Make replacement logic conscise 2025-06-01 05:57:43 +00:00
Michael Han
45a32bc599 Update issue templates 2025-05-31 14:38:55 -07:00
datta0
f2a8a437b4 Fix quant model param fetch regex 2025-05-31 18:52:46 +00:00
Daniel Han
03965930e7 DeepSeek R1 Qwen 2025-05-30 01:38:53 -07:00
Daniel Han
125e9f5f84 Merge branch 'main' into nightly 2025-05-29 09:59:48 -07:00
Daniel Han
f9677b6cae Bug fixes (#2651)
* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* check

* Update _utils.py

* Update loader.py

* Update loader.py

* Remove prints

* Update README.md

typo

* Update _utils.py

* Update _utils.py

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update vision.py

* HF Transfer

* fix(utils): add missing importlib import to fix NameError (#2134)

This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py
without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled.
By adding the missing import statement, the code will no longer throw a NameError.

* Add QLoRA Train and Merge16bit Test (#2130)

* add reference and unsloth lora merging tests

* add test / dataset printing to test scripts

* allow running tests from repo root

* add qlora test readme

* more readme edits

* ruff formatting

* additional readme comments

* forgot to add actual tests

* add apache license

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Revert

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Bug fix

* Update mapper.py

* check SDPA for Mistral 3, Pixtral

* Update vision.py

* Versioning

* Update rl_replacements.py

* Update README.md

* add model registry

* move hf hub utils to unsloth/utils

* refactor global model info dicts to dataclasses

* fix dataclass init

* fix llama registration

* remove deprecated key function

* start registry reog

* add llama vision

* quant types -> Enum

* remap literal quant types to QuantType Enum

* add llama model registration

* fix quant tag mapping

* add qwen2.5 models to registry

* add option to include original model in registry

* handle quant types per model size

* separate registration of base and instruct llama3.2

* add QwenQVQ to registry

* add gemma3 to registry

* add phi

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

---------

Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com>
Co-authored-by: naliazheli <nalia0316@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2025-05-29 09:59:29 -07:00
Daniel Han
b087bfaf55 Update rl.py 2025-05-28 12:27:55 -07:00
Daniel Han
b290557814 Update rl.py 2025-05-28 12:27:43 -07:00
Daniel Han
acb972cee0 versioning 2025-05-28 12:19:43 -07:00
Daniel Han
85b959b9bb Merge branch 'main' into nightly 2025-05-28 12:11:41 -07:00
DoubleMathew
95452eed81 Fix SFTtraining for new trl (#2647)
* fix sft training with trl>0.15.2 with trl DataCollator

* Update fix to accomodate both trl and transformers DataCollatorForLanguageModeling
2025-05-28 11:55:48 -07:00
Daniel Han
2b1462e813 Merge branch 'main' into nightly 2025-05-28 06:15:34 -07:00
Daniel Han
623060ba29 Latest TRL, GRPO + Bug fixes (#2645)
* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* model_type_arch

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* check

* Update _utils.py

* Update loader.py

* Update loader.py

* Remove prints

* Update README.md

typo

* Update _utils.py

* Update _utils.py

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update vision.py

* HF Transfer

* fix(utils): add missing importlib import to fix NameError (#2134)

This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py
without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled.
By adding the missing import statement, the code will no longer throw a NameError.

* Add QLoRA Train and Merge16bit Test (#2130)

* add reference and unsloth lora merging tests

* add test / dataset printing to test scripts

* allow running tests from repo root

* add qlora test readme

* more readme edits

* ruff formatting

* additional readme comments

* forgot to add actual tests

* add apache license

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Revert

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Bug fix

* Update mapper.py

* check SDPA for Mistral 3, Pixtral

* Update vision.py

* Versioning

* Update rl_replacements.py

* Update README.md

* add model registry

* move hf hub utils to unsloth/utils

* refactor global model info dicts to dataclasses

* fix dataclass init

* fix llama registration

* remove deprecated key function

* start registry reog

* add llama vision

* quant types -> Enum

* remap literal quant types to QuantType Enum

* add llama model registration

* fix quant tag mapping

* add qwen2.5 models to registry

* add option to include original model in registry

* handle quant types per model size

* separate registration of base and instruct llama3.2

* add QwenQVQ to registry

* add gemma3 to registry

* add phi

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

---------

Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com>
Co-authored-by: naliazheli <nalia0316@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2025-05-28 06:15:12 -07:00
Daniel Han
251066289d Update rl.py 2025-05-28 06:04:23 -07:00
Daniel Han
12912022f4 Update pyproject.toml 2025-05-28 05:58:02 -07:00
Daniel Han
c94e31888c logging 2025-05-28 05:56:19 -07:00
Daniel Han
725d84997b Update rl.py 2025-05-28 05:26:01 -07:00
Daniel Han
1dabfcd2d3 Update rl.py 2025-05-28 05:23:07 -07:00
Daniel Han
86cd1d2786 Update rl.py 2025-05-28 05:09:03 -07:00
Daniel Han
e7f76d53bb Update rl.py 2025-05-28 05:06:24 -07:00
Daniel Han
4fb4d3a36c Update rl.py 2025-05-28 05:04:12 -07:00
Daniel Han
dc48358b2f versioning 2025-05-28 05:02:05 -07:00
Daniel Han
89c30967df Update rl.py 2025-05-28 04:57:30 -07:00
Daniel Han
026ba8e678 Create LICENSE 2025-05-28 03:27:48 -07:00
jeromeku
0b5ac8f2ab Llama4 MoE Grouped GEMM (#2639)
* add llama4 reference layer

* add llama4 reference impl

* formatting
2025-05-28 03:26:35 -07:00
Daniel Han
86ecf655a9 Merge branch 'main' into nightly 2025-05-28 03:24:15 -07:00
Premik
d8bf17959a Check the skip_prepare_dataset before accessing dataset fields. #2496 (#2633) 2025-05-28 03:23:59 -07:00
Daniel Han
0ba75a4359 Merge branch 'main' into nightly 2025-05-28 02:06:42 -07:00
Michael Han
ce9e54755f Update README.md
Better Qwen3 notebook
2025-05-26 23:44:41 -07:00
Daniel Han
5327d1d36d Flash Attention whls 2025-05-26 22:48:46 -07:00
Datta Nimmaturi
811422bbb4 Upgrade trl fix (#2544)
* Update llama.py making set and reset functions in order to properly use autoSequenceClassification

* Update fast_lora.py, added mixed precising pytorch autocasting

* Update llama.py did not included rotary embeddings in the reset functions correctly

* Update rl.py: correct get reward model added as well as the eval step stuff

* Update rl.py removed function that did not need to be patched

* Update llama.py: kept reset functions and made their names generic

* Update fast_lora.py

* Update rl.py, try except

* Update fast_lora.py, removing downcasting stuff

* Update llama.py removed depircate LLamaLinearScalingRotaryEmbedding

* Update rl.py for VLLM RLOO and PPO

* Update rl.py reverted

* Update rl.py with peft cahnges

* Update rl.py, disabling adapters screws inference up

* Update rl.py getting PPO support

* Update rl.py cleanup

* Update rl.py cleaned up not useful commented code

* Update llama.py, enabled new flag, keep padding

* Upgrade trl fix

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

* Update rl.py made changes relative to the review

* Revert accidental patch block for non grpo

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

* Fixup sampling params issue

* Fix rl.py regex

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

* loss type: grpo, drgrpo and bnpo

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>

* Add trl version check for vllm colocate mode for RL trainers

* Update rl.py

For TRL 0.18.0 (Main branch of TRL at the time because its on 0.17.0) , the SFT trainer for some reason deletes the labels column and unsloth internal loss funcitons need that column for hte claculations so I add it back in like this.

* Update llama.py, merge it to be dattas llama version

* Update rl.py, sft changes to get 0.18.0 to be working

* Update rl_replacements.py, added hidden state stuff

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py, rechanged the accumlated loss

* Fixup num_iterations>1 for grpo

Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>

* Update rl_replacements.py

* no unnecessary logits upcast. fix naming

Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>

* Update rl_replacements.py returned hidden states from logprobs

* Update rl_replacements.py removed debug logic

* Update rl_replacements.py, should be fine now

* Update rl_replacements.py, should take new args for GRPO trainer

* Update rl_replacements.py, made it compatible with trl 0.15.2

* Update rl_replacements.py, fixed typo in per tokne-Logps

---------

Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>
Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>
Co-authored-by: pluesclues <136766175+pluesclues@users.noreply.github.com>
2025-05-26 17:20:57 -07:00
Daniel Han
a66f3f4cda Colocate vLLM 2025-05-26 00:37:04 -07:00
Michael Han
1f4e74cb96 Update README.md 2025-05-25 03:35:43 -07:00
Quentin Gallouédec
ce5c2d2145 Remove dataset_text_field from SFTConfig (#2609) 2025-05-25 03:20:16 -07:00
Richi
f6c4be39b7 add: path checking for failed llama cpp builds (#2603) 2025-05-25 03:18:07 -07:00
Daniel Han
9b8e6b1e22 Merge branch 'main' into nightly 2025-05-21 23:21:12 -07:00
Daniel Han
dd43200718 Devstral, MedGemma 2025-05-21 07:35:36 -07:00
Michael Han
e771760e53 Update README.md
Updating model support
2025-05-20 09:51:55 -07:00
Michael Han
61b68725e9 Update README.md 2025-05-19 21:26:19 -07:00
Daniel Han
b5269556c2 Update issue templates 2025-05-17 18:30:17 -07:00
Daniel Han
25c11cf2d8 Update issue templates 2025-05-17 18:29:27 -07:00
Daniel Han
7a0c8c7da1 Update issue templates 2025-05-17 05:42:10 -07:00
Daniel Han
4d5f2172f4 Fix Whisper, ModernBERT (#2565)
* Update vision.py

* Update vision.py

* Update mapper.py

* Update vision.py

* fix: config.torch_dtype in LlamaModel_fast_forward_inference (#2091)

* fix: config.torch_dtype in LlamaModel_fast_forward_inference

* Update llama.py

* update for consistency

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* model_type_arch

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* check

* Update _utils.py

* Update loader.py

* Update loader.py

* Remove prints

* Update README.md

typo

* Update _utils.py

* Update _utils.py

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update vision.py

* HF Transfer

* fix(utils): add missing importlib import to fix NameError (#2134)

This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py
without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled.
By adding the missing import statement, the code will no longer throw a NameError.

* Add QLoRA Train and Merge16bit Test (#2130)

* add reference and unsloth lora merging tests

* add test / dataset printing to test scripts

* allow running tests from repo root

* add qlora test readme

* more readme edits

* ruff formatting

* additional readme comments

* forgot to add actual tests

* add apache license

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Revert

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Bug fix

* Update mapper.py

* check SDPA for Mistral 3, Pixtral

* Update vision.py

* Versioning

* Update rl_replacements.py

* Update README.md

* add model registry

* move hf hub utils to unsloth/utils

* refactor global model info dicts to dataclasses

* fix dataclass init

* fix llama registration

* remove deprecated key function

* start registry reog

* add llama vision

* quant types -> Enum

* remap literal quant types to QuantType Enum

* add llama model registration

* fix quant tag mapping

* add qwen2.5 models to registry

* add option to include original model in registry

* handle quant types per model size

* separate registration of base and instruct llama3.2

* add QwenQVQ to registry

* add gemma3 to registry

* add phi

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

---------

Co-authored-by: lurf21 <93976703+lurf21@users.noreply.github.com>
Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com>
Co-authored-by: naliazheli <nalia0316@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2025-05-17 05:11:50 -07:00
Daniel Han
416b89e901 Update _utils.py 2025-05-17 05:11:36 -07:00
Daniel Han
f0bf761832 Merge branch 'main' into nightly 2025-05-17 05:10:00 -07:00
Emmanuel Ferdman
2a1caa746b Display the model name in RoPE scaling unsupported error (#2564)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-05-17 05:09:06 -07:00
Daniel Han
382406a9ef Update loader.py 2025-05-17 00:29:55 -07:00
Daniel Han
87b98cf970 Update loader.py 2025-05-17 00:29:04 -07:00
Daniel Han
811a86007d Update loader.py 2025-05-17 00:24:27 -07:00
Daniel Han
611463b428 Update loader.py 2025-05-17 00:03:20 -07:00
Daniel Han
74d44dc611 Update vision.py 2025-05-16 23:31:15 -07:00
Daniel Han
c0ae3602e2 Update vision.py 2025-05-16 23:25:32 -07:00
Daniel Han
fe3f1c02f9 Update vision.py 2025-05-16 23:24:32 -07:00
Daniel Han
18489d1c99 Update vision.py 2025-05-16 23:20:57 -07:00
Daniel Han
b790b82e42 Update vision.py 2025-05-16 23:20:18 -07:00
Daniel Han
3c475c834d Update vision.py 2025-05-16 23:03:04 -07:00
Daniel Han
9b571970f8 Update mapper.py 2025-05-16 23:02:06 -07:00
Daniel Han
0078d7bfb5 Merge branch 'main' into nightly 2025-05-16 23:02:00 -07:00
Michael Han
f48e240bad Merge pull request #2563 from davedgd/main
fix issue with qwen3 template double quote escapes
2025-05-16 22:38:39 -07:00
David Dobolyi
a063c4a41e fix issue with qwen3 template double quote escapes 2025-05-16 23:26:03 -06:00
Etherll
99a2a36f73 Fix trust remote code (#2357)
* Update _utils.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update unsloth/models/vision.py

* Update unsloth/models/vision.py

* Update unsloth/models/vision.py

* Update unsloth/models/vision.py

* Update unsloth/models/_utils.py

* Update unsloth/models/vision.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-05-16 16:06:42 -07:00
Daniel Han
2524de493e Update pyproject.toml 2025-05-16 15:38:19 -07:00
Daniel Han
15b6ac613a Merge branch 'main' of https://github.com/unslothai/unsloth 2025-05-16 15:34:41 -07:00
Daniel Han
299c8a94a4 Update _utils.py 2025-05-16 15:33:49 -07:00
Michael Han
4937cd97f0 Merge pull request #2554 from Erland366/fix/generation_config
Quick fix on the CompileConfig error
2025-05-16 12:48:00 -07:00
Erland366
3cdbd879f7 Fix Nonetype on the compile_config 2025-05-16 13:16:34 +00:00
Michael Han
b22e654ef0 Update README.md 2025-05-16 01:56:40 -07:00
Michael Han
41e3701251 Update README.md
TTS support
2025-05-15 15:15:53 -07:00
Daniel Han
dc6c4dc385 TTS (#2545)
* Update rl_replacements.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Remove double generate patch

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update mapper.py

* Update vision.py

* fix: config.torch_dtype in LlamaModel_fast_forward_inference (#2091)

* fix: config.torch_dtype in LlamaModel_fast_forward_inference

* Update llama.py

* update for consistency

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* model_type_arch

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* check

* Update _utils.py

* Update loader.py

* Update loader.py

* Remove prints

* Update README.md

typo

* Update _utils.py

* Update _utils.py

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update vision.py

* HF Transfer

* fix(utils): add missing importlib import to fix NameError (#2134)

This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py
without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled.
By adding the missing import statement, the code will no longer throw a NameError.

* Add QLoRA Train and Merge16bit Test (#2130)

* add reference and unsloth lora merging tests

* add test / dataset printing to test scripts

* allow running tests from repo root

* add qlora test readme

* more readme edits

* ruff formatting

* additional readme comments

* forgot to add actual tests

* add apache license

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Revert

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Bug fix

* Update mapper.py

* check SDPA for Mistral 3, Pixtral

* Update vision.py

* Versioning

* Update rl_replacements.py

* Update README.md

* add model registry

* move hf hub utils to unsloth/utils

* refactor global model info dicts to dataclasses

* fix dataclass init

* fix llama registration

* remove deprecated key function

* start registry reog

* add llama vision

* quant types -> Enum

* remap literal quant types to QuantType Enum

* add llama model registration

* fix quant tag mapping

* add qwen2.5 models to registry

* add option to include original model in registry

* handle quant types per model size

* separate registration of base and instruct llama3.2

* add QwenQVQ to registry

* add gemma3 to registry

* add phi

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

---------

Co-authored-by: lurf21 <93976703+lurf21@users.noreply.github.com>
Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com>
Co-authored-by: naliazheli <nalia0316@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2025-05-15 09:23:52 -07:00
Daniel Han
b9b7b2c054 Update loader.py 2025-05-15 09:23:11 -07:00
Daniel Han
2579cd982b Update loader.py 2025-05-15 08:52:10 -07:00
Daniel Han
ed5712c397 Update loader.py 2025-05-15 08:39:37 -07:00
Daniel Han
b425d9da9e Whisper 2025-05-15 08:27:17 -07:00
Daniel Han
a0dd7663c2 Sesame 2025-05-15 07:29:32 -07:00
Michael Han
b5ba71a3d3 Update README.md 2025-05-15 06:54:05 -07:00
Daniel Han
b77d59e97c Auto vision detection 2025-05-15 06:39:03 -07:00
Daniel Han
d77a4a9d5c Update vision.py 2025-05-15 04:52:10 -07:00
Daniel Han
2fe9067d6b UNSLOTH_DISABLE_STATIC_GENERATION 2025-05-15 04:48:42 -07:00
Daniel Han
b02f6035ae Update vision.py 2025-05-15 04:34:49 -07:00
Daniel Han
8865207ba4 Update vision.py 2025-05-15 04:26:05 -07:00
Janusz
b781a7ad38 Add use_rslora reference to LoraConfig inititalisation (#2539)
Co-authored-by: jkumz <janusz.kumor01@gmail.com>
2025-05-15 04:24:18 -07:00
omahs
28304e4101 Fix typos (#2540) 2025-05-15 04:23:27 -07:00
Daniel Han
dafdcbe88c Update vision.py 2025-05-15 03:54:22 -07:00
Daniel Han
ba6dcd2498 Update loader.py 2025-05-15 03:46:48 -07:00
Daniel Han
30a1b25752 Update loader.py 2025-05-15 03:39:20 -07:00
Daniel Han
fda9007d77 Update loader.py 2025-05-15 03:18:16 -07:00
Daniel Han
7f22fdd9f9 Update loader.py 2025-05-15 03:00:25 -07:00
Daniel Han
5982e34484 is_multimodal 2025-05-15 02:32:29 -07:00
Daniel Han
b4b073a59d Update loader.py 2025-05-15 02:30:12 -07:00
Daniel Han
f781c871c7 Update vision.py 2025-05-15 02:17:03 -07:00
Daniel Han
633221f8c9 Update vision.py 2025-05-15 02:07:29 -07:00
Daniel Han
151b1ae2e3 Update vision.py 2025-05-15 01:54:05 -07:00
Daniel Han
630f2b5f66 Update loader.py 2025-05-15 01:47:34 -07:00
Daniel Han
c4d8ed4939 Fix Seasame 2025-05-15 01:45:15 -07:00
Daniel Han
9dac587254 Seasame force float16 / float32 2025-05-15 01:27:14 -07:00
Daniel Han
189ed215e6 Update chat_templates.py 2025-05-15 01:13:20 -07:00
Daniel Han
992f9248bb Merge branch 'main' into nightly 2025-05-15 01:07:32 -07:00
Michael Han
b64c84ef33 Merge pull request #2537 from kiankyars/main
Add Qwen-3 chat template and Ollama template support
2025-05-14 20:45:53 -07:00
Daniel Han
c9b93f7b10 Merge branch 'main' into nightly 2025-05-14 20:30:41 -07:00
Kian Kyars
e147af330e undo accident 2025-05-14 19:00:23 -06:00
Kian Kyars
059ccd8221 style: Place Qwen-3 template after Gemma-3, match style with other templates 2025-05-14 18:59:17 -06:00
Kian Kyars
48dc104728 Update Qwen-3 chat and Ollama templates to official full version, placed after Gemma-3 2025-05-14 18:42:55 -06:00
Kian Kyars
40ac241994 Add Qwen-3 chat template and Ollama template support 2025-05-14 18:35:02 -06:00
Daniel Han
e18a41d10f Update pyproject.toml 2025-05-14 05:42:47 -07:00
Daniel Han
da41d4c21f Update _utils.py 2025-05-14 05:42:11 -07:00
Daniel Han
0bbf131238 Update synthetic.py 2025-05-14 04:05:23 -07:00
Daniel Han
f971fce721 Update synthetic.py 2025-05-14 03:54:46 -07:00
Daniel Han
17d8517144 Update synthetic.py 2025-05-14 03:49:27 -07:00
Daniel Han
b47bbd3f55 Update synthetic.py 2025-05-14 03:47:29 -07:00
Daniel Han
e05735db0c Update synthetic.py 2025-05-14 03:44:15 -07:00
Michael Han
074573a13b Merge pull request #2527 from mmathew23/csm
Add Sesame CSM
2025-05-14 02:26:12 -07:00
DoubleMathew
e317fc222d Merge branch 'unslothai:main' into csm 2025-05-13 17:58:11 -05:00
Daniel Han
cdb8eaaf42 Versioning 2025-05-13 09:10:24 -07:00
Daniel Han
99a0627c64 Update synthetic.py 2025-05-13 08:26:17 -07:00
Daniel Han
76e632ea35 Update synthetic.py 2025-05-13 08:25:48 -07:00
Daniel Han
4f9dfadd91 Merge branch 'main' into nightly 2025-05-13 03:39:49 -07:00
Michael Han
f4cbf303fe Update README.md 2025-05-13 01:39:59 -07:00
Daniel Han
65710647b5 Update loader_utils.py 2025-05-12 21:06:30 -07:00
Daniel Han
e99b66e711 Update pyproject.toml 2025-05-12 16:28:50 -07:00
feng lui
48cb9c724c vLLM Windows CUDA support [tested] (#2158)
* Update loader.py

change vllm installed check by transformers utils function

* Update llama.py

change vllm installed check by transformers utils function

* add sample notebook

* fix Indentation

* add global is_vLLM_available function

* Pythonic style

* Delete nb/Qwen2.5_(3B)-GRPO-windows.ipynb

Would be great to move it to https://github.com/unslothai/notebooks - appreciate it!

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-05-12 05:33:42 -07:00
Daniel Han
67026e28ee Update pyproject.toml 2025-05-12 04:24:18 -07:00
Daniel Han
cecdfc5a34 Fix Intel GPU 2025-05-12 03:10:21 -07:00
Lei Zhenyuan
fe6b83fd7e [2/N] Enable intel GPU for unsloth (#2388)
* add DEVICE_TYPE and resolve device specific API

* reuse import torch

* move env under device typr

* resolve comments

* add more comments

* add more comments
2025-05-12 02:58:21 -07:00
Lei Zhenyuan
5bf77aabf5 first pr for intel GPU, resolve __init__.py and pyproject.toml (#2350)
add better comments
2025-05-12 02:38:40 -07:00
Daniel Han
c37380c63b Fix GRPO eval 2025-05-12 02:35:31 -07:00
Michael Han
e3f6c5eff4 Merge pull request #2466 from mmathew23/fix_pop_token_type_ids
the pixtral vision notebook fails during inference
2025-05-09 14:57:31 -07:00
Mathew Mathew
3a56b3a24a turn off compilation and fast generation for csm 2025-05-09 16:50:16 -05:00
Michael Han
1a99b4dc94 Merge pull request #2492 from yuanzhedong/yz/dev/fix-readme
Fix readme example
2025-05-07 23:29:59 -07:00
Yuanzhe Dong
75f3f8a7e5 Fix readme example 2025-05-06 19:26:35 -07:00
Michael Han
8821057420 Update README.md
Adding extra synthetic data notebook, cleaning repo
2025-05-05 20:56:01 -07:00
Daniel Han
6c0b8a57e4 Update __init__.py 2025-05-04 18:06:39 -07:00
Michael Han
9e2ef7c50c Uploading HQ Unsloth Sticker 2025-05-04 05:31:57 -07:00
Michael Han
c4d0fd42be Updating HQ logos 2025-05-04 05:25:06 -07:00
Daniel Han
84779ee11b Better vllm deletion 2025-05-04 05:03:58 -07:00
Daniel Han
f9bf537130 Update pyproject.toml 2025-05-04 04:29:10 -07:00
Daniel Han
bad8069807 Update _utils.py 2025-05-04 03:03:50 -07:00
Michael Han
bb802c8a4a Update README.md 2025-05-02 23:14:34 -07:00
Daniel Han
077fe260e6 Update pyproject.toml 2025-05-02 22:52:58 -07:00
Daniel Han
e287e55906 Update pyproject.toml 2025-05-02 22:47:09 -07:00
Daniel Han
fcaf726dda Update pyproject.toml 2025-05-02 22:40:12 -07:00
Roland Tannous
4190502a1a Added missing code of conduct (#2416)
* Added code of conduct

* fixed CONTRIBUTING -> CODE_OF_CONDUCT url link
2025-05-02 21:08:27 -07:00
Johnny
fb3fb77d43 Update pyproject.toml (#2458) 2025-05-02 21:07:59 -07:00
jeromeku
b7fc12c8be MoE Kernel (#2465)
* add moe grouped gemm kernel

* add benchmark, README

* remove formatting from __init__.py
2025-05-02 20:59:23 -07:00
Mathew Mathew
bbe5e2a221 the pixtral vision notebook fails during inferenc with unused kwargs token_type_ids. This fixes the error 2025-05-02 21:40:15 -05:00
Daniel Han
9edbe23259 Bug fix 2025-05-02 14:06:14 -07:00
Daniel Han
d957caeae7 Fix Qwen 3 mapping 2025-05-02 09:44:01 -07:00
Michael Han
8bfe5fd4ab Update README.md 2025-05-02 09:06:57 -07:00
Daniel Han
2249a5ff88 Qwen3 bug fixes 2025-05-02 07:18:33 -07:00
Daniel Han
8fbd80231c Update mapper.py 2025-05-02 06:26:43 -07:00
Daniel Han
d74e7c19b3 Versioning 2025-05-02 05:05:09 -07:00
Daniel Han
439cae9fae Remove hf_xet warning 2025-05-02 04:23:53 -07:00
Daniel Han
f90224b02f Update __init__.py 2025-05-02 03:43:44 -07:00
Daniel Han
2a231c7ba4 Merge branch 'main' of https://github.com/unslothai/unsloth 2025-05-02 03:09:52 -07:00
Daniel Han
eb495f171f Qwen 3 2025-05-02 03:09:44 -07:00
cblomert
629a3fcfe3 Added k_norm & q_norm to merged Qwen3 layers (#2452) 2025-05-02 03:07:37 -07:00
Michael Han
97a63f809f Update README.md
Qwen3 notebook
2025-05-01 22:52:42 -07:00
Daniel Han
6e2b5a767f Nightly (#2448)
* move float32

* Ensure trust_remote_code propegates down to unsloth_compile_transformers (#2075)

* Update _utils.py

* Show both `peft_error` and `autoconfig_error`, not just `autoconfig_error` (#2080)

When loading a PEFT model fails, only the `autoconfig_error` is shown. Instead of the `peft_error`, which is what really matters when we're trying to load a PEFT adapter, the user will see something like this:

```
RuntimeError: Unrecognized model in my_model. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, ...
```

This PR just changes it so `autoconfig_error` and `peft_error` are both displayed.

* fix error message (#2046)

* Update vision.py

* Update _utils.py

* Update pyproject.toml

* Update __init__.py

* Update __init__.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Remove double generate patch

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update mapper.py

* Update vision.py

* fix: config.torch_dtype in LlamaModel_fast_forward_inference (#2091)

* fix: config.torch_dtype in LlamaModel_fast_forward_inference

* Update llama.py

* update for consistency

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* model_type_arch

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* check

* Update _utils.py

* Update loader.py

* Update loader.py

* Remove prints

* Update README.md

typo

* Update _utils.py

* Update _utils.py

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update vision.py

* HF Transfer

* fix(utils): add missing importlib import to fix NameError (#2134)

This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py
without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled.
By adding the missing import statement, the code will no longer throw a NameError.

* Add QLoRA Train and Merge16bit Test (#2130)

* add reference and unsloth lora merging tests

* add test / dataset printing to test scripts

* allow running tests from repo root

* add qlora test readme

* more readme edits

* ruff formatting

* additional readme comments

* forgot to add actual tests

* add apache license

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Revert

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Bug fix

* Update mapper.py

* check SDPA for Mistral 3, Pixtral

* Update vision.py

* Versioning

* Update rl_replacements.py

* Update README.md

* add model registry

* move hf hub utils to unsloth/utils

* refactor global model info dicts to dataclasses

* fix dataclass init

* fix llama registration

* remove deprecated key function

* start registry reog

* add llama vision

* quant types -> Enum

* remap literal quant types to QuantType Enum

* add llama model registration

* fix quant tag mapping

* add qwen2.5 models to registry

* add option to include original model in registry

* handle quant types per model size

* separate registration of base and instruct llama3.2

* add QwenQVQ to registry

* add gemma3 to registry

* add phi

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

---------

Co-authored-by: Xander Hawthorne <167850078+CuppaXanax@users.noreply.github.com>
Co-authored-by: Isaac Breen <isaac.breen@icloud.com>
Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com>
Co-authored-by: lurf21 <93976703+lurf21@users.noreply.github.com>
Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com>
Co-authored-by: naliazheli <nalia0316@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2025-05-01 07:42:08 -07:00
Daniel Han
88988a8eae Update synthetic.py 2025-05-01 07:41:45 -07:00
Daniel Han
ea30448540 Update synthetic.py 2025-05-01 07:26:51 -07:00
Daniel Han
8590f903e0 Update pyproject.toml 2025-05-01 07:18:29 -07:00
Daniel Han
b6efc4f985 Update _utils.py 2025-05-01 07:17:36 -07:00
Daniel Han
9b8556a100 Update synthetic.py 2025-05-01 06:56:02 -07:00
Daniel Han
8e17139f21 Update synthetic.py 2025-05-01 06:55:43 -07:00
Daniel Han
3ff18046d4 Update synthetic.py 2025-05-01 06:49:46 -07:00
Daniel Han
9aff268a0c Update synthetic.py 2025-05-01 06:46:04 -07:00
Daniel Han
be85c14fec Update synthetic.py 2025-05-01 06:42:49 -07:00
Daniel Han
24f6940bf2 Update synthetic.py 2025-05-01 06:38:16 -07:00
Daniel Han
c7385aa85d Update synthetic.py 2025-05-01 06:38:08 -07:00
Daniel Han
94c90d35be Update synthetic.py 2025-05-01 06:36:16 -07:00
Daniel Han
f2fb23e532 Update synthetic.py 2025-05-01 06:35:28 -07:00
Daniel Han
a4d8dc31c4 Update synthetic.py 2025-05-01 06:32:20 -07:00
Daniel Han
c7d953c452 Update synthetic.py 2025-05-01 06:26:09 -07:00
Daniel Han
09470acdb8 Update synthetic.py 2025-05-01 06:23:54 -07:00
Daniel Han
72ca86306a Update synthetic.py 2025-05-01 06:20:47 -07:00
Daniel Han
8853a0fee4 Update synthetic.py 2025-05-01 06:20:02 -07:00
Daniel Han
d93fe5e656 Update synthetic.py 2025-05-01 06:19:25 -07:00
Daniel Han
706d14ea51 Update synthetic.py 2025-05-01 06:16:58 -07:00
Daniel Han
9b31af17b4 Update synthetic.py 2025-05-01 06:16:35 -07:00
Daniel Han
080fd1b4da Merge branch 'main' into nightly 2025-05-01 05:54:31 -07:00
Daniel Han
382961c9d3 Update mapper.py 2025-05-01 03:07:56 -07:00
Daniel Han
9a930bb095 Qwen 3, Bug Fixes (#2445)
* bug fix #2008 (#2039)

* fix (#2051)

* Update loader.py

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* more prints

* Update loader.py

* LoRA 16bit fix

* Update vision.py

* Update vision.py

* Update _utils.py

* Update vision.py

* move forced float32

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* move print

* Update _utils.py

* disable bfloat16

* Fix forced float32

* move float32

* Ensure trust_remote_code propegates down to unsloth_compile_transformers (#2075)

* Update _utils.py

* Show both `peft_error` and `autoconfig_error`, not just `autoconfig_error` (#2080)

When loading a PEFT model fails, only the `autoconfig_error` is shown. Instead of the `peft_error`, which is what really matters when we're trying to load a PEFT adapter, the user will see something like this:

```
RuntimeError: Unrecognized model in my_model. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, ...
```

This PR just changes it so `autoconfig_error` and `peft_error` are both displayed.

* fix error message (#2046)

* Update vision.py

* Update _utils.py

* Update pyproject.toml

* Update __init__.py

* Update __init__.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Remove double generate patch

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update mapper.py

* Update vision.py

* fix: config.torch_dtype in LlamaModel_fast_forward_inference (#2091)

* fix: config.torch_dtype in LlamaModel_fast_forward_inference

* Update llama.py

* update for consistency

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* model_type_arch

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* check

* Update _utils.py

* Update loader.py

* Update loader.py

* Remove prints

* Update README.md

typo

* Update _utils.py

* Update _utils.py

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update vision.py

* HF Transfer

* fix(utils): add missing importlib import to fix NameError (#2134)

This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py
without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled.
By adding the missing import statement, the code will no longer throw a NameError.

* Add QLoRA Train and Merge16bit Test (#2130)

* add reference and unsloth lora merging tests

* add test / dataset printing to test scripts

* allow running tests from repo root

* add qlora test readme

* more readme edits

* ruff formatting

* additional readme comments

* forgot to add actual tests

* add apache license

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Revert

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Bug fix

* Update mapper.py

* check SDPA for Mistral 3, Pixtral

* Update vision.py

* Versioning

* Update rl_replacements.py

* Update README.md

* add model registry

* move hf hub utils to unsloth/utils

* refactor global model info dicts to dataclasses

* fix dataclass init

* fix llama registration

* remove deprecated key function

* start registry reog

* add llama vision

* quant types -> Enum

* remap literal quant types to QuantType Enum

* add llama model registration

* fix quant tag mapping

* add qwen2.5 models to registry

* add option to include original model in registry

* handle quant types per model size

* separate registration of base and instruct llama3.2

* add QwenQVQ to registry

* add gemma3 to registry

* add phi

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

---------

Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com>
Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com>
Co-authored-by: Xander Hawthorne <167850078+CuppaXanax@users.noreply.github.com>
Co-authored-by: Isaac Breen <isaac.breen@icloud.com>
Co-authored-by: lurf21 <93976703+lurf21@users.noreply.github.com>
Co-authored-by: Jack Shi Wei Lun <87535974+jackswl@users.noreply.github.com>
Co-authored-by: naliazheli <nalia0316@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2025-04-30 22:38:39 -07:00
Daniel Han
962e788eb1 Delete .gitignore 2025-04-30 22:36:15 -07:00
Daniel Han
64cf5d8e64 Update pyproject.toml 2025-04-30 22:35:22 -07:00
Daniel Han
599a15ea8d Update pyproject.toml 2025-04-30 22:34:07 -07:00
Daniel Han
86a4506309 Update synthetic.py 2025-04-30 12:58:58 -07:00
Daniel Han
676febe4f0 Update synthetic.py 2025-04-30 11:17:00 -07:00
Daniel Han
62d3a9aee2 Update synthetic.py 2025-04-30 10:49:13 -07:00
Daniel Han
a075028f00 Update synthetic.py 2025-04-30 10:22:31 -07:00
Daniel Han
e27063fd5c Update synthetic.py 2025-04-30 10:22:01 -07:00
Daniel Han
71095ce435 Update synthetic.py 2025-04-30 10:08:50 -07:00
Daniel Han
f7bd9718bb Update synthetic.py 2025-04-30 10:07:49 -07:00
Daniel Han
4c20d2a499 Update synthetic.py 2025-04-30 10:07:38 -07:00
Daniel Han
e0ba82eebb Update synthetic.py 2025-04-30 09:59:54 -07:00
Daniel Han
40b623a61d Update synthetic.py 2025-04-30 09:49:27 -07:00
Daniel Han
e08a4ceeb1 Update synthetic.py 2025-04-30 09:48:53 -07:00
Daniel Han
b60af64236 Update synthetic.py 2025-04-30 09:40:13 -07:00
Daniel Han
72b84b8e73 Update synthetic.py 2025-04-30 09:32:57 -07:00
Daniel Han
bc7ac80890 Update synthetic.py 2025-04-30 09:30:33 -07:00
Daniel Han
237075e18e Update synthetic.py 2025-04-30 09:25:46 -07:00
Daniel Han
1236aa851f Update synthetic.py 2025-04-30 09:25:27 -07:00
Daniel Han
b91e804a75 Update synthetic.py 2025-04-30 09:24:31 -07:00
Daniel Han
44cf9b6d37 Update synthetic.py 2025-04-30 09:22:40 -07:00
Daniel Han
6d3f8871d1 Update synthetic.py 2025-04-30 09:21:05 -07:00
Daniel Han
621a1316d6 Update synthetic.py 2025-04-30 09:19:23 -07:00
Daniel Han
c6d8158389 Update synthetic.py 2025-04-30 08:57:23 -07:00
Daniel Han
6d2bb0eda2 Update synthetic.py 2025-04-30 08:56:01 -07:00
Daniel Han
390d55ee8e Update synthetic.py 2025-04-30 08:52:26 -07:00
Daniel Han
69323c5498 Update synthetic.py 2025-04-30 08:50:27 -07:00
Daniel Han
7e70c81342 Update synthetic.py 2025-04-30 08:49:04 -07:00
Daniel Han
45a173217c Update synthetic.py 2025-04-30 08:47:23 -07:00
Daniel Han
ca3db2980d Update synthetic.py 2025-04-30 08:45:04 -07:00
Daniel Han
6c122be95e Update loader.py 2025-04-30 08:44:15 -07:00
Daniel Han
ec9892f636 Update synthetic.py 2025-04-30 08:39:49 -07:00
Daniel Han
ed9709bdcf Xet and Synthetic 2025-04-30 08:37:56 -07:00
Daniel Han
fd07824f0f Update mapper.py 2025-04-30 07:35:02 -07:00
Daniel Han
bf19381ea7 Merge branch 'main' into nightly 2025-04-30 07:31:43 -07:00
Daniel Han
43d483122f Merge branch 'main' of https://github.com/unslothai/unsloth 2025-04-30 07:31:34 -07:00
Daniel Han
46a0d8c7e5 Synthetic data 2025-04-30 07:31:27 -07:00
Michael Han
f4283a800e Merge pull request #2439 from Etherll/patch-1
Update mapper.py to add Qwen3 base
2025-04-30 06:18:46 -07:00
Daniel Han
6bde1af86e Update synthetic.py 2025-04-30 05:21:36 -07:00
Etherll
d4ef475c33 Update mapper.py 2025-04-30 14:39:23 +03:00
Daniel Han
3dee30e5cd Update synthetic.py 2025-04-30 00:16:37 -07:00
Daniel Han
58111d52b0 Update synthetic.py 2025-04-30 00:10:39 -07:00
Daniel Han
8ad73ecd51 Update synthetic.py 2025-04-30 00:09:00 -07:00
Daniel Han
771f5502c0 Update synthetic.py 2025-04-30 00:07:45 -07:00
Daniel Han
d5759b7e51 Update synthetic.py 2025-04-30 00:05:28 -07:00
Daniel Han
b0c256571e Update synthetic.py 2025-04-30 00:03:54 -07:00
Daniel Han
85944576e3 Update synthetic.py 2025-04-30 00:02:52 -07:00
Daniel Han
71daff0aff Update synthetic.py 2025-04-30 00:00:09 -07:00
Daniel Han
7930d38c72 Update synthetic.py 2025-04-29 23:57:46 -07:00
Daniel Han
163f7a1c6d Update synthetic.py 2025-04-29 23:53:13 -07:00
Daniel Han
f331d5e537 Merge branch 'main' into nightly 2025-04-29 23:48:43 -07:00
Daniel Han
a7eb02790e Update synthetic.py 2025-04-29 23:48:33 -07:00
Daniel Han
f03961da09 Update synthetic.py 2025-04-29 23:47:09 -07:00
Michael Han
af337af35c Merge pull request #2436 from Datta0/qwen3_support
Qwen3 inference fixes
2025-04-29 20:18:03 -07:00
Dattu Sharma
5640401435 Qwen3 inference fixes 2025-04-30 03:03:38 +00:00
Daniel Han
ecfd56aabe Update _utils.py 2025-04-29 11:43:24 -07:00
Daniel Han
7b97cf2304 Update synthetic.py 2025-04-29 11:43:13 -07:00
Daniel Han
7e6dbd9ccd Create __init__.py 2025-04-29 11:18:03 -07:00
Daniel Han
8babbeaded Create synthetic.py 2025-04-29 11:17:10 -07:00
Daniel Han
9b5446d5a2 Versioning 2025-04-29 09:50:51 -07:00
Daniel Han
d3f419f6ac Update mapper.py 2025-04-29 00:50:46 -07:00
Michael Han
9945bcb629 Merge pull request #2427 from Datta0/qwen3_support
Fixup qwen3 qk norm
2025-04-28 23:45:21 -07:00
Dattu Sharma
8cb2400e45 fixup qwen3 qk norm
Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>
2025-04-29 06:22:07 +00:00
Michael Han
f637694085 Merge pull request #2423 from Datta0/qwen3_support
Fixup qwen3
2025-04-28 19:32:45 -07:00
Dattu Sharma
26cf059574 fixup qwen3
Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>
2025-04-29 02:31:55 +00:00
Michael Han
53e6fba362 Update README.md 2025-04-28 19:08:12 -07:00
Michael Han
e4ab75296d Merge pull request #2211 from Datta0/qwen3_support
[WIP] Initial support for Qwen3. Will udpate when the model is released
2025-04-28 19:05:40 -07:00
Daniel Han
9d9447b7e1 Merge branch 'main' into nightly 2025-04-26 03:56:23 -07:00
Daniel Han
acea2c09bb Update pyproject.toml 2025-04-26 03:45:29 -07:00
Daniel Han
de64444965 Versioning 2025-04-26 03:44:21 -07:00
Daniel Han
6366a4764b Merge branch 'main' into nightly 2025-04-19 19:46:17 -07:00
Michael Han
31a37cb2ce Merge pull request #2381 from Erland366/fix/saving_vlm_4bit
Fix saving 4bit for VLM
2025-04-19 14:26:11 -07:00
Erland366
ed16a50bf9 feat: Add validation for 4bit save method and implement corresponding error handling 2025-04-19 20:36:30 +00:00
Michael Han
f1128eea1b Merge pull request #2375 from unslothai/revert-2358-patch-1
Revert "fix: improved error handling when llama.cpp build fails"
2025-04-17 20:33:40 -07:00
Michael Han
6b261264a2 Revert "fix: improved error handling when llama.cpp build fails" 2025-04-17 20:33:25 -07:00
Michael Han
7a5b81db74 Merge pull request #2358 from Hansehart/patch-1
fix: improved error handling when llama.cpp build fails
2025-04-17 14:03:45 -07:00
Richi
4b3ae022a9 fix: improved error handling when llama.cpp build fails 2025-04-16 09:17:25 +02:00
Etherll
9fc1e3b945 feat: Support custom auto_model for wider model compatibility (Whisper, Bert,etc) & attn_implementation support (#2263)
* Update loader.py

* Update vision.py

* Update vision.py

fix attn_implementation

* Refactor: Improve parameter handling and checks in loader/vision
2025-04-14 14:10:05 -07:00
Daniel Han
0e17bfb282 Merge branch 'main' into nightly 2025-04-09 23:39:14 -07:00
Michael Han
a41cda7dfb Update question.md 2025-04-09 15:31:56 -07:00
Michael Han
76f24eb8a2 Update feature_request.md 2025-04-09 15:30:56 -07:00
Michael Han
ea8b427fc2 Update documentation.md 2025-04-09 15:29:20 -07:00
Michael Han
3b10fbde0a Update bug_report.md 2025-04-09 15:24:35 -07:00
Michael Han
92c87a4e97 Update question.md 2025-04-09 15:20:44 -07:00
Michael Han
9c0663af1a Update feature_request.md 2025-04-09 15:20:06 -07:00
Michael Han
61386a6a1a Update documentation.md 2025-04-09 15:19:40 -07:00
Michael Han
0b5b98d688 Merge pull request #2323 from unslothai/shimmyshimmer-patch-2
Update bug_report.md
2025-04-09 15:19:21 -07:00
Michael Han
f96ff98cc5 Update bug_report.md 2025-04-09 15:17:33 -07:00
Daniel Han
a40b3c5578 Llama4 2025-04-06 01:43:42 -07:00
Michael Han
29b25e36eb Update README.md 2025-04-05 14:56:01 -07:00
Daniel Han
ea14a66e21 Update _auto_install.py 2025-04-05 14:30:10 -07:00
Michael Han
56f7c1edac Merge pull request #2119 from jackswl/patch-1
Update README.md
2025-04-02 21:15:45 -07:00
Michael Han
ad9a0e7672 Merge pull request #2267 from Kimizhao/main
Update README.md
2025-04-02 02:07:35 -07:00
zhaozh
c107f46b5e Update README.md
Gemma3 HF uploaded GGUFs, 4-bit models link.
2025-04-02 16:10:21 +08:00
datta0
00589310df add comments and use modified function 2025-04-02 06:49:06 +00:00
Michael Han
fd20192aef Merge pull request #2255 from jeromeku/registry-refactor
Registry refactor
2025-04-01 23:16:27 -07:00
Daniel Han
5ae9c4359b Merge branch 'main' into nightly 2025-04-01 14:15:13 -07:00
jeromeku
9a14edcd2f more registry readme updates 2025-03-31 18:34:18 -07:00
jeromeku
b33970525c clear registry when executing individual model registration file 2025-03-31 18:24:15 -07:00
jeromeku
8d393f29c1 make llama registration more specific 2025-03-31 18:12:52 -07:00
jeromeku
e2cfec6339 add registry readme 2025-03-31 18:11:58 -07:00
jeromeku
ecf70d6caa add quant type test 2025-03-31 17:58:44 -07:00
jeromeku
bb66d454e2 remove deprecated registration api 2025-03-31 17:36:42 -07:00
jeromeku
959727a2d2 add model search method 2025-03-31 17:36:19 -07:00
jeromeku
d93120db9d refactor model registration tests for new registry apis 2025-03-31 17:22:26 -07:00
jeromeku
2ff490e23b add global register models 2025-03-31 17:11:35 -07:00
jeromeku
65ea6356e4 refactor naming for mistral and phi 2025-03-31 17:08:11 -07:00
jeromeku
9a276978d2 rename deepseek registration methods 2025-03-31 17:03:05 -07:00
jeromeku
16f644e95d rename model registration methods 2025-03-31 17:01:51 -07:00
jeromeku
5c402c9e82 add mistral small to registry 2025-03-31 15:31:01 -07:00
jeromeku
025e22b666 remove redundant code when constructing model names 2025-03-31 15:06:08 -07:00
Michael Han
a8517a3009 Merge pull request #2250 from unslothai/jeromeku-patch-1
Fix feature_request ISSUE_TEMPLATE
2025-03-31 12:51:21 -07:00
jeromeku
79299596cd Fix feature_request ISSUE_TEMPLATE 2025-03-31 12:28:44 -07:00
jeromeku
7157f3c47c add deepseek distill models 2025-03-31 12:04:57 -07:00
jeromeku
7dec39e3b6 add deepseek distill llama 2025-03-31 11:47:51 -07:00
jeromeku
767044e7f2 add deepseek r1 zero 2025-03-31 11:32:21 -07:00
jeromeku
9f8f78c90b add deepseek r1 base 2025-03-31 11:30:47 -07:00
jeromeku
e2ff538fc5 add deepseek v3 2025-03-31 11:23:22 -07:00
jeromeku
a46811c471 add phi 2025-03-31 10:22:50 -07:00
jeromeku
756af9f35f add gemma3 to registry 2025-03-31 10:10:20 -07:00
jeromeku
2222e5ad58 add QwenQVQ to registry 2025-03-31 09:45:15 -07:00
jeromeku
76a2b62766 separate registration of base and instruct llama3.2 2025-03-31 09:35:11 -07:00
jeromeku
0f0aa0c476 handle quant types per model size 2025-03-31 09:27:43 -07:00
jeromeku
671dd3dc14 add option to include original model in registry 2025-03-31 09:09:34 -07:00
jeromeku
0395604928 add qwen2.5 models to registry 2025-03-31 08:45:53 -07:00
Michael Han
79c35a51ba Merge pull request #2242 from jeromeku/issues-templates
Issues templates
2025-03-30 22:04:47 -07:00
jeromeku
95da062046 fix quant tag mapping 2025-03-30 16:14:33 -07:00
jeromeku
7f0059c881 make wording more user-friendly 2025-03-30 15:52:31 -07:00
jeromeku
ee953ef710 improve wording 2025-03-30 15:47:33 -07:00
jeromeku
39926f1730 more edits 2025-03-30 15:46:25 -07:00
jeromeku
bbc5981054 fix typos, better wording 2025-03-30 15:39:20 -07:00
jeromeku
7b113bbe99 make templates more concise 2025-03-30 15:36:45 -07:00
jeromeku
e204fdbd0e more template edits 2025-03-30 15:34:01 -07:00
jeromeku
e00e3dd3a2 generalize documentation template 2025-03-30 15:21:47 -07:00
jeromeku
c3d517406d fix question template 2025-03-30 15:19:58 -07:00
jeromeku
7d93ca559f clean up bug template 2025-03-30 15:19:00 -07:00
jeromeku
d620b54909 fix template labels 2025-03-30 15:11:00 -07:00
jeromeku
c2ea782cd0 add question template 2025-03-30 15:08:38 -07:00
jeromeku
aeffab10f8 Update custom.md 2025-03-30 15:06:42 -07:00
jeromeku
8a587f232e Update issue templates 2025-03-30 15:06:42 -07:00
jeromeku
dcbc2fa776 Update issue templates 2025-03-30 15:06:41 -07:00
jeromeku
dac21f8bdf Update and rename custom.md to documentation_request.md 2025-03-30 15:06:41 -07:00
jeromeku
4e62b2180e Update issue templates 2025-03-30 15:06:41 -07:00
jeromeku
0130265ca8 add llama model registration 2025-03-30 15:05:33 -07:00
jeromeku
6abdb1fef6 remap literal quant types to QuantType Enum 2025-03-30 14:39:57 -07:00
jeromeku
6b4bf12873 quant types -> Enum 2025-03-30 14:37:30 -07:00
jeromeku
3d1249a551 add llama vision 2025-03-30 11:44:52 -07:00
jeromeku
ab7c51b4a5 start registry reog 2025-03-30 11:36:48 -07:00
jeromeku
35e3b48c2b remove deprecated key function 2025-03-30 11:06:59 -07:00
jeromeku
85209602f3 fix llama registration 2025-03-30 11:06:11 -07:00
jeromeku
410c4b4c76 fix dataclass init 2025-03-30 10:58:51 -07:00
jeromeku
4b3df3d214 refactor global model info dicts to dataclasses 2025-03-30 10:43:00 -07:00
jeromeku
f21d61ae5d move hf hub utils to unsloth/utils 2025-03-28 16:54:38 -07:00
jeromeku
6f6a7a5e9b add model registry 2025-03-28 16:49:12 -07:00
datta0
406eb8cc71 Enable qwen3 and qwen3moe 2025-03-28 05:58:01 +00:00
datta0
71424b35ad Add Qwen3Moe and necessitate transformers version 2025-03-28 05:50:43 +00:00
datta0
73c42e5ded Initial support for Qwen3. Will udpate when the model is released 2025-03-27 14:35:57 +00:00
Michael Han
0b8e01ddb9 Update README.md 2025-03-27 00:26:18 -07:00
Jack Shi Wei Lun
949ac8eb6f Update README.md 2025-03-26 21:20:16 +08:00
Daniel Han
7afd6afe2c Merge branch 'main' into nightly 2025-03-26 05:21:24 -07:00
Daniel Han
2dc3930a53 Bug Fixes (#2197)
* Update loader.py

* model names

* Gemma 3 chat template

* Bug fixes

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update rl.py

* Update chat_templates.py

* Update chat_templates.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Revert

* Update _utils.py

* forced precision

* Autocast

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* vLLM fixes

* constexpr

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update save.py

* New models

* Triton windows update (#1976)

* Update pyproject.toml

* Update README.md

* Update RMS LayerNorm implementation, and list compr. change in chat templates (#1974)

* Update RMS LayerNorm implementation with optimizations and testing suite

* perf: optimize list comprehension in get_ollama_eos_tokens

* Update Zoo

* Update llama.py

* Update llama.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* grpo fix

* Update rl_replacements.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update save.py

* Update save.py

* Update save.py

* Update rl.py

* Update _utils.py

* Version

* Update pyproject.toml

* Update llama.py

* Update llama.py

* bug fix #2008 (#2039)

* fix (#2051)

* Update loader.py

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* more prints

* Update loader.py

* LoRA 16bit fix

* Update vision.py

* Update vision.py

* Update _utils.py

* Update vision.py

* move forced float32

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* move print

* Update _utils.py

* disable bfloat16

* Fix forced float32

* move float32

* Ensure trust_remote_code propegates down to unsloth_compile_transformers (#2075)

* Update _utils.py

* Show both `peft_error` and `autoconfig_error`, not just `autoconfig_error` (#2080)

When loading a PEFT model fails, only the `autoconfig_error` is shown. Instead of the `peft_error`, which is what really matters when we're trying to load a PEFT adapter, the user will see something like this:

```
RuntimeError: Unrecognized model in my_model. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, ...
```

This PR just changes it so `autoconfig_error` and `peft_error` are both displayed.

* fix error message (#2046)

* Update vision.py

* Update _utils.py

* Update pyproject.toml

* Update __init__.py

* Update __init__.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Remove double generate patch

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update mapper.py

* Update vision.py

* fix: config.torch_dtype in LlamaModel_fast_forward_inference (#2091)

* fix: config.torch_dtype in LlamaModel_fast_forward_inference

* Update llama.py

* update for consistency

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* model_type_arch

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* check

* Update _utils.py

* Update loader.py

* Update loader.py

* Remove prints

* Update _utils.py

* Update _utils.py

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update vision.py

* HF Transfer

* fix(utils): add missing importlib import to fix NameError (#2134)

This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py
without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled.
By adding the missing import statement, the code will no longer throw a NameError.

* Add QLoRA Train and Merge16bit Test (#2130)

* add reference and unsloth lora merging tests

* add test / dataset printing to test scripts

* allow running tests from repo root

* add qlora test readme

* more readme edits

* ruff formatting

* additional readme comments

* forgot to add actual tests

* add apache license

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Revert

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Bug fix

* Update mapper.py

* check SDPA for Mistral 3, Pixtral

* Update vision.py

* Versioning

* Update rl_replacements.py

---------

Co-authored-by: Akshay Behl <126911424+Captain-T2004@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com>
Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com>
Co-authored-by: Xander Hawthorne <167850078+CuppaXanax@users.noreply.github.com>
Co-authored-by: Isaac Breen <isaac.breen@icloud.com>
Co-authored-by: lurf21 <93976703+lurf21@users.noreply.github.com>
Co-authored-by: naliazheli <nalia0316@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
2025-03-26 05:19:48 -07:00
Daniel Han
9bbba3a511 Update rl_replacements.py 2025-03-26 05:18:18 -07:00
Daniel Han
8e6dfed0ec Versioning 2025-03-26 04:29:44 -07:00
Daniel Han
c6588c73b9 Update vision.py 2025-03-26 04:13:43 -07:00
Daniel Han
599bfeb38c check SDPA for Mistral 3, Pixtral 2025-03-26 04:11:27 -07:00
Daniel Han
98d946acad Update mapper.py 2025-03-26 03:54:57 -07:00
Daniel Han
f197aa2b0a Bug fix 2025-03-26 03:50:46 -07:00
Daniel Han
adf8fb9b80 Update vision.py 2025-03-26 03:13:48 -07:00
Daniel Han
b9914974cd Update vision.py 2025-03-25 23:46:35 -07:00
Daniel Han
fc9f708e8d Update vision.py 2025-03-25 23:31:33 -07:00
Daniel Han
59d04bb523 Update vision.py 2025-03-25 23:20:15 -07:00
Daniel Han
416cfd5cab Update vision.py 2025-03-25 23:20:00 -07:00
Daniel Han
5ade8353d2 Revert 2025-03-25 23:17:19 -07:00
Daniel Han
1a67227923 Update loader.py 2025-03-25 23:15:14 -07:00
Daniel Han
8080c4edb0 Update loader.py 2025-03-25 23:13:26 -07:00
Daniel Han
ebccb588b7 Update vision.py 2025-03-25 23:07:35 -07:00
Daniel Han
8e6d90bec7 Update vision.py 2025-03-25 23:04:11 -07:00
Daniel Han
81984ed2a2 Update vision.py 2025-03-25 22:50:08 -07:00
Daniel Han
92c612b100 Update vision.py 2025-03-25 22:26:24 -07:00
Daniel Han
d2b5c807cc Merge branch 'main' into nightly 2025-03-21 18:02:09 -07:00
Daniel Han
b126e8947d Update pyproject.toml 2025-03-21 18:02:05 -07:00
Daniel Han
6a50448564 Merge branch 'main' into nightly 2025-03-21 17:55:39 -07:00
Daniel Han
c466303956 Fix Transformers 4.45 (#2151)
* Update pyproject.toml

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Batch samples

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update mapper.py

* Update vision.py

* Temporary patches

* Update loader.py

* model names

* Gemma 3 chat template

* Bug fixes

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update rl.py

* Update chat_templates.py

* Update chat_templates.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Revert

* Update _utils.py

* forced precision

* Autocast

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* vLLM fixes

* constexpr

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update save.py

* New models

* Triton windows update (#1976)

* Update pyproject.toml

* Update README.md

* Update RMS LayerNorm implementation, and list compr. change in chat templates (#1974)

* Update RMS LayerNorm implementation with optimizations and testing suite

* perf: optimize list comprehension in get_ollama_eos_tokens

* Update Zoo

* Update llama.py

* Update llama.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* grpo fix

* Update rl_replacements.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update save.py

* Update save.py

* Update save.py

* Update rl.py

* Update _utils.py

* Version

* Update pyproject.toml

* Update llama.py

* Update llama.py

* bug fix #2008 (#2039)

* fix (#2051)

* Update loader.py

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* more prints

* Update loader.py

* LoRA 16bit fix

* Update vision.py

* Update vision.py

* Update _utils.py

* Update vision.py

* move forced float32

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* move print

* Update _utils.py

* disable bfloat16

* Fix forced float32

* move float32

* Ensure trust_remote_code propegates down to unsloth_compile_transformers (#2075)

* Update _utils.py

* Show both `peft_error` and `autoconfig_error`, not just `autoconfig_error` (#2080)

When loading a PEFT model fails, only the `autoconfig_error` is shown. Instead of the `peft_error`, which is what really matters when we're trying to load a PEFT adapter, the user will see something like this:

```
RuntimeError: Unrecognized model in my_model. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, ...
```

This PR just changes it so `autoconfig_error` and `peft_error` are both displayed.

* fix error message (#2046)

* Update vision.py

* Update _utils.py

* Update pyproject.toml

* Update __init__.py

* Update __init__.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Remove double generate patch

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update mapper.py

* Update vision.py

* fix: config.torch_dtype in LlamaModel_fast_forward_inference (#2091)

* fix: config.torch_dtype in LlamaModel_fast_forward_inference

* Update llama.py

* update for consistency

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* model_type_arch

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* check

* Update _utils.py

* Update loader.py

* Update loader.py

* Remove prints

* Update _utils.py

* Update _utils.py

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update vision.py

* HF Transfer

* fix(utils): add missing importlib import to fix NameError (#2134)

This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py
without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled.
By adding the missing import statement, the code will no longer throw a NameError.

* Add QLoRA Train and Merge16bit Test (#2130)

* add reference and unsloth lora merging tests

* add test / dataset printing to test scripts

* allow running tests from repo root

* add qlora test readme

* more readme edits

* ruff formatting

* additional readme comments

* forgot to add actual tests

* add apache license

* Update pyproject.toml

---------

Co-authored-by: Akshay Behl <126911424+Captain-T2004@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com>
Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com>
Co-authored-by: Xander Hawthorne <167850078+CuppaXanax@users.noreply.github.com>
Co-authored-by: Isaac Breen <isaac.breen@icloud.com>
Co-authored-by: lurf21 <93976703+lurf21@users.noreply.github.com>
Co-authored-by: naliazheli <nalia0316@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
2025-03-21 17:55:12 -07:00
Daniel Han
65aca02a25 Update pyproject.toml 2025-03-21 17:54:07 -07:00
jeromeku
4a18c881c5 Add QLoRA Train and Merge16bit Test (#2130)
* add reference and unsloth lora merging tests

* add test / dataset printing to test scripts

* allow running tests from repo root

* add qlora test readme

* more readme edits

* ruff formatting

* additional readme comments

* forgot to add actual tests

* add apache license
2025-03-21 17:53:37 -07:00
naliazheli
472dd5462f fix(utils): add missing importlib import to fix NameError (#2134)
This commit fixes a NameError that occurs when `importlib` is referenced in _utils.py
without being imported, especially when UNSLOTH_USE_MODELSCOPE=1 is enabled.
By adding the missing import statement, the code will no longer throw a NameError.
2025-03-21 17:44:25 -07:00
Daniel Han
db7c23ec5a HF Transfer 2025-03-21 17:41:40 -07:00
Daniel Han
7bf58c2465 Update vision.py 2025-03-21 17:39:51 -07:00
Daniel Han
90a5593bf0 Update llama.py 2025-03-21 17:38:39 -07:00
Daniel Han
8e9446e848 Update llama.py 2025-03-21 17:34:27 -07:00
Daniel Han
7d04000128 Update llama.py 2025-03-21 17:26:20 -07:00
Daniel Han
75e0dee7fc Update llama.py 2025-03-21 17:24:41 -07:00
Daniel Han
f8b1b21d43 Update llama.py 2025-03-21 17:14:52 -07:00
Daniel Han
dbb7bd3a7a Update llama.py 2025-03-21 17:13:14 -07:00
Daniel Han
4ce55e6201 Update llama.py 2025-03-21 17:08:00 -07:00
Daniel Han
eaded6d504 Update llama.py 2025-03-21 17:06:06 -07:00
Daniel Han
f900297fca Update llama.py 2025-03-21 17:05:43 -07:00
Daniel Han
aca67486c3 Update llama.py 2025-03-21 17:01:11 -07:00
Daniel Han
df44d6a4fb Update llama.py 2025-03-21 17:00:35 -07:00
Daniel Han
cc3c81e2ed Update llama.py 2025-03-21 16:58:00 -07:00
Daniel Han
d360225a84 Update llama.py 2025-03-21 16:53:44 -07:00
Daniel Han
8014699bec Update llama.py 2025-03-21 16:50:29 -07:00
Daniel Han
6efb8b1143 Update llama.py 2025-03-21 16:48:43 -07:00
Daniel Han
ce20a1f250 Update llama.py 2025-03-21 16:44:10 -07:00
Daniel Han
e666023a5e Update llama.py 2025-03-21 16:42:32 -07:00
Daniel Han
8c5be02e29 Update llama.py 2025-03-21 15:50:32 -07:00
Daniel Han
db4b5c9715 Update llama.py 2025-03-21 15:50:05 -07:00
Daniel Han
0b847e7c91 Update llama.py 2025-03-21 15:46:47 -07:00
Daniel Han
826e7775d8 Update llama.py 2025-03-21 15:44:38 -07:00
Daniel Han
678bdda2b8 Update llama.py 2025-03-21 15:43:21 -07:00
Daniel Han
254988c404 Update llama.py 2025-03-21 15:39:43 -07:00
Daniel Han
f79b6b4cbe Update llama.py 2025-03-21 15:31:00 -07:00
Daniel Han
f12572ee0d Update llama.py 2025-03-21 15:27:47 -07:00
Daniel Han
e783ba5795 Update _utils.py 2025-03-21 15:25:06 -07:00
Daniel Han
2f6d65c934 Update _utils.py 2025-03-21 15:24:17 -07:00
Daniel Han
4057bfe021 Update _utils.py 2025-03-21 15:20:48 -07:00
Daniel Han
82c5e7a45c versioning 2025-03-21 15:18:11 -07:00
Daniel Han
d7dfe1e9b0 Update _utils.py 2025-03-21 15:17:40 -07:00
Daniel Han
544824eb32 Update _utils.py 2025-03-21 15:17:30 -07:00
Daniel Han
b69472a198 Merge branch 'main' into nightly 2025-03-21 15:17:16 -07:00
Jack Shi Wei Lun
f616b65a17 Update README.md
typo
2025-03-20 13:13:58 +08:00
Daniel Han
eaf27d5b43 Small fix (#2114)
* versioning

* Update _utils.py

* Update llama.py

* Update llama.py

* Bug fixes

* FastModel

* __doc__

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* version

* move use_modelscope to _utils (#1938)

* move use_modelscope to _utils

* Update _utils.py

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Don't use revision when loading model_config and is_peft=True (#1949)

* More syntax warnings (#1944)

* move use_modelscope to _utils

* fix

* Update _utils.py

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Full finetuning and other fixes

* UNSLOTH_ENABLE_FULL_FINETUNING

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* full finetuning

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* max_seq_length

* Update rl.py

* Update rl.py

* Update rl.py

* Update pyproject.toml

* AutoModelForImageTextToText

* Update mapper.py

* Update pyproject.toml

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Batch samples

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update mapper.py

* Update vision.py

* Temporary patches

* Update loader.py

* model names

* Gemma 3 chat template

* Bug fixes

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update rl.py

* Update chat_templates.py

* Update chat_templates.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Revert

* Update _utils.py

* forced precision

* Autocast

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* vLLM fixes

* constexpr

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update save.py

* New models

* Triton windows update (#1976)

* Update pyproject.toml

* Update README.md

* Update RMS LayerNorm implementation, and list compr. change in chat templates (#1974)

* Update RMS LayerNorm implementation with optimizations and testing suite

* perf: optimize list comprehension in get_ollama_eos_tokens

* Update Zoo

* Update llama.py

* Update llama.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* grpo fix

* Update rl_replacements.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update save.py

* Update save.py

* Update save.py

* Update rl.py

* Update _utils.py

* Version

* Update pyproject.toml

* Update llama.py

* Update llama.py

* bug fix #2008 (#2039)

* fix (#2051)

* Update loader.py

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* more prints

* Update loader.py

* LoRA 16bit fix

* Update vision.py

* Update vision.py

* Update _utils.py

* Update vision.py

* move forced float32

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* move print

* Update _utils.py

* disable bfloat16

* Fix forced float32

* move float32

* Ensure trust_remote_code propegates down to unsloth_compile_transformers (#2075)

* Update _utils.py

* Show both `peft_error` and `autoconfig_error`, not just `autoconfig_error` (#2080)

When loading a PEFT model fails, only the `autoconfig_error` is shown. Instead of the `peft_error`, which is what really matters when we're trying to load a PEFT adapter, the user will see something like this:

```
RuntimeError: Unrecognized model in my_model. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, ...
```

This PR just changes it so `autoconfig_error` and `peft_error` are both displayed.

* fix error message (#2046)

* Update vision.py

* Update _utils.py

* Update pyproject.toml

* Update __init__.py

* Update __init__.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Remove double generate patch

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update mapper.py

* Update vision.py

* fix: config.torch_dtype in LlamaModel_fast_forward_inference (#2091)

* fix: config.torch_dtype in LlamaModel_fast_forward_inference

* Update llama.py

* update for consistency

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* model_type_arch

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* check

* Update _utils.py

* Update loader.py

* Update loader.py

* Remove prints

---------

Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com>
Co-authored-by: Wilson Wu <140025193+wiwu2390@users.noreply.github.com>
Co-authored-by: Akshay Behl <126911424+Captain-T2004@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com>
Co-authored-by: Xander Hawthorne <167850078+CuppaXanax@users.noreply.github.com>
Co-authored-by: Isaac Breen <isaac.breen@icloud.com>
Co-authored-by: lurf21 <93976703+lurf21@users.noreply.github.com>
2025-03-19 08:45:52 -07:00
Daniel Han
194508d561 Remove prints 2025-03-19 08:44:22 -07:00
Daniel Han
063cca03c8 Update loader.py 2025-03-19 08:41:40 -07:00
Daniel Han
305c362ba8 Update loader.py 2025-03-19 08:37:53 -07:00
Daniel Han
3424ad1599 Update _utils.py 2025-03-19 08:31:00 -07:00
Daniel Han
1ca384f8ab check 2025-03-19 08:27:36 -07:00
Daniel Han
2e7da38488 Update loader.py 2025-03-19 08:24:31 -07:00
Daniel Han
40d1f36e5c Merge branch 'main' into nightly 2025-03-19 08:24:29 -07:00
Daniel Han
1c5676a83f Bug fixes (#2113)
* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Update _utils.py

* Version

* versioning

* Update _utils.py

* Update llama.py

* Update llama.py

* Bug fixes

* FastModel

* __doc__

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* version

* move use_modelscope to _utils (#1938)

* move use_modelscope to _utils

* Update _utils.py

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Don't use revision when loading model_config and is_peft=True (#1949)

* More syntax warnings (#1944)

* move use_modelscope to _utils

* fix

* Update _utils.py

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Full finetuning and other fixes

* UNSLOTH_ENABLE_FULL_FINETUNING

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* full finetuning

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* max_seq_length

* Update rl.py

* Update rl.py

* Update rl.py

* Update pyproject.toml

* AutoModelForImageTextToText

* Update mapper.py

* Update pyproject.toml

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Batch samples

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update mapper.py

* Update vision.py

* Temporary patches

* Update loader.py

* model names

* Gemma 3 chat template

* Bug fixes

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update rl.py

* Update chat_templates.py

* Update chat_templates.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Revert

* Update _utils.py

* forced precision

* Autocast

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* vLLM fixes

* constexpr

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update save.py

* New models

* Triton windows update (#1976)

* Update pyproject.toml

* Update README.md

* Update RMS LayerNorm implementation, and list compr. change in chat templates (#1974)

* Update RMS LayerNorm implementation with optimizations and testing suite

* perf: optimize list comprehension in get_ollama_eos_tokens

* Update Zoo

* Update llama.py

* Update llama.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* grpo fix

* Update rl_replacements.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update save.py

* Update save.py

* Update save.py

* Update rl.py

* Update _utils.py

* Version

* Update pyproject.toml

* Update llama.py

* Update llama.py

* bug fix #2008 (#2039)

* fix (#2051)

* Update loader.py

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* more prints

* Update loader.py

* LoRA 16bit fix

* Update vision.py

* Update vision.py

* Update _utils.py

* Update vision.py

* move forced float32

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* move print

* Update _utils.py

* disable bfloat16

* Fix forced float32

* move float32

* Ensure trust_remote_code propegates down to unsloth_compile_transformers (#2075)

* Update _utils.py

* Show both `peft_error` and `autoconfig_error`, not just `autoconfig_error` (#2080)

When loading a PEFT model fails, only the `autoconfig_error` is shown. Instead of the `peft_error`, which is what really matters when we're trying to load a PEFT adapter, the user will see something like this:

```
RuntimeError: Unrecognized model in my_model. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, ...
```

This PR just changes it so `autoconfig_error` and `peft_error` are both displayed.

* fix error message (#2046)

* Update vision.py

* Update _utils.py

* Update pyproject.toml

* Update __init__.py

* Update __init__.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Remove double generate patch

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update mapper.py

* Update vision.py

* fix: config.torch_dtype in LlamaModel_fast_forward_inference (#2091)

* fix: config.torch_dtype in LlamaModel_fast_forward_inference

* Update llama.py

* update for consistency

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* model_type_arch

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

---------

Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com>
Co-authored-by: Wilson Wu <140025193+wiwu2390@users.noreply.github.com>
Co-authored-by: Akshay Behl <126911424+Captain-T2004@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com>
Co-authored-by: Xander Hawthorne <167850078+CuppaXanax@users.noreply.github.com>
Co-authored-by: Isaac Breen <isaac.breen@icloud.com>
Co-authored-by: lurf21 <93976703+lurf21@users.noreply.github.com>
2025-03-19 07:12:08 -07:00
Daniel Han
4c89bec1bd Merge branch 'main' into nightly 2025-03-19 05:37:36 -07:00
Daniel Han
c43a785e7c Update vision.py 2025-03-19 04:34:39 -07:00
Daniel Han
db49a4de37 Update vision.py 2025-03-19 04:30:06 -07:00
Michael Han
2d0885f33f Merge pull request #2110 from unslothai/shimmyshimmer-patch-1
Updating new FFT 8bit support
2025-03-19 04:24:32 -07:00
Michael Han
d8fc81f47b Update README.md 2025-03-19 04:23:52 -07:00
Michael Han
2f0de2be1f Update README.md 2025-03-19 04:21:39 -07:00
Daniel Han
3c1b5a09b6 Update vision.py 2025-03-19 03:45:28 -07:00
Daniel Han
bd9c7d353e Update vision.py 2025-03-19 03:27:49 -07:00
Daniel Han
d8bf75417e Update vision.py 2025-03-19 03:20:55 -07:00
Daniel Han
84ae94b124 Update vision.py 2025-03-19 03:08:17 -07:00
Daniel Han
d2ab2860c8 model_type_arch 2025-03-19 03:03:43 -07:00
Daniel Han
f5569ca2b4 Update vision.py 2025-03-19 02:59:40 -07:00
Daniel Han
f2dffa4537 Update vision.py 2025-03-19 02:52:25 -07:00
Daniel Han
a706ec3d17 Update vision.py 2025-03-19 02:50:24 -07:00
Daniel Han
ab1441ccfd Update vision.py 2025-03-19 02:50:08 -07:00
Daniel Han
a6ae11a426 Update vision.py 2025-03-19 02:47:29 -07:00
Daniel Han
52c62b9420 Update vision.py 2025-03-19 02:41:43 -07:00
Daniel Han
69ab65dfc7 Update vision.py 2025-03-19 02:39:50 -07:00
Daniel Han
67b48d9db4 Update vision.py 2025-03-19 02:29:32 -07:00
Daniel Han
a4d9b192b5 Update vision.py 2025-03-19 02:28:09 -07:00
Daniel Han
17a6bc1bd4 Update vision.py 2025-03-19 02:17:17 -07:00
Daniel Han
c417b1f67e Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly 2025-03-19 02:17:05 -07:00
Daniel Han
333aff6e84 versioning 2025-03-19 02:14:54 -07:00
lurf21
00a98f17f5 fix: config.torch_dtype in LlamaModel_fast_forward_inference (#2091)
* fix: config.torch_dtype in LlamaModel_fast_forward_inference

* Update llama.py

* update for consistency

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-03-19 02:06:48 -07:00
Daniel Han
10d27a5179 Update vision.py 2025-03-19 02:04:12 -07:00
Daniel Han
09b1d254f8 Update mapper.py 2025-03-19 01:59:17 -07:00
Daniel Han
1f9e5769f3 Update vision.py 2025-03-19 01:33:43 -07:00
Daniel Han
d14b36e157 Update vision.py 2025-03-19 01:31:26 -07:00
Daniel Han
cdd005bf5a Update vision.py 2025-03-18 23:53:39 -07:00
Daniel Han
8886f3aa31 Update vision.py 2025-03-18 23:42:26 -07:00
Daniel Han
71c405eda5 Update vision.py 2025-03-18 23:37:34 -07:00
Daniel Han
2795865d6f Remove double generate patch 2025-03-18 23:06:20 -07:00
Daniel Han
897aef9899 Update vision.py 2025-03-18 22:46:30 -07:00
Daniel Han
3109603785 Update vision.py 2025-03-18 22:33:18 -07:00
Daniel Han
3622d7e76d Update vision.py 2025-03-18 22:29:55 -07:00
Daniel Han
f5c94ba3a7 Update vision.py 2025-03-18 22:26:58 -07:00
Daniel Han
8831d1d440 Update vision.py 2025-03-18 21:08:07 -07:00
Daniel Han
9660605540 Update vision.py 2025-03-18 05:29:09 -07:00
Daniel Han
1556e64864 Merge branch 'main' into nightly 2025-03-18 05:25:09 -07:00
Daniel Han
49eece4d94 Many bug fixes (#2087)
* _wrap_fast_inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* SFT dataset prepare

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update utils.py

* bug fix

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Update _utils.py

* Version

* versioning

* Update _utils.py

* Update llama.py

* Update llama.py

* Bug fixes

* FastModel

* __doc__

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* version

* move use_modelscope to _utils (#1938)

* move use_modelscope to _utils

* Update _utils.py

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Don't use revision when loading model_config and is_peft=True (#1949)

* More syntax warnings (#1944)

* move use_modelscope to _utils

* fix

* Update _utils.py

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Full finetuning and other fixes

* UNSLOTH_ENABLE_FULL_FINETUNING

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* full finetuning

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* max_seq_length

* Update rl.py

* Update rl.py

* Update rl.py

* Update pyproject.toml

* AutoModelForImageTextToText

* Update mapper.py

* Update pyproject.toml

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Batch samples

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update mapper.py

* Update vision.py

* Temporary patches

* Update loader.py

* model names

* Gemma 3 chat template

* Bug fixes

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update rl.py

* Update chat_templates.py

* Update chat_templates.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Revert

* Update _utils.py

* forced precision

* Autocast

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* vLLM fixes

* constexpr

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update save.py

* New models

* Triton windows update (#1976)

* Update pyproject.toml

* Update README.md

* Update RMS LayerNorm implementation, and list compr. change in chat templates (#1974)

* Update RMS LayerNorm implementation with optimizations and testing suite

* perf: optimize list comprehension in get_ollama_eos_tokens

* Update Zoo

* Update llama.py

* Update llama.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* grpo fix

* Update rl_replacements.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update save.py

* Update save.py

* Update save.py

* Update rl.py

* Update _utils.py

* Version

* Update pyproject.toml

* Update llama.py

* Update llama.py

* bug fix #2008 (#2039)

* fix (#2051)

* Update loader.py

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* more prints

* Update loader.py

* LoRA 16bit fix

* Update vision.py

* Update vision.py

* Update _utils.py

* Update vision.py

* move forced float32

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* move print

* Update _utils.py

* disable bfloat16

* Fix forced float32

* move float32

* Ensure trust_remote_code propegates down to unsloth_compile_transformers (#2075)

* Update _utils.py

* Show both `peft_error` and `autoconfig_error`, not just `autoconfig_error` (#2080)

When loading a PEFT model fails, only the `autoconfig_error` is shown. Instead of the `peft_error`, which is what really matters when we're trying to load a PEFT adapter, the user will see something like this:

```
RuntimeError: Unrecognized model in my_model. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, ...
```

This PR just changes it so `autoconfig_error` and `peft_error` are both displayed.

* fix error message (#2046)

* Update vision.py

* Update _utils.py

* Update pyproject.toml

* Update __init__.py

* Update __init__.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update rl_replacements.py

---------

Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com>
Co-authored-by: Wilson Wu <140025193+wiwu2390@users.noreply.github.com>
Co-authored-by: Akshay Behl <126911424+Captain-T2004@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Mukkesh Ganesh <mukmckenzie@gmail.com>
Co-authored-by: Xander Hawthorne <167850078+CuppaXanax@users.noreply.github.com>
Co-authored-by: Isaac Breen <isaac.breen@icloud.com>
2025-03-18 05:15:57 -07:00
Daniel Han
53d1fac54f Update rl_replacements.py 2025-03-18 05:09:35 -07:00
Daniel Han
51503620cc Update vision.py 2025-03-18 05:01:27 -07:00
Daniel Han
3eb4d76b28 Update rl_replacements.py 2025-03-18 04:59:01 -07:00
Daniel Han
a8f24ea882 Update vision.py 2025-03-18 04:51:06 -07:00
Daniel Han
cc0b7135e3 Update vision.py 2025-03-18 04:50:18 -07:00
Daniel Han
fad1ad9122 Update vision.py 2025-03-18 04:49:35 -07:00
Daniel Han
aca62cfa01 Update vision.py 2025-03-18 03:45:55 -07:00
Daniel Han
141084e7a1 Update vision.py 2025-03-18 02:27:20 -07:00
Daniel Han
16237517de Update rl_replacements.py 2025-03-18 02:26:50 -07:00
Daniel Han
0c91c02196 Update rl_replacements.py 2025-03-18 02:23:03 -07:00
Daniel Han
a3f76fb4ad Update rl_replacements.py 2025-03-18 02:05:49 -07:00
Daniel Han
9a60fdce08 Update rl_replacements.py 2025-03-18 02:05:30 -07:00
Daniel Han
f906b6de13 Update vision.py 2025-03-18 01:59:39 -07:00
Daniel Han
043c9f12b8 Update vision.py 2025-03-18 01:53:52 -07:00
Daniel Han
28f80c8957 Update vision.py 2025-03-18 01:44:50 -07:00
Daniel Han
1f12ce24c8 Update vision.py 2025-03-18 01:43:34 -07:00
Daniel Han
1ce93de34b Update vision.py 2025-03-18 01:42:28 -07:00
Daniel Han
f095c5d51d Update vision.py 2025-03-18 01:33:38 -07:00
Daniel Han
f9dee6fd1f Update vision.py 2025-03-18 01:33:29 -07:00
Daniel Han
e1470efb3d Update vision.py 2025-03-18 01:28:49 -07:00
Daniel Han
6b7f14f5b1 Update vision.py 2025-03-18 01:27:39 -07:00
Daniel Han
4f78da7e93 Update __init__.py 2025-03-18 01:10:14 -07:00
Daniel Han
91bfdca738 Update __init__.py 2025-03-18 01:10:04 -07:00
Daniel Han
3cf27c1039 Update pyproject.toml 2025-03-18 01:09:51 -07:00
Daniel Han
098692bd96 Update _utils.py 2025-03-18 01:05:58 -07:00
Daniel Han
28b96e7e77 Update vision.py 2025-03-18 01:01:46 -07:00
Kareem
5d87090e4d fix error message (#2046) 2025-03-17 21:46:20 -07:00
Isaac Breen
7eba1b9708 Show both peft_error and autoconfig_error, not just autoconfig_error (#2080)
When loading a PEFT model fails, only the `autoconfig_error` is shown. Instead of the `peft_error`, which is what really matters when we're trying to load a PEFT adapter, the user will see something like this:

```
RuntimeError: Unrecognized model in my_model. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, ...
```

This PR just changes it so `autoconfig_error` and `peft_error` are both displayed.
2025-03-17 21:45:29 -07:00
Daniel Han
a22a6ba6bc Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly 2025-03-17 21:44:00 -07:00
Daniel Han
1bd7c733a6 Update _utils.py 2025-03-17 21:43:51 -07:00
Xander Hawthorne
04bcab022c Ensure trust_remote_code propegates down to unsloth_compile_transformers (#2075) 2025-03-17 21:43:47 -07:00
Daniel Han
ebf2a4abb6 move float32 2025-03-17 19:42:49 -07:00
Daniel Han
56a8dc6067 Fix forced float32 2025-03-17 19:24:21 -07:00
Daniel Han
a731e2fb6c disable bfloat16 2025-03-17 18:28:46 -07:00
Daniel Han
7dd9817bd0 Update _utils.py 2025-03-17 16:57:14 -07:00
Daniel Han
845a253ed4 move print 2025-03-17 04:51:28 -07:00
Daniel Han
f8d198a947 Update _utils.py 2025-03-17 04:49:26 -07:00
Daniel Han
96f4073208 Update _utils.py 2025-03-17 04:47:58 -07:00
Daniel Han
5a2c87fa5d Update _utils.py 2025-03-17 04:45:49 -07:00
Daniel Han
d247f1e20d Update _utils.py 2025-03-17 04:42:50 -07:00
Daniel Han
3679194e0e move forced float32 2025-03-17 04:41:44 -07:00
Daniel Han
947f201856 Update vision.py 2025-03-17 04:17:59 -07:00
Daniel Han
47a2492f87 Update _utils.py 2025-03-16 23:37:46 -07:00
Daniel Han
d50f42a8b4 Update vision.py 2025-03-16 23:27:59 -07:00
Daniel Han
74fc293c99 Update vision.py 2025-03-16 23:20:43 -07:00
Daniel Han
268930e2cd LoRA 16bit fix 2025-03-16 23:18:57 -07:00
Daniel Han
3c1dfe2f58 Update loader.py 2025-03-16 22:15:54 -07:00
Daniel Han
ce2884f563 more prints 2025-03-16 22:13:57 -07:00
Daniel Han
f01757601f Update vision.py 2025-03-16 22:10:36 -07:00
Daniel Han
1eafb4a3b3 Update pyproject.toml 2025-03-16 20:31:54 -07:00
Daniel Han
35caede8ea Update pyproject.toml 2025-03-16 20:30:46 -07:00
Daniel Han
4824e17063 Update loader.py 2025-03-16 20:18:53 -07:00
Kareem
136837e5cc fix (#2051) 2025-03-16 15:19:58 -07:00
Mukkesh Ganesh
745e0da8ae bug fix #2008 (#2039) 2025-03-16 15:19:14 -07:00
Daniel Han
9aa93db23c Update llama.py 2025-03-15 23:34:52 -07:00
Daniel Han
344e6616a8 Update llama.py 2025-03-15 22:39:09 -07:00
Daniel Han
7b020eff46 Update pyproject.toml 2025-03-15 19:40:02 -07:00
Daniel Han
10ab6e32b8 Version 2025-03-15 19:22:44 -07:00
Daniel Han
79aab4b74a Merge branch 'main' into nightly 2025-03-15 19:15:26 -07:00
Daniel Han
9233e42c9e Update _utils.py 2025-03-15 17:58:14 -07:00
Michael Han
d82a707a4a Update README.md 2025-03-15 17:47:25 -07:00
Daniel Han
ad08cb9730 Update rl.py 2025-03-15 17:13:18 -07:00
Daniel Han
50d41d6d9f Merge branch 'main' into nightly 2025-03-15 16:36:18 -07:00
Daniel Han
e1c24a01f8 Update README.md (#2028) 2025-03-14 22:06:53 -07:00
Daniel Han
05fdaff970 Gemma 3 readme (#2019)
* Update README.md

* Update README.md

* Update README.md

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2025-03-14 11:12:02 -07:00
Daniel Han
71baea8e55 Update _utils.py 2025-03-14 10:04:10 -07:00
Daniel Han
0f1d78d8e4 Update save.py 2025-03-14 09:54:48 -07:00
Daniel Han
8cfe8a57e6 Precision issues 2025-03-14 08:33:33 -07:00
Daniel Han
b4cd82d59f Update vision.py 2025-03-14 08:19:02 -07:00
Daniel Han
360cc66779 Update _utils.py 2025-03-14 08:17:49 -07:00
Daniel Han
afd297d281 Update vision.py 2025-03-14 08:17:36 -07:00
Daniel Han
0f587bfe2c Update save.py 2025-03-14 08:08:43 -07:00
Daniel Han
b8aaf550a7 GGUF saving (#2017)
* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* fix an import error (#1767)

* fix an import error

* Delete .gitignore

* Update loader.py

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* SamplingParams

* Convert mask to float (#1762)

* [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753)

* Add latest xformers

* Add a couple of lines to docs

* vLLMSamplingParams

* Update __init__.py

* default num_chunks == -1

* Versioning

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update pyproject.toml

* Update pyproject.toml

* Export Model to ollama.com  (#1648)

* Ollama Export Model to ollama.com

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Check for model_name

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* subprocess use instead of requests | added check for ollama server

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model | fix

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Push to Ollama

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

---------

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Update cross_entropy_loss.py

* torch_cuda_device

* Update utils.py

* Update utils.py

* Update utils.py

* device

* device

* Update loader.py

* Update llama.py

* Update README.md

* Update llama.py

* Update llama.py

* Update _utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* __version__

* Update rl.py

* Bug fixes

* Bug fixes

* Update llama.py

* Update _utils.py

* _wrap_fast_inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* SFT dataset prepare

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update utils.py

* bug fix

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Update _utils.py

* Version

* versioning

* Update _utils.py

* Update llama.py

* Update llama.py

* Bug fixes

* FastModel

* __doc__

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* version

* move use_modelscope to _utils (#1938)

* move use_modelscope to _utils

* Update _utils.py

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Don't use revision when loading model_config and is_peft=True (#1949)

* More syntax warnings (#1944)

* move use_modelscope to _utils

* fix

* Update _utils.py

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Full finetuning and other fixes

* UNSLOTH_ENABLE_FULL_FINETUNING

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* full finetuning

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* max_seq_length

* Update rl.py

* Update rl.py

* Update rl.py

* Update pyproject.toml

* AutoModelForImageTextToText

* Update mapper.py

* Update pyproject.toml

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Batch samples

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update mapper.py

* Update vision.py

* Temporary patches

* Update loader.py

* model names

* Gemma 3 chat template

* Bug fixes

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update rl.py

* Update chat_templates.py

* Update chat_templates.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Revert

* Update _utils.py

* forced precision

* Autocast

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* vLLM fixes

* constexpr

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update save.py

* New models

* Triton windows update (#1976)

* Update pyproject.toml

* Update README.md

* Update RMS LayerNorm implementation, and list compr. change in chat templates (#1974)

* Update RMS LayerNorm implementation with optimizations and testing suite

* perf: optimize list comprehension in get_ollama_eos_tokens

* Update Zoo

* Update llama.py

* Update llama.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* grpo fix

* Update rl_replacements.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update save.py

* Update save.py

* Update save.py

---------

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com>
Co-authored-by: Jyotin Goel <120490013+gjyotin305@users.noreply.github.com>
Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com>
Co-authored-by: Wilson Wu <140025193+wiwu2390@users.noreply.github.com>
Co-authored-by: Akshay Behl <126911424+Captain-T2004@users.noreply.github.com>
2025-03-14 07:58:57 -07:00
Daniel Han
fecb6492de Update save.py 2025-03-14 07:56:56 -07:00
Daniel Han
2b07988c11 Update save.py 2025-03-14 07:36:35 -07:00
Daniel Han
dd0e790d3f Update save.py 2025-03-14 07:26:23 -07:00
Daniel Han
cb4579c199 Update vision.py 2025-03-14 07:09:23 -07:00
Daniel Han
10c166b452 Merge branch 'main' into nightly 2025-03-14 07:09:16 -07:00
Daniel Han
3410744e88 Gemma 3, bug fixes (#2014)
* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* fix an import error (#1767)

* fix an import error

* Delete .gitignore

* Update loader.py

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* SamplingParams

* Convert mask to float (#1762)

* [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753)

* Add latest xformers

* Add a couple of lines to docs

* vLLMSamplingParams

* Update __init__.py

* default num_chunks == -1

* Versioning

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update pyproject.toml

* Update pyproject.toml

* Export Model to ollama.com  (#1648)

* Ollama Export Model to ollama.com

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Check for model_name

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* subprocess use instead of requests | added check for ollama server

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model | fix

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Push to Ollama

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

---------

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Update cross_entropy_loss.py

* torch_cuda_device

* Update utils.py

* Update utils.py

* Update utils.py

* device

* device

* Update loader.py

* Update llama.py

* Update README.md

* Update llama.py

* Update llama.py

* Update _utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* __version__

* Update rl.py

* Bug fixes

* Bug fixes

* Update llama.py

* Update _utils.py

* _wrap_fast_inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* SFT dataset prepare

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update utils.py

* bug fix

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Update _utils.py

* Version

* versioning

* Update _utils.py

* Update llama.py

* Update llama.py

* Bug fixes

* FastModel

* __doc__

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* version

* move use_modelscope to _utils (#1938)

* move use_modelscope to _utils

* Update _utils.py

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Don't use revision when loading model_config and is_peft=True (#1949)

* More syntax warnings (#1944)

* move use_modelscope to _utils

* fix

* Update _utils.py

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Full finetuning and other fixes

* UNSLOTH_ENABLE_FULL_FINETUNING

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* full finetuning

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* max_seq_length

* Update rl.py

* Update rl.py

* Update rl.py

* Update pyproject.toml

* AutoModelForImageTextToText

* Update mapper.py

* Update pyproject.toml

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Batch samples

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update mapper.py

* Update vision.py

* Temporary patches

* Update loader.py

* model names

* Gemma 3 chat template

* Bug fixes

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update rl.py

* Update chat_templates.py

* Update chat_templates.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Revert

* Update _utils.py

* forced precision

* Autocast

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* vLLM fixes

* constexpr

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update save.py

* New models

* Triton windows update (#1976)

* Update pyproject.toml

* Update README.md

* Update RMS LayerNorm implementation, and list compr. change in chat templates (#1974)

* Update RMS LayerNorm implementation with optimizations and testing suite

* perf: optimize list comprehension in get_ollama_eos_tokens

* Update Zoo

* Update llama.py

* Update llama.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* grpo fix

* Update rl_replacements.py

* Update vision.py

* Update rl_replacements.py

* Update vision.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update loader.py

---------

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com>
Co-authored-by: Jyotin Goel <120490013+gjyotin305@users.noreply.github.com>
Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com>
Co-authored-by: Wilson Wu <140025193+wiwu2390@users.noreply.github.com>
Co-authored-by: Akshay Behl <126911424+Captain-T2004@users.noreply.github.com>
2025-03-14 06:42:44 -07:00
Daniel Han
c53be2c3b8 Update loader.py 2025-03-14 06:26:01 -07:00
Daniel Han
f125d6f9ef Update vision.py 2025-03-14 06:17:35 -07:00
Daniel Han
12dadb8c35 Update vision.py 2025-03-14 06:17:20 -07:00
Daniel Han
695ebd956c Update mapper.py 2025-03-14 06:15:44 -07:00
Daniel Han
d42f11f41e Update vision.py 2025-03-14 06:11:51 -07:00
Daniel Han
64820ea4bd Update rl_replacements.py 2025-03-14 06:11:19 -07:00
Daniel Han
c0c42e9716 Update vision.py 2025-03-14 06:06:10 -07:00
Daniel Han
026fb59e2f Update rl_replacements.py 2025-03-14 06:01:42 -07:00
Daniel Han
86bc0a4761 grpo fix 2025-03-14 05:57:17 -07:00
Daniel Han
fdb2f8e177 Update vision.py 2025-03-14 05:54:57 -07:00
Daniel Han
7322f95971 Update rl_replacements.py 2025-03-14 05:51:08 -07:00
Daniel Han
b8500b055d Update vision.py 2025-03-14 05:41:17 -07:00
Daniel Han
0a64bb88f6 Update vision.py 2025-03-14 05:40:56 -07:00
Daniel Han
7c4889a84c Update vision.py 2025-03-14 04:59:29 -07:00
Daniel Han
4d49983b8b Update vision.py 2025-03-14 04:58:32 -07:00
Daniel Han
70d09475fa Update vision.py 2025-03-14 04:51:43 -07:00
Daniel Han
5c2ce48bb1 Update vision.py 2025-03-14 04:43:11 -07:00
Daniel Han
490b1b087a Update vision.py 2025-03-14 04:40:29 -07:00
Daniel Han
99c2eec8e0 Update vision.py 2025-03-14 04:39:00 -07:00
Daniel Han
43db95099b Update vision.py 2025-03-14 04:37:31 -07:00
Daniel Han
87071e4f4c Update vision.py 2025-03-14 04:30:26 -07:00
Daniel Han
a9bd11e336 Update vision.py 2025-03-14 04:30:02 -07:00
Daniel Han
20512f8a9b Update vision.py 2025-03-14 04:26:10 -07:00
Daniel Han
5d85b29a2a Update llama.py 2025-03-14 04:08:17 -07:00
Daniel Han
ca3a09e7dd Update llama.py 2025-03-14 04:02:41 -07:00
Daniel Han
3ff81862ef Merge branch 'main' into nightly 2025-03-14 02:59:25 -07:00
Daniel Han
cbfb67441c Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly 2025-03-14 02:54:47 -07:00
Daniel Han
67e498ae82 Update Zoo 2025-03-14 02:54:40 -07:00
Nino Risteski
f4d97caf5e Update RMS LayerNorm implementation, and list compr. change in chat templates (#1974)
* Update RMS LayerNorm implementation with optimizations and testing suite

* perf: optimize list comprehension in get_ollama_eos_tokens
2025-03-14 02:53:21 -07:00
Akshay Behl
8baebe46a0 Triton windows update (#1976)
* Update pyproject.toml

* Update README.md
2025-03-14 02:51:30 -07:00
Daniel Han
d66f5ff082 New models 2025-03-14 02:41:52 -07:00
Daniel Han
5d6fa45c3f Update save.py 2025-03-14 02:32:53 -07:00
Daniel Han
0a09349096 Update _utils.py 2025-03-14 02:21:15 -07:00
Daniel Han
c92ef07ab7 Update _utils.py 2025-03-14 02:07:13 -07:00
Daniel Han
c508113286 Update _utils.py 2025-03-14 02:00:03 -07:00
Daniel Han
758bca7414 Update _utils.py 2025-03-14 01:44:04 -07:00
Daniel Han
abfe34f7c0 Update llama.py 2025-03-14 01:39:45 -07:00
Daniel Han
f25063f060 Update llama.py 2025-03-14 01:36:12 -07:00
Daniel Han
aaca723291 Update llama.py 2025-03-14 01:26:45 -07:00
Daniel Han
82835c2904 Update llama.py 2025-03-14 01:25:35 -07:00
Daniel Han
88c042500b Update llama.py 2025-03-14 01:20:18 -07:00
Daniel Han
456e05dc57 Update llama.py 2025-03-14 01:18:54 -07:00
Daniel Han
d628218ac8 Update llama.py 2025-03-14 01:18:04 -07:00
Daniel Han
c2b9855084 Update llama.py 2025-03-14 00:16:24 -07:00
Daniel Han
813fb7edcf Update rl.py 2025-03-13 23:08:46 -07:00
Daniel Han
1d7dba52d2 Update vision.py 2025-03-13 18:34:26 -07:00
Daniel Han
6df2ef3667 Update vision.py 2025-03-13 18:31:03 -07:00
Daniel Han
b28b7fa364 Update vision.py 2025-03-13 18:26:30 -07:00
Daniel Han
5843d784f5 constexpr 2025-03-13 18:23:29 -07:00
Daniel Han
78584619f5 vLLM fixes 2025-03-13 17:57:46 -07:00
Daniel Han
325741b7b7 Update rl.py 2025-03-13 16:02:56 -07:00
Daniel Han
d0e0dad7d0 Gemma 3 bug fixes (#2005)
* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649)

* edit save.py to fix gguf saving breaks.

* add check for .exe or not exe file extension for linux and windows

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* unsloth_num_chunks

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py (#1754)

Fix typo in comment: know -> now.

This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well.

* Optional logits

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* fix an import error (#1767)

* fix an import error

* Delete .gitignore

* Update loader.py

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* SamplingParams

* Convert mask to float (#1762)

* [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753)

* Add latest xformers

* Add a couple of lines to docs

* vLLMSamplingParams

* Update __init__.py

* default num_chunks == -1

* Versioning

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update pyproject.toml

* Update pyproject.toml

* Export Model to ollama.com  (#1648)

* Ollama Export Model to ollama.com

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Check for model_name

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* subprocess use instead of requests | added check for ollama server

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model | fix

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Push to Ollama

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

---------

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Update cross_entropy_loss.py

* torch_cuda_device

* Update utils.py

* Update utils.py

* Update utils.py

* device

* device

* Update loader.py

* Update llama.py

* Update README.md

* Update llama.py

* Update llama.py

* Update _utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* __version__

* Update rl.py

* Bug fixes

* Bug fixes

* Update llama.py

* Update _utils.py

* _wrap_fast_inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* SFT dataset prepare

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update utils.py

* bug fix

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Update _utils.py

* Version

* versioning

* Update _utils.py

* Update llama.py

* Update llama.py

* Bug fixes

* FastModel

* __doc__

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* version

* move use_modelscope to _utils (#1938)

* move use_modelscope to _utils

* Update _utils.py

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Don't use revision when loading model_config and is_peft=True (#1949)

* More syntax warnings (#1944)

* move use_modelscope to _utils

* fix

* Update _utils.py

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Full finetuning and other fixes

* UNSLOTH_ENABLE_FULL_FINETUNING

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* full finetuning

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* max_seq_length

* Update rl.py

* Update rl.py

* Update rl.py

* Update pyproject.toml

* AutoModelForImageTextToText

* Update mapper.py

* Update pyproject.toml

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Batch samples

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update mapper.py

* Update vision.py

* Temporary patches

* Update loader.py

* model names

* Gemma 3 chat template

* Bug fixes

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update rl.py

* Update chat_templates.py

* Update chat_templates.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Revert

* Update _utils.py

* forced precision

* Autocast

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

---------

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>
Co-authored-by: Gennadii Manzhos <105049664+everythingisc00l@users.noreply.github.com>
Co-authored-by: Seth Weidman <seth@sethweidman.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com>
Co-authored-by: Jyotin Goel <120490013+gjyotin305@users.noreply.github.com>
Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com>
Co-authored-by: Wilson Wu <140025193+wiwu2390@users.noreply.github.com>
2025-03-13 06:41:42 -07:00
Daniel Han
df9573c754 Update vision.py 2025-03-13 06:39:17 -07:00
Daniel Han
188ccdde3e Update vision.py 2025-03-13 06:34:55 -07:00
Daniel Han
bf54c7e8a7 Update vision.py 2025-03-13 06:25:18 -07:00
Daniel Han
ed00fb213c Update vision.py 2025-03-13 06:24:37 -07:00
Daniel Han
35664afebd Update vision.py 2025-03-13 06:15:42 -07:00
Daniel Han
43e8a6d714 Update rl.py 2025-03-13 06:12:36 -07:00
Daniel Han
0aab37286c Update vision.py 2025-03-13 06:08:33 -07:00
Daniel Han
f07c4ebc68 Update vision.py 2025-03-13 06:06:24 -07:00
Daniel Han
b2a3c36963 Autocast 2025-03-13 06:02:33 -07:00
Daniel Han
eef49713e4 forced precision 2025-03-13 05:46:02 -07:00
Daniel Han
87c4fcf1bb Update _utils.py 2025-03-13 05:38:21 -07:00
Daniel Han
da0048d2b5 Revert 2025-03-13 05:13:17 -07:00
Daniel Han
b9b68c9ee3 Update vision.py 2025-03-13 01:30:06 -07:00
Daniel Han
1dc76614d8 Update vision.py 2025-03-13 01:27:50 -07:00
Daniel Han
b1eff862e0 Update loader.py 2025-03-13 01:17:48 -07:00
Daniel Han
e6a65ca866 Update vision.py 2025-03-13 01:15:25 -07:00
Daniel Han
66a734b22a Update vision.py 2025-03-12 22:23:29 -07:00
Daniel Han
0dd88f91ec Update vision.py 2025-03-12 22:22:23 -07:00
Daniel Han
d0517a527d Update chat_templates.py 2025-03-12 21:52:38 -07:00
Daniel Han
437eb8184f Update chat_templates.py 2025-03-12 21:47:53 -07:00
Daniel Han
7fa4bf813f Update rl.py 2025-03-12 21:47:01 -07:00
Daniel Han
2441039ce1 Update llama.py 2025-03-12 21:42:27 -07:00
Daniel Han
1d07d4ded2 Update llama.py 2025-03-12 21:40:20 -07:00
Daniel Han
fb5230ae6d Update vision.py 2025-03-12 21:38:51 -07:00
Daniel Han
79b59cc8a6 Update vision.py 2025-03-12 21:35:29 -07:00
Daniel Han
702f85bd54 Update vision.py 2025-03-12 21:34:34 -07:00
Daniel Han
eaa5947342 Update vision.py 2025-03-12 21:33:22 -07:00
Daniel Han
f904e66e7b Update vision.py 2025-03-12 21:31:37 -07:00
Daniel Han
be660d3bb1 Bug fixes 2025-03-12 21:30:02 -07:00
Daniel Han
7fe2874157 Gemma 3 chat template 2025-03-12 20:37:19 -07:00
Daniel Han
5349526e35 model names 2025-03-12 20:04:27 -07:00
Daniel Han
fa6628b3e9 Update loader.py 2025-03-12 19:25:41 -07:00
Daniel Han
914bd92a8c Temporary patches 2025-03-12 19:20:17 -07:00
Daniel Han
8e9f52f16d Update vision.py 2025-03-12 06:51:23 -07:00
Daniel Han
f8a490e16e Merge branch 'main' into nightly 2025-03-12 05:00:45 -07:00
Daniel Han
33e020c064 Update _utils.py 2025-03-12 04:52:07 -07:00
Daniel Han
2c54bfd7ff Update _utils.py 2025-03-12 04:46:34 -07:00
Daniel Han
356a74d4dd Update mapper.py 2025-03-12 04:07:45 -07:00
Daniel Han
f35d5977d6 Gemma 3 (#1986)
* Update llama.py

* GRPO optimized

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Selective Log softmax

* Fix GRPO bsz

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Fix TRL

* Metrics GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* No compile

* Update rl.py

* Remove docs

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649)

* edit save.py to fix gguf saving breaks.

* add check for .exe or not exe file extension for linux and windows

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* unsloth_num_chunks

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py (#1754)

Fix typo in comment: know -> now.

This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well.

* Optional logits

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* fix an import error (#1767)

* fix an import error

* Delete .gitignore

* Update loader.py

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* SamplingParams

* Convert mask to float (#1762)

* [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753)

* Add latest xformers

* Add a couple of lines to docs

* vLLMSamplingParams

* Update __init__.py

* default num_chunks == -1

* Versioning

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update pyproject.toml

* Update pyproject.toml

* Export Model to ollama.com  (#1648)

* Ollama Export Model to ollama.com

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Check for model_name

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* subprocess use instead of requests | added check for ollama server

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model | fix

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Push to Ollama

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

---------

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Update cross_entropy_loss.py

* torch_cuda_device

* Update utils.py

* Update utils.py

* Update utils.py

* device

* device

* Update loader.py

* Update llama.py

* Update README.md

* Update llama.py

* Update llama.py

* Update _utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* __version__

* Update rl.py

* Bug fixes

* Bug fixes

* Update llama.py

* Update _utils.py

* _wrap_fast_inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* SFT dataset prepare

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update utils.py

* bug fix

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Update _utils.py

* Version

* versioning

* Update _utils.py

* Update llama.py

* Update llama.py

* Bug fixes

* FastModel

* __doc__

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* version

* move use_modelscope to _utils (#1938)

* move use_modelscope to _utils

* Update _utils.py

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Don't use revision when loading model_config and is_peft=True (#1949)

* More syntax warnings (#1944)

* move use_modelscope to _utils

* fix

* Update _utils.py

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Full finetuning and other fixes

* UNSLOTH_ENABLE_FULL_FINETUNING

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* full finetuning

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* max_seq_length

* Update rl.py

* Update rl.py

* Update rl.py

* Update pyproject.toml

* AutoModelForImageTextToText

* Update mapper.py

* Update pyproject.toml

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Batch samples

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update mapper.py

---------

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>
Co-authored-by: Gennadii Manzhos <105049664+everythingisc00l@users.noreply.github.com>
Co-authored-by: Seth Weidman <seth@sethweidman.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com>
Co-authored-by: Jyotin Goel <120490013+gjyotin305@users.noreply.github.com>
Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com>
Co-authored-by: Wilson Wu <140025193+wiwu2390@users.noreply.github.com>
2025-03-12 01:23:34 -07:00
Daniel Han
89a2517a8d Update mapper.py 2025-03-11 23:35:46 -07:00
Daniel Han
c319cacac4 Update vision.py 2025-03-11 22:55:34 -07:00
Daniel Han
a3c2856977 Update vision.py 2025-03-11 22:36:51 -07:00
Daniel Han
8148f3e7f0 Update vision.py 2025-03-11 22:34:57 -07:00
Daniel Han
7208177c66 Update loader.py 2025-03-11 22:31:14 -07:00
Daniel Han
d4e0f42951 Update vision.py 2025-03-11 22:29:06 -07:00
Daniel Han
7f03388c66 Update loader.py 2025-03-11 22:27:53 -07:00
Daniel Han
b061ab453e Update _utils.py 2025-03-11 22:22:36 -07:00
Daniel Han
032d109d98 Update loader.py 2025-03-11 22:02:39 -07:00
Daniel Han
7e76b42b46 Update loader.py 2025-03-11 20:45:53 -07:00
Daniel Han
50803b0a2f Update loader.py 2025-03-11 20:44:01 -07:00
Daniel Han
2249e266b7 Update loader.py 2025-03-11 20:43:25 -07:00
Daniel Han
904fe3485e Batch samples 2025-03-11 19:46:41 -07:00
Daniel Han
1b71f08854 Update _utils.py 2025-03-11 15:25:21 -07:00
Daniel Han
b7131a0697 Update _utils.py 2025-03-11 15:22:16 -07:00
Daniel Han
57c6894583 Update _utils.py 2025-03-11 15:20:20 -07:00
Daniel Han
361fa338d8 Update pyproject.toml 2025-03-11 14:59:29 -07:00
Daniel Han
683bd5fd9b Update mapper.py 2025-03-11 05:37:29 -07:00
Daniel Han
b78f12cff8 AutoModelForImageTextToText 2025-03-11 04:30:01 -07:00
Daniel Han
e680904d46 Update pyproject.toml 2025-03-11 00:18:55 -07:00
Daniel Han
a564d888bd Update rl.py 2025-03-10 05:00:45 -07:00
Daniel Han
2cedc89ac8 Update rl.py 2025-03-10 04:58:59 -07:00
Daniel Han
b97ac4e76f Update rl.py 2025-03-10 04:57:47 -07:00
Daniel Han
c6f73dd521 max_seq_length 2025-03-10 04:39:25 -07:00
Daniel Han
3443a8503f Update _utils.py 2025-03-10 00:31:49 -07:00
Daniel Han
c9dafd5eaf Update loader.py 2025-03-09 23:33:21 -07:00
Daniel Han
49dce011e9 Update loader.py 2025-03-09 23:24:21 -07:00
Daniel Han
5e23241d6e Update loader.py 2025-03-09 23:22:13 -07:00
Daniel Han
a6215fcc50 full finetuning 2025-03-09 23:18:52 -07:00
Daniel Han
2f737d1144 Update vision.py 2025-03-09 23:13:30 -07:00
Daniel Han
4560215e41 Update vision.py 2025-03-09 23:11:28 -07:00
Daniel Han
4122cc7547 Update loader.py 2025-03-09 23:08:34 -07:00
Daniel Han
4cf033f6c7 Update loader.py 2025-03-09 23:06:06 -07:00
Daniel Han
186a2f6ee6 Update loader.py 2025-03-09 23:03:27 -07:00
Daniel Han
d41e5578e1 UNSLOTH_ENABLE_FULL_FINETUNING 2025-03-09 22:57:24 -07:00
Daniel Han
66f661ea02 Full finetuning and other fixes 2025-03-09 22:52:34 -07:00
Daniel Han
0a33a1d5fc Update loader.py 2025-03-08 18:48:54 -08:00
Kareem
f1d77f7857 More syntax warnings (#1944)
* move use_modelscope to _utils

* fix

* Update _utils.py

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-03-08 17:55:37 -08:00
Wilson Wu
c5e163db74 Don't use revision when loading model_config and is_peft=True (#1949) 2025-03-08 17:51:53 -08:00
Kareem
7a1199cf0b move use_modelscope to _utils (#1938)
* move use_modelscope to _utils

* Update _utils.py

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-03-08 17:51:07 -08:00
Daniel Han
b79ce73b2c Merge branch 'main' into nightly 2025-03-08 17:50:50 -08:00
Daniel Han
08815f9f57 Bug fixes (#1951)
* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* GRPO optimized

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Selective Log softmax

* Fix GRPO bsz

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Fix TRL

* Metrics GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* No compile

* Update rl.py

* Remove docs

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649)

* edit save.py to fix gguf saving breaks.

* add check for .exe or not exe file extension for linux and windows

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* unsloth_num_chunks

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py (#1754)

Fix typo in comment: know -> now.

This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well.

* Optional logits

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* fix an import error (#1767)

* fix an import error

* Delete .gitignore

* Update loader.py

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* SamplingParams

* Convert mask to float (#1762)

* [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753)

* Add latest xformers

* Add a couple of lines to docs

* vLLMSamplingParams

* Update __init__.py

* default num_chunks == -1

* Versioning

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update pyproject.toml

* Update pyproject.toml

* Export Model to ollama.com  (#1648)

* Ollama Export Model to ollama.com

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Check for model_name

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* subprocess use instead of requests | added check for ollama server

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model | fix

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Push to Ollama

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

---------

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Update cross_entropy_loss.py

* torch_cuda_device

* Update utils.py

* Update utils.py

* Update utils.py

* device

* device

* Update loader.py

* Update llama.py

* Update README.md

* Update llama.py

* Update llama.py

* Update _utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* __version__

* Update rl.py

* Bug fixes

* Bug fixes

* Update llama.py

* Update _utils.py

* _wrap_fast_inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* SFT dataset prepare

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update utils.py

* bug fix

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Update _utils.py

* Version

* versioning

* Update _utils.py

* Update llama.py

* Update llama.py

* Bug fixes

* FastModel

* __doc__

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* version

---------

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>
Co-authored-by: Gennadii Manzhos <105049664+everythingisc00l@users.noreply.github.com>
Co-authored-by: Seth Weidman <seth@sethweidman.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com>
Co-authored-by: Jyotin Goel <120490013+gjyotin305@users.noreply.github.com>
2025-03-08 04:34:55 -08:00
Daniel Han
e18896439c version 2025-03-08 04:29:19 -08:00
Daniel Han
723dbcd576 Update loader.py 2025-03-08 03:38:22 -08:00
Daniel Han
09b53c79ed Update loader.py 2025-03-08 03:36:45 -08:00
Daniel Han
87c643cb42 Update loader.py 2025-03-08 03:35:22 -08:00
Daniel Han
454b19929a Update vision.py 2025-03-08 03:23:23 -08:00
Daniel Han
964339a236 __doc__ 2025-03-08 03:20:03 -08:00
Daniel Han
ff17bdbc11 FastModel 2025-03-08 03:02:45 -08:00
Daniel Han
a9026d9e9e Bug fixes 2025-03-07 01:43:39 -08:00
Daniel Han
7a0ce38a0d Merge branch 'main' into nightly 2025-03-06 14:47:39 -08:00
Daniel Han
de6be085ca Merge branch 'main' of https://github.com/unslothai/unsloth 2025-03-06 14:44:16 -08:00
Daniel Han
0a646e40f5 Big bug fixes
Fixes:
1. #1932
2. #1931
3. #1928
4. #1925
5. #1921
6. #1918
7. #1923
8. #1922
9. #1921

Please do: `pip install --upgrade --force-reinstall --no-deps unsloth unsloth_zoo` for local machines. Colab / Kaggle please restart and delete / disconnect runtime and redo

Apologies on the issues!
2025-03-06 14:43:48 -08:00
Daniel Han
c60621670d Bug fixes 2025-03-06 14:39:04 -08:00
Daniel Han
53d0ba079e Bug fixes (#1920)
* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update pyproject.toml

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* GRPO optimized

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Selective Log softmax

* Fix GRPO bsz

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Fix TRL

* Metrics GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* No compile

* Update rl.py

* Remove docs

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649)

* edit save.py to fix gguf saving breaks.

* add check for .exe or not exe file extension for linux and windows

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* unsloth_num_chunks

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py (#1754)

Fix typo in comment: know -> now.

This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well.

* Optional logits

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* fix an import error (#1767)

* fix an import error

* Delete .gitignore

* Update loader.py

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* SamplingParams

* Convert mask to float (#1762)

* [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753)

* Add latest xformers

* Add a couple of lines to docs

* vLLMSamplingParams

* Update __init__.py

* default num_chunks == -1

* Versioning

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update pyproject.toml

* Update pyproject.toml

* Export Model to ollama.com  (#1648)

* Ollama Export Model to ollama.com

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Check for model_name

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* subprocess use instead of requests | added check for ollama server

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model | fix

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Push to Ollama

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

---------

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Update cross_entropy_loss.py

* torch_cuda_device

* Update utils.py

* Update utils.py

* Update utils.py

* device

* device

* Update loader.py

* Update llama.py

* Update README.md

* Update llama.py

* Update llama.py

* Update _utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* __version__

* Update rl.py

* Bug fixes

* Bug fixes

* Update llama.py

* Update _utils.py

* _wrap_fast_inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* SFT dataset prepare

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update utils.py

* bug fix

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Update _utils.py

* Version

* versioning

* Update _utils.py

* Update llama.py

* Update llama.py

---------

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>
Co-authored-by: Gennadii Manzhos <105049664+everythingisc00l@users.noreply.github.com>
Co-authored-by: Seth Weidman <seth@sethweidman.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com>
Co-authored-by: Jyotin Goel <120490013+gjyotin305@users.noreply.github.com>
2025-03-06 05:16:15 -08:00
Daniel Han
4000d2ec6b Update llama.py 2025-03-06 03:46:00 -08:00
Daniel Han
ed2cfd1e0a Update llama.py 2025-03-06 03:44:40 -08:00
Daniel Han
5ed619522b Update _utils.py 2025-03-06 03:22:10 -08:00
Daniel Han
a666967405 versioning 2025-03-06 03:14:01 -08:00
Daniel Han
9325426c3e Merge branch 'main' into nightly 2025-03-06 02:35:00 -08:00
Daniel Han
83eaa2f087 Logits fixes (#1916)
* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update pyproject.toml

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* GRPO optimized

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Selective Log softmax

* Fix GRPO bsz

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Fix TRL

* Metrics GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* No compile

* Update rl.py

* Remove docs

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649)

* edit save.py to fix gguf saving breaks.

* add check for .exe or not exe file extension for linux and windows

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* unsloth_num_chunks

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py (#1754)

Fix typo in comment: know -> now.

This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well.

* Optional logits

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* fix an import error (#1767)

* fix an import error

* Delete .gitignore

* Update loader.py

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* SamplingParams

* Convert mask to float (#1762)

* [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753)

* Add latest xformers

* Add a couple of lines to docs

* vLLMSamplingParams

* Update __init__.py

* default num_chunks == -1

* Versioning

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update pyproject.toml

* Update pyproject.toml

* Export Model to ollama.com  (#1648)

* Ollama Export Model to ollama.com

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Check for model_name

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* subprocess use instead of requests | added check for ollama server

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model | fix

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Push to Ollama

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

---------

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Update cross_entropy_loss.py

* torch_cuda_device

* Update utils.py

* Update utils.py

* Update utils.py

* device

* device

* Update loader.py

* Update llama.py

* Update README.md

* Update llama.py

* Update llama.py

* Update _utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* __version__

* Update rl.py

* Bug fixes

* Bug fixes

* Update llama.py

* Update _utils.py

* _wrap_fast_inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* SFT dataset prepare

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update utils.py

* bug fix

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Update _utils.py

* Version

---------

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>
Co-authored-by: Gennadii Manzhos <105049664+everythingisc00l@users.noreply.github.com>
Co-authored-by: Seth Weidman <seth@sethweidman.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com>
Co-authored-by: Jyotin Goel <120490013+gjyotin305@users.noreply.github.com>
2025-03-06 02:32:32 -08:00
Daniel Han
85b2b7e941 Version 2025-03-06 02:29:59 -08:00
Daniel Han
1f7001b5c3 Update _utils.py 2025-03-06 01:31:01 -08:00
Daniel Han
033d6dcf60 Update __init__.py 2025-03-06 01:10:50 -08:00
Daniel Han
a52b5a48dd Update _utils.py 2025-03-05 23:48:00 -08:00
Daniel Han
f52f12a073 Update rl.py 2025-03-05 23:32:04 -08:00
Daniel Han
ed2d8a9303 Update rl.py 2025-03-05 23:19:12 -08:00
Daniel Han
76cfa1a14f Update rl.py 2025-03-05 23:17:51 -08:00
Daniel Han
c97447e983 Update _utils.py 2025-03-05 23:10:48 -08:00
Daniel Han
50397d8510 Update _utils.py 2025-03-05 23:05:52 -08:00
Daniel Han
94d50c367a Update _utils.py 2025-03-05 23:02:15 -08:00
Daniel Han
9313007ded Update _utils.py 2025-03-05 22:59:43 -08:00
Daniel Han
a877c3700b Merge branch 'main' into nightly 2025-03-05 22:58:40 -08:00
Daniel Han
105020e4cc Update _utils.py 2025-03-05 22:58:14 -08:00
Daniel Han
303daeb1e1 Python 3.12 fix 2025-03-05 12:58:24 -08:00
Daniel Han
2afeb37839 Many bug fixes (#1900)
* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* autocast

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update pyproject.toml

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* GRPO optimized

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Selective Log softmax

* Fix GRPO bsz

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Fix TRL

* Metrics GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* No compile

* Update rl.py

* Remove docs

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649)

* edit save.py to fix gguf saving breaks.

* add check for .exe or not exe file extension for linux and windows

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* unsloth_num_chunks

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py (#1754)

Fix typo in comment: know -> now.

This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well.

* Optional logits

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* fix an import error (#1767)

* fix an import error

* Delete .gitignore

* Update loader.py

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* SamplingParams

* Convert mask to float (#1762)

* [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753)

* Add latest xformers

* Add a couple of lines to docs

* vLLMSamplingParams

* Update __init__.py

* default num_chunks == -1

* Versioning

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update pyproject.toml

* Update pyproject.toml

* Export Model to ollama.com  (#1648)

* Ollama Export Model to ollama.com

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Check for model_name

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* subprocess use instead of requests | added check for ollama server

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model | fix

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Push to Ollama

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

---------

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Update cross_entropy_loss.py

* torch_cuda_device

* Update utils.py

* Update utils.py

* Update utils.py

* device

* device

* Update loader.py

* Update llama.py

* Update README.md

* Update llama.py

* Update llama.py

* Update _utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* __version__

* Update rl.py

* Bug fixes

* Bug fixes

* Update llama.py

* Update _utils.py

* _wrap_fast_inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* SFT dataset prepare

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update utils.py

* bug fix

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

---------

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>
Co-authored-by: Gennadii Manzhos <105049664+everythingisc00l@users.noreply.github.com>
Co-authored-by: Seth Weidman <seth@sethweidman.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com>
Co-authored-by: Jyotin Goel <120490013+gjyotin305@users.noreply.github.com>
2025-03-05 05:13:32 -08:00
Daniel Han
f557986862 Update __init__.py 2025-03-05 04:55:06 -08:00
Daniel Han
5a4c03fc60 Update llama.py 2025-03-05 04:50:26 -08:00
Daniel Han
b28f5dfe64 Update llama.py 2025-03-05 04:47:44 -08:00
Daniel Han
a434bf37f6 Update llama.py 2025-03-05 04:45:46 -08:00
Daniel Han
752c826719 Update llama.py 2025-03-05 04:22:20 -08:00
Daniel Han
3290a0c736 Update llama.py 2025-03-05 03:59:59 -08:00
Daniel Han
fc4101646f bug fix 2025-03-05 03:49:38 -08:00
Daniel Han
5de84d1ead Update utils.py 2025-03-05 03:46:21 -08:00
Daniel Han
0597d535bb Update llama.py 2025-03-05 03:38:16 -08:00
Daniel Han
ea4e8e4371 Update llama.py 2025-03-05 03:30:13 -08:00
Daniel Han
8c181dfdbe Update rl.py 2025-03-05 03:24:08 -08:00
Daniel Han
7d669e8b18 Update rl_replacements.py 2025-03-05 01:11:03 -08:00
Daniel Han
f1b158bdda Update rl_replacements.py 2025-03-05 01:03:13 -08:00
Daniel Han
dd364f3b80 Update rl_replacements.py 2025-03-05 01:00:01 -08:00
Daniel Han
363208bf16 Update pyproject.toml 2025-03-05 00:56:56 -08:00
Daniel Han
2cd1793e83 SFT dataset prepare 2025-03-05 00:51:10 -08:00
Daniel Han
7e84cddb97 Update _utils.py 2025-03-04 19:44:54 -08:00
Daniel Han
2569b0f245 Update llama.py 2025-03-04 19:39:49 -08:00
Daniel Han
ba944d6ef8 Update llama.py 2025-03-04 19:20:38 -08:00
Daniel Han
a95d7322f0 Update llama.py 2025-03-04 19:19:12 -08:00
Daniel Han
6e49701337 Update llama.py 2025-03-04 19:15:40 -08:00
Daniel Han
f54cc4f15c Update llama.py 2025-03-04 19:11:54 -08:00
Daniel Han
6ba41c59ab Update llama.py 2025-03-04 19:08:59 -08:00
Daniel Han
c7e980a208 Update llama.py 2025-03-04 19:04:46 -08:00
Daniel Han
81d0ed64e4 Update llama.py 2025-03-04 19:02:12 -08:00
Daniel Han
f40e145559 Update llama.py 2025-03-04 18:59:55 -08:00
Daniel Han
d006df611f Update llama.py 2025-03-04 18:21:54 -08:00
Daniel Han
fa5e37de2d Update llama.py 2025-03-04 18:18:02 -08:00
Daniel Han
0b1cf7640e _wrap_fast_inference 2025-03-04 18:15:45 -08:00
Daniel Han
ed90df2c4f Update _utils.py 2025-03-04 16:30:57 -08:00
Daniel Han
fe68005095 Update llama.py 2025-03-04 16:16:21 -08:00
Daniel Han
d84f723a82 Merge branch 'main' into nightly 2025-03-04 14:20:27 -08:00
Daniel Han
6a03ea2f81 Bug fixes 2025-03-04 14:19:56 -08:00
Daniel Han
6208d99833 Bug fix 2025-03-04 13:26:47 -08:00
Daniel Han
c813f2de1d Bug fix 2025-03-04 04:22:23 -08:00
Daniel Han
d5c427adea Bug fix 2025-03-04 04:12:38 -08:00
Daniel Han
3e5f061133 Bug fixes (#1891)
* Update rl.py

* Patching

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* NEFTune

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Extra replacements

* Update rl_replacements.py

* Update rl.py

* extra RL replacements

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update _utils.py

* Update loader_utils.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* autocast

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update pyproject.toml

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* GRPO optimized

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Selective Log softmax

* Fix GRPO bsz

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Fix TRL

* Metrics GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* No compile

* Update rl.py

* Remove docs

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649)

* edit save.py to fix gguf saving breaks.

* add check for .exe or not exe file extension for linux and windows

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* unsloth_num_chunks

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py (#1754)

Fix typo in comment: know -> now.

This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well.

* Optional logits

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* fix an import error (#1767)

* fix an import error

* Delete .gitignore

* Update loader.py

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* SamplingParams

* Convert mask to float (#1762)

* [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753)

* Add latest xformers

* Add a couple of lines to docs

* vLLMSamplingParams

* Update __init__.py

* default num_chunks == -1

* Versioning

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update pyproject.toml

* Update pyproject.toml

* Export Model to ollama.com  (#1648)

* Ollama Export Model to ollama.com

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Check for model_name

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* subprocess use instead of requests | added check for ollama server

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model | fix

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Push to Ollama

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

---------

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Update cross_entropy_loss.py

* torch_cuda_device

* Update utils.py

* Update utils.py

* Update utils.py

* device

* device

* Update loader.py

* Update llama.py

* Update README.md

* Update llama.py

* Update llama.py

* Update _utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* __version__

* Update rl.py

* Bug fixes

---------

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>
Co-authored-by: Gennadii Manzhos <105049664+everythingisc00l@users.noreply.github.com>
Co-authored-by: Seth Weidman <seth@sethweidman.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com>
Co-authored-by: Jyotin Goel <120490013+gjyotin305@users.noreply.github.com>
2025-03-04 03:55:49 -08:00
Daniel Han
94bda540e2 Bug fixes 2025-03-04 03:47:51 -08:00
Daniel Han
f22fe83726 Update rl.py 2025-03-04 03:31:38 -08:00
Daniel Han
607539435e __version__ 2025-03-04 02:57:54 -08:00
Daniel Han
e0c0b71831 Update utils.py 2025-03-04 02:38:44 -08:00
Daniel Han
0387f997f7 Update utils.py 2025-03-04 02:37:09 -08:00
Daniel Han
c38c1d52dd Update utils.py 2025-03-04 02:35:19 -08:00
Daniel Han
963e876ea4 Update utils.py 2025-03-04 02:28:35 -08:00
Daniel Han
71735c342f Update llama.py 2025-03-03 23:59:11 -08:00
Daniel Han
a4fa7f920f Update llama.py 2025-03-03 23:58:58 -08:00
Daniel Han
63b7b34424 Update llama.py 2025-03-03 23:41:37 -08:00
Daniel Han
0460a67ed4 Update llama.py 2025-03-03 23:32:02 -08:00
Daniel Han
61a3daa667 Update llama.py 2025-03-03 23:00:26 -08:00
Michael Han
6491abfb78 Merge pull request #1885 from unslothai/shimmyshimmer-patch-6
Update README.md
2025-03-03 21:28:00 -08:00
Michael Han
c018ea28db Update README.md 2025-03-03 21:27:20 -08:00
Daniel Han
9c3bb53a0a Update utils.py 2025-03-03 18:46:16 -08:00
Daniel Han
baed76e0a5 Update utils.py 2025-03-03 18:33:18 -08:00
Daniel Han
2f9767887a Update utils.py 2025-03-03 18:29:35 -08:00
Daniel Han
3671756291 Update utils.py 2025-03-03 17:27:59 -08:00
Daniel Han
806bf910fc Update utils.py 2025-03-03 17:17:25 -08:00
Daniel Han
7e0bb36f9f Update _utils.py 2025-03-03 17:12:56 -08:00
Daniel Han
ce708bef1c Update llama.py 2025-03-03 15:49:04 -08:00
Daniel Han
6662fd652c Update llama.py 2025-03-03 15:48:55 -08:00
Daniel Han
d7ffc09329 Update README.md 2025-03-03 14:58:30 -08:00
Daniel Han
643637dd5e Update llama.py 2025-03-03 02:36:16 -08:00
Daniel Han
64ab4df808 Update loader.py 2025-03-03 02:30:53 -08:00
Daniel Han
391fe2907b device 2025-03-03 00:04:08 -08:00
Daniel Han
37541f149a device 2025-03-02 23:58:17 -08:00
Daniel Han
cb6318299d Update utils.py 2025-03-02 23:43:02 -08:00
Daniel Han
ed45bd9cd3 Update utils.py 2025-03-02 23:41:35 -08:00
Daniel Han
0b69bf34cc Update utils.py 2025-03-02 23:38:32 -08:00
Daniel Han
ed75ff330a torch_cuda_device 2025-03-02 23:31:16 -08:00
Daniel Han
b73a2c39d7 Update cross_entropy_loss.py 2025-03-02 23:08:44 -08:00
Michael Han
e02561d883 Update README.md 2025-03-02 20:44:26 -08:00
Michael Han
8b5883275d Update README.md 2025-03-02 20:35:27 -08:00
Michael Han
788563f8fe Update README.md 2025-03-02 20:34:36 -08:00
Daniel Han
f47415973a Merge branch 'main' into nightly 2025-03-02 20:28:24 -08:00
J. M Areeb Uzair
c6d2433547 Added Python version warning to Windows Install Section (#1872)
I spent half a day on the wrong Python version, so I am adding this big, red sign.
2025-03-02 03:48:21 -08:00
Mohamed Mekkouri
8180e9803c Fix Layernorm when num_cols not a power of 2 (#1867)
* fix

* Update layernorm.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-03-01 14:23:22 -08:00
Daniel Han
8a72e20aa5 Update vision.py 2025-03-01 13:47:01 -08:00
Daniel Han
4f166026d7 LoRA 2025-03-01 02:52:20 -08:00
Daniel Han
481e3e41b2 Update llama.py 2025-03-01 00:23:20 -08:00
Daniel Han
f4748c020d Update _utils.py 2025-03-01 00:21:28 -08:00
Daniel Han
f3b5469213 Update granite.py 2025-03-01 00:20:34 -08:00
Daniel Han
6b735edbcb Update granite.py 2025-03-01 00:19:12 -08:00
Daniel Han
6485cbb499 Prelim release 2025-03-01 00:13:11 -08:00
Aditya Ghai
08bc291300 Direct windows support for unsloth (#1841)
* Direct Windows Support(main)

* Update pyproject.toml

* Update README.md

Added the suggested changes to README
2025-02-27 20:25:46 -08:00
Daniel Han
841626c405 Update rl_replacements.py 2025-02-27 03:46:47 -08:00
Daniel Han
f55395b5f0 Update rl_replacements.py 2025-02-27 03:42:03 -08:00
Michael Han
569b4422c4 Update README.md 2025-02-26 17:03:47 -08:00
Michael Han
86aea0b4f8 Update README.md 2025-02-26 16:58:32 -08:00
Kareem
71d2a24575 fixed syntax warnings (#1522)
* fixed most of syntax warnings

* all syntaxwarnings fixed

* Syntax fixes

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-02-26 03:50:37 -08:00
Charles London
96346a5f35 Fix key error in GRPOTrainer (#1818)
* fix keyerror in GRPOTrainer

* check for train in _metrics
2025-02-25 15:22:35 -08:00
Igor Kilbas
455517aae9 Fix: GRPO with Mistral and importing (#1831)
* fix: mistral and importing

* minor change

* Style :)

* Update mistral.py

* Update mistral.py

* Update mistral.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-02-25 15:21:26 -08:00
Jyotin Goel
c316ad8910 Export Model to ollama.com (#1648)
* Ollama Export Model to ollama.com

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Check for model_name

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* subprocess use instead of requests | added check for ollama server

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* create_ollama_model | fix

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

* Push to Ollama

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>

---------

Signed-off-by: Jyotin Goel <b22ai063@iitj.ac.in>
2025-02-22 02:37:01 -08:00
Michael Han
ab701257d6 Update README.md 2025-02-21 22:59:19 -08:00
Daniel Han
734a9a0611 Update _utils.py 2025-02-20 09:24:07 -08:00
Daniel Han
4570c8b41e Bug Fixes (#1774)
* Update rl.py

* Update tokenizer_utils.py

* Auto patching

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update rl.py

* Update tokenizer_utils.py

* Update rl.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update tokenizer_utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* max seq length

* Update rl.py

* Update rl.py

* Patching

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* NEFTune

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Extra replacements

* Update rl_replacements.py

* Update rl.py

* extra RL replacements

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update _utils.py

* Update loader_utils.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* autocast

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update pyproject.toml

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* GRPO optimized

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Selective Log softmax

* Fix GRPO bsz

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Fix TRL

* Metrics GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* No compile

* Update rl.py

* Remove docs

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649)

* edit save.py to fix gguf saving breaks.

* add check for .exe or not exe file extension for linux and windows

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* unsloth_num_chunks

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py (#1754)

Fix typo in comment: know -> now.

This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well.

* Optional logits

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* fix an import error (#1767)

* fix an import error

* Delete .gitignore

* Update loader.py

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* SamplingParams

* Convert mask to float (#1762)

* [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753)

* Add latest xformers

* Add a couple of lines to docs

* vLLMSamplingParams

* Update __init__.py

* default num_chunks == -1

* Versioning

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update pyproject.toml

* Update pyproject.toml

---------

Co-authored-by: Gennadii Manzhos <105049664+everythingisc00l@users.noreply.github.com>
Co-authored-by: Seth Weidman <seth@sethweidman.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com>
2025-02-20 09:20:49 -08:00
Daniel Han
b39a8605cc Update pyproject.toml 2025-02-20 09:02:41 -08:00
Daniel Han
92ea46eae4 Update pyproject.toml 2025-02-20 08:43:46 -08:00
Daniel Han
0cf0a33c41 Update rl_replacements.py 2025-02-20 08:31:37 -08:00
Daniel Han
7a6e004288 Update rl_replacements.py 2025-02-20 08:28:21 -08:00
Daniel Han
3a06c5054c Update _utils.py 2025-02-20 07:51:33 -08:00
Daniel Han
0d28429ba1 Update llama.py 2025-02-20 07:46:14 -08:00
Daniel Han
4dc880dae8 Update llama.py 2025-02-20 07:40:45 -08:00
Daniel Han
9c812cc72a Update llama.py 2025-02-20 07:36:23 -08:00
Daniel Han
1a0f18e598 Update llama.py 2025-02-20 07:25:31 -08:00
Daniel Han
2c438b3b99 Update llama.py 2025-02-20 07:01:14 -08:00
Daniel Han
0345c8b1c3 Merge branch 'main' into nightly 2025-02-20 05:14:57 -08:00
Daniel Han
cd3a733602 bug fix 2025-02-20 04:47:12 -08:00
Daniel Han
a45a08f91b Memory Efficient GRPO (#1773)
* Update __init__.py

* Update loader.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Better TRL handling

* Update rl.py

* Update tokenizer_utils.py

* Auto patching

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update rl.py

* Update tokenizer_utils.py

* Update rl.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update tokenizer_utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* max seq length

* Update rl.py

* Update rl.py

* Patching

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* NEFTune

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Extra replacements

* Update rl_replacements.py

* Update rl.py

* extra RL replacements

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update _utils.py

* Update loader_utils.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* autocast

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update pyproject.toml

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* GRPO optimized

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Selective Log softmax

* Fix GRPO bsz

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Fix TRL

* Metrics GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* No compile

* Update rl.py

* Remove docs

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649)

* edit save.py to fix gguf saving breaks.

* add check for .exe or not exe file extension for linux and windows

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* unsloth_num_chunks

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py (#1754)

Fix typo in comment: know -> now.

This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well.

* Optional logits

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* fix an import error (#1767)

* fix an import error

* Delete .gitignore

* Update loader.py

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* SamplingParams

* Convert mask to float (#1762)

* [Windows Support] Add latest `xformers` wheels to pyproject.toml (#1753)

* Add latest xformers

* Add a couple of lines to docs

* vLLMSamplingParams

* Update __init__.py

* default num_chunks == -1

* Versioning

---------

Co-authored-by: Gennadii Manzhos <105049664+everythingisc00l@users.noreply.github.com>
Co-authored-by: Seth Weidman <seth@sethweidman.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Ben <6579034+versipellis@users.noreply.github.com>
2025-02-20 04:23:28 -08:00
Daniel Han
89711bf4f8 Versioning 2025-02-20 04:22:17 -08:00
Daniel Han
3a0fb38744 default num_chunks == -1 2025-02-19 23:51:06 -08:00
Daniel Han
50acb0f4f8 Update __init__.py 2025-02-19 23:45:07 -08:00
Daniel Han
940bce0b04 vLLMSamplingParams 2025-02-19 23:43:52 -08:00
Daniel Han
ad46d1a4a7 Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly 2025-02-19 23:40:50 -08:00
Ben
b8a2ceca14 [Windows Support] Add latest xformers wheels to pyproject.toml (#1753)
* Add latest xformers

* Add a couple of lines to docs
2025-02-19 23:40:07 -08:00
Edd
16e69efc96 Convert mask to float (#1762) 2025-02-19 23:38:48 -08:00
Daniel Han
c27074c28b SamplingParams 2025-02-19 23:37:52 -08:00
Nino Risteski
1b329a6731 fix an import error (#1767)
* fix an import error

* Delete .gitignore

* Update loader.py

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-02-19 23:32:25 -08:00
Daniel Han
e9c0591119 Update rl.py 2025-02-19 23:27:16 -08:00
Daniel Han
f3488a88c5 Merge branch 'main' into nightly 2025-02-19 23:24:21 -08:00
Daniel Han
fbe9ee80d4 Update README.md (#1768) 2025-02-19 23:24:05 -08:00
Daniel Han
55e8c086dd Update rl.py 2025-02-19 23:23:39 -08:00
Daniel Han
4fdbe839ea Update rl.py 2025-02-19 23:15:13 -08:00
Daniel Han
b4bd52f978 Update rl.py 2025-02-19 23:06:18 -08:00
Daniel Han
93804d9998 Update rl_replacements.py 2025-02-19 22:03:47 -08:00
Daniel Han
fe3c84d250 Update rl.py 2025-02-19 21:41:17 -08:00
Daniel Han
4dfbbb2b9a Update rl.py 2025-02-19 17:58:25 -08:00
Daniel Han
90158637d3 Update rl.py 2025-02-19 16:40:37 -08:00
Daniel Han
5129cf3049 Update rl.py 2025-02-19 16:38:44 -08:00
Daniel Han
98ad241821 Update rl.py 2025-02-19 16:37:12 -08:00
Daniel Han
09d8c9b2ca Update rl.py 2025-02-19 12:51:22 -08:00
Daniel Han
e06905724d Update rl.py 2025-02-19 12:47:15 -08:00
Daniel Han
dc04a82307 Update rl.py 2025-02-19 03:41:51 -08:00
Daniel Han
3797412198 Optional logits 2025-02-19 02:23:00 -08:00
Seth Weidman
c2938ccc83 Update rl_replacements.py (#1754)
Fix typo in comment: know -> now.

This was printed when running the Llama3.1_(8B)-GRPO.ipynb example notebook, so I'd expect others to run into it as well.
2025-02-19 02:12:07 -08:00
Daniel Han
b7a53c37bd Update rl_replacements.py 2025-02-18 01:57:18 -08:00
Daniel Han
9eb8e34a15 Update rl_replacements.py 2025-02-18 00:17:07 -08:00
Daniel Han
ab27ddc6a3 Update rl.py 2025-02-18 00:13:26 -08:00
Daniel Han
6148ce8d46 Update rl.py 2025-02-18 00:09:11 -08:00
Daniel Han
79141331f1 Update rl.py 2025-02-18 00:05:40 -08:00
Daniel Han
16f0cc2214 Update rl.py 2025-02-18 00:01:09 -08:00
Daniel Han
7a8ae1f272 Update rl.py 2025-02-17 23:57:57 -08:00
Daniel Han
2c81afe484 Update rl_replacements.py 2025-02-17 23:47:52 -08:00
Daniel Han
d6325aa94d Update rl_replacements.py 2025-02-17 22:30:20 -08:00
Daniel Han
6741b050d7 Update rl_replacements.py 2025-02-17 22:30:13 -08:00
Daniel Han
99b56e4193 Update rl.py 2025-02-17 22:24:57 -08:00
Daniel Han
6688732d2e unsloth_num_chunks 2025-02-17 22:17:11 -08:00
Daniel Han
cc523685ae Update rl_replacements.py 2025-02-17 21:17:58 -08:00
Daniel Han
df9c98bc14 Update rl_replacements.py 2025-02-17 21:10:26 -08:00
Daniel Han
afb60b7772 Update rl_replacements.py 2025-02-17 21:08:32 -08:00
Daniel Han
cb8a2a5550 Update rl_replacements.py 2025-02-17 21:03:44 -08:00
Daniel Han
bd3bd2103d Update rl_replacements.py 2025-02-17 20:38:18 -08:00
Daniel Han
b6f473b804 Update rl_replacements.py 2025-02-17 20:21:20 -08:00
Daniel Han
d05e13c70c Update rl.py 2025-02-17 20:08:15 -08:00
Daniel Han
c9b15eb00f Update rl.py 2025-02-17 20:04:26 -08:00
Daniel Han
d5002e1ebf Update rl_replacements.py 2025-02-17 20:04:14 -08:00
Daniel Han
798f8daaf1 Update rl.py 2025-02-17 20:00:04 -08:00
Daniel Han
c2917b06c7 Update rl.py 2025-02-17 19:58:17 -08:00
Daniel Han
53e14f4b2d Update rl_replacements.py 2025-02-17 19:53:49 -08:00
Daniel Han
fe483b6210 Update rl_replacements.py 2025-02-17 19:45:07 -08:00
Daniel Han
86512dd59f Update rl_replacements.py 2025-02-17 19:44:43 -08:00
Daniel Han
74f9c9e1fd Update llama.py 2025-02-17 19:16:41 -08:00
Daniel Han
397a0e49d5 Update llama.py 2025-02-17 00:49:35 -08:00
Daniel Han
f68ab5ad84 Update rl_replacements.py 2025-02-17 00:43:11 -08:00
Daniel Han
518ecd6638 Update rl_replacements.py 2025-02-17 00:05:32 -08:00
Daniel Han
da456076d3 Update rl_replacements.py 2025-02-16 23:49:20 -08:00
Daniel Han
759d23c7a4 Update llama.py 2025-02-16 21:22:35 -08:00
Daniel Han
74463e92d9 Update rl_replacements.py 2025-02-16 21:06:47 -08:00
Daniel Han
d4e9e38dfe Update rl_replacements.py 2025-02-16 20:55:26 -08:00
Daniel Han
f665fb243d Update rl_replacements.py 2025-02-16 20:55:11 -08:00
Daniel Han
834d3d69a1 Update rl_replacements.py 2025-02-16 20:20:01 -08:00
Daniel Han
3db0f8a0b3 Update rl_replacements.py 2025-02-16 19:47:39 -08:00
Daniel Han
ccc609fa43 Update rl_replacements.py 2025-02-16 18:34:45 -08:00
Daniel Han
0174dfc14f Update rl_replacements.py 2025-02-16 18:27:04 -08:00
Daniel Han
bdf52260d4 Update rl_replacements.py 2025-02-16 18:09:03 -08:00
Daniel Han
435552fc8a Update rl_replacements.py 2025-02-16 17:13:24 -08:00
Daniel Han
4914c563e4 Update rl_replacements.py 2025-02-16 16:14:21 -08:00
Gennadii Manzhos
3584c1b855 llama-quantize on WINDOWS WSL error fix - edit save.py (gguf saving breaks) (#1649)
* edit save.py to fix gguf saving breaks.

* add check for .exe or not exe file extension for linux and windows
2025-02-16 02:04:08 -08:00
Daniel Han
a841d358a9 Update rl_replacements.py 2025-02-16 01:49:29 -08:00
Daniel Han
2cc8a54f7a Update rl_replacements.py 2025-02-15 20:04:35 -08:00
Daniel Han
bfb3494b64 Update rl.py 2025-02-15 18:35:07 -08:00
Daniel Han
e86e739087 Update rl.py 2025-02-15 18:34:25 -08:00
Daniel Han
11e9251e89 Update rl_replacements.py 2025-02-15 18:06:12 -08:00
Daniel Han
780e02656b Update rl.py 2025-02-15 18:03:57 -08:00
Daniel Han
d056dd5c80 Update rl.py 2025-02-15 18:00:08 -08:00
Daniel Han
a87a0a6193 Update rl.py 2025-02-15 17:57:47 -08:00
Daniel Han
28fbb67281 Update rl.py 2025-02-15 17:48:52 -08:00
Daniel Han
cb43671874 Remove docs 2025-02-15 17:36:18 -08:00
Daniel Han
57e6b9ddcc Update rl.py 2025-02-15 16:45:57 -08:00
Daniel Han
630e4258e6 No compile 2025-02-15 16:45:25 -08:00
Daniel Han
6eb707f48d Merge branch 'main' into nightly 2025-02-15 03:12:52 -08:00
Daniel Han
66a1345421 Fix weird tokenizer issue 2025-02-15 03:12:43 -08:00
Daniel Han
895c7ca320 Update mapper.py 2025-02-15 02:48:36 -08:00
Daniel Han
c5e9299a73 Add GRPO metrics (#1718)
* Update llama.py

* Update llama.py

* Faster inference?

* Update llama.py

* Update llama.py

* Update utils.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update mapper.py

* Fast Inference via vLLM

* Update llama.py

* Update llama.py

* Update utils.py

* Create rl.py

* PatchRL

* Update rl.py

* Update rl.py

* Update rl.py

* PatchRLStatistics

* Update rl.py

* Update rl.py

* Update rl.py

* Update utils.py

* Update utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* RL metrics

* Update rl.py

* RL metrics

* Update __init__.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update chat_templates.py

* Update mapper.py

* Fp8 cache

* Update llama.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update __init__.py

* Update loader.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Better TRL handling

* Update rl.py

* Update tokenizer_utils.py

* Auto patching

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update rl.py

* Update tokenizer_utils.py

* Update rl.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update tokenizer_utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* max seq length

* Update rl.py

* Update rl.py

* Patching

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* NEFTune

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Extra replacements

* Update rl_replacements.py

* Update rl.py

* extra RL replacements

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update _utils.py

* Update loader_utils.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* autocast

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update pyproject.toml

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* GRPO optimized

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Selective Log softmax

* Fix GRPO bsz

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Fix TRL

* Metrics GRPO

* Update rl_replacements.py

* Update rl_replacements.py
2025-02-15 02:24:01 -08:00
Daniel Han
49321fb88e Update rl_replacements.py 2025-02-15 02:17:26 -08:00
Daniel Han
b014f87c5c Update rl_replacements.py 2025-02-15 02:12:49 -08:00
Daniel Han
7d1e6ae263 Metrics GRPO 2025-02-15 02:08:33 -08:00
Daniel Han
6d385c6f92 Merge branch 'main' into nightly 2025-02-15 01:56:40 -08:00
Daniel Han
c66350a48a Memory efficient GRPO, DPO etc (#1716)
* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Faster inference?

* Update llama.py

* Update llama.py

* Update utils.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update mapper.py

* Fast Inference via vLLM

* Update llama.py

* Update llama.py

* Update utils.py

* Create rl.py

* PatchRL

* Update rl.py

* Update rl.py

* Update rl.py

* PatchRLStatistics

* Update rl.py

* Update rl.py

* Update rl.py

* Update utils.py

* Update utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* RL metrics

* Update rl.py

* RL metrics

* Update __init__.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update chat_templates.py

* Update mapper.py

* Fp8 cache

* Update llama.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update __init__.py

* Update loader.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Better TRL handling

* Update rl.py

* Update tokenizer_utils.py

* Auto patching

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update rl.py

* Update tokenizer_utils.py

* Update rl.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update tokenizer_utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* max seq length

* Update rl.py

* Update rl.py

* Patching

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* NEFTune

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Extra replacements

* Update rl_replacements.py

* Update rl.py

* extra RL replacements

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update _utils.py

* Update loader_utils.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* autocast

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update pyproject.toml

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* GRPO optimized

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Selective Log softmax

* Fix GRPO bsz

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Fix TRL
2025-02-15 01:19:39 -08:00
Daniel Han
a7f32f3412 Fix TRL 2025-02-15 01:13:41 -08:00
Daniel Han
40cc4cbe05 Update rl_replacements.py 2025-02-14 16:08:49 -08:00
Daniel Han
b1e963b2ea Update rl_replacements.py 2025-02-14 16:03:13 -08:00
Daniel Han
02b45397d6 Update rl_replacements.py 2025-02-14 16:01:29 -08:00
Daniel Han
caca33f401 Update rl_replacements.py 2025-02-14 15:58:13 -08:00
Daniel Han
ee672a214e Update rl.py 2025-02-14 15:56:05 -08:00
Daniel Han
294037324f Fix GRPO bsz 2025-02-14 15:32:02 -08:00
Daniel Han
4b385df264 Selective Log softmax 2025-02-14 14:56:37 -08:00
Daniel Han
a204e6ecb9 Update rl_replacements.py 2025-02-14 04:53:06 -08:00
Daniel Han
9adbd6909b Update rl_replacements.py 2025-02-14 04:49:41 -08:00
Daniel Han
6de69db2be Update rl_replacements.py 2025-02-14 04:45:48 -08:00
Daniel Han
10c359f231 Update rl.py 2025-02-14 04:44:03 -08:00
Daniel Han
5a9f9b7f24 Update rl.py 2025-02-14 04:42:05 -08:00
Daniel Han
a142182cc9 Update rl.py 2025-02-14 04:38:03 -08:00
Daniel Han
409762bd19 Update rl.py 2025-02-14 04:35:03 -08:00
Daniel Han
546be40339 Update rl_replacements.py 2025-02-14 04:33:41 -08:00
Daniel Han
c1d028ced8 Update rl_replacements.py 2025-02-14 04:32:24 -08:00
Daniel Han
79f345d484 Update rl.py 2025-02-14 04:31:27 -08:00
Daniel Han
194c1869b9 GRPO optimized 2025-02-14 04:30:15 -08:00
Daniel Han
dc2ef0b255 Merge branch 'main' into nightly 2025-02-13 22:18:26 -08:00
Daniel Han
3136fd9611 Fix bugs (#1706)
* Bug fixes

* fix: flash_attn_detection_error (#1556)

* fix: flash_attn_detection_error

* Update _utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mapper.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* dim fix

* Update _utils.py

* Torch 2.6 support

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Faster inference?

* Update llama.py

* Update llama.py

* Update utils.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update mapper.py

* Fast Inference via vLLM

* Update llama.py

* Update llama.py

* Update utils.py

* Create rl.py

* PatchRL

* Update rl.py

* Update rl.py

* Update rl.py

* PatchRLStatistics

* Update rl.py

* Update rl.py

* Update rl.py

* Update utils.py

* Update utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* RL metrics

* Update rl.py

* RL metrics

* Update __init__.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update chat_templates.py

* Update mapper.py

* Fp8 cache

* Update llama.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update __init__.py

* Update loader.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Better TRL handling

* Update rl.py

* Update tokenizer_utils.py

* Auto patching

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update rl.py

* Update tokenizer_utils.py

* Update rl.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update tokenizer_utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* max seq length

* Update rl.py

* Update rl.py

* Patching

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* NEFTune

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Extra replacements

* Update rl_replacements.py

* Update rl.py

* extra RL replacements

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update _utils.py

* Update loader_utils.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* autocast

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update pyproject.toml

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

---------

Co-authored-by: Zhe Zhang <2631992879@qq.com>
2025-02-13 19:12:19 -08:00
Daniel Han
4d1de7af9e Update llama.py 2025-02-13 19:11:38 -08:00
Daniel Han
4377b396c3 Update llama.py 2025-02-13 17:25:12 -08:00
Daniel Han
4dbfdcebbd Update llama.py 2025-02-13 17:22:39 -08:00
Daniel Han
a1db8042cb Update llama.py 2025-02-13 17:18:42 -08:00
Daniel Han
90375db93b Update rl_replacements.py 2025-02-13 17:15:29 -08:00
Daniel Han
ee1c2b4abd Update llama.py 2025-02-13 17:12:06 -08:00
Daniel Han
c51e4e2d21 Update llama.py 2025-02-13 17:11:33 -08:00
Daniel Han
e20a459343 Update llama.py 2025-02-13 17:05:04 -08:00
Daniel Han
c17bf7e04c Update llama.py 2025-02-13 17:00:14 -08:00
Daniel Han
dccd3999f3 Update rl.py 2025-02-13 16:47:27 -08:00
Daniel Han
8a5b163fe3 Update rl.py 2025-02-13 16:38:21 -08:00
Daniel Han
0651fe19ab Update rl.py 2025-02-13 16:37:05 -08:00
Daniel Han
5346a5f96a Update rl.py 2025-02-13 16:35:09 -08:00
Daniel Han
b9c4ab96cb Update rl.py 2025-02-13 16:27:23 -08:00
Daniel Han
4190857458 Update rl_replacements.py 2025-02-13 16:27:02 -08:00
Daniel Han
144b9b9e53 Update _utils.py 2025-02-13 16:23:51 -08:00
Daniel Han
8ae57e25a7 Update llama.py 2025-02-13 16:11:51 -08:00
Daniel Han
834c3a4492 Merge branch 'main' into nightly 2025-02-13 15:14:35 -08:00
Daniel Han
4f9301d321 Update _utils.py 2025-02-13 15:14:17 -08:00
Daniel Han
26f8d8580b Update pyproject.toml 2025-02-13 15:13:55 -08:00
Daniel Han
bf2ee8eed2 Merge branch 'main' into nightly 2025-02-13 15:02:56 -08:00
Daniel Han
39eaefce14 Update rl.py 2025-02-13 15:02:50 -08:00
Daniel Han
3ad4076e4b Merge branch 'main' into nightly 2025-02-13 14:59:59 -08:00
Daniel Han
635e921506 Update _utils.py 2025-02-13 14:59:52 -08:00
Daniel Han
5c19724e7f Update dpo.py 2025-02-13 14:59:42 -08:00
Daniel Han
f11079ba65 Merge branch 'main' into nightly 2025-02-13 14:57:58 -08:00
Daniel Han
43cda3240c Update __init__.py 2025-02-13 14:55:14 -08:00
Daniel Han
010de17c90 Update _utils.py 2025-02-13 14:54:49 -08:00
Daniel Han
95fb1d699d Fix bugs (#1701)
* Phi 4

* Update llama.py

* Torch.Cuda Is Available Condition and Warning (#1545)

* check for torch.cuda and triton if available
on my machine(mac m3) the cuda were not available

* Update pyproject.toml

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update mistral.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix

* Bug fixes

* Update mapper.py

* Add dropout to granite to match HF's implementation (#1557)

Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>

* Update llama.py

* Update llama.py

* Bug fixes

* fix: flash_attn_detection_error (#1556)

* fix: flash_attn_detection_error

* Update _utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mapper.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* dim fix

* Update _utils.py

* Torch 2.6 support

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Faster inference?

* Update llama.py

* Update llama.py

* Update utils.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update mapper.py

* Fast Inference via vLLM

* Update llama.py

* Update llama.py

* Update utils.py

* Create rl.py

* PatchRL

* Update rl.py

* Update rl.py

* Update rl.py

* PatchRLStatistics

* Update rl.py

* Update rl.py

* Update rl.py

* Update utils.py

* Update utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* RL metrics

* Update rl.py

* RL metrics

* Update __init__.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update chat_templates.py

* Update mapper.py

* Fp8 cache

* Update llama.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update __init__.py

* Update loader.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Better TRL handling

* Update rl.py

* Update tokenizer_utils.py

* Auto patching

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update rl.py

* Update tokenizer_utils.py

* Update rl.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update tokenizer_utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* max seq length

* Update rl.py

* Update rl.py

* Patching

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* NEFTune

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Extra replacements

* Update rl_replacements.py

* Update rl.py

* extra RL replacements

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update _utils.py

* Update loader_utils.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* autocast

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update pyproject.toml

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update llama.py

* Update _utils.py

---------

Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>
Co-authored-by: AminWhat <88392440+aminwhat@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Zhe Zhang <2631992879@qq.com>
2025-02-13 14:50:44 -08:00
Daniel Han
44e2879efb Update _utils.py 2025-02-13 14:47:54 -08:00
Daniel Han
50568f1c15 Update llama.py 2025-02-13 14:42:38 -08:00
Daniel Han
c59c2738d9 Merge branch 'main' into nightly 2025-02-13 14:40:49 -08:00
Daniel Han
b1d2c9fb30 Update rl_replacements.py 2025-02-13 04:44:21 -08:00
Daniel Han
de7e76c250 Update rl_replacements.py 2025-02-13 04:40:53 -08:00
Daniel Han
1013ab5891 Update rl_replacements.py 2025-02-13 04:40:42 -08:00
Daniel Han
a0ed33d99a Update rl_replacements.py 2025-02-13 04:34:13 -08:00
Daniel Han
dffbba5ff4 Update rl_replacements.py 2025-02-13 04:25:56 -08:00
Daniel Han
8a7a6c1a8e Update rl_replacements.py 2025-02-13 02:06:36 -08:00
Daniel Han
ef9ad91e36 Update rl_replacements.py 2025-02-13 02:04:11 -08:00
Daniel Han
ca8080290c Update rl_replacements.py 2025-02-13 02:01:51 -08:00
Daniel Han
386cb81c23 Update llama.py 2025-02-13 01:57:45 -08:00
Daniel Han
b181a8ff2b Update rl_replacements.py 2025-02-13 01:52:07 -08:00
Daniel Han
5cd34b0b95 Update rl_replacements.py 2025-02-13 01:48:38 -08:00
Daniel Han
d594ba5812 Update rl_replacements.py 2025-02-13 01:44:00 -08:00
Daniel Han
c9da968975 Update rl_replacements.py 2025-02-13 01:42:55 -08:00
Michael Han
28cb38675f Merge pull request #1688 from unslothai/shimmyshimmer-patch-5
Update README.md
2025-02-13 01:14:23 -08:00
Michael Han
6097db77bb Update README.md 2025-02-13 01:14:06 -08:00
Daniel Han
53653128ff Update llama.py 2025-02-13 00:29:34 -08:00
Daniel Han
3ddad518b6 Update llama.py 2025-02-13 00:26:58 -08:00
Daniel Han
8a35445c23 Update llama.py 2025-02-13 00:10:44 -08:00
Daniel Han
005a397436 Update llama.py 2025-02-12 23:03:40 -08:00
Daniel Han
7d881ae18a Update llama.py 2025-02-12 23:00:51 -08:00
Daniel Han
a94e37db16 Update llama.py 2025-02-12 22:56:47 -08:00
Daniel Han
310a40b3f9 Update llama.py 2025-02-12 22:34:50 -08:00
Daniel Han
85bb48288a Update pyproject.toml 2025-02-12 21:16:44 -08:00
Daniel Han
df43bdb8bf Update llama.py 2025-02-12 21:06:33 -08:00
Daniel Han
eb5a14617f Update llama.py 2025-02-12 20:33:37 -08:00
Daniel Han
318a59f6b4 Update llama.py 2025-02-12 20:29:10 -08:00
Daniel Han
35bad803b5 Update llama.py 2025-02-12 20:24:48 -08:00
Daniel Han
e270294550 Update rl_replacements.py 2025-02-12 20:18:07 -08:00
Daniel Han
3f500fdd12 Update llama.py 2025-02-12 19:11:09 -08:00
Daniel Han
56dffe007c Update llama.py 2025-02-12 19:10:48 -08:00
Daniel Han
2f25ff2698 Update llama.py 2025-02-12 19:07:50 -08:00
Daniel Han
46007979eb Update llama.py 2025-02-12 19:01:23 -08:00
Daniel Han
43458dafd4 Update llama.py 2025-02-12 18:57:01 -08:00
Daniel Han
2b7655527b Update rl_replacements.py 2025-02-12 18:50:45 -08:00
Daniel Han
07616241a0 Update llama.py 2025-02-12 18:44:47 -08:00
Daniel Han
77f40b1ed9 Update rl_replacements.py 2025-02-12 16:23:47 -08:00
Daniel Han
5d040c70a7 Update rl_replacements.py 2025-02-12 16:19:34 -08:00
Daniel Han
2b360989c6 Update rl_replacements.py 2025-02-12 16:19:13 -08:00
Daniel Han
956ccb79e3 Update rl_replacements.py 2025-02-12 16:16:31 -08:00
Daniel Han
7681bff612 Update llama.py 2025-02-12 03:56:12 -08:00
Daniel Han
20acdcef31 Update rl_replacements.py 2025-02-12 03:50:32 -08:00
Daniel Han
0ddf688a56 autocast 2025-02-12 03:50:07 -08:00
Daniel Han
6ebf29b9fe Update llama.py 2025-02-12 03:32:12 -08:00
Daniel Han
1f4b8e0c9c Update llama.py 2025-02-12 03:24:32 -08:00
Daniel Han
4a08c6fc32 Update llama.py 2025-02-12 03:08:40 -08:00
Daniel Han
60ba876dc9 Update llama.py 2025-02-12 03:03:43 -08:00
Daniel Han
b88b77efce Update rl.py 2025-02-12 02:27:34 -08:00
Daniel Han
771a6c95e3 Update rl_replacements.py 2025-02-12 01:53:16 -08:00
Daniel Han
c915b0ae2f Update rl_replacements.py 2025-02-11 23:58:26 -08:00
Daniel Han
0e51ebdd58 Update rl.py 2025-02-11 23:47:33 -08:00
Daniel Han
bae1d69611 Update loader_utils.py 2025-02-11 23:45:26 -08:00
Daniel Han
08dea00cfb Update _utils.py 2025-02-11 23:10:11 -08:00
Daniel Han
56d3ea2a7a Update rl_replacements.py 2025-02-11 22:02:41 -08:00
Daniel Han
30adb81fc2 Update llama.py 2025-02-11 22:02:22 -08:00
Daniel Han
725c59bfd2 Update rl_replacements.py 2025-02-11 22:00:44 -08:00
Daniel Han
01e6c71d7c Merge branch 'main' into nightly 2025-02-11 21:31:34 -08:00
Daniel Han
26e5d6ac08 Update rl_replacements.py 2025-02-11 21:31:23 -08:00
Daniel Han
b333b3064d Update rl_replacements.py 2025-02-11 21:18:55 -08:00
Daniel Han
947649af63 Update rl_replacements.py 2025-02-11 21:16:56 -08:00
Daniel Han
f41d01e74d Update rl_replacements.py 2025-02-11 21:14:41 -08:00
Daniel Han
b7b7213295 Update rl_replacements.py 2025-02-11 21:13:31 -08:00
Daniel Han
4bdd2ed59c extra RL replacements 2025-02-11 21:10:32 -08:00
Daniel Han
fd48c77ff7 Update rl.py 2025-02-11 20:39:55 -08:00
Daniel Han
cf14867bc0 Update rl_replacements.py 2025-02-11 20:37:18 -08:00
Daniel Han
c4fdd39c08 Extra replacements 2025-02-11 20:35:34 -08:00
Daniel Han
44810d7876 Update rl.py 2025-02-11 19:34:41 -08:00
Daniel Han
fcebdb08bb Update rl.py 2025-02-11 19:34:29 -08:00
Daniel Han
f24d897a29 Update rl.py 2025-02-11 19:00:53 -08:00
Daniel Han
275b836de1 Update rl.py 2025-02-11 18:57:35 -08:00
Daniel Han
e35dfacb2a Update rl.py 2025-02-11 18:56:09 -08:00
Daniel Han
7b265dbe0c Update rl.py 2025-02-11 18:54:39 -08:00
Daniel Han
2af746b1bd Update rl.py 2025-02-11 18:49:09 -08:00
Daniel Han
7ea918d85b NEFTune 2025-02-11 18:19:16 -08:00
Daniel Han
800536774a Update rl.py 2025-02-11 16:20:14 -08:00
Daniel Han
af4cd27eb9 Update rl.py 2025-02-11 16:04:56 -08:00
Daniel Han
722c4ecca6 Update rl.py 2025-02-11 16:03:33 -08:00
Daniel Han
da96bd8374 Update rl.py 2025-02-11 15:57:32 -08:00
Daniel Han
7c3b51100f Update rl.py 2025-02-11 15:53:46 -08:00
Daniel Han
aae24f5ae3 Patching 2025-02-11 15:11:16 -08:00
Daniel Han
875181b6d2 Update rl.py 2025-02-11 15:00:44 -08:00
Daniel Han
da7e35fd35 Update rl.py 2025-02-11 14:27:31 -08:00
Daniel Han
9680d0f73a max seq length 2025-02-11 14:22:19 -08:00
Daniel Han
6fbba44ff0 Update rl.py 2025-02-11 14:11:33 -08:00
Daniel Han
766c71844f Update rl.py 2025-02-11 03:25:37 -08:00
Daniel Han
f1a924c31f Update rl.py 2025-02-11 03:23:05 -08:00
Daniel Han
c792fa4f20 Update tokenizer_utils.py 2025-02-11 03:21:59 -08:00
Daniel Han
9f1b839d34 Update rl.py 2025-02-11 03:20:41 -08:00
Daniel Han
6f6a544b5c Update rl.py 2025-02-11 03:16:07 -08:00
Daniel Han
764d22e0bd Update rl.py 2025-02-11 03:08:04 -08:00
Daniel Han
cd4778f78f Update rl.py 2025-02-11 03:05:44 -08:00
Daniel Han
edb2a62bdb Update rl.py 2025-02-11 03:04:05 -08:00
Daniel Han
cf4ccb543d Update rl.py 2025-02-11 01:39:51 -08:00
Daniel Han
815ef563e4 Update rl.py 2025-02-11 01:36:57 -08:00
Daniel Han
9a64373b27 Update rl.py 2025-02-11 01:36:30 -08:00
Daniel Han
be18ce5db9 Update rl.py 2025-02-11 01:33:43 -08:00
Daniel Han
0b61f8e86e Update tokenizer_utils.py 2025-02-11 01:30:02 -08:00
Daniel Han
835dab9903 Update tokenizer_utils.py 2025-02-11 01:28:28 -08:00
Daniel Han
d57f8a36ec Update tokenizer_utils.py 2025-02-11 01:27:50 -08:00
Daniel Han
95029e4163 Update tokenizer_utils.py 2025-02-11 01:25:17 -08:00
Daniel Han
3261dffeed Update tokenizer_utils.py 2025-02-11 01:22:58 -08:00
Daniel Han
3ded920491 Update tokenizer_utils.py 2025-02-11 01:17:13 -08:00
Daniel Han
afbad75e20 Update tokenizer_utils.py 2025-02-11 01:14:06 -08:00
Daniel Han
b3bbe3d3f9 Update tokenizer_utils.py 2025-02-11 00:42:47 -08:00
Daniel Han
12cccc52c9 Update rl.py 2025-02-11 00:37:08 -08:00
Daniel Han
0288ca825f Update tokenizer_utils.py 2025-02-11 00:36:12 -08:00
Daniel Han
9d6ad6b400 Update rl.py 2025-02-11 00:24:31 -08:00
Daniel Han
07f39e0b05 Update tokenizer_utils.py 2025-02-11 00:23:24 -08:00
Daniel Han
543e6d5ab3 Update tokenizer_utils.py 2025-02-11 00:22:02 -08:00
Daniel Han
a4cde480a9 Update tokenizer_utils.py 2025-02-11 00:06:08 -08:00
Daniel Han
2aa87bd8f1 Auto patching 2025-02-10 23:33:15 -08:00
Daniel Han
0767a5eccb Update tokenizer_utils.py 2025-02-10 23:30:08 -08:00
Daniel Han
d3fefc2095 Update rl.py 2025-02-10 23:25:37 -08:00
Daniel Han
56921915f5 Better TRL handling 2025-02-10 23:24:36 -08:00
Michael Han
a5d1391d12 Merge pull request #1654 from unslothai/shimmyshimmer-patch-4
Update README.md
2025-02-09 19:57:27 -08:00
Michael Han
9807456b29 Update README.md 2025-02-09 19:57:15 -08:00
Daniel Han
d07aa0e4d3 Update tokenizer_utils.py 2025-02-09 19:21:58 -08:00
Daniel Han
766f9e5d47 Update tokenizer_utils.py 2025-02-09 19:19:51 -08:00
Daniel Han
a80d468199 Merge branch 'main' into nightly 2025-02-09 19:06:32 -08:00
Diogo Neves
36c3d36e74 Fixed Triton url (#1607)
Triton's link was pointing to the old research url
2025-02-08 19:41:39 -08:00
Daniel Han
bc7897805b Merge branch 'main' into nightly 2025-02-07 00:53:37 -08:00
Michael Han
74fce13683 Update README.md 2025-02-06 17:20:19 -08:00
Daniel Han
cd52ac2e16 GRPO Bug fixes (#1623)
* use exact model name

* Update save.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* print

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update vision.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* accurate_accumulation

* Update loader.py

* Update loader.py

* Update _utils.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update pyproject.toml

* Update __init__.py

* Update pyproject.toml

* Update __init__.py

* Update __init__.py

* Fix Triton heuristics

https://github.com/triton-lang/triton/issues/5224

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Xformers

* Update loader.py

* Update loader.py

* Rewind

* Update _utils.py

* Update _utils.py

* requires grad

* Update loader.py

* Update _utils.py

* Update loader.py

* changing model to base_model if peft model is already used

* Improve debugging experience (#1512)

* Create CONTRIBUTING.md (#1472)

Creating contributing guidelines

* Update CONTRIBUTING.md

improved sentence

* Improve logging control in `unsloth_compile_transformers` by conditionally redirecting stdout based on UNSLOTH_DISABLE_LOGGER environment variable

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>

* Update loader.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a8edd0931a.

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Auto change is_bfloat16_supported

* Update llama.py

* Force data-type

* Update llama.py

* All attention refactor fix (#1491)

* change initilization of n_heads, n_kv_heads, hidden_size in llama.py

* do the same for cohere, mistral, gemma2, granite

* do the same for flexattention,cohere, mistral, granite

* Update llama.py

* Update llama.py

* Update granite to work with latest post_patch methods (#1502)

* Update granite to work with latest post_patch methods

* Pass position_embeddings for granite even if transformers<4.47

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Minor fixes for granite models (#1503)

* Update granite.py

Grab residual multiplier directly from layer

* Update llama.py

Version should read >= 4.47.1 as that is the version requiring the changes

* Update granite.py

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* support modelscope models and datasets (#1481)

* support modelscope

* change modelscope args

* remove useless import

* remove useless import

* fix

* wip

* fix

* remove useless code

* add readme

* add some comments

* change print to raise error

* update comment

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Merge branch 'main' into nightly

* Phi 4

* Update llama.py

* Torch.Cuda Is Available Condition and Warning (#1545)

* check for torch.cuda and triton if available
on my machine(mac m3) the cuda were not available

* Update pyproject.toml

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update mistral.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix

* Bug fixes

* Update mapper.py

* Add dropout to granite to match HF's implementation (#1557)

Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>

* Update llama.py

* Update llama.py

* Bug fixes

* fix: flash_attn_detection_error (#1556)

* fix: flash_attn_detection_error

* Update _utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mapper.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* dim fix

* Update _utils.py

* Torch 2.6 support

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Faster inference?

* Update llama.py

* Update llama.py

* Update utils.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update mapper.py

* Fast Inference via vLLM

* Update llama.py

* Update llama.py

* Update utils.py

* Create rl.py

* PatchRL

* Update rl.py

* Update rl.py

* Update rl.py

* PatchRLStatistics

* Update rl.py

* Update rl.py

* Update rl.py

* Update utils.py

* Update utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* RL metrics

* Update rl.py

* RL metrics

* Update __init__.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update chat_templates.py

* Update mapper.py

* Fp8 cache

* Update llama.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update __init__.py

* Update loader.py

* Update rl.py

* Update rl.py

* Update _utils.py

---------

Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Itsuro Tajima <tajima@georepublic.de>
Co-authored-by: Muhammad Osama <muhammadosama1994@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Z <coffeevampirebusiness@gmail.com>
Co-authored-by: tastelikefeet <58414341+tastelikefeet@users.noreply.github.com>
Co-authored-by: AminWhat <88392440+aminwhat@users.noreply.github.com>
Co-authored-by: Zhe Zhang <2631992879@qq.com>
2025-02-06 05:08:22 -08:00
Daniel Han
10b604431a Update _utils.py 2025-02-06 05:07:37 -08:00
Daniel Han
d437235d15 Update rl.py 2025-02-06 04:29:04 -08:00
Daniel Han
0974094f02 Update rl.py 2025-02-06 04:23:20 -08:00
Daniel Han
7854bc2cb8 Merge branch 'main' into nightly 2025-02-06 03:49:16 -08:00
Daniel Han
c8e4d4b767 Update 2025-02-06 03:23:54 -08:00
Daniel Han
5d6275c957 Update _utils.py 2025-02-06 02:44:46 -08:00
Daniel Han
e288d96272 Update pyproject.toml 2025-02-06 02:44:23 -08:00
Daniel Han
144190bd06 GRPO, vLLM, Bug Fixes, Reinforcement Learning (#1620)
* use exact model name

* Update save.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* print

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update vision.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* accurate_accumulation

* Update loader.py

* Update loader.py

* Update _utils.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update pyproject.toml

* Update __init__.py

* Update pyproject.toml

* Update __init__.py

* Update __init__.py

* Fix Triton heuristics

https://github.com/triton-lang/triton/issues/5224

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Xformers

* Update loader.py

* Update loader.py

* Rewind

* Update _utils.py

* Update _utils.py

* requires grad

* Update loader.py

* Update _utils.py

* Update loader.py

* changing model to base_model if peft model is already used

* Improve debugging experience (#1512)

* Create CONTRIBUTING.md (#1472)

Creating contributing guidelines

* Update CONTRIBUTING.md

improved sentence

* Improve logging control in `unsloth_compile_transformers` by conditionally redirecting stdout based on UNSLOTH_DISABLE_LOGGER environment variable

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>

* Update loader.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a8edd0931a.

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Auto change is_bfloat16_supported

* Update llama.py

* Force data-type

* Update llama.py

* All attention refactor fix (#1491)

* change initilization of n_heads, n_kv_heads, hidden_size in llama.py

* do the same for cohere, mistral, gemma2, granite

* do the same for flexattention,cohere, mistral, granite

* Update llama.py

* Update llama.py

* Update granite to work with latest post_patch methods (#1502)

* Update granite to work with latest post_patch methods

* Pass position_embeddings for granite even if transformers<4.47

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Minor fixes for granite models (#1503)

* Update granite.py

Grab residual multiplier directly from layer

* Update llama.py

Version should read >= 4.47.1 as that is the version requiring the changes

* Update granite.py

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* support modelscope models and datasets (#1481)

* support modelscope

* change modelscope args

* remove useless import

* remove useless import

* fix

* wip

* fix

* remove useless code

* add readme

* add some comments

* change print to raise error

* update comment

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Merge branch 'main' into nightly

* Phi 4

* Update llama.py

* Torch.Cuda Is Available Condition and Warning (#1545)

* check for torch.cuda and triton if available
on my machine(mac m3) the cuda were not available

* Update pyproject.toml

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update mistral.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix

* Bug fixes

* Update mapper.py

* Add dropout to granite to match HF's implementation (#1557)

Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>

* Update llama.py

* Update llama.py

* Bug fixes

* fix: flash_attn_detection_error (#1556)

* fix: flash_attn_detection_error

* Update _utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mapper.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* dim fix

* Update _utils.py

* Torch 2.6 support

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Faster inference?

* Update llama.py

* Update llama.py

* Update utils.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update mapper.py

* Fast Inference via vLLM

* Update llama.py

* Update llama.py

* Update utils.py

* Create rl.py

* PatchRL

* Update rl.py

* Update rl.py

* Update rl.py

* PatchRLStatistics

* Update rl.py

* Update rl.py

* Update rl.py

* Update utils.py

* Update utils.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* RL metrics

* Update rl.py

* RL metrics

* Update __init__.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update chat_templates.py

* Update mapper.py

* Fp8 cache

* Update llama.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update __init__.py

* Update loader.py

---------

Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Itsuro Tajima <tajima@georepublic.de>
Co-authored-by: Muhammad Osama <muhammadosama1994@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Z <coffeevampirebusiness@gmail.com>
Co-authored-by: tastelikefeet <58414341+tastelikefeet@users.noreply.github.com>
Co-authored-by: AminWhat <88392440+aminwhat@users.noreply.github.com>
Co-authored-by: Zhe Zhang <2631992879@qq.com>
2025-02-06 02:41:12 -08:00
Daniel Han
9dd1219358 Update loader.py 2025-02-06 02:36:42 -08:00
Daniel Han
7d9a7aaa3d Update __init__.py 2025-02-06 02:32:34 -08:00
Daniel Han
79d2b0aea3 Update rl.py 2025-02-06 02:31:01 -08:00
Daniel Han
d6ff188cec Update rl.py 2025-02-06 02:24:13 -08:00
Daniel Han
965dee5f5e Update rl.py 2025-02-06 02:21:08 -08:00
Daniel Han
e71a8aa101 Update rl.py 2025-02-06 02:14:59 -08:00
Daniel Han
68da8b69b7 Update rl.py 2025-02-06 01:57:37 -08:00
Daniel Han
da966a4e1c Update rl.py 2025-02-06 01:50:22 -08:00
Daniel Han
fc4555c288 Update rl.py 2025-02-06 01:47:44 -08:00
Daniel Han
e20efe32a3 Update rl.py 2025-02-06 01:16:43 -08:00
Daniel Han
10d65ea437 Update rl.py 2025-02-06 01:12:33 -08:00
Daniel Han
4724be861e Update rl.py 2025-02-06 01:10:42 -08:00
Daniel Han
eb1f9b91f1 Update rl.py 2025-02-06 01:08:31 -08:00
Daniel Han
5c0ed78b9b Update rl.py 2025-02-06 01:07:00 -08:00
Daniel Han
c998665f0b Update rl.py 2025-02-06 01:06:40 -08:00
Daniel Han
c7f2ec3338 Update rl.py 2025-02-06 01:06:20 -08:00
Daniel Han
d52341f06b Update rl.py 2025-02-06 01:05:48 -08:00
Daniel Han
71c34db351 Update rl.py 2025-02-06 01:02:31 -08:00
Daniel Han
32f09a14d6 Update rl.py 2025-02-06 00:59:13 -08:00
Daniel Han
b196ba85ca Update llama.py 2025-02-05 20:30:36 -08:00
Daniel Han
6719a6134b Update llama.py 2025-02-05 20:24:33 -08:00
Daniel Han
18de8f29c8 Fp8 cache 2025-02-05 20:00:02 -08:00
Daniel Han
b8a833e1af Update mapper.py 2025-02-05 18:31:01 -08:00
Daniel Han
7cc03b2b51 Update chat_templates.py 2025-02-05 17:52:36 -08:00
Daniel Han
9455082740 Update rl.py 2025-02-05 15:36:59 -08:00
Daniel Han
1ff75a7dc7 Update rl.py 2025-02-05 15:21:53 -08:00
Daniel Han
c0805c415f Update rl.py 2025-02-05 15:16:44 -08:00
Daniel Han
ba0a2871c2 Update __init__.py 2025-02-05 15:11:40 -08:00
Daniel Han
093cd0cde3 RL metrics 2025-02-05 15:08:10 -08:00
Daniel Han
cc6bb7d1db Update rl.py 2025-02-05 15:02:52 -08:00
Daniel Han
8ce4a73bfc RL metrics 2025-02-05 14:59:01 -08:00
Daniel Han
b7bd548779 Update rl.py 2025-02-05 07:28:16 -08:00
Daniel Han
9b76d49761 Update rl.py 2025-02-05 07:27:07 -08:00
Daniel Han
cc927d2d18 Update rl.py 2025-02-05 07:25:05 -08:00
Daniel Han
d7c3f9cba6 Update rl.py 2025-02-05 07:13:40 -08:00
Daniel Han
aebeeb4901 Update rl.py 2025-02-05 06:58:04 -08:00
Daniel Han
8506d6be95 Update rl.py 2025-02-05 06:56:12 -08:00
Daniel Han
e59e196448 Update rl.py 2025-02-05 06:50:39 -08:00
Daniel Han
7e7fc35625 Update rl.py 2025-02-05 06:50:14 -08:00
Daniel Han
6aa0fc7e28 Update rl.py 2025-02-05 06:48:54 -08:00
Daniel Han
a6f919f60c Update rl.py 2025-02-05 06:44:18 -08:00
Daniel Han
eed2ac7329 Update rl.py 2025-02-05 06:41:37 -08:00
Daniel Han
bfe87a51bf Update rl.py 2025-02-05 06:37:32 -08:00
Daniel Han
0aa4c035e8 Update rl.py 2025-02-05 06:32:51 -08:00
Daniel Han
ea55289b5a Update rl.py 2025-02-05 06:28:54 -08:00
Daniel Han
dc7b58bad3 Update rl.py 2025-02-05 06:14:12 -08:00
Daniel Han
648efd0525 Update utils.py 2025-02-05 06:02:42 -08:00
Daniel Han
e7a1f0458e Update utils.py 2025-02-05 06:01:38 -08:00
Daniel Han
fc02b50a56 Update rl.py 2025-02-05 05:47:23 -08:00
Daniel Han
c0c4f56208 Update rl.py 2025-02-05 05:45:05 -08:00
Daniel Han
6386de4cce Update rl.py 2025-02-05 05:36:51 -08:00
Daniel Han
a94afad455 PatchRLStatistics 2025-02-05 05:36:04 -08:00
Daniel Han
19b40e883b Update rl.py 2025-02-05 05:24:36 -08:00
Daniel Han
e702cfa179 Update rl.py 2025-02-05 05:23:19 -08:00
Daniel Han
7dfd171c55 Update rl.py 2025-02-05 05:19:37 -08:00
Daniel Han
5e69427fda PatchRL 2025-02-05 05:17:38 -08:00
Daniel Han
665f52065f Create rl.py 2025-02-05 05:14:01 -08:00
Daniel Han
b20253f713 Update utils.py 2025-02-05 04:02:40 -08:00
Daniel Han
0cab914893 Update llama.py 2025-02-05 02:56:16 -08:00
Daniel Han
e4ac52fe85 Update llama.py 2025-02-05 02:30:51 -08:00
Daniel Han
4157c640b7 Fast Inference via vLLM 2025-02-05 02:21:15 -08:00
Daniel Han
5b7b456514 Update mapper.py 2025-02-03 19:43:52 -08:00
Daniel Han
82e011e8d4 Update utils.py 2025-02-03 15:43:40 -08:00
Daniel Han
dc7f0fca4f Update utils.py 2025-02-02 17:12:21 -08:00
Daniel Han
a274e75db7 Update utils.py 2025-02-02 17:07:31 -08:00
Daniel Han
9296a6a93d Update utils.py 2025-02-02 17:04:37 -08:00
Daniel Han
126d804d2a Update utils.py 2025-02-02 17:00:24 -08:00
Daniel Han
ed31cf80c6 Update utils.py 2025-02-02 16:31:28 -08:00
Daniel Han
6e5a6af0d3 Update utils.py 2025-02-02 16:29:26 -08:00
Daniel Han
31449956a3 Update utils.py 2025-02-02 16:26:55 -08:00
Daniel Han
e961d68733 Update utils.py 2025-02-02 16:24:10 -08:00
Daniel Han
35609081cd Update utils.py 2025-02-02 16:23:31 -08:00
Daniel Han
56b6c37ec0 Update utils.py 2025-02-02 16:18:56 -08:00
Daniel Han
1f42dc194a Update utils.py 2025-02-02 15:23:05 -08:00
Daniel Han
8d457854ac Update utils.py 2025-02-02 15:17:50 -08:00
Daniel Han
61eb34b6c2 Update llama.py 2025-02-02 14:54:58 -08:00
Daniel Han
e91706febf Update llama.py 2025-02-02 14:51:14 -08:00
Daniel Han
24b84d89ec Update utils.py 2025-02-02 13:52:15 -08:00
Daniel Han
ffe31d5cea Update llama.py 2025-02-02 13:49:25 -08:00
Daniel Han
50d5250a57 Update llama.py 2025-02-02 13:47:14 -08:00
Daniel Han
9314e46e07 Faster inference? 2025-02-02 13:45:25 -08:00
Daniel Han
14d6199e63 Update llama.py 2025-02-02 04:00:47 -08:00
Daniel Han
f6bffaee84 Update llama.py 2025-02-02 03:57:24 -08:00
Daniel Han
f5d65f6570 Update llama.py 2025-02-02 03:56:25 -08:00
Daniel Han
865b7d685f Update llama.py 2025-02-02 03:56:13 -08:00
Daniel Han
6af2af1b48 Update llama.py 2025-02-02 03:49:18 -08:00
Daniel Han
c77471114b Update llama.py 2025-02-02 02:23:53 -08:00
Daniel Han
52aeca630e Update llama.py 2025-02-02 02:18:33 -08:00
Daniel Han
69c2cd23c0 Update llama.py 2025-02-02 02:15:54 -08:00
Daniel Han
2b5250f601 Update llama.py 2025-02-02 02:14:13 -08:00
Daniel Han
17c486acf8 Update llama.py 2025-02-02 02:11:33 -08:00
Daniel Han
ceea79e3a2 Update llama.py 2025-02-02 02:10:17 -08:00
Daniel Han
c571e2395e Update llama.py 2025-02-02 02:06:13 -08:00
Daniel Han
4936049259 Torch 2.6 support 2025-02-02 00:43:26 -08:00
Daniel Han
70fa248fe7 Merge branch 'main' into nightly 2025-02-01 23:46:58 -08:00
Daniel Han
e91ae59e84 Update _utils.py 2025-02-01 23:46:40 -08:00
Daniel Han
890c9cf818 dim fix 2025-02-01 19:27:33 -08:00
Daniel Han
a75869f545 Update gemma.py 2025-02-01 19:15:55 -08:00
Daniel Han
ff7cb20f93 Update gemma.py 2025-02-01 19:11:20 -08:00
Daniel Han
c360fcd526 Update gemma.py 2025-02-01 19:02:36 -08:00
Daniel Han
147623d270 Update gemma.py 2025-02-01 17:54:00 -08:00
Daniel Han
e2b23f17b1 Mistral 24B, Qwen 2.5 VL support (#1598)
* use exact model name

* Update save.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* print

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update vision.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* accurate_accumulation

* Update loader.py

* Update loader.py

* Update _utils.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update pyproject.toml

* Update __init__.py

* Update pyproject.toml

* Update __init__.py

* Update __init__.py

* Fix Triton heuristics

https://github.com/triton-lang/triton/issues/5224

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Xformers

* Update loader.py

* Update loader.py

* Rewind

* Update _utils.py

* Update _utils.py

* requires grad

* Update loader.py

* Update _utils.py

* Update loader.py

* changing model to base_model if peft model is already used

* Improve debugging experience (#1512)

* Create CONTRIBUTING.md (#1472)

Creating contributing guidelines

* Update CONTRIBUTING.md

improved sentence

* Improve logging control in `unsloth_compile_transformers` by conditionally redirecting stdout based on UNSLOTH_DISABLE_LOGGER environment variable

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>

* Update loader.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a8edd0931a.

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Auto change is_bfloat16_supported

* Update llama.py

* Force data-type

* Update llama.py

* All attention refactor fix (#1491)

* change initilization of n_heads, n_kv_heads, hidden_size in llama.py

* do the same for cohere, mistral, gemma2, granite

* do the same for flexattention,cohere, mistral, granite

* Update llama.py

* Update llama.py

* Update granite to work with latest post_patch methods (#1502)

* Update granite to work with latest post_patch methods

* Pass position_embeddings for granite even if transformers<4.47

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Minor fixes for granite models (#1503)

* Update granite.py

Grab residual multiplier directly from layer

* Update llama.py

Version should read >= 4.47.1 as that is the version requiring the changes

* Update granite.py

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* support modelscope models and datasets (#1481)

* support modelscope

* change modelscope args

* remove useless import

* remove useless import

* fix

* wip

* fix

* remove useless code

* add readme

* add some comments

* change print to raise error

* update comment

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Merge branch 'main' into nightly

* Phi 4

* Update llama.py

* Torch.Cuda Is Available Condition and Warning (#1545)

* check for torch.cuda and triton if available
on my machine(mac m3) the cuda were not available

* Update pyproject.toml

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update mistral.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix

* Bug fixes

* Update mapper.py

* Add dropout to granite to match HF's implementation (#1557)

Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>

* Update llama.py

* Update llama.py

* Bug fixes

* fix: flash_attn_detection_error (#1556)

* fix: flash_attn_detection_error

* Update _utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mapper.py

---------

Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Itsuro Tajima <tajima@georepublic.de>
Co-authored-by: Muhammad Osama <muhammadosama1994@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Z <coffeevampirebusiness@gmail.com>
Co-authored-by: tastelikefeet <58414341+tastelikefeet@users.noreply.github.com>
Co-authored-by: AminWhat <88392440+aminwhat@users.noreply.github.com>
Co-authored-by: Zhe Zhang <2631992879@qq.com>
2025-01-31 03:34:36 -08:00
Daniel Han
0240dc2c5e Merge branch 'main' into nightly 2025-01-31 03:33:30 -08:00
Daniel Han
4f57c09ee4 Update _utils.py 2025-01-31 03:33:24 -08:00
Daniel Han
7267b22c4e Merge branch 'main' into nightly 2025-01-31 03:02:42 -08:00
Daniel Han
b5763a886d Update mapper.py 2025-01-31 03:02:37 -08:00
Michael Han
edec640658 Merge pull request #1595 from unslothai/shimmyshimmer-patch-3
Update README.md
2025-01-30 21:05:57 -08:00
Michael Han
789af5b7f9 Update README.md 2025-01-30 21:05:45 -08:00
Michael Han
69d879970a Merge pull request #1580 from unslothai/shimmyshimmer-patch-2
Update README.md
2025-01-26 14:12:10 -08:00
Michael Han
748d1f1fd0 Update README.md
Updating super old benchmarks
2025-01-26 14:11:58 -08:00
Daniel Han
d847f90a29 Fix triton.ops 2025-01-22 17:49:20 -08:00
Daniel Han
73d58170b2 move TritonOps 2025-01-22 16:56:01 -08:00
Daniel Han
021bdad687 triton.ops error 2025-01-22 16:53:35 -08:00
Daniel Han
780a799542 Update __init__.py 2025-01-22 16:46:54 -08:00
Daniel Han
5509502af4 Merge branch 'main' into nightly 2025-01-22 16:46:04 -08:00
Daniel Han
2c79a95a0d Update __init__.py 2025-01-22 16:45:41 -08:00
Daniel Han
8ea67d78ac Fix triton.ops missing Triton 3.2 2025-01-22 16:44:48 -08:00
Michael Han
065b49867c Merge pull request #1569 from unslothai/shimmyshimmer-patch-1
Update README.md
2025-01-20 22:13:30 -08:00
Michael Han
b4c3b5eea9 Update README.md 2025-01-20 22:13:07 -08:00
Daniel Han
e633d7f056 Update mapper.py 2025-01-20 08:10:20 -08:00
Daniel Han
f90bd4ec49 Fix Mistral, Qwen (#1565)
* use exact model name

* Update save.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* print

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update vision.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* accurate_accumulation

* Update loader.py

* Update loader.py

* Update _utils.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update pyproject.toml

* Update __init__.py

* Update pyproject.toml

* Update __init__.py

* Update __init__.py

* Fix Triton heuristics

https://github.com/triton-lang/triton/issues/5224

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Xformers

* Update loader.py

* Update loader.py

* Rewind

* Update _utils.py

* Update _utils.py

* requires grad

* Update loader.py

* Update _utils.py

* Update loader.py

* changing model to base_model if peft model is already used

* Improve debugging experience (#1512)

* Create CONTRIBUTING.md (#1472)

Creating contributing guidelines

* Update CONTRIBUTING.md

improved sentence

* Improve logging control in `unsloth_compile_transformers` by conditionally redirecting stdout based on UNSLOTH_DISABLE_LOGGER environment variable

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>

* Update loader.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a8edd0931a.

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Auto change is_bfloat16_supported

* Update llama.py

* Force data-type

* Update llama.py

* All attention refactor fix (#1491)

* change initilization of n_heads, n_kv_heads, hidden_size in llama.py

* do the same for cohere, mistral, gemma2, granite

* do the same for flexattention,cohere, mistral, granite

* Update llama.py

* Update llama.py

* Update granite to work with latest post_patch methods (#1502)

* Update granite to work with latest post_patch methods

* Pass position_embeddings for granite even if transformers<4.47

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Minor fixes for granite models (#1503)

* Update granite.py

Grab residual multiplier directly from layer

* Update llama.py

Version should read >= 4.47.1 as that is the version requiring the changes

* Update granite.py

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* support modelscope models and datasets (#1481)

* support modelscope

* change modelscope args

* remove useless import

* remove useless import

* fix

* wip

* fix

* remove useless code

* add readme

* add some comments

* change print to raise error

* update comment

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Merge branch 'main' into nightly

* Phi 4

* Update llama.py

* Torch.Cuda Is Available Condition and Warning (#1545)

* check for torch.cuda and triton if available
on my machine(mac m3) the cuda were not available

* Update pyproject.toml

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update mistral.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix

* Bug fixes

* Update mapper.py

* Add dropout to granite to match HF's implementation (#1557)

Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>

* Update llama.py

* Update llama.py

* Bug fixes

* fix: flash_attn_detection_error (#1556)

* fix: flash_attn_detection_error

* Update _utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

---------

Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Itsuro Tajima <tajima@georepublic.de>
Co-authored-by: Muhammad Osama <muhammadosama1994@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Z <coffeevampirebusiness@gmail.com>
Co-authored-by: tastelikefeet <58414341+tastelikefeet@users.noreply.github.com>
Co-authored-by: AminWhat <88392440+aminwhat@users.noreply.github.com>
Co-authored-by: Zhe Zhang <2631992879@qq.com>
2025-01-20 01:27:24 -08:00
Zhe Zhang
9c5accea7b fix: flash_attn_detection_error (#1556)
* fix: flash_attn_detection_error

* Update _utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-01-20 01:25:31 -08:00
Daniel Han
6b90239a33 Bug fixes 2025-01-20 01:10:55 -08:00
Daniel Han
928a9c631b Update llama.py 2025-01-19 19:19:08 -08:00
Daniel Han
7995b56526 Merge branch 'main' into nightly 2025-01-19 19:18:05 -08:00
Daniel Han
b979a01ba9 Update llama.py 2025-01-19 15:24:14 -08:00
Daniel Han
db16fb3022 Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly 2025-01-19 14:03:11 -08:00
Datta Nimmaturi
f5fb462bec Add dropout to granite to match HF's implementation (#1557)
Signed-off-by: datta0 <venkatadattasainimmaturi@gmail.com>
2025-01-19 03:54:12 -08:00
Daniel Han
0adfa0bc7d Update mapper.py 2025-01-19 01:37:13 -08:00
Daniel Han
fdd0ace6fd Update issue templates 2025-01-17 00:43:11 -08:00
Daniel Han
1576396cd0 Bug fixes 2025-01-16 03:09:02 -08:00
Daniel Han
3b908c36e8 Fix 2025-01-16 01:22:13 -08:00
Daniel Han
883d793607 Update _utils.py 2025-01-16 01:18:15 -08:00
Daniel Han
77e2c4a0d7 Update _utils.py 2025-01-16 01:15:42 -08:00
Daniel Han
d806dcaf8d Update _utils.py 2025-01-16 01:10:40 -08:00
Daniel Han
d05463bcd0 Update _utils.py 2025-01-16 01:09:23 -08:00
Daniel Han
624ad17c9a Update _utils.py 2025-01-16 01:07:23 -08:00
Daniel Han
6b5bfa5147 Update mistral.py 2025-01-16 00:58:46 -08:00
Daniel Han
d996955627 Update mistral.py 2025-01-16 00:56:56 -08:00
AminWhat
6b22725df7 Torch.Cuda Is Available Condition and Warning (#1545)
* check for torch.cuda and triton if available
on my machine(mac m3) the cuda were not available

* Update pyproject.toml

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-01-15 23:02:23 -08:00
Daniel Han
5266a43d0b Merge branch 'main' into nightly 2025-01-15 22:55:45 -08:00
Michael Han
3c789bc88e Merge pull request #1542 from unslothai/shimmyshimmer-patch-3
Update README.md
2025-01-14 23:20:19 -08:00
Michael Han
e3162dc5bf Update README.md
Update to benchmark tables
2025-01-14 23:20:07 -08:00
Daniel Han
ef86ce5cba Update llama.py 2025-01-14 22:32:44 -08:00
Daniel Han
fb1397d926 Merge branch 'main' into nightly 2025-01-14 22:32:02 -08:00
Daniel Han
0a2c397393 Update issue templates 2025-01-14 03:13:35 -08:00
Daniel Han
64c54c284e Update bug_report.md (#1538) 2025-01-14 03:12:17 -08:00
Daniel Han
a732ae88f9 Update issue templates 2025-01-14 03:10:29 -08:00
Michael Han
2033f40135 Merge pull request #1529 from unslothai/shimmyshimmer-patch-2
Update README.md
2025-01-11 17:35:11 -08:00
Michael Han
08c330b7cc Update README.md 2025-01-11 17:34:51 -08:00
Michael Han
9569392187 Merge pull request #1515 from unslothai/shimmyshimmer-patch-1
Update README.md for Notebooks
2025-01-10 10:13:04 -08:00
Daniel Han
a72a9d9b06 Update mapper.py 2025-01-10 04:34:23 -08:00
Michael Han
db14c7f182 Update README.md 2025-01-09 16:59:43 -08:00
Michael Han
59d7cd9888 Update README.md 2025-01-08 23:02:27 -08:00
Daniel Han
e42bd98706 Update Unsloth-Zoo 2025-01-08 16:46:04 -08:00
Daniel Han
4feae9ae42 Update _utils.py 2025-01-08 15:48:40 -08:00
Daniel Han
1767be3692 Update tokenizer_utils.py 2025-01-08 15:48:11 -08:00
Daniel Han
3b4364985f Phi-4 bug fix 2025-01-08 15:40:27 -08:00
Daniel Han
6cbfca8c63 Phi-4 (#1523)
* use exact model name

* Update save.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* print

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update vision.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* accurate_accumulation

* Update loader.py

* Update loader.py

* Update _utils.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update pyproject.toml

* Update __init__.py

* Update pyproject.toml

* Update __init__.py

* Update __init__.py

* Fix Triton heuristics

https://github.com/triton-lang/triton/issues/5224

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Xformers

* Update loader.py

* Update loader.py

* Rewind

* Update _utils.py

* Update _utils.py

* requires grad

* Update loader.py

* Update _utils.py

* Update loader.py

* changing model to base_model if peft model is already used

* Improve debugging experience (#1512)

* Create CONTRIBUTING.md (#1472)

Creating contributing guidelines

* Update CONTRIBUTING.md

improved sentence

* Improve logging control in `unsloth_compile_transformers` by conditionally redirecting stdout based on UNSLOTH_DISABLE_LOGGER environment variable

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>

* Update loader.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit b7ddf962d2.

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Auto change is_bfloat16_supported

* Update llama.py

* Force data-type

* Update llama.py

* All attention refactor fix (#1491)

* change initilization of n_heads, n_kv_heads, hidden_size in llama.py

* do the same for cohere, mistral, gemma2, granite

* do the same for flexattention,cohere, mistral, granite

* Update llama.py

* Update llama.py

* Update granite to work with latest post_patch methods (#1502)

* Update granite to work with latest post_patch methods

* Pass position_embeddings for granite even if transformers<4.47

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Minor fixes for granite models (#1503)

* Update granite.py

Grab residual multiplier directly from layer

* Update llama.py

Version should read >= 4.47.1 as that is the version requiring the changes

* Update granite.py

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* support modelscope models and datasets (#1481)

* support modelscope

* change modelscope args

* remove useless import

* remove useless import

* fix

* wip

* fix

* remove useless code

* add readme

* add some comments

* change print to raise error

* update comment

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Merge branch 'main' into nightly

* Phi 4

---------

Co-authored-by: Itsuro Tajima <tajima@georepublic.de>
Co-authored-by: Muhammad Osama <muhammadosama1994@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Z <coffeevampirebusiness@gmail.com>
Co-authored-by: tastelikefeet <58414341+tastelikefeet@users.noreply.github.com>
2025-01-08 15:10:46 -08:00
Daniel Han
1820995bae Merge branch 'main' into nightly 2025-01-08 15:10:13 -08:00
Daniel Han
f77d6d608c Phi 4 2025-01-08 14:38:41 -08:00
Daniel Han
0554918864 Merge branch 'main' into nightly 2025-01-08 12:42:18 -08:00
sebaxakerhtc
71ca60c7f0 Update __init__.py (#1520)
* Update __init__.py

This PR is solving the (issue)[https://github.com/unslothai/unsloth/issues/1518] with some GPUs

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-01-07 14:51:17 -08:00
Daniel Han
d90aefea98 Update pyproject.toml 2025-01-07 04:29:09 -08:00
Daniel Han
63782ea3af Bug fixes (#1516)
* use exact model name

* Update save.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* print

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update vision.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* accurate_accumulation

* Update loader.py

* Update loader.py

* Update _utils.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update pyproject.toml

* Update __init__.py

* Update pyproject.toml

* Update __init__.py

* Update __init__.py

* Fix Triton heuristics

https://github.com/triton-lang/triton/issues/5224

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Xformers

* Update loader.py

* Update loader.py

* Rewind

* Update _utils.py

* Update _utils.py

* requires grad

* Update loader.py

* Update _utils.py

* Update loader.py

* changing model to base_model if peft model is already used

* Improve debugging experience (#1512)

* Create CONTRIBUTING.md (#1472)

Creating contributing guidelines

* Update CONTRIBUTING.md

improved sentence

* Improve logging control in `unsloth_compile_transformers` by conditionally redirecting stdout based on UNSLOTH_DISABLE_LOGGER environment variable

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>

* Update loader.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit b7ddf962d2.

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Auto change is_bfloat16_supported

* Update llama.py

* Force data-type

* Update llama.py

* All attention refactor fix (#1491)

* change initilization of n_heads, n_kv_heads, hidden_size in llama.py

* do the same for cohere, mistral, gemma2, granite

* do the same for flexattention,cohere, mistral, granite

* Update llama.py

* Update llama.py

* Update granite to work with latest post_patch methods (#1502)

* Update granite to work with latest post_patch methods

* Pass position_embeddings for granite even if transformers<4.47

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Minor fixes for granite models (#1503)

* Update granite.py

Grab residual multiplier directly from layer

* Update llama.py

Version should read >= 4.47.1 as that is the version requiring the changes

* Update granite.py

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* support modelscope models and datasets (#1481)

* support modelscope

* change modelscope args

* remove useless import

* remove useless import

* fix

* wip

* fix

* remove useless code

* add readme

* add some comments

* change print to raise error

* update comment

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

---------

Co-authored-by: Itsuro Tajima <tajima@georepublic.de>
Co-authored-by: Muhammad Osama <muhammadosama1994@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Z <coffeevampirebusiness@gmail.com>
Co-authored-by: tastelikefeet <58414341+tastelikefeet@users.noreply.github.com>
2025-01-07 04:23:14 -08:00
tastelikefeet
83421fd2b5 support modelscope models and datasets (#1481)
* support modelscope

* change modelscope args

* remove useless import

* remove useless import

* fix

* wip

* fix

* remove useless code

* add readme

* add some comments

* change print to raise error

* update comment

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-01-07 04:09:36 -08:00
Z
3cde4e1922 Minor fixes for granite models (#1503)
* Update granite.py

Grab residual multiplier directly from layer

* Update llama.py

Version should read >= 4.47.1 as that is the version requiring the changes

* Update granite.py

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-01-07 03:58:40 -08:00
Datta Nimmaturi
c8e9dcf4f8 Update granite to work with latest post_patch methods (#1502)
* Update granite to work with latest post_patch methods

* Pass position_embeddings for granite even if transformers<4.47

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-01-07 03:49:11 -08:00
Daniel Han
84fa1af77e Update llama.py 2025-01-07 03:39:08 -08:00
Daniel Han
b7d47c1d8a Update llama.py 2025-01-07 03:33:46 -08:00
Kareem
0e2110f7b8 All attention refactor fix (#1491)
* change initilization of n_heads, n_kv_heads, hidden_size in llama.py

* do the same for cohere, mistral, gemma2, granite

* do the same for flexattention,cohere, mistral, granite
2025-01-07 02:41:15 -08:00
Michael Han
4ce92cfe2c Update README.md
Notebook links
2025-01-07 02:02:59 -08:00
Daniel Han
a4aba47ebb Update llama.py 2025-01-07 01:56:32 -08:00
Daniel Han
0672e71b17 Force data-type 2025-01-07 01:51:49 -08:00
Daniel Han
6320381fb6 Update llama.py 2025-01-07 01:43:20 -08:00
Daniel Han
ab2b72c5f0 Auto change is_bfloat16_supported 2025-01-07 01:40:04 -08:00
Daniel Han
9e00262be6 Update llama.py 2025-01-07 01:10:41 -08:00
Daniel Han
4df3af2f57 Update llama.py 2025-01-07 01:05:04 -08:00
Daniel Han
358316522f Update llama.py 2025-01-07 01:02:35 -08:00
Daniel Han
97c3e282fb Update llama.py 2025-01-07 01:02:24 -08:00
Daniel Han
020c793a1e Update llama.py 2025-01-07 00:50:56 -08:00
Daniel Han
656099cfc3 Update llama.py 2025-01-07 00:45:22 -08:00
Daniel Han
837d620dbd Update llama.py 2025-01-07 00:41:26 -08:00
Daniel Han
c95380cc9c Update llama.py 2025-01-07 00:41:16 -08:00
Daniel Han
689ca57214 Update llama.py 2025-01-07 00:38:04 -08:00
Daniel Han
dc33cc94a7 Update llama.py 2025-01-07 00:34:46 -08:00
Daniel Han
f791766ab9 Update llama.py 2025-01-07 00:34:09 -08:00
Daniel Han
a7740ba8e9 Update llama.py 2025-01-07 00:30:32 -08:00
Daniel Han
883c25d34c Merge branch 'pr/1509' into nightly 2025-01-06 22:08:07 -08:00
Daniel Han
c4720f1baf Update llama.py 2025-01-06 22:06:00 -08:00
Daniel Han
294cd8ea32 Revert "Update llama.py"
This reverts commit a8edd0931a.
2025-01-06 22:05:44 -08:00
Daniel Han
a8edd0931a Update llama.py 2025-01-06 22:05:14 -08:00
Daniel Han
0f6b518ee1 Update llama.py 2025-01-06 18:56:26 -08:00
Daniel Han
adb2dcfd2b Update loader.py 2025-01-06 18:13:48 -08:00
Edd
9940583287 Improve debugging experience (#1512)
* Create CONTRIBUTING.md (#1472)

Creating contributing guidelines

* Update CONTRIBUTING.md

improved sentence

* Improve logging control in `unsloth_compile_transformers` by conditionally redirecting stdout based on UNSLOTH_DISABLE_LOGGER environment variable

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
2025-01-06 18:04:27 -08:00
Daniel Han
8cf3e6fa2b Merge branch 'main' into nightly 2025-01-06 18:03:53 -08:00
Muhammad Osama
6b9e11bdf3 changing model to base_model if peft model is already used 2025-01-05 18:18:42 -06:00
Michael Han
48627f876c Merge pull request #1507 from NinoRisteski/patch-1
Update CONTRIBUTING.md
2025-01-05 01:56:40 -08:00
Nino Risteski
8063abc004 Update CONTRIBUTING.md
improved sentence
2025-01-05 09:24:10 +01:00
Michael Han
fb49390494 Create CONTRIBUTING.md (#1472)
Creating contributing guidelines
2025-01-04 22:09:25 -08:00
Daniel Han
d08c8afd6c Merge branch 'pr/1339' into nightly 2025-01-04 22:08:00 -08:00
Daniel Han
3e1c5ec3a0 Update loader.py 2025-01-04 22:03:11 -08:00
Daniel Han
c697d6d01a Update _utils.py 2025-01-04 00:42:39 -08:00
Daniel Han
5cf47b3e63 Update loader.py 2025-01-02 22:44:17 -08:00
Daniel Han
75ffad921f requires grad 2025-01-02 18:25:25 -08:00
Daniel Han
69017405db Update _utils.py 2025-01-01 22:34:40 -08:00
Daniel Han
163ef43181 Update _utils.py 2025-01-01 22:32:58 -08:00
Daniel Han
f7a322e5d5 Rewind 2025-01-01 16:34:18 -08:00
Daniel Han
16b42fd5e2 Update loader.py 2025-01-01 16:27:08 -08:00
Daniel Han
b3dec6af35 Update loader.py 2025-01-01 02:06:41 -08:00
Daniel Han
f2ff798c4e Xformers 2024-12-31 22:42:28 -08:00
Daniel Han
48c743d508 Update __init__.py 2024-12-31 12:35:56 -08:00
Daniel Han
6e05c84a26 Update __init__.py 2024-12-31 12:31:30 -08:00
Daniel Han
c83f5422a1 Update __init__.py 2024-12-31 00:38:18 -08:00
Daniel Han
fddf14ebc8 Update __init__.py 2024-12-30 14:13:49 -08:00
Daniel Han
5c439a0bd6 Fix Triton heuristics
https://github.com/triton-lang/triton/issues/5224
2024-12-30 13:52:42 -08:00
Daniel Han
9611ee433e Update __init__.py 2024-12-29 19:36:11 -08:00
Daniel Han
7c74db901b Update __init__.py 2024-12-29 19:35:05 -08:00
Daniel Han
10a7d4fc93 Update pyproject.toml 2024-12-29 19:34:30 -08:00
Daniel Han
4b7aa371fa Update __init__.py 2024-12-29 19:29:19 -08:00
Daniel Han
87e8b675e5 Merge branch 'main' into nightly 2024-12-29 03:58:02 -08:00
Daniel Han
408563debc Update _utils.py 2024-12-29 03:57:58 -08:00
Daniel Han
e254125954 Bug fixes (#1484)
* Update save.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* print

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update vision.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* accurate_accumulation

* Update loader.py

* Update loader.py

* Update _utils.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update pyproject.toml
2024-12-29 03:57:31 -08:00
Daniel Han
4b9c6110b1 Update pyproject.toml 2024-12-29 03:57:21 -08:00
Daniel Han
19beafcebd Update loader.py 2024-12-29 03:53:46 -08:00
Daniel Han
47879d01ae Update loader.py 2024-12-28 21:28:49 -08:00
Daniel Han
ea1286faf4 Update loader.py 2024-12-28 18:02:00 -08:00
Daniel Han
f98b98ae46 Update loader.py 2024-12-28 03:29:58 -08:00
Daniel Han
5930e5e8e2 Update _utils.py 2024-12-28 03:24:18 -08:00
Daniel Han
3402f7403d Update loader.py 2024-12-28 03:21:41 -08:00
Daniel Han
5395a053c0 Update loader.py 2024-12-28 03:12:03 -08:00
Daniel Han
aacc8e227b accurate_accumulation 2024-12-28 03:11:53 -08:00
Daniel Han
6cdf1868ce Update loader.py 2024-12-27 01:02:40 -08:00
Daniel Han
7fdcb7f24d Update _utils.py 2024-12-27 00:59:16 -08:00
Daniel Han
e1880e2b76 Update _utils.py 2024-12-27 00:54:12 -08:00
Daniel Han
8ac2d07946 Update _utils.py 2024-12-27 00:33:41 -08:00
Daniel Han
05d975a591 Update _utils.py 2024-12-26 23:45:19 -08:00
Daniel Han
9e1004f95c Update _utils.py 2024-12-26 23:41:44 -08:00
Daniel Han
59ffd06268 Update _utils.py 2024-12-26 23:37:25 -08:00
Daniel Han
9837ec964d Update _utils.py 2024-12-26 23:35:15 -08:00
Daniel Han
414d55cf89 Update _utils.py 2024-12-26 22:25:19 -08:00
Daniel Han
2ab4dca36c Update vision.py 2024-12-26 21:39:38 -08:00
Daniel Han
4da5306917 Update _utils.py 2024-12-26 19:30:49 -08:00
Daniel Han
1f81f9f5ab Update llama.py 2024-12-26 19:16:58 -08:00
Daniel Han
19cb433a7b Update _utils.py 2024-12-26 19:12:13 -08:00
Daniel Han
f1f390fd88 Update _utils.py 2024-12-26 19:12:02 -08:00
Daniel Han
9197ce07cb print 2024-12-26 18:45:16 -08:00
Daniel Han
bbe37dcc51 Update _utils.py 2024-12-26 18:21:30 -08:00
Daniel Han
03c889dc21 Update _utils.py 2024-12-26 18:19:45 -08:00
Daniel Han
de96d480e0 Update _utils.py 2024-12-26 18:12:52 -08:00
Daniel Han
734c5338dc Update _utils.py 2024-12-26 17:11:22 -08:00
Daniel Han
264b85da26 Update save.py 2024-12-26 17:07:25 -08:00
Daniel Han
da4741bbef Update pyproject.toml 2024-12-26 04:12:46 -08:00
Daniel Han
17b591675a Update _utils.py 2024-12-26 04:05:07 -08:00
Daniel Han
47f210922d Update pyproject.toml 2024-12-26 04:04:23 -08:00
Daniel Han
30bb143a87 Update pyproject.toml 2024-12-26 03:26:01 -08:00
Daniel Han
25d7b25ab5 Merge branch 'main' into nightly 2024-12-26 01:44:31 -08:00
Daniel Han
c8d2f5b8da Update save.py 2024-12-26 01:44:12 -08:00
Daniel Han
160ff801f7 Bug fixes (#1473)
* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254)

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors

* Update cross_entropy_loss.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Throw error when inferencing longer than max_popsition_embeddings (#1236)

* Throw error when inferencing longer than max_popsition_embeddings without rope scaling

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* CLI now handles user input strings for dtype correctly (#1235)

Co-authored-by: root <root@ieeres.chu.cam.ac.uk>

* Update flex_attention.py

* Update _utils.py

* Update _utils.py

* Update flex_attention.py

* Update flex_attention.py

* Update loader.py

* Update loader.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* triton_cast

* Update utils.py

* Qwen 2.5 Coder

* Fix/export mistral (#1281)

* Enhance install_python_non_blocking to handle protobuf installation and process management

* Revert "Enhance install_python_non_blocking to handle protobuf installation and process management"

This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266"

This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* DOC Update - Update README.md with os.environ in example (#1269)

* Update README.md with os.environ in example

Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. 
As currently the  unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required.
Small change but a bit time saver for those who straight away copies the tutorials

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/get_chat_template (#1246)

* Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to

* Remove type hinting

* Update chat_templates.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/sft-trainer (#1276)

* Add patch for SFTTrainer to maintain backward compatibility with TRL changes

* Update trainer.py

* Update trainer.py

* Refactor trainer patch to maintain backward compatibility with TRL changes

* Update trainer.py

* Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update __init__.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update tokenizer_utils.py

* Update llama.py

* Fix #853

* fix/sfttrainer-compatibility (#1293)

* Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance

* Update trainer.py

* Update trainer.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update rms_layernorm.py

* Update rms_layernorm.py

* Gemma

* Update rms_layernorm.py

* Update gemma2.py

* Cut Cross Entropy

* Update llama.py

* Cut Cross Entropy

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* patch_fast_lora

* vision

* Update fast_lora.py

* Update _utils.py

* Update _utils.py

* Vision

* Update trainer.py

* Update save.py

* FastBaseVisionModel

* Update loader_utils.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update _utils.py

* tokenizer_name

* Update loader.py

* Update vision.py

* Update save.py

* Update save.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update _utils.py

* Update loader.py

* kwargs

* logits

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* error

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* Update llama.py

* Update vision.py

* Update loader.py

* Old torch versions

* Update loader.py

* Update loader.py

* prints

* recheck

* Update loader.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Feat/kto (#1316)

* Add PatchKTOTrainer and update model imports

* Update dpo.py

* Update __init__.py

* Delete unsloth/models/kto.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Fix orpo/dpo trainer  (#1286)

* change the colab notebook for dpo zephyr and orpo

* use original tokenizer

* Update README.md

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* skip modules

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Fix llama.cpp

* Update save.py

* Update save.py

* Update vision.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update _utils.py

* Update save.py

* Update save.py

* Update mapper.py

* modules

* Fix vision model tokenizer padding side. (#1384)

* Dynamic quants (#1379)

* typing

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* int64

* Update _utils.py

* Update cross_entropy_loss.py

* constexpr

* constexpr

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* CE

* Update cross_entropy_loss.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* typing

* Update rope_embedding.py

* types

* Disable compiling

* Update _utils.py

* Update _utils.py

* Forward hook

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update llama.py

* CE Loss

* Update cross_entropy_loss.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254)

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors

* Update cross_entropy_loss.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Throw error when inferencing longer than max_popsition_embeddings (#1236)

* Throw error when inferencing longer than max_popsition_embeddings without rope scaling

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* CLI now handles user input strings for dtype correctly (#1235)

Co-authored-by: root <root@ieeres.chu.cam.ac.uk>

* Update flex_attention.py

* Update _utils.py

* Update _utils.py

* Update flex_attention.py

* Update flex_attention.py

* Update loader.py

* Update loader.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* triton_cast

* Update utils.py

* Qwen 2.5 Coder

* Fix/export mistral (#1281)

* Enhance install_python_non_blocking to handle protobuf installation and process management

* Revert "Enhance install_python_non_blocking to handle protobuf installation and process management"

This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266"

This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* DOC Update - Update README.md with os.environ in example (#1269)

* Update README.md with os.environ in example

Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. 
As currently the  unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required.
Small change but a bit time saver for those who straight away copies the tutorials

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/get_chat_template (#1246)

* Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to

* Remove type hinting

* Update chat_templates.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/sft-trainer (#1276)

* Add patch for SFTTrainer to maintain backward compatibility with TRL changes

* Update trainer.py

* Update trainer.py

* Refactor trainer patch to maintain backward compatibility with TRL changes

* Update trainer.py

* Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update __init__.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update tokenizer_utils.py

* Update llama.py

* Fix #853

* fix/sfttrainer-compatibility (#1293)

* Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance

* Update trainer.py

* Update trainer.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update rms_layernorm.py

* Update rms_layernorm.py

* Gemma

* Update rms_layernorm.py

* Update gemma2.py

* Cut Cross Entropy

* Update llama.py

* Cut Cross Entropy

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* patch_fast_lora

* vision

* Update fast_lora.py

* Update _utils.py

* Update _utils.py

* Vision

* Update trainer.py

* Update save.py

* FastBaseVisionModel

* Update loader_utils.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update _utils.py

* tokenizer_name

* Update loader.py

* Update vision.py

* Update save.py

* Update save.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update _utils.py

* Update loader.py

* kwargs

* logits

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* error

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* Update llama.py

* Update vision.py

* Update loader.py

* Old torch versions

* Update loader.py

* Update loader.py

* prints

* recheck

* Update loader.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Feat/kto (#1316)

* Add PatchKTOTrainer and update model imports

* Update dpo.py

* Update __init__.py

* Delete unsloth/models/kto.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Fix orpo/dpo trainer  (#1286)

* change the colab notebook for dpo zephyr and orpo

* use original tokenizer

* Update README.md

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* skip modules

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Fix llama.cpp

* Update save.py

* Update save.py

* Update vision.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update _utils.py

* Update save.py

* Update save.py

* Update mapper.py

* modules

---------

Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com>

* Update README.md

Unsloth Dynamic 4-bit Quantization Update

* Fix vision model tokenizer padding side.

* Update vision.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Add citation section to README.md (#1377)

* Add citation section to README.md

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Granite support (#1218)

* [WIP] Support for Granite

* Fixup inference

* Cleanup flex attention

* remove sliding window

* Use torch.add for residual multiplier

* Llama 3.3

* Update llama.py

* Update llama.py

* fullgraph

* Fix loader.py to work on Windows (#1453)

* Update README.md

Llama 3.3 + Reddit

* Update README.md

Apple ML Cross Entropy

* Update README.md

Removing double citation

* Fix loader.py to work on Windows

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Update save.py warning message (#1425)

* Update README.md

Llama 3.3 + Reddit

* Update README.md

Apple ML Cross Entropy

* Update README.md

Removing double citation

* Update save.py warning message

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Change _fix_chat_template in case a template has both endif and endfor (#1388)

* Update llama and derivatives to pass position embeddings explicitly for transformers v4.47+ (#1442)

* Update save.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Temp fix

* Update _utils.py

* Update _utils.py

* Update pyproject.toml

* Name Error Bug Fix - import from packaging.version import Version (#1468)

* Version

* Update pyproject.toml

* Update pyproject.toml

* Version

* Update pyproject.toml

* Update pyproject.toml

* dependencies

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update mistral.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update granite.py

* Update cohere.py

* Triton windows

* Update gemma2.py

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

* Residual & LoRA

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Bug fix

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update loader.py

---------

Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com>
Co-authored-by: Zewen Shen <zewen.public@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Scott Phillips <polygonguru@gmail.com>
Co-authored-by: qingy1337 <qxli2@students.everettcc.edu>
Co-authored-by: Giulia Baldini <44327645+giuliabaldini@users.noreply.github.com>
Co-authored-by: Yonghye Kwon <developer.0hye@gmail.com>
2024-12-26 01:35:19 -08:00
Daniel Han
ae636f21c9 Update loader.py 2024-12-26 01:32:56 -08:00
Daniel Han
16e7cbf989 Update _utils.py 2024-12-26 01:32:00 -08:00
Daniel Han
c767f5adb1 Merge branch 'main' into nightly 2024-12-26 01:31:22 -08:00
Daniel Han
db2b947223 Update loader.py 2024-12-26 01:29:07 -08:00
Daniel Han
c933c0c30f Update loader.py 2024-12-26 01:26:27 -08:00
Daniel Han
e74fd06280 Update loader.py 2024-12-26 01:23:02 -08:00
Daniel Han
ea04b72532 Bug fix 2024-12-26 01:22:06 -08:00
Daniel Han
e39365018f Update loader.py 2024-12-26 00:33:15 -08:00
Daniel Han
b3da38237c Update loader.py 2024-12-25 23:57:42 -08:00
Daniel Han
d6f27f3084 Update loader.py 2024-12-25 23:49:40 -08:00
Daniel Han
6f2b7be36a Update loader.py 2024-12-25 23:01:54 -08:00
Daniel Han
488a649af0 Residual & LoRA 2024-12-25 23:01:34 -08:00
Daniel Han
a20b380d5e Bug Fixes (#1470)
* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update llama.py

* CE Loss

* Update cross_entropy_loss.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254)

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors

* Update cross_entropy_loss.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Throw error when inferencing longer than max_popsition_embeddings (#1236)

* Throw error when inferencing longer than max_popsition_embeddings without rope scaling

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* CLI now handles user input strings for dtype correctly (#1235)

Co-authored-by: root <root@ieeres.chu.cam.ac.uk>

* Update flex_attention.py

* Update _utils.py

* Update _utils.py

* Update flex_attention.py

* Update flex_attention.py

* Update loader.py

* Update loader.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* triton_cast

* Update utils.py

* Qwen 2.5 Coder

* Fix/export mistral (#1281)

* Enhance install_python_non_blocking to handle protobuf installation and process management

* Revert "Enhance install_python_non_blocking to handle protobuf installation and process management"

This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266"

This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* DOC Update - Update README.md with os.environ in example (#1269)

* Update README.md with os.environ in example

Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. 
As currently the  unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required.
Small change but a bit time saver for those who straight away copies the tutorials

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/get_chat_template (#1246)

* Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to

* Remove type hinting

* Update chat_templates.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/sft-trainer (#1276)

* Add patch for SFTTrainer to maintain backward compatibility with TRL changes

* Update trainer.py

* Update trainer.py

* Refactor trainer patch to maintain backward compatibility with TRL changes

* Update trainer.py

* Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update __init__.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update tokenizer_utils.py

* Update llama.py

* Fix #853

* fix/sfttrainer-compatibility (#1293)

* Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance

* Update trainer.py

* Update trainer.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update rms_layernorm.py

* Update rms_layernorm.py

* Gemma

* Update rms_layernorm.py

* Update gemma2.py

* Cut Cross Entropy

* Update llama.py

* Cut Cross Entropy

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* patch_fast_lora

* vision

* Update fast_lora.py

* Update _utils.py

* Update _utils.py

* Vision

* Update trainer.py

* Update save.py

* FastBaseVisionModel

* Update loader_utils.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update _utils.py

* tokenizer_name

* Update loader.py

* Update vision.py

* Update save.py

* Update save.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update _utils.py

* Update loader.py

* kwargs

* logits

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* error

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* Update llama.py

* Update vision.py

* Update loader.py

* Old torch versions

* Update loader.py

* Update loader.py

* prints

* recheck

* Update loader.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Feat/kto (#1316)

* Add PatchKTOTrainer and update model imports

* Update dpo.py

* Update __init__.py

* Delete unsloth/models/kto.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Fix orpo/dpo trainer  (#1286)

* change the colab notebook for dpo zephyr and orpo

* use original tokenizer

* Update README.md

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* skip modules

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Fix llama.cpp

* Update save.py

* Update save.py

* Update vision.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update _utils.py

* Update save.py

* Update save.py

* Update mapper.py

* modules

* Fix vision model tokenizer padding side. (#1384)

* Dynamic quants (#1379)

* typing

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* int64

* Update _utils.py

* Update cross_entropy_loss.py

* constexpr

* constexpr

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* CE

* Update cross_entropy_loss.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* typing

* Update rope_embedding.py

* types

* Disable compiling

* Update _utils.py

* Update _utils.py

* Forward hook

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update llama.py

* CE Loss

* Update cross_entropy_loss.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254)

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors

* Update cross_entropy_loss.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Throw error when inferencing longer than max_popsition_embeddings (#1236)

* Throw error when inferencing longer than max_popsition_embeddings without rope scaling

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* CLI now handles user input strings for dtype correctly (#1235)

Co-authored-by: root <root@ieeres.chu.cam.ac.uk>

* Update flex_attention.py

* Update _utils.py

* Update _utils.py

* Update flex_attention.py

* Update flex_attention.py

* Update loader.py

* Update loader.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* triton_cast

* Update utils.py

* Qwen 2.5 Coder

* Fix/export mistral (#1281)

* Enhance install_python_non_blocking to handle protobuf installation and process management

* Revert "Enhance install_python_non_blocking to handle protobuf installation and process management"

This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266"

This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* DOC Update - Update README.md with os.environ in example (#1269)

* Update README.md with os.environ in example

Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. 
As currently the  unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required.
Small change but a bit time saver for those who straight away copies the tutorials

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/get_chat_template (#1246)

* Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to

* Remove type hinting

* Update chat_templates.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/sft-trainer (#1276)

* Add patch for SFTTrainer to maintain backward compatibility with TRL changes

* Update trainer.py

* Update trainer.py

* Refactor trainer patch to maintain backward compatibility with TRL changes

* Update trainer.py

* Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update __init__.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update tokenizer_utils.py

* Update llama.py

* Fix #853

* fix/sfttrainer-compatibility (#1293)

* Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance

* Update trainer.py

* Update trainer.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update rms_layernorm.py

* Update rms_layernorm.py

* Gemma

* Update rms_layernorm.py

* Update gemma2.py

* Cut Cross Entropy

* Update llama.py

* Cut Cross Entropy

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* patch_fast_lora

* vision

* Update fast_lora.py

* Update _utils.py

* Update _utils.py

* Vision

* Update trainer.py

* Update save.py

* FastBaseVisionModel

* Update loader_utils.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update _utils.py

* tokenizer_name

* Update loader.py

* Update vision.py

* Update save.py

* Update save.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update _utils.py

* Update loader.py

* kwargs

* logits

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* error

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* Update llama.py

* Update vision.py

* Update loader.py

* Old torch versions

* Update loader.py

* Update loader.py

* prints

* recheck

* Update loader.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Feat/kto (#1316)

* Add PatchKTOTrainer and update model imports

* Update dpo.py

* Update __init__.py

* Delete unsloth/models/kto.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Fix orpo/dpo trainer  (#1286)

* change the colab notebook for dpo zephyr and orpo

* use original tokenizer

* Update README.md

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* skip modules

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Fix llama.cpp

* Update save.py

* Update save.py

* Update vision.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update _utils.py

* Update save.py

* Update save.py

* Update mapper.py

* modules

---------

Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com>

* Update README.md

Unsloth Dynamic 4-bit Quantization Update

* Fix vision model tokenizer padding side.

* Update vision.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Add citation section to README.md (#1377)

* Add citation section to README.md

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Granite support (#1218)

* [WIP] Support for Granite

* Fixup inference

* Cleanup flex attention

* remove sliding window

* Use torch.add for residual multiplier

* Llama 3.3

* Update llama.py

* Update llama.py

* fullgraph

* Fix loader.py to work on Windows (#1453)

* Update README.md

Llama 3.3 + Reddit

* Update README.md

Apple ML Cross Entropy

* Update README.md

Removing double citation

* Fix loader.py to work on Windows

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Update save.py warning message (#1425)

* Update README.md

Llama 3.3 + Reddit

* Update README.md

Apple ML Cross Entropy

* Update README.md

Removing double citation

* Update save.py warning message

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Change _fix_chat_template in case a template has both endif and endfor (#1388)

* Update llama and derivatives to pass position embeddings explicitly for transformers v4.47+ (#1442)

* Update save.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Temp fix

* Update _utils.py

* Update _utils.py

* Update pyproject.toml

* Name Error Bug Fix - import from packaging.version import Version (#1468)

* Version

* Update pyproject.toml

* Update pyproject.toml

* Version

* Update pyproject.toml

* Update pyproject.toml

* dependencies

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update mistral.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update granite.py

* Update cohere.py

* Triton windows

* Update gemma2.py

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

---------

Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com>
Co-authored-by: Zewen Shen <zewen.public@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Scott Phillips <polygonguru@gmail.com>
Co-authored-by: qingy1337 <qxli2@students.everettcc.edu>
Co-authored-by: Giulia Baldini <44327645+giuliabaldini@users.noreply.github.com>
Co-authored-by: Yonghye Kwon <developer.0hye@gmail.com>
2024-12-24 03:37:03 -08:00
Daniel Han
258d8135e7 Update pyproject.toml 2024-12-24 03:35:04 -08:00
Daniel Han
b705130e59 Update _utils.py 2024-12-24 03:34:54 -08:00
Daniel Han
fd2295d6a6 Update pyproject.toml 2024-12-24 01:55:04 -08:00
Daniel Han
1d04daaad8 Update gemma2.py 2024-12-24 00:18:47 -08:00
Daniel Han
77ca6e2dca Triton windows 2024-12-24 00:17:33 -08:00
Daniel Han
739509e3a7 Update cohere.py 2024-12-24 00:11:17 -08:00
Daniel Han
1f8be5012e Update granite.py 2024-12-24 00:09:23 -08:00
Daniel Han
f97f49b08d Update pyproject.toml 2024-12-24 00:08:49 -08:00
Daniel Han
660df7b9ab Update pyproject.toml 2024-12-24 00:08:03 -08:00
Daniel Han
61147bfbad Update pyproject.toml 2024-12-24 00:07:37 -08:00
Daniel Han
c728dd9e87 Update mistral.py 2024-12-24 00:04:36 -08:00
Daniel Han
6e6e65bd46 Update pyproject.toml 2024-12-24 00:02:30 -08:00
Daniel Han
e6d50c8839 Update pyproject.toml 2024-12-23 22:49:16 -08:00
Daniel Han
5323157160 Update pyproject.toml 2024-12-23 22:47:20 -08:00
Daniel Han
fba4cf7fb4 Update pyproject.toml 2024-12-23 22:47:11 -08:00
Daniel Han
753b954702 dependencies 2024-12-23 22:27:35 -08:00
Daniel Han
c939334570 Update pyproject.toml 2024-12-23 21:53:43 -08:00
Daniel Han
01b256b4cf Update pyproject.toml 2024-12-23 21:52:50 -08:00
Daniel Han
4cc97b23cc Version 2024-12-23 21:43:32 -08:00
Daniel Han
2f969fa137 Update pyproject.toml 2024-12-23 21:16:30 -08:00
Daniel Han
25c5b0524d Update pyproject.toml 2024-12-23 21:14:01 -08:00
Daniel Han
69dc1ad694 Version 2024-12-23 21:07:51 -08:00
Yonghye Kwon
cecbb08c03 Name Error Bug Fix - import from packaging.version import Version (#1468) 2024-12-22 23:22:36 -08:00
Daniel Han
e8ae4727d0 Update pyproject.toml 2024-12-21 01:24:09 -08:00
Daniel Han
14886235e3 Update _utils.py 2024-12-21 01:11:30 -08:00
Daniel Han
6181fb7126 Merge branch 'main' into nightly 2024-12-20 03:30:58 -08:00
Daniel Han
42b624bb77 Bug fix 2024-12-20 03:30:01 -08:00
Daniel Han
97ff8efaca Merge branch 'main' into nightly 2024-12-20 03:26:58 -08:00
Daniel Han
87b1ce2824 Typo 2024-12-20 03:26:55 -08:00
Daniel Han
d5830e9c2f Merge branch 'main' into nightly 2024-12-20 03:24:37 -08:00
Daniel Han
703e8b96f1 Typo 2024-12-20 03:24:17 -08:00
Daniel Han
6bf5d8d626 Bug fixes (#1458)
* Update _utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* typing

* Update rope_embedding.py

* types

* Disable compiling

* Update _utils.py

* Update _utils.py

* Forward hook

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update llama.py

* CE Loss

* Update cross_entropy_loss.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254)

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors

* Update cross_entropy_loss.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Throw error when inferencing longer than max_popsition_embeddings (#1236)

* Throw error when inferencing longer than max_popsition_embeddings without rope scaling

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* CLI now handles user input strings for dtype correctly (#1235)

Co-authored-by: root <root@ieeres.chu.cam.ac.uk>

* Update flex_attention.py

* Update _utils.py

* Update _utils.py

* Update flex_attention.py

* Update flex_attention.py

* Update loader.py

* Update loader.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* triton_cast

* Update utils.py

* Qwen 2.5 Coder

* Fix/export mistral (#1281)

* Enhance install_python_non_blocking to handle protobuf installation and process management

* Revert "Enhance install_python_non_blocking to handle protobuf installation and process management"

This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266"

This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* DOC Update - Update README.md with os.environ in example (#1269)

* Update README.md with os.environ in example

Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. 
As currently the  unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required.
Small change but a bit time saver for those who straight away copies the tutorials

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/get_chat_template (#1246)

* Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to

* Remove type hinting

* Update chat_templates.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/sft-trainer (#1276)

* Add patch for SFTTrainer to maintain backward compatibility with TRL changes

* Update trainer.py

* Update trainer.py

* Refactor trainer patch to maintain backward compatibility with TRL changes

* Update trainer.py

* Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update __init__.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update tokenizer_utils.py

* Update llama.py

* Fix #853

* fix/sfttrainer-compatibility (#1293)

* Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance

* Update trainer.py

* Update trainer.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update rms_layernorm.py

* Update rms_layernorm.py

* Gemma

* Update rms_layernorm.py

* Update gemma2.py

* Cut Cross Entropy

* Update llama.py

* Cut Cross Entropy

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* patch_fast_lora

* vision

* Update fast_lora.py

* Update _utils.py

* Update _utils.py

* Vision

* Update trainer.py

* Update save.py

* FastBaseVisionModel

* Update loader_utils.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update _utils.py

* tokenizer_name

* Update loader.py

* Update vision.py

* Update save.py

* Update save.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update _utils.py

* Update loader.py

* kwargs

* logits

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* error

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* Update llama.py

* Update vision.py

* Update loader.py

* Old torch versions

* Update loader.py

* Update loader.py

* prints

* recheck

* Update loader.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Feat/kto (#1316)

* Add PatchKTOTrainer and update model imports

* Update dpo.py

* Update __init__.py

* Delete unsloth/models/kto.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Fix orpo/dpo trainer  (#1286)

* change the colab notebook for dpo zephyr and orpo

* use original tokenizer

* Update README.md

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* skip modules

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Fix llama.cpp

* Update save.py

* Update save.py

* Update vision.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update _utils.py

* Update save.py

* Update save.py

* Update mapper.py

* modules

* Fix vision model tokenizer padding side. (#1384)

* Dynamic quants (#1379)

* typing

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* int64

* Update _utils.py

* Update cross_entropy_loss.py

* constexpr

* constexpr

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* CE

* Update cross_entropy_loss.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* typing

* Update rope_embedding.py

* types

* Disable compiling

* Update _utils.py

* Update _utils.py

* Forward hook

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update llama.py

* CE Loss

* Update cross_entropy_loss.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254)

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors

* Update cross_entropy_loss.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Throw error when inferencing longer than max_popsition_embeddings (#1236)

* Throw error when inferencing longer than max_popsition_embeddings without rope scaling

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* CLI now handles user input strings for dtype correctly (#1235)

Co-authored-by: root <root@ieeres.chu.cam.ac.uk>

* Update flex_attention.py

* Update _utils.py

* Update _utils.py

* Update flex_attention.py

* Update flex_attention.py

* Update loader.py

* Update loader.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* triton_cast

* Update utils.py

* Qwen 2.5 Coder

* Fix/export mistral (#1281)

* Enhance install_python_non_blocking to handle protobuf installation and process management

* Revert "Enhance install_python_non_blocking to handle protobuf installation and process management"

This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266"

This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* DOC Update - Update README.md with os.environ in example (#1269)

* Update README.md with os.environ in example

Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. 
As currently the  unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required.
Small change but a bit time saver for those who straight away copies the tutorials

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/get_chat_template (#1246)

* Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to

* Remove type hinting

* Update chat_templates.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/sft-trainer (#1276)

* Add patch for SFTTrainer to maintain backward compatibility with TRL changes

* Update trainer.py

* Update trainer.py

* Refactor trainer patch to maintain backward compatibility with TRL changes

* Update trainer.py

* Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update __init__.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update tokenizer_utils.py

* Update llama.py

* Fix #853

* fix/sfttrainer-compatibility (#1293)

* Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance

* Update trainer.py

* Update trainer.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update rms_layernorm.py

* Update rms_layernorm.py

* Gemma

* Update rms_layernorm.py

* Update gemma2.py

* Cut Cross Entropy

* Update llama.py

* Cut Cross Entropy

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* patch_fast_lora

* vision

* Update fast_lora.py

* Update _utils.py

* Update _utils.py

* Vision

* Update trainer.py

* Update save.py

* FastBaseVisionModel

* Update loader_utils.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update _utils.py

* tokenizer_name

* Update loader.py

* Update vision.py

* Update save.py

* Update save.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update _utils.py

* Update loader.py

* kwargs

* logits

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* error

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* Update llama.py

* Update vision.py

* Update loader.py

* Old torch versions

* Update loader.py

* Update loader.py

* prints

* recheck

* Update loader.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Feat/kto (#1316)

* Add PatchKTOTrainer and update model imports

* Update dpo.py

* Update __init__.py

* Delete unsloth/models/kto.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Fix orpo/dpo trainer  (#1286)

* change the colab notebook for dpo zephyr and orpo

* use original tokenizer

* Update README.md

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* skip modules

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Fix llama.cpp

* Update save.py

* Update save.py

* Update vision.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update _utils.py

* Update save.py

* Update save.py

* Update mapper.py

* modules

---------

Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com>

* Update README.md

Unsloth Dynamic 4-bit Quantization Update

* Fix vision model tokenizer padding side.

* Update vision.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Add citation section to README.md (#1377)

* Add citation section to README.md

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Granite support (#1218)

* [WIP] Support for Granite

* Fixup inference

* Cleanup flex attention

* remove sliding window

* Use torch.add for residual multiplier

* Llama 3.3

* Update llama.py

* Update llama.py

* fullgraph

* Fix loader.py to work on Windows (#1453)

* Update README.md

Llama 3.3 + Reddit

* Update README.md

Apple ML Cross Entropy

* Update README.md

Removing double citation

* Fix loader.py to work on Windows

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Update save.py warning message (#1425)

* Update README.md

Llama 3.3 + Reddit

* Update README.md

Apple ML Cross Entropy

* Update README.md

Removing double citation

* Update save.py warning message

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Change _fix_chat_template in case a template has both endif and endfor (#1388)

* Update llama and derivatives to pass position embeddings explicitly for transformers v4.47+ (#1442)

* Update save.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Temp fix

* Update _utils.py

---------

Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com>
Co-authored-by: Zewen Shen <zewen.public@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Scott Phillips <polygonguru@gmail.com>
Co-authored-by: qingy1337 <qxli2@students.everettcc.edu>
Co-authored-by: Giulia Baldini <44327645+giuliabaldini@users.noreply.github.com>
2024-12-20 03:09:59 -08:00
Daniel Han
f5bcf31169 Update _utils.py 2024-12-20 03:08:59 -08:00
Daniel Han
58bcf66d27 Temp fix 2024-12-20 03:07:04 -08:00
Daniel Han
2afe47565a Update llama.py 2024-12-20 03:03:26 -08:00
Daniel Han
69728ffd14 Update llama.py 2024-12-20 03:01:28 -08:00
Daniel Han
cf8ec670ed Update llama.py 2024-12-20 02:59:27 -08:00
Daniel Han
05c99a207b Update llama.py 2024-12-20 02:53:45 -08:00
Daniel Han
faa6825049 Update llama.py 2024-12-20 02:50:23 -08:00
Daniel Han
3d8fc11d96 Update llama.py 2024-12-20 02:47:32 -08:00
Daniel Han
7e5a6ffab1 Update mistral.py 2024-12-20 02:46:29 -08:00
Daniel Han
3019b5967b Update llama.py 2024-12-20 02:45:43 -08:00
Daniel Han
57b8ddf21f Update save.py 2024-12-20 02:40:42 -08:00
Datta Nimmaturi
8e5d68286e Update llama and derivatives to pass position embeddings explicitly for transformers v4.47+ (#1442) 2024-12-20 02:35:42 -08:00
Giulia Baldini
f3e6a28c3f Change _fix_chat_template in case a template has both endif and endfor (#1388) 2024-12-20 02:23:30 -08:00
qingy1337
47cee7fd7e Update save.py warning message (#1425)
* Update README.md

Llama 3.3 + Reddit

* Update README.md

Apple ML Cross Entropy

* Update README.md

Removing double citation

* Update save.py warning message

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2024-12-20 02:22:27 -08:00
Scott Phillips
104eeac1db Fix loader.py to work on Windows (#1453)
* Update README.md

Llama 3.3 + Reddit

* Update README.md

Apple ML Cross Entropy

* Update README.md

Removing double citation

* Fix loader.py to work on Windows

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2024-12-20 02:20:15 -08:00
Daniel Han
d9ed4bef09 fullgraph 2024-12-12 01:14:27 -08:00
Michael Han
6dd9d4e2b1 Merge pull request #1412 from unslothai/shimmyshimmer-patch-5
Update README.md
2024-12-10 14:43:50 -08:00
Michael Han
4aa34a0afa Update README.md
Removing double citation
2024-12-10 14:43:27 -08:00
Michael Han
0b25173b10 Merge pull request #1411 from unslothai/shimmyshimmer-patch-4
Update README.md
2024-12-10 12:15:21 -08:00
Michael Han
18a830c971 Update README.md
Apple ML Cross Entropy
2024-12-10 12:15:03 -08:00
Daniel Han
b9fe2588e5 Update llama.py 2024-12-10 02:46:14 -08:00
Daniel Han
67d1e9eb50 Update llama.py 2024-12-09 14:19:56 -08:00
Michael Han
4eec751ba7 Merge pull request #1401 from unslothai/shimmyshimmer-patch-3
Update README.md
2024-12-07 14:25:05 -08:00
Michael Han
89e3ddcb55 Update README.md
Llama 3.3 + Reddit
2024-12-07 14:24:48 -08:00
Daniel Han
e79b11d31d Merge branch 'main' into nightly 2024-12-07 00:15:56 -08:00
Daniel Han
39de04dbdd Update _utils.py 2024-12-07 00:15:48 -08:00
Daniel Han
0e0d8fc322 Llama 3.3 (#1393)
* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* int64

* Update _utils.py

* Update cross_entropy_loss.py

* constexpr

* constexpr

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* CE

* Update cross_entropy_loss.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* typing

* Update rope_embedding.py

* types

* Disable compiling

* Update _utils.py

* Update _utils.py

* Forward hook

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update llama.py

* CE Loss

* Update cross_entropy_loss.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254)

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors

* Update cross_entropy_loss.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Throw error when inferencing longer than max_popsition_embeddings (#1236)

* Throw error when inferencing longer than max_popsition_embeddings without rope scaling

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* CLI now handles user input strings for dtype correctly (#1235)

Co-authored-by: root <root@ieeres.chu.cam.ac.uk>

* Update flex_attention.py

* Update _utils.py

* Update _utils.py

* Update flex_attention.py

* Update flex_attention.py

* Update loader.py

* Update loader.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* triton_cast

* Update utils.py

* Qwen 2.5 Coder

* Fix/export mistral (#1281)

* Enhance install_python_non_blocking to handle protobuf installation and process management

* Revert "Enhance install_python_non_blocking to handle protobuf installation and process management"

This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266"

This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* DOC Update - Update README.md with os.environ in example (#1269)

* Update README.md with os.environ in example

Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. 
As currently the  unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required.
Small change but a bit time saver for those who straight away copies the tutorials

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/get_chat_template (#1246)

* Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to

* Remove type hinting

* Update chat_templates.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/sft-trainer (#1276)

* Add patch for SFTTrainer to maintain backward compatibility with TRL changes

* Update trainer.py

* Update trainer.py

* Refactor trainer patch to maintain backward compatibility with TRL changes

* Update trainer.py

* Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update __init__.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update tokenizer_utils.py

* Update llama.py

* Fix #853

* fix/sfttrainer-compatibility (#1293)

* Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance

* Update trainer.py

* Update trainer.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update rms_layernorm.py

* Update rms_layernorm.py

* Gemma

* Update rms_layernorm.py

* Update gemma2.py

* Cut Cross Entropy

* Update llama.py

* Cut Cross Entropy

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* patch_fast_lora

* vision

* Update fast_lora.py

* Update _utils.py

* Update _utils.py

* Vision

* Update trainer.py

* Update save.py

* FastBaseVisionModel

* Update loader_utils.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update _utils.py

* tokenizer_name

* Update loader.py

* Update vision.py

* Update save.py

* Update save.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update _utils.py

* Update loader.py

* kwargs

* logits

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* error

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* Update llama.py

* Update vision.py

* Update loader.py

* Old torch versions

* Update loader.py

* Update loader.py

* prints

* recheck

* Update loader.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Feat/kto (#1316)

* Add PatchKTOTrainer and update model imports

* Update dpo.py

* Update __init__.py

* Delete unsloth/models/kto.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Fix orpo/dpo trainer  (#1286)

* change the colab notebook for dpo zephyr and orpo

* use original tokenizer

* Update README.md

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* skip modules

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Fix llama.cpp

* Update save.py

* Update save.py

* Update vision.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update _utils.py

* Update save.py

* Update save.py

* Update mapper.py

* modules

* Fix vision model tokenizer padding side. (#1384)

* Dynamic quants (#1379)

* typing

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* int64

* Update _utils.py

* Update cross_entropy_loss.py

* constexpr

* constexpr

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* CE

* Update cross_entropy_loss.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* typing

* Update rope_embedding.py

* types

* Disable compiling

* Update _utils.py

* Update _utils.py

* Forward hook

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update llama.py

* CE Loss

* Update cross_entropy_loss.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254)

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors

* Update cross_entropy_loss.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Throw error when inferencing longer than max_popsition_embeddings (#1236)

* Throw error when inferencing longer than max_popsition_embeddings without rope scaling

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* CLI now handles user input strings for dtype correctly (#1235)

Co-authored-by: root <root@ieeres.chu.cam.ac.uk>

* Update flex_attention.py

* Update _utils.py

* Update _utils.py

* Update flex_attention.py

* Update flex_attention.py

* Update loader.py

* Update loader.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* triton_cast

* Update utils.py

* Qwen 2.5 Coder

* Fix/export mistral (#1281)

* Enhance install_python_non_blocking to handle protobuf installation and process management

* Revert "Enhance install_python_non_blocking to handle protobuf installation and process management"

This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266"

This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* DOC Update - Update README.md with os.environ in example (#1269)

* Update README.md with os.environ in example

Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. 
As currently the  unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required.
Small change but a bit time saver for those who straight away copies the tutorials

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/get_chat_template (#1246)

* Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to

* Remove type hinting

* Update chat_templates.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/sft-trainer (#1276)

* Add patch for SFTTrainer to maintain backward compatibility with TRL changes

* Update trainer.py

* Update trainer.py

* Refactor trainer patch to maintain backward compatibility with TRL changes

* Update trainer.py

* Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update __init__.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update tokenizer_utils.py

* Update llama.py

* Fix #853

* fix/sfttrainer-compatibility (#1293)

* Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance

* Update trainer.py

* Update trainer.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update rms_layernorm.py

* Update rms_layernorm.py

* Gemma

* Update rms_layernorm.py

* Update gemma2.py

* Cut Cross Entropy

* Update llama.py

* Cut Cross Entropy

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* patch_fast_lora

* vision

* Update fast_lora.py

* Update _utils.py

* Update _utils.py

* Vision

* Update trainer.py

* Update save.py

* FastBaseVisionModel

* Update loader_utils.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update _utils.py

* tokenizer_name

* Update loader.py

* Update vision.py

* Update save.py

* Update save.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update _utils.py

* Update loader.py

* kwargs

* logits

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* error

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* Update llama.py

* Update vision.py

* Update loader.py

* Old torch versions

* Update loader.py

* Update loader.py

* prints

* recheck

* Update loader.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Feat/kto (#1316)

* Add PatchKTOTrainer and update model imports

* Update dpo.py

* Update __init__.py

* Delete unsloth/models/kto.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Fix orpo/dpo trainer  (#1286)

* change the colab notebook for dpo zephyr and orpo

* use original tokenizer

* Update README.md

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* skip modules

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Fix llama.cpp

* Update save.py

* Update save.py

* Update vision.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update _utils.py

* Update save.py

* Update save.py

* Update mapper.py

* modules

---------

Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com>

* Update README.md

Unsloth Dynamic 4-bit Quantization Update

* Fix vision model tokenizer padding side.

* Update vision.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Add citation section to README.md (#1377)

* Add citation section to README.md

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Granite support (#1218)

* [WIP] Support for Granite

* Fixup inference

* Cleanup flex attention

* remove sliding window

* Use torch.add for residual multiplier

* Llama 3.3

---------

Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com>
Co-authored-by: Zewen Shen <zewen.public@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2024-12-06 13:05:15 -08:00
Daniel Han
11f901a19e Llama 3.3 2024-12-06 12:25:08 -08:00
Datta Nimmaturi
0c6813df2f Granite support (#1218)
* [WIP] Support for Granite

* Fixup inference

* Cleanup flex attention

* remove sliding window

* Use torch.add for residual multiplier
2024-12-05 00:01:53 -08:00
Edd
eaee5ddfa9 Add citation section to README.md (#1377)
* Add citation section to README.md

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2024-12-04 23:59:13 -08:00
Zewen Shen
a0377c529d Fix vision model tokenizer padding side. (#1384)
* Dynamic quants (#1379)

* typing

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* int64

* Update _utils.py

* Update cross_entropy_loss.py

* constexpr

* constexpr

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* CE

* Update cross_entropy_loss.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* typing

* Update rope_embedding.py

* types

* Disable compiling

* Update _utils.py

* Update _utils.py

* Forward hook

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update llama.py

* CE Loss

* Update cross_entropy_loss.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254)

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors

* Update cross_entropy_loss.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Throw error when inferencing longer than max_popsition_embeddings (#1236)

* Throw error when inferencing longer than max_popsition_embeddings without rope scaling

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* CLI now handles user input strings for dtype correctly (#1235)

Co-authored-by: root <root@ieeres.chu.cam.ac.uk>

* Update flex_attention.py

* Update _utils.py

* Update _utils.py

* Update flex_attention.py

* Update flex_attention.py

* Update loader.py

* Update loader.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* triton_cast

* Update utils.py

* Qwen 2.5 Coder

* Fix/export mistral (#1281)

* Enhance install_python_non_blocking to handle protobuf installation and process management

* Revert "Enhance install_python_non_blocking to handle protobuf installation and process management"

This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266"

This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* DOC Update - Update README.md with os.environ in example (#1269)

* Update README.md with os.environ in example

Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. 
As currently the  unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required.
Small change but a bit time saver for those who straight away copies the tutorials

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/get_chat_template (#1246)

* Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to

* Remove type hinting

* Update chat_templates.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/sft-trainer (#1276)

* Add patch for SFTTrainer to maintain backward compatibility with TRL changes

* Update trainer.py

* Update trainer.py

* Refactor trainer patch to maintain backward compatibility with TRL changes

* Update trainer.py

* Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update __init__.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update tokenizer_utils.py

* Update llama.py

* Fix #853

* fix/sfttrainer-compatibility (#1293)

* Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance

* Update trainer.py

* Update trainer.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update rms_layernorm.py

* Update rms_layernorm.py

* Gemma

* Update rms_layernorm.py

* Update gemma2.py

* Cut Cross Entropy

* Update llama.py

* Cut Cross Entropy

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* patch_fast_lora

* vision

* Update fast_lora.py

* Update _utils.py

* Update _utils.py

* Vision

* Update trainer.py

* Update save.py

* FastBaseVisionModel

* Update loader_utils.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update _utils.py

* tokenizer_name

* Update loader.py

* Update vision.py

* Update save.py

* Update save.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update _utils.py

* Update loader.py

* kwargs

* logits

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* error

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* Update llama.py

* Update vision.py

* Update loader.py

* Old torch versions

* Update loader.py

* Update loader.py

* prints

* recheck

* Update loader.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Feat/kto (#1316)

* Add PatchKTOTrainer and update model imports

* Update dpo.py

* Update __init__.py

* Delete unsloth/models/kto.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Fix orpo/dpo trainer  (#1286)

* change the colab notebook for dpo zephyr and orpo

* use original tokenizer

* Update README.md

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* skip modules

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Fix llama.cpp

* Update save.py

* Update save.py

* Update vision.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update _utils.py

* Update save.py

* Update save.py

* Update mapper.py

* modules

---------

Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com>

* Update README.md

Unsloth Dynamic 4-bit Quantization Update

* Fix vision model tokenizer padding side.

* Update vision.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2024-12-04 23:54:18 -08:00
Daniel Han
80dc5bd26f Merge branch 'main' into nightly 2024-12-04 23:53:08 -08:00
Michael Han
4ecfdb5450 Merge pull request #1383 from unslothai/shimmyshimmer-patch-2
Update README.md
2024-12-04 21:32:36 -08:00
Michael Han
da7cdb2c8c Update README.md
Unsloth Dynamic 4-bit Quantization Update
2024-12-04 21:32:23 -08:00
Daniel Han
35ca26c898 Dynamic quants (#1379)
* typing

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* int64

* Update _utils.py

* Update cross_entropy_loss.py

* constexpr

* constexpr

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* CE

* Update cross_entropy_loss.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* typing

* Update rope_embedding.py

* types

* Disable compiling

* Update _utils.py

* Update _utils.py

* Forward hook

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update llama.py

* CE Loss

* Update cross_entropy_loss.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254)

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors

* Update cross_entropy_loss.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Throw error when inferencing longer than max_popsition_embeddings (#1236)

* Throw error when inferencing longer than max_popsition_embeddings without rope scaling

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* CLI now handles user input strings for dtype correctly (#1235)

Co-authored-by: root <root@ieeres.chu.cam.ac.uk>

* Update flex_attention.py

* Update _utils.py

* Update _utils.py

* Update flex_attention.py

* Update flex_attention.py

* Update loader.py

* Update loader.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* triton_cast

* Update utils.py

* Qwen 2.5 Coder

* Fix/export mistral (#1281)

* Enhance install_python_non_blocking to handle protobuf installation and process management

* Revert "Enhance install_python_non_blocking to handle protobuf installation and process management"

This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266"

This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* DOC Update - Update README.md with os.environ in example (#1269)

* Update README.md with os.environ in example

Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. 
As currently the  unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required.
Small change but a bit time saver for those who straight away copies the tutorials

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/get_chat_template (#1246)

* Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to

* Remove type hinting

* Update chat_templates.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/sft-trainer (#1276)

* Add patch for SFTTrainer to maintain backward compatibility with TRL changes

* Update trainer.py

* Update trainer.py

* Refactor trainer patch to maintain backward compatibility with TRL changes

* Update trainer.py

* Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update __init__.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update tokenizer_utils.py

* Update llama.py

* Fix #853

* fix/sfttrainer-compatibility (#1293)

* Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance

* Update trainer.py

* Update trainer.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update rms_layernorm.py

* Update rms_layernorm.py

* Gemma

* Update rms_layernorm.py

* Update gemma2.py

* Cut Cross Entropy

* Update llama.py

* Cut Cross Entropy

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* patch_fast_lora

* vision

* Update fast_lora.py

* Update _utils.py

* Update _utils.py

* Vision

* Update trainer.py

* Update save.py

* FastBaseVisionModel

* Update loader_utils.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update _utils.py

* tokenizer_name

* Update loader.py

* Update vision.py

* Update save.py

* Update save.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update _utils.py

* Update loader.py

* kwargs

* logits

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* error

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* Update llama.py

* Update vision.py

* Update loader.py

* Old torch versions

* Update loader.py

* Update loader.py

* prints

* recheck

* Update loader.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Feat/kto (#1316)

* Add PatchKTOTrainer and update model imports

* Update dpo.py

* Update __init__.py

* Delete unsloth/models/kto.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Fix orpo/dpo trainer  (#1286)

* change the colab notebook for dpo zephyr and orpo

* use original tokenizer

* Update README.md

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* skip modules

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Fix llama.cpp

* Update save.py

* Update save.py

* Update vision.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update _utils.py

* Update save.py

* Update save.py

* Update mapper.py

* modules

---------

Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com>
2024-12-04 05:38:05 -08:00
Daniel Han
9d6ab2ce78 modules 2024-12-04 04:26:36 -08:00
Daniel Han
f5bdfee1a3 Update mapper.py 2024-12-04 02:36:42 -08:00
Daniel Han
a8a00edbda Merge branch 'main' into nightly 2024-12-04 02:36:37 -08:00
Daniel Han
e7edb9b339 Fix llama.cpp GGUF (#1375)
* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* typing

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* int64

* Update _utils.py

* Update cross_entropy_loss.py

* constexpr

* constexpr

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* CE

* Update cross_entropy_loss.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* typing

* Update rope_embedding.py

* types

* Disable compiling

* Update _utils.py

* Update _utils.py

* Forward hook

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update llama.py

* CE Loss

* Update cross_entropy_loss.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254)

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors

* Update cross_entropy_loss.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Throw error when inferencing longer than max_popsition_embeddings (#1236)

* Throw error when inferencing longer than max_popsition_embeddings without rope scaling

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* CLI now handles user input strings for dtype correctly (#1235)

Co-authored-by: root <root@ieeres.chu.cam.ac.uk>

* Update flex_attention.py

* Update _utils.py

* Update _utils.py

* Update flex_attention.py

* Update flex_attention.py

* Update loader.py

* Update loader.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* triton_cast

* Update utils.py

* Qwen 2.5 Coder

* Fix/export mistral (#1281)

* Enhance install_python_non_blocking to handle protobuf installation and process management

* Revert "Enhance install_python_non_blocking to handle protobuf installation and process management"

This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266"

This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* DOC Update - Update README.md with os.environ in example (#1269)

* Update README.md with os.environ in example

Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. 
As currently the  unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required.
Small change but a bit time saver for those who straight away copies the tutorials

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/get_chat_template (#1246)

* Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to

* Remove type hinting

* Update chat_templates.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/sft-trainer (#1276)

* Add patch for SFTTrainer to maintain backward compatibility with TRL changes

* Update trainer.py

* Update trainer.py

* Refactor trainer patch to maintain backward compatibility with TRL changes

* Update trainer.py

* Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update __init__.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update tokenizer_utils.py

* Update llama.py

* Fix #853

* fix/sfttrainer-compatibility (#1293)

* Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance

* Update trainer.py

* Update trainer.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update rms_layernorm.py

* Update rms_layernorm.py

* Gemma

* Update rms_layernorm.py

* Update gemma2.py

* Cut Cross Entropy

* Update llama.py

* Cut Cross Entropy

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* patch_fast_lora

* vision

* Update fast_lora.py

* Update _utils.py

* Update _utils.py

* Vision

* Update trainer.py

* Update save.py

* FastBaseVisionModel

* Update loader_utils.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update _utils.py

* tokenizer_name

* Update loader.py

* Update vision.py

* Update save.py

* Update save.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update _utils.py

* Update loader.py

* kwargs

* logits

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* error

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* Update llama.py

* Update vision.py

* Update loader.py

* Old torch versions

* Update loader.py

* Update loader.py

* prints

* recheck

* Update loader.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Feat/kto (#1316)

* Add PatchKTOTrainer and update model imports

* Update dpo.py

* Update __init__.py

* Delete unsloth/models/kto.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Fix orpo/dpo trainer  (#1286)

* change the colab notebook for dpo zephyr and orpo

* use original tokenizer

* Update README.md

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* skip modules

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Fix llama.cpp

* Update save.py

* Update save.py

* Update vision.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update _utils.py

* Update save.py

* Update save.py

---------

Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
Co-authored-by: cell-dame <122996026+dame-cell@users.noreply.github.com>
2024-12-03 17:29:59 -08:00
Daniel Han
c802979c4d Merge branch 'main' into nightly 2024-12-03 17:29:24 -08:00
Daniel Han
b807f3fd2d Update save.py 2024-12-03 17:28:46 -08:00
Daniel Han
31e472ec9e Update save.py 2024-12-03 17:11:10 -08:00
Daniel Han
b3585ba73b Update _utils.py 2024-12-03 17:01:16 -08:00
Daniel Han
16e475bbd9 Update save.py 2024-12-03 16:59:56 -08:00
Daniel Han
933cfe1ccf Update save.py 2024-12-03 16:57:32 -08:00
Michael Han
a94f1548f9 Merge pull request #1374 from unslothai/shimmyshimmer-patch-1
Update README.md
2024-12-03 16:52:21 -08:00
Michael Han
16cf998173 Update README.md
Fixing Qwen links
2024-12-03 16:50:52 -08:00
Daniel Han
dfcbb8ac26 Update save.py 2024-12-03 16:40:43 -08:00
Daniel Han
8311b13827 Update save.py 2024-12-03 16:40:36 -08:00
Daniel Han
f4f5f32e85 Update save.py 2024-12-03 16:35:38 -08:00
Daniel Han
c09c7ab3a7 Update save.py 2024-12-03 16:34:15 -08:00
Daniel Han
79fa6d3829 Update save.py 2024-12-03 16:25:40 -08:00
Daniel Han
c3b3d3bd03 Update vision.py 2024-12-03 16:25:13 -08:00
Daniel Han
9953ab1593 Update save.py 2024-12-03 16:24:23 -08:00
Daniel Han
853e7c3687 Update save.py 2024-12-03 16:16:28 -08:00
Daniel Han
133772a416 Fix llama.cpp 2024-12-03 16:03:38 -08:00
Daniel Han
e0908e0d30 Update llama.py 2024-12-01 02:36:17 -08:00
Daniel Han
d7d8591f83 Update llama.py 2024-12-01 02:30:21 -08:00
Daniel Han
f730e997b6 Update llama.py 2024-12-01 02:29:11 -08:00
Daniel Han
51bf5eae95 Update llama.py 2024-12-01 02:25:48 -08:00
Daniel Han
4620e76e3d Update llama.py 2024-12-01 02:22:21 -08:00
Daniel Han
f67a062010 Update llama.py 2024-12-01 02:20:54 -08:00
Daniel Han
8f14160dbb Update llama.py 2024-12-01 02:19:07 -08:00
Daniel Han
d44a8e0bdd Update llama.py 2024-12-01 02:15:21 -08:00
Daniel Han
479b4824dc Update llama.py 2024-12-01 02:13:34 -08:00
Daniel Han
f4b8710843 Update llama.py 2024-12-01 02:10:43 -08:00
Daniel Han
a45d642641 Update llama.py 2024-12-01 02:02:31 -08:00
Daniel Han
ec30c12cbb Update vision.py 2024-11-28 00:02:13 -08:00
Daniel Han
aa6ef77fad skip modules 2024-11-28 00:01:25 -08:00
Daniel Han
a823352381 Merge branch 'main' into nightly 2024-11-26 16:40:05 -08:00
cell-dame
5eeb53fb42 Fix orpo/dpo trainer (#1286)
* change the colab notebook for dpo zephyr and orpo

* use original tokenizer

* Update README.md

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2024-11-26 14:32:06 -08:00
Edd
ae7aabd648 Feat/kto (#1316)
* Add PatchKTOTrainer and update model imports

* Update dpo.py

* Update __init__.py

* Delete unsloth/models/kto.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2024-11-26 14:22:02 -08:00
Daniel Han
8144766f78 Update pyproject.toml 2024-11-26 03:29:59 -08:00
Daniel Han
67fd43f6f5 Bug fixes for vision (#1340)
* Update __init__.py

* Update __init__.py

* Patching

* Update cross_entropy_loss.py

* CE Loss

* Update _utils.py

* Update _utils.py

* CE Loss

* Update _utils.py

* Update _utils.py

* Layernorm

* Update _utils.py

* Update _utils.py

* Post patch

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* typing

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* int64

* Update _utils.py

* Update cross_entropy_loss.py

* constexpr

* constexpr

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* CE

* Update cross_entropy_loss.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* typing

* Update rope_embedding.py

* types

* Disable compiling

* Update _utils.py

* Update _utils.py

* Forward hook

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update llama.py

* CE Loss

* Update cross_entropy_loss.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254)

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors

* Update cross_entropy_loss.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Throw error when inferencing longer than max_popsition_embeddings (#1236)

* Throw error when inferencing longer than max_popsition_embeddings without rope scaling

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* CLI now handles user input strings for dtype correctly (#1235)

Co-authored-by: root <root@ieeres.chu.cam.ac.uk>

* Update flex_attention.py

* Update _utils.py

* Update _utils.py

* Update flex_attention.py

* Update flex_attention.py

* Update loader.py

* Update loader.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* triton_cast

* Update utils.py

* Qwen 2.5 Coder

* Fix/export mistral (#1281)

* Enhance install_python_non_blocking to handle protobuf installation and process management

* Revert "Enhance install_python_non_blocking to handle protobuf installation and process management"

This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266"

This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* DOC Update - Update README.md with os.environ in example (#1269)

* Update README.md with os.environ in example

Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. 
As currently the  unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required.
Small change but a bit time saver for those who straight away copies the tutorials

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/get_chat_template (#1246)

* Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to

* Remove type hinting

* Update chat_templates.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/sft-trainer (#1276)

* Add patch for SFTTrainer to maintain backward compatibility with TRL changes

* Update trainer.py

* Update trainer.py

* Refactor trainer patch to maintain backward compatibility with TRL changes

* Update trainer.py

* Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update __init__.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update tokenizer_utils.py

* Update llama.py

* Fix #853

* fix/sfttrainer-compatibility (#1293)

* Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance

* Update trainer.py

* Update trainer.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update rms_layernorm.py

* Update rms_layernorm.py

* Gemma

* Update rms_layernorm.py

* Update gemma2.py

* Cut Cross Entropy

* Update llama.py

* Cut Cross Entropy

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* patch_fast_lora

* vision

* Update fast_lora.py

* Update _utils.py

* Update _utils.py

* Vision

* Update trainer.py

* Update save.py

* FastBaseVisionModel

* Update loader_utils.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update _utils.py

* tokenizer_name

* Update loader.py

* Update vision.py

* Update save.py

* Update save.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update _utils.py

* Update loader.py

* kwargs

* logits

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* error

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* Update llama.py

* Update vision.py

* Update loader.py

* Old torch versions

* Update loader.py

* Update loader.py

* prints

* recheck

* Update loader.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

---------

Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
2024-11-26 03:25:55 -08:00
Itsuro Tajima
4cae1aa0be use exact model name 2024-11-26 20:20:34 +09:00
Daniel Han
0f1fa094c7 Update mapper.py 2024-11-26 03:20:21 -08:00
Daniel Han
3c598843b6 Update _utils.py 2024-11-26 03:18:09 -08:00
Daniel Han
bba36b7d5e Update _utils.py 2024-11-26 03:09:32 -08:00
Daniel Han
41e6f94c03 Update loader.py 2024-11-26 03:07:50 -08:00
Daniel Han
aa17dbedbd Update loader.py 2024-11-26 02:58:16 -08:00
Daniel Han
59da965df7 recheck 2024-11-26 02:52:19 -08:00
Daniel Han
9d8bb08270 prints 2024-11-26 00:47:27 -08:00
Daniel Han
a02de20cb8 Update loader.py 2024-11-26 00:35:05 -08:00
Daniel Han
73bc801914 Update loader.py 2024-11-26 00:34:51 -08:00
Daniel Han
14920eb3eb Old torch versions 2024-11-26 00:26:37 -08:00
Daniel Han
bc028b0dd1 Update loader.py 2024-11-26 00:18:00 -08:00
Daniel Han
485efdcd13 Update vision.py 2024-11-26 00:16:33 -08:00
Daniel Han
aaba695e00 Update llama.py 2024-11-26 00:08:15 -08:00
Daniel Han
e59a427aee Update loader.py 2024-11-26 00:01:13 -08:00
Daniel Han
77a2e3dda5 Update _utils.py 2024-11-25 23:39:26 -08:00
Daniel Han
4017c470b2 Update _utils.py 2024-11-25 23:35:36 -08:00
Daniel Han
66253a0007 Update _utils.py 2024-11-25 23:33:19 -08:00
Daniel Han
90618c4304 Update _utils.py 2024-11-25 23:03:35 -08:00
Daniel Han
b72cdbd5dd Update _utils.py 2024-11-25 22:56:24 -08:00
Daniel Han
946a46a618 Update _utils.py 2024-11-25 22:52:57 -08:00
Daniel Han
2bce754b23 Update _utils.py 2024-11-25 22:37:02 -08:00
Daniel Han
cde73c424c Update _utils.py 2024-11-25 22:35:55 -08:00
Daniel Han
61fbf5757a Update _utils.py 2024-11-25 22:33:18 -08:00
Daniel Han
ad5ca0d59f Update _utils.py 2024-11-25 22:32:27 -08:00
Daniel Han
1c7a2bbe99 Update _utils.py 2024-11-25 22:31:09 -08:00
Daniel Han
f395434291 Update _utils.py 2024-11-25 22:27:49 -08:00
Daniel Han
6e7e6a52ef Update _utils.py 2024-11-25 22:20:16 -08:00
Daniel Han
5d08a89f36 Update _utils.py 2024-11-25 22:18:33 -08:00
Daniel Han
41008f7ece Update _utils.py 2024-11-25 22:17:20 -08:00
Daniel Han
307fd67a83 error 2024-11-25 22:12:22 -08:00
Daniel Han
0487293f4c Update _utils.py 2024-11-25 22:03:08 -08:00
Daniel Han
7188082852 Update _utils.py 2024-11-25 21:59:40 -08:00
Daniel Han
4475041c36 Update _utils.py 2024-11-25 21:57:56 -08:00
Daniel Han
7c298a79ed Update llama.py 2024-11-25 21:52:42 -08:00
Daniel Han
b8631b7bd8 Update llama.py 2024-11-25 21:46:36 -08:00
Daniel Han
549c5be61c Update llama.py 2024-11-25 21:39:19 -08:00
Daniel Han
41878b581f logits 2024-11-25 21:31:43 -08:00
Daniel Han
360a4c8702 kwargs 2024-11-25 18:48:41 -08:00
Daniel Han
7e7656c5a1 Update loader.py 2024-11-22 16:04:53 -08:00
Daniel Han
3bae0beca6 Merge branch 'main' into nightly 2024-11-22 15:03:28 -08:00
Daniel Han
4fe74c93ee Update pyproject.toml 2024-11-21 17:46:22 -08:00
Daniel Han
aebf67a61e Delete docs github button.png 2024-11-21 11:25:12 -08:00
Daniel Han
6d34ab821b Vision (#1318)
* Add files via upload

* Add files via upload

* Add files via upload

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Update README.md

---------

Co-authored-by: Michael <107991372+shimmyshimmer@users.noreply.github.com>
2024-11-21 11:24:12 -08:00
Daniel Han
7296f5eed7 Update _utils.py 2024-11-21 06:45:40 -08:00
Daniel Han
967f9fb23d Update vision.py 2024-11-21 06:07:06 -08:00
Daniel Han
ddf118a8fc Vision support (#1315)
* Fix pad token

* Update llama.py

* Typo

* ignored labels

* Revert "ignored labels"

This reverts commit 4b25138ac7.

* More patching

* Update _utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Feat/all tmp (#1219)

* Update save.py

Check whether path is in /tmp dir for Kaggle environment

* Update save.py

Move temporary_location to /tmp in Kaggle

* Enhance Kaggle environment support in save and tokenizer utilities

---------

Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>

* Bug fixes

* Update pyproject.toml

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Tied weights

* Revert "Tied weights"

This reverts commit 820cd4efef.

* Tied weights

* Utils

* CE Loss patching

* Update __init__.py

* Update __init__.py

* Patching

* Update cross_entropy_loss.py

* CE Loss

* Update _utils.py

* Update _utils.py

* CE Loss

* Update _utils.py

* Update _utils.py

* Layernorm

* Update _utils.py

* Update _utils.py

* Post patch

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* typing

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* int64

* Update _utils.py

* Update cross_entropy_loss.py

* constexpr

* constexpr

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* CE

* Update cross_entropy_loss.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* typing

* Update rope_embedding.py

* types

* Disable compiling

* Update _utils.py

* Update _utils.py

* Forward hook

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update llama.py

* CE Loss

* Update cross_entropy_loss.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254)

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors

* Update cross_entropy_loss.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Throw error when inferencing longer than max_popsition_embeddings (#1236)

* Throw error when inferencing longer than max_popsition_embeddings without rope scaling

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* CLI now handles user input strings for dtype correctly (#1235)

Co-authored-by: root <root@ieeres.chu.cam.ac.uk>

* Update flex_attention.py

* Update _utils.py

* Update _utils.py

* Update flex_attention.py

* Update flex_attention.py

* Update loader.py

* Update loader.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* triton_cast

* Update utils.py

* Qwen 2.5 Coder

* Fix/export mistral (#1281)

* Enhance install_python_non_blocking to handle protobuf installation and process management

* Revert "Enhance install_python_non_blocking to handle protobuf installation and process management"

This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266"

This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* DOC Update - Update README.md with os.environ in example (#1269)

* Update README.md with os.environ in example

Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. 
As currently the  unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required.
Small change but a bit time saver for those who straight away copies the tutorials

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/get_chat_template (#1246)

* Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to

* Remove type hinting

* Update chat_templates.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/sft-trainer (#1276)

* Add patch for SFTTrainer to maintain backward compatibility with TRL changes

* Update trainer.py

* Update trainer.py

* Refactor trainer patch to maintain backward compatibility with TRL changes

* Update trainer.py

* Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update __init__.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update tokenizer_utils.py

* Update llama.py

* Fix #853

* fix/sfttrainer-compatibility (#1293)

* Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance

* Update trainer.py

* Update trainer.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update rms_layernorm.py

* Update rms_layernorm.py

* Gemma

* Update rms_layernorm.py

* Update gemma2.py

* Cut Cross Entropy

* Update llama.py

* Cut Cross Entropy

* Update llama.py

* Update llama.py

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update mapper.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* patch_fast_lora

* vision

* Update fast_lora.py

* Update _utils.py

* Update _utils.py

* Vision

* Update trainer.py

* Update save.py

* FastBaseVisionModel

* Update loader_utils.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update _utils.py

* tokenizer_name

* Update loader.py

* Update vision.py

* Update save.py

* Update save.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update _utils.py

---------

Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
2024-11-21 05:01:44 -08:00
Daniel Han
a02a11a10c Update _utils.py 2024-11-21 04:13:54 -08:00
Daniel Han
6c050b805a Update vision.py 2024-11-21 04:10:43 -08:00
Daniel Han
044d1962c9 Update vision.py 2024-11-21 04:10:24 -08:00
Daniel Han
bef3496dbd Update vision.py 2024-11-21 04:05:05 -08:00
Daniel Han
e22a03948a Update vision.py 2024-11-21 04:01:17 -08:00
Daniel Han
02ed7bfd7f Update vision.py 2024-11-21 03:58:18 -08:00
Daniel Han
8d7b0d9298 Update vision.py 2024-11-21 03:54:09 -08:00
Daniel Han
4c3473082d Update save.py 2024-11-21 03:47:57 -08:00
Daniel Han
331c7d62e4 Update save.py 2024-11-21 03:18:31 -08:00
Daniel Han
38767cd63f Update vision.py 2024-11-21 03:05:13 -08:00
Daniel Han
ef64ffc879 Update loader.py 2024-11-21 02:29:24 -08:00
Daniel Han
104594648d tokenizer_name 2024-11-21 02:28:00 -08:00
Daniel Han
8f2db72aec Update _utils.py 2024-11-21 02:27:07 -08:00
Daniel Han
dd78e6ed9a Update vision.py 2024-11-21 02:23:35 -08:00
Daniel Han
99cdccd91a Update loader.py 2024-11-21 02:23:11 -08:00
Daniel Han
0570871947 Update vision.py 2024-11-21 02:20:47 -08:00
Daniel Han
52e089e17a Update loader.py 2024-11-21 02:18:59 -08:00
Daniel Han
cf302f0d17 Update vision.py 2024-11-21 02:17:20 -08:00
Daniel Han
83c79ba1d9 Update loader_utils.py 2024-11-21 02:15:43 -08:00
Daniel Han
528b7f58af FastBaseVisionModel 2024-11-21 02:12:45 -08:00
Daniel Han
9b69eb9144 Update save.py 2024-11-21 01:51:46 -08:00
Daniel Han
2b47e95f53 Merge branch 'main' into nightly 2024-11-21 01:51:06 -08:00
Daniel Han
0cdda6b3a3 Update trainer.py 2024-11-21 01:49:46 -08:00
Daniel Han
925a63120e Vision 2024-11-21 01:48:20 -08:00
Daniel Han
67d40f3f6d Update _utils.py 2024-11-20 19:40:08 -08:00
Daniel Han
4be7341e49 Update _utils.py 2024-11-20 19:15:05 -08:00
Daniel Han
c67183f183 Update fast_lora.py 2024-11-20 17:07:53 -08:00
Daniel Han
d9f042d7e0 vision 2024-11-20 04:15:53 -08:00
Daniel Han
81cd49d2e7 patch_fast_lora 2024-11-20 03:36:41 -08:00
Michael
778359ee9e Add files via upload 2024-11-20 01:47:23 -08:00
Michael
26a3095d76 Add files via upload 2024-11-20 01:44:15 -08:00
Daniel Han
6c3b7f0e32 Update _utils.py 2024-11-19 16:56:53 -08:00
Daniel Han
65a5049423 Update _utils.py 2024-11-19 15:49:01 -08:00
Daniel Han
e5b2f577de Update _utils.py 2024-11-19 13:01:38 -08:00
Daniel Han
56a19d82de Update _utils.py 2024-11-19 03:54:35 -08:00
Daniel Han
1f62b73677 Update _utils.py 2024-11-19 03:53:17 -08:00
Daniel Han
f8ccb5758a Update _utils.py 2024-11-19 03:50:20 -08:00
Daniel Han
d67bd4cfb7 Update _utils.py 2024-11-19 03:46:32 -08:00
Daniel Han
699a9ff81e Update _utils.py 2024-11-19 03:45:23 -08:00
Daniel Han
80f9f6a225 Update _utils.py 2024-11-19 02:25:22 -08:00
Daniel Han
ee98b75c06 Update mapper.py 2024-11-18 12:18:34 -08:00
Daniel Han
5f59faf526 Update _utils.py 2024-11-17 22:01:20 -08:00
Daniel Han
096a77d9e6 Update _utils.py 2024-11-17 22:01:07 -08:00
Daniel Han
0e77184c23 Update _utils.py 2024-11-17 21:31:17 -08:00
Daniel Han
c8082e46aa Update _utils.py 2024-11-17 20:24:10 -08:00
Daniel Han
26546e68b7 Update _utils.py 2024-11-17 20:16:33 -08:00
Daniel Han
4b30c7a89b Update _utils.py 2024-11-17 17:12:34 -08:00
Daniel Han
db1c5f414a Update _utils.py 2024-11-17 16:56:26 -08:00
Daniel Han
d8c6c3e903 Update _utils.py 2024-11-17 16:54:44 -08:00
Daniel Han
ccf033893e Update __init__.py 2024-11-17 16:13:13 -08:00
Daniel Han
3c2794ecee Update __init__.py 2024-11-17 16:01:06 -08:00
Daniel Han
73bbd9e795 Update llama.py 2024-11-17 16:00:49 -08:00
Daniel Han
df62b6242d Update llama.py 2024-11-17 15:58:00 -08:00
Daniel Han
05fb970edd Update llama.py 2024-11-17 15:54:12 -08:00
Daniel Han
fa8e59eb1b Cut Cross Entropy 2024-11-17 14:32:41 -08:00
Daniel Han
1dc066afda Update llama.py 2024-11-17 00:21:44 -08:00
Daniel Han
c4eacf50da Cut Cross Entropy 2024-11-16 23:53:46 -08:00
Daniel Han
e7ad484169 Update gemma2.py 2024-11-16 15:18:38 -08:00
Daniel Han
d47d838ee8 Update rms_layernorm.py 2024-11-16 15:00:28 -08:00
Daniel Han
263eaaa27f Gemma 2024-11-16 13:55:11 -08:00
Daniel Han
e49a4d9277 Update rms_layernorm.py 2024-11-16 13:01:54 -08:00
Daniel Han
2cf0203166 Update rms_layernorm.py 2024-11-16 12:18:47 -08:00
Edd
b69fee4a36 fix/sfttrainer-compatibility (#1293)
* Refactor trainer.py to import SFTConfig directly and update UnslothTrainingArguments class inheritance

* Update trainer.py

* Update trainer.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2024-11-14 17:07:29 -08:00
Daniel Han
786aea6365 Fix #853 2024-11-14 01:26:13 -08:00
Daniel Han
8e899bf956 Update llama.py 2024-11-14 01:11:16 -08:00
Daniel Han
686a97d750 Merge branch 'main' into nightly 2024-11-13 19:07:33 -08:00
Daniel Han
892115606d Update _utils.py 2024-11-13 19:07:26 -08:00
Daniel Han
2dca0cb94b Bug fixes (#1288)
* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

* Update _utils.py

* fix/transformers-unpack (#1180)

* Fix DPO, ORPO (#1177)

* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* donot upcast lm_head and embeddings to float32 (#1186)

* Cleanup upcast logs (#1188)

* Fix/phi-longrope (#1193)

* Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding

* Typo

* Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache

* Update llama.py

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update transformers

* Unk token issues

* Update _utils.py

* Fix pad token

* Update llama.py

* Typo

* ignored labels

* Revert "ignored labels"

This reverts commit 4b25138ac7.

* More patching

* Update _utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Feat/all tmp (#1219)

* Update save.py

Check whether path is in /tmp dir for Kaggle environment

* Update save.py

Move temporary_location to /tmp in Kaggle

* Enhance Kaggle environment support in save and tokenizer utilities

---------

Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>

* Bug fixes

* Update pyproject.toml

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Tied weights

* Revert "Tied weights"

This reverts commit 820cd4efef.

* Tied weights

* Utils

* CE Loss patching

* Update __init__.py

* Update __init__.py

* Patching

* Update cross_entropy_loss.py

* CE Loss

* Update _utils.py

* Update _utils.py

* CE Loss

* Update _utils.py

* Update _utils.py

* Layernorm

* Update _utils.py

* Update _utils.py

* Post patch

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* typing

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* int64

* Update _utils.py

* Update cross_entropy_loss.py

* constexpr

* constexpr

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* CE

* Update cross_entropy_loss.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* typing

* Update rope_embedding.py

* types

* Disable compiling

* Update _utils.py

* Update _utils.py

* Forward hook

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update llama.py

* CE Loss

* Update cross_entropy_loss.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254)

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors

* Update cross_entropy_loss.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Throw error when inferencing longer than max_popsition_embeddings (#1236)

* Throw error when inferencing longer than max_popsition_embeddings without rope scaling

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* CLI now handles user input strings for dtype correctly (#1235)

Co-authored-by: root <root@ieeres.chu.cam.ac.uk>

* Update flex_attention.py

* Update _utils.py

* Update _utils.py

* Update flex_attention.py

* Update flex_attention.py

* Update loader.py

* Update loader.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* triton_cast

* Update utils.py

* Qwen 2.5 Coder

* Fix/export mistral (#1281)

* Enhance install_python_non_blocking to handle protobuf installation and process management

* Revert "Enhance install_python_non_blocking to handle protobuf installation and process management"

This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266"

This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* DOC Update - Update README.md with os.environ in example (#1269)

* Update README.md with os.environ in example

Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. 
As currently the  unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required.
Small change but a bit time saver for those who straight away copies the tutorials

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/get_chat_template (#1246)

* Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to

* Remove type hinting

* Update chat_templates.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/sft-trainer (#1276)

* Add patch for SFTTrainer to maintain backward compatibility with TRL changes

* Update trainer.py

* Update trainer.py

* Refactor trainer patch to maintain backward compatibility with TRL changes

* Update trainer.py

* Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update __init__.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update tokenizer_utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
2024-11-13 19:05:40 -08:00
Daniel Han
f554e663ec Update tokenizer_utils.py 2024-11-13 19:05:15 -08:00
Daniel Han
2e4bca5cbf Update trainer.py 2024-11-13 18:53:40 -08:00
Daniel Han
022d571835 Update trainer.py 2024-11-13 18:48:59 -08:00
Daniel Han
33c85a3bd0 Update trainer.py 2024-11-13 18:44:54 -08:00
Daniel Han
cec6e570a8 Update __init__.py 2024-11-13 17:38:26 -08:00
Edd
cad6df52c5 fix/sft-trainer (#1276)
* Add patch for SFTTrainer to maintain backward compatibility with TRL changes

* Update trainer.py

* Update trainer.py

* Refactor trainer patch to maintain backward compatibility with TRL changes

* Update trainer.py

* Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2024-11-13 17:33:30 -08:00
Edd
cb3608b72d fix/get_chat_template (#1246)
* Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to

* Remove type hinting

* Update chat_templates.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2024-11-13 00:06:48 -08:00
Uday Girish Maradana
b230fa13eb DOC Update - Update README.md with os.environ in example (#1269)
* Update README.md with os.environ in example

Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. 
As currently the  unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required.
Small change but a bit time saver for those who straight away copies the tutorials

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2024-11-12 23:55:28 -08:00
Edd
fb9a3ca1a1 Fix/export mistral (#1281)
* Enhance install_python_non_blocking to handle protobuf installation and process management

* Revert "Enhance install_python_non_blocking to handle protobuf installation and process management"

This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266"

This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2024-11-12 23:53:50 -08:00
Daniel Han
c94fa058b0 Merge branch 'main' into nightly 2024-11-12 23:51:46 -08:00
Daniel Han
6007831cef Update _utils.py 2024-11-12 10:54:58 -08:00
Daniel Han
6cc21e378d Qwen 2.5 (#1280)
* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

* Update _utils.py

* fix/transformers-unpack (#1180)

* Fix DPO, ORPO (#1177)

* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* donot upcast lm_head and embeddings to float32 (#1186)

* Cleanup upcast logs (#1188)

* Fix/phi-longrope (#1193)

* Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding

* Typo

* Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache

* Update llama.py

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update transformers

* Unk token issues

* Update _utils.py

* Fix pad token

* Update llama.py

* Typo

* ignored labels

* Revert "ignored labels"

This reverts commit 9d07be077b.

* More patching

* Update _utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Feat/all tmp (#1219)

* Update save.py

Check whether path is in /tmp dir for Kaggle environment

* Update save.py

Move temporary_location to /tmp in Kaggle

* Enhance Kaggle environment support in save and tokenizer utilities

---------

Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>

* Bug fixes

* Update pyproject.toml

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Tied weights

* Revert "Tied weights"

This reverts commit 8090b7c01a.

* Tied weights

* Utils

* CE Loss patching

* Update __init__.py

* Update __init__.py

* Patching

* Update cross_entropy_loss.py

* CE Loss

* Update _utils.py

* Update _utils.py

* CE Loss

* Update _utils.py

* Update _utils.py

* Layernorm

* Update _utils.py

* Update _utils.py

* Post patch

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* typing

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* int64

* Update _utils.py

* Update cross_entropy_loss.py

* constexpr

* constexpr

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* CE

* Update cross_entropy_loss.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* typing

* Update rope_embedding.py

* types

* Disable compiling

* Update _utils.py

* Update _utils.py

* Forward hook

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update llama.py

* CE Loss

* Update cross_entropy_loss.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254)

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors

* Update cross_entropy_loss.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Throw error when inferencing longer than max_popsition_embeddings (#1236)

* Throw error when inferencing longer than max_popsition_embeddings without rope scaling

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* CLI now handles user input strings for dtype correctly (#1235)

Co-authored-by: root <root@ieeres.chu.cam.ac.uk>

* Update flex_attention.py

* Update _utils.py

* Update _utils.py

* Update flex_attention.py

* Update flex_attention.py

* Update loader.py

* Update loader.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* triton_cast

* Update utils.py

* Qwen 2.5 Coder

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
2024-11-12 03:22:41 -08:00
Daniel Han
58810bf37e Qwen 2.5 Coder 2024-11-11 18:46:05 -08:00
Daniel Han
451ed2952d Update utils.py 2024-11-11 00:37:37 -08:00
Daniel Han
bbe1dda8c0 triton_cast 2024-11-11 00:17:22 -08:00
Daniel Han
a935d80026 Update tokenizer_utils.py 2024-11-11 00:04:02 -08:00
Daniel Han
61665e96d8 Update tokenizer_utils.py 2024-11-09 17:40:32 -08:00
Daniel Han
3c9e9b4a44 Update tokenizer_utils.py 2024-11-09 17:37:11 -08:00
Daniel Han
58f370ab26 Update tokenizer_utils.py 2024-11-09 17:34:47 -08:00
Daniel Han
6530a66a82 Update tokenizer_utils.py 2024-11-09 17:01:33 -08:00
Daniel Han
16edb6bdcc Update _utils.py 2024-11-07 01:11:45 -08:00
Daniel Han
e7d0ce17a8 Update cross_entropy_loss.py 2024-11-06 21:07:50 -08:00
Daniel Han
b3ce5868ca Merge branch 'main' into nightly 2024-11-06 19:02:44 -08:00
Daniel Han
a93762532d Update _utils.py 2024-11-06 19:00:42 -08:00
Daniel Han
01cd5b3370 Update loader.py 2024-11-06 19:00:23 -08:00
Daniel Han
a8ddc39482 Update loader.py 2024-11-06 19:00:13 -08:00
Daniel Han
070128c5c1 Merge branch 'main' into nightly 2024-11-06 17:21:21 -08:00
Daniel Han
c9b5d5cea3 Bug fixes (#1259)
* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

* Update _utils.py

* fix/transformers-unpack (#1180)

* Fix DPO, ORPO (#1177)

* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* donot upcast lm_head and embeddings to float32 (#1186)

* Cleanup upcast logs (#1188)

* Fix/phi-longrope (#1193)

* Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding

* Typo

* Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache

* Update llama.py

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update transformers

* Unk token issues

* Update _utils.py

* Fix pad token

* Update llama.py

* Typo

* ignored labels

* Revert "ignored labels"

This reverts commit 9d07be077b.

* More patching

* Update _utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Feat/all tmp (#1219)

* Update save.py

Check whether path is in /tmp dir for Kaggle environment

* Update save.py

Move temporary_location to /tmp in Kaggle

* Enhance Kaggle environment support in save and tokenizer utilities

---------

Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>

* Bug fixes

* Update pyproject.toml

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Tied weights

* Revert "Tied weights"

This reverts commit 8090b7c01a.

* Tied weights

* Utils

* CE Loss patching

* Update __init__.py

* Update __init__.py

* Patching

* Update cross_entropy_loss.py

* CE Loss

* Update _utils.py

* Update _utils.py

* CE Loss

* Update _utils.py

* Update _utils.py

* Layernorm

* Update _utils.py

* Update _utils.py

* Post patch

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* typing

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* int64

* Update _utils.py

* Update cross_entropy_loss.py

* constexpr

* constexpr

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* CE

* Update cross_entropy_loss.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* typing

* Update rope_embedding.py

* types

* Disable compiling

* Update _utils.py

* Update _utils.py

* Forward hook

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update llama.py

* CE Loss

* Update cross_entropy_loss.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254)

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors

* Update cross_entropy_loss.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Throw error when inferencing longer than max_popsition_embeddings (#1236)

* Throw error when inferencing longer than max_popsition_embeddings without rope scaling

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* CLI now handles user input strings for dtype correctly (#1235)

Co-authored-by: root <root@ieeres.chu.cam.ac.uk>

* Update flex_attention.py

* Update _utils.py

* Update _utils.py

* Update flex_attention.py

* Update flex_attention.py

* Update loader.py

* Update loader.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
2024-11-06 17:17:19 -08:00
Daniel Han
c62f901d00 Update _utils.py 2024-11-06 17:14:56 -08:00
Daniel Han
7ce530b5cc Update flex_attention.py 2024-11-06 16:59:15 -08:00
Daniel Han
c8005418c5 Update flex_attention.py 2024-11-06 15:56:52 -08:00
Daniel Han
92c99ce97c Update flex_attention.py 2024-11-06 15:56:26 -08:00
Daniel Han
297e25007c Update flex_attention.py 2024-11-06 15:39:32 -08:00
Daniel Han
746ff24ed2 Update loader.py 2024-11-06 15:13:03 -08:00
Daniel Han
52e3a2bf9a Update loader.py 2024-11-06 15:05:10 -08:00
Daniel Han
1d11e3e391 Update flex_attention.py 2024-11-06 14:54:53 -08:00
Daniel Han
b3f1a866f4 Update flex_attention.py 2024-11-06 14:51:52 -08:00
Daniel Han
981bf005a6 Update _utils.py 2024-11-06 14:49:37 -08:00
Daniel Han
0a6bdf93b5 Update _utils.py 2024-11-06 14:05:52 -08:00
Daniel Han
33874414b8 Update flex_attention.py 2024-11-06 13:09:29 -08:00
Edwin Fennell
08db916009 CLI now handles user input strings for dtype correctly (#1235)
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
2024-11-06 12:23:09 -08:00
Datta Nimmaturi
13ab93547e Throw error when inferencing longer than max_popsition_embeddings (#1236)
* Throw error when inferencing longer than max_popsition_embeddings without rope scaling

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2024-11-06 12:22:08 -08:00
Edd
8a14fe3f1e Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254)
* Fix: cast logits to float32 in cross_entropy_forward to prevent errors

* Update cross_entropy_loss.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2024-11-06 12:16:02 -08:00
Daniel Han
ccce1f24e4 Merge branch 'main' into nightly 2024-11-06 12:13:56 -08:00
Daniel Han
4f8bf42442 Bug fixes (#1255)
* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

* Update _utils.py

* fix/transformers-unpack (#1180)

* Fix DPO, ORPO (#1177)

* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* donot upcast lm_head and embeddings to float32 (#1186)

* Cleanup upcast logs (#1188)

* Fix/phi-longrope (#1193)

* Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding

* Typo

* Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache

* Update llama.py

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update transformers

* Unk token issues

* Update _utils.py

* Fix pad token

* Update llama.py

* Typo

* ignored labels

* Revert "ignored labels"

This reverts commit 9d07be077b.

* More patching

* Update _utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Feat/all tmp (#1219)

* Update save.py

Check whether path is in /tmp dir for Kaggle environment

* Update save.py

Move temporary_location to /tmp in Kaggle

* Enhance Kaggle environment support in save and tokenizer utilities

---------

Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>

* Bug fixes

* Update pyproject.toml

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Tied weights

* Revert "Tied weights"

This reverts commit 8090b7c01a.

* Tied weights

* Utils

* CE Loss patching

* Update __init__.py

* Update __init__.py

* Patching

* Update cross_entropy_loss.py

* CE Loss

* Update _utils.py

* Update _utils.py

* CE Loss

* Update _utils.py

* Update _utils.py

* Layernorm

* Update _utils.py

* Update _utils.py

* Post patch

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* typing

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* int64

* Update _utils.py

* Update cross_entropy_loss.py

* constexpr

* constexpr

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* CE

* Update cross_entropy_loss.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* typing

* Update rope_embedding.py

* types

* Disable compiling

* Update _utils.py

* Update _utils.py

* Forward hook

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update llama.py

* CE Loss

* Update cross_entropy_loss.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>
2024-11-06 12:08:55 -08:00
Daniel Han
4df520f08e Update _utils.py 2024-11-06 02:10:37 -08:00
Daniel Han
126b5a3ff7 Update _utils.py 2024-11-06 00:18:04 -08:00
Daniel Han
7cda12fd10 Update _utils.py 2024-11-05 22:48:36 -08:00
Daniel Han
362d187f53 Update _utils.py 2024-11-05 21:44:38 -08:00
Daniel Han
a12cc6b55e Update _utils.py 2024-11-05 21:24:56 -08:00
Daniel Han
1fa35b1e69 Merge branch 'main' into nightly 2024-11-05 21:24:47 -08:00
Daniel Han
7c684fb793 Bug fix (#1249)
* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

* Update _utils.py

* fix/transformers-unpack (#1180)

* Fix DPO, ORPO (#1177)

* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* donot upcast lm_head and embeddings to float32 (#1186)

* Cleanup upcast logs (#1188)

* Fix/phi-longrope (#1193)

* Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding

* Typo

* Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache

* Update llama.py

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update transformers

* Unk token issues

* Update _utils.py

* Fix pad token

* Update llama.py

* Typo

* ignored labels

* Revert "ignored labels"

This reverts commit 9d07be077b.

* More patching

* Update _utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Feat/all tmp (#1219)

* Update save.py

Check whether path is in /tmp dir for Kaggle environment

* Update save.py

Move temporary_location to /tmp in Kaggle

* Enhance Kaggle environment support in save and tokenizer utilities

---------

Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>

* Bug fixes

* Update pyproject.toml

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Tied weights

* Revert "Tied weights"

This reverts commit 8090b7c01a.

* Tied weights

* Utils

* CE Loss patching

* Update __init__.py

* Update __init__.py

* Patching

* Update cross_entropy_loss.py

* CE Loss

* Update _utils.py

* Update _utils.py

* CE Loss

* Update _utils.py

* Update _utils.py

* Layernorm

* Update _utils.py

* Update _utils.py

* Post patch

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* typing

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* int64

* Update _utils.py

* Update cross_entropy_loss.py

* constexpr

* constexpr

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* CE

* Update cross_entropy_loss.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* typing

* Update rope_embedding.py

* types

* Disable compiling

* Update _utils.py

* Update _utils.py

* Forward hook

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update llama.py

* CE Loss

* Update cross_entropy_loss.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>
2024-11-05 21:08:11 -08:00
Daniel Han
1df89aec1e Update llama.py 2024-11-05 21:07:02 -08:00
Daniel Han
6fed054c2e Merge branch 'main' into nightly 2024-11-05 21:06:13 -08:00
Daniel Han
3f9b395d48 Update cross_entropy_loss.py 2024-11-05 21:01:33 -08:00
Daniel Han
84cf5e1585 Update cross_entropy_loss.py 2024-11-05 20:58:15 -08:00
Daniel Han
792c473b7d Update cross_entropy_loss.py 2024-11-05 20:29:30 -08:00
Daniel Han
e9451a36d4 Update _utils.py 2024-11-05 14:54:07 -08:00
Daniel Han
cf2a714d74 Update cross_entropy_loss.py 2024-11-05 14:43:49 -08:00
Daniel Han
27cd2b3a0d CE Loss 2024-11-05 14:40:57 -08:00
Daniel Han
fef9932bf0 Update llama.py 2024-11-05 13:52:12 -08:00
Daniel Han
92a044197d Update _utils.py 2024-11-05 13:35:43 -08:00
Daniel Han
15268ba184 Bug fixes (#1245)
* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

* Update _utils.py

* fix/transformers-unpack (#1180)

* Fix DPO, ORPO (#1177)

* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* donot upcast lm_head and embeddings to float32 (#1186)

* Cleanup upcast logs (#1188)

* Fix/phi-longrope (#1193)

* Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding

* Typo

* Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache

* Update llama.py

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update transformers

* Unk token issues

* Update _utils.py

* Fix pad token

* Update llama.py

* Typo

* ignored labels

* Revert "ignored labels"

This reverts commit 9d07be077b.

* More patching

* Update _utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Feat/all tmp (#1219)

* Update save.py

Check whether path is in /tmp dir for Kaggle environment

* Update save.py

Move temporary_location to /tmp in Kaggle

* Enhance Kaggle environment support in save and tokenizer utilities

---------

Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>

* Bug fixes

* Update pyproject.toml

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Tied weights

* Revert "Tied weights"

This reverts commit 8090b7c01a.

* Tied weights

* Utils

* CE Loss patching

* Update __init__.py

* Update __init__.py

* Patching

* Update cross_entropy_loss.py

* CE Loss

* Update _utils.py

* Update _utils.py

* CE Loss

* Update _utils.py

* Update _utils.py

* Layernorm

* Update _utils.py

* Update _utils.py

* Post patch

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* typing

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* int64

* Update _utils.py

* Update cross_entropy_loss.py

* constexpr

* constexpr

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* CE

* Update cross_entropy_loss.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* typing

* Update rope_embedding.py

* types

* Disable compiling

* Update _utils.py

* Update _utils.py

* Forward hook

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update pyproject.toml

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>
2024-11-05 13:29:37 -08:00
Daniel Han
5bb3e0d462 Update pyproject.toml 2024-11-05 13:28:25 -08:00
Daniel Han
09ad3757c9 Update _utils.py 2024-11-05 12:05:17 -08:00
Daniel Han
0f9b6390a6 Update llama.py 2024-11-05 02:06:27 -08:00
Daniel Han
250109be45 Update llama.py 2024-11-05 01:55:17 -08:00
Daniel Han
116f286d0c Update _utils.py 2024-11-05 01:52:27 -08:00
Daniel Han
c18bac88be Update llama.py 2024-11-05 01:42:44 -08:00
Daniel Han
ed61c33d2c Update _utils.py 2024-11-05 01:41:04 -08:00
Daniel Han
a184f38f7a Forward hook 2024-11-05 01:36:11 -08:00
Daniel Han
09c1f6d677 Update _utils.py 2024-11-05 01:30:34 -08:00
Daniel Han
b0f010259f Update _utils.py 2024-11-05 00:15:56 -08:00
Daniel Han
f627c1bf35 Disable compiling 2024-11-05 00:11:32 -08:00
Daniel Han
81a87cc262 types 2024-11-05 00:09:11 -08:00
Daniel Han
4076dc49cb Update rope_embedding.py 2024-11-05 00:08:16 -08:00
Daniel Han
0ab731d40b typing 2024-11-05 00:05:56 -08:00
Daniel Han
9384cb9f72 Update rms_layernorm.py 2024-11-04 23:54:02 -08:00
Daniel Han
0b2363bacd Update rms_layernorm.py 2024-11-04 23:47:38 -08:00
Daniel Han
6836e43f27 Update rms_layernorm.py 2024-11-04 23:45:25 -08:00
Daniel Han
eddf0c46f4 Update rms_layernorm.py 2024-11-04 23:39:03 -08:00
Daniel Han
bf97546a16 Update rms_layernorm.py 2024-11-04 23:34:11 -08:00
Daniel Han
4485a342e0 Update rms_layernorm.py 2024-11-04 23:33:08 -08:00
Daniel Han
34abdb309d Update rms_layernorm.py 2024-11-04 23:30:55 -08:00
Daniel Han
fd07c2ee73 Update rms_layernorm.py 2024-11-04 23:28:57 -08:00
Daniel Han
c35f115537 Update rms_layernorm.py 2024-11-04 23:27:50 -08:00
Daniel Han
d80444fe3e Update rms_layernorm.py 2024-11-04 23:24:33 -08:00
Daniel Han
40858e4f76 Update rms_layernorm.py 2024-11-04 23:20:32 -08:00
Daniel Han
a3429d208d Update rms_layernorm.py 2024-11-04 23:19:02 -08:00
Daniel Han
4e4bdc4fcf Update utils.py 2024-11-04 23:16:01 -08:00
Daniel Han
578c8fde96 Update rms_layernorm.py 2024-11-04 23:11:03 -08:00
Daniel Han
15fdce80b9 Update rms_layernorm.py 2024-11-04 23:08:05 -08:00
Daniel Han
106d35fe8a Update rms_layernorm.py 2024-11-04 23:06:13 -08:00
Daniel Han
9de3649586 Update rms_layernorm.py 2024-11-04 23:04:34 -08:00
Daniel Han
74e5310ba1 Update rms_layernorm.py 2024-11-04 23:01:14 -08:00
Daniel Han
08f6b3ac3c Update rms_layernorm.py 2024-11-04 22:38:47 -08:00
Daniel Han
1c8582dd34 Update _utils.py 2024-11-04 22:10:52 -08:00
Daniel Han
3524fb6fd9 Update llama.py 2024-11-04 22:08:49 -08:00
Daniel Han
344b32931f Update _utils.py 2024-11-04 21:47:48 -08:00
Daniel Han
1518da6eef Update cross_entropy_loss.py 2024-11-04 19:48:09 -08:00
Daniel Han
543d31f6ec CE 2024-11-04 19:45:10 -08:00
Daniel Han
72c568498a Update _utils.py 2024-11-04 17:56:16 -08:00
Daniel Han
a1b608e889 Update _utils.py 2024-11-04 01:23:24 -08:00
Daniel Han
1f0fd97f46 Update _utils.py 2024-11-04 01:20:31 -08:00
Daniel Han
80dea539a7 Update cross_entropy_loss.py 2024-11-04 00:32:35 -08:00
Daniel Han
287d50c257 Update cross_entropy_loss.py 2024-11-04 00:07:52 -08:00
Daniel Han
0c48e0c179 constexpr 2024-11-04 00:05:51 -08:00
Daniel Han
d2725e7910 constexpr 2024-11-04 00:03:24 -08:00
Daniel Han
006bafb437 Update cross_entropy_loss.py 2024-11-04 00:01:04 -08:00
Daniel Han
ffe5d81c4f Update _utils.py 2024-11-03 23:55:39 -08:00
Daniel Han
eaff11e1c3 int64 2024-11-03 23:53:03 -08:00
Daniel Han
5eddfd8cfe Update cross_entropy_loss.py 2024-11-03 21:42:13 -08:00
Daniel Han
fee02b903b Update cross_entropy_loss.py 2024-11-03 21:40:13 -08:00
Daniel Han
c61a9b5593 Update cross_entropy_loss.py 2024-11-03 21:38:19 -08:00
Daniel Han
c1f3875371 Update cross_entropy_loss.py 2024-11-03 21:35:27 -08:00
Daniel Han
9384452eae Update cross_entropy_loss.py 2024-11-03 21:35:01 -08:00
Daniel Han
539a1406bd Update cross_entropy_loss.py 2024-11-03 21:33:26 -08:00
Daniel Han
9875cbd810 Update cross_entropy_loss.py 2024-11-03 21:31:18 -08:00
Daniel Han
9d13739145 Update cross_entropy_loss.py 2024-11-03 21:29:19 -08:00
Daniel Han
f6acbbeb03 Update cross_entropy_loss.py 2024-11-03 21:27:37 -08:00
Daniel Han
1e98cc3eb2 typing 2024-11-03 21:25:32 -08:00
Daniel Han
43c3da8776 Update cross_entropy_loss.py 2024-11-03 21:23:28 -08:00
Daniel Han
c5b142e7b5 Update cross_entropy_loss.py 2024-11-03 21:22:08 -08:00
Daniel Han
4cedfeae87 Update cross_entropy_loss.py 2024-11-03 21:19:42 -08:00
Daniel Han
c49cf23ac1 Update cross_entropy_loss.py 2024-11-03 21:18:26 -08:00
Daniel Han
adebfa1874 Update cross_entropy_loss.py 2024-11-03 21:16:19 -08:00
Daniel Han
66b807b6dd Update cross_entropy_loss.py 2024-11-03 21:11:34 -08:00
Daniel Han
86f5a300e0 Update cross_entropy_loss.py 2024-11-03 21:09:34 -08:00
Daniel Han
a4471de988 Update cross_entropy_loss.py 2024-11-03 21:07:23 -08:00
Daniel Han
dc552adcf1 Update cross_entropy_loss.py 2024-11-03 21:03:24 -08:00
Daniel Han
7fd6bed2d3 Update cross_entropy_loss.py 2024-11-03 20:58:59 -08:00
Daniel Han
ad79b86e86 Update cross_entropy_loss.py 2024-11-03 20:58:31 -08:00
Daniel Han
f13c60b73d Update cross_entropy_loss.py 2024-11-03 20:49:59 -08:00
Daniel Han
630ec299ac Update cross_entropy_loss.py 2024-11-03 20:03:03 -08:00
Daniel Han
9a68c7fec9 Update cross_entropy_loss.py 2024-11-03 19:59:59 -08:00
Daniel Han
2cff29f70c Update cross_entropy_loss.py 2024-11-03 19:57:26 -08:00
Daniel Han
e773a4ffe6 Update cross_entropy_loss.py 2024-11-03 18:35:43 -08:00
Daniel Han
f3a93319b0 Update cross_entropy_loss.py 2024-11-03 18:33:02 -08:00
Daniel Han
90af825bed Update _utils.py 2024-11-03 18:13:11 -08:00
Daniel Han
5eeffb8965 Update llama.py 2024-11-03 18:08:22 -08:00
Daniel Han
74f0223765 Update _utils.py 2024-11-03 17:56:03 -08:00
Daniel Han
6ca6013f1d Post patch 2024-11-03 17:49:08 -08:00
Daniel Han
374e0d9294 Update _utils.py 2024-11-03 17:15:27 -08:00
Daniel Han
0b79fb1618 Update _utils.py 2024-11-03 17:09:37 -08:00
Daniel Han
68b6bebded Layernorm 2024-11-03 17:05:33 -08:00
Daniel Han
73ed5dcfa0 Update _utils.py 2024-11-03 15:25:18 -08:00
Daniel Han
cc5c57b764 Update _utils.py 2024-11-03 15:21:03 -08:00
Daniel Han
37b160d724 CE Loss 2024-11-03 15:15:58 -08:00
Daniel Han
3591f347eb Update _utils.py 2024-11-03 15:02:38 -08:00
Daniel Han
21d50ad49d Update _utils.py 2024-11-03 15:01:29 -08:00
Daniel Han
2ea161c913 CE Loss 2024-11-03 14:15:08 -08:00
Daniel Han
29a73e562d Update cross_entropy_loss.py 2024-11-03 14:09:47 -08:00
Daniel Han
13e8f1a52d Patching 2024-11-03 01:18:48 -07:00
Daniel Han
b2260942fa Update __init__.py 2024-11-03 00:22:38 -07:00
Daniel Han
0ebecb759a Update __init__.py 2024-11-02 23:14:41 -07:00
Daniel Han
726e7c933b CE Loss patching 2024-11-02 19:20:48 -07:00
Daniel Han
4b8906fe50 Utils 2024-11-02 19:08:57 -07:00
Daniel Han
27d2d1df49 Tied weights 2024-10-31 16:25:26 -07:00
Daniel Han
e2aa4d6a1a Revert "Tied weights"
This reverts commit 820cd4efef.
2024-10-31 12:38:20 -07:00
Daniel Han
820cd4efef Tied weights 2024-10-31 12:36:21 -07:00
Daniel Han
f3bd5d6d33 Update cross_entropy_loss.py 2024-10-31 02:00:04 -07:00
Daniel Han
dd5d035a99 Update cross_entropy_loss.py 2024-10-31 01:56:03 -07:00
Daniel Han
8942b2fc76 Update cross_entropy_loss.py 2024-10-31 01:23:41 -07:00
Daniel Han
67ca220d94 Update cross_entropy_loss.py 2024-10-30 17:35:20 -07:00
Daniel Han
dea8630305 Update cross_entropy_loss.py 2024-10-30 17:26:34 -07:00
Daniel Han
93785d3578 Update cross_entropy_loss.py 2024-10-30 17:19:30 -07:00
Daniel Han
4a369c264a Update cross_entropy_loss.py 2024-10-30 16:55:24 -07:00
Daniel Han
65e99c28cb Update cross_entropy_loss.py 2024-10-30 16:53:04 -07:00
Daniel Han
c8056af236 Update cross_entropy_loss.py 2024-10-30 16:50:56 -07:00
Daniel Han
895a5364f0 Update cross_entropy_loss.py 2024-10-30 16:46:33 -07:00
Daniel Han
6ef574b58a Update cross_entropy_loss.py 2024-10-30 16:37:37 -07:00
Daniel Han
c60053e4af Update cross_entropy_loss.py 2024-10-30 15:46:09 -07:00
Daniel Han
9bf58f1770 Update cross_entropy_loss.py 2024-10-30 15:33:37 -07:00
Daniel Han
cc5a1728ff Update cross_entropy_loss.py 2024-10-30 14:57:14 -07:00
Daniel Han
233ec38c26 Update _utils.py 2024-10-30 14:01:49 -07:00
Daniel Han
353f7eeb9f Update _utils.py 2024-10-30 14:00:03 -07:00
Daniel Han
6ce3958c87 Update _utils.py 2024-10-30 13:54:11 -07:00
Daniel Han
cfc84d83d9 Update _utils.py 2024-10-30 13:51:10 -07:00
Daniel Han
6e6cb3c0da Update __init__.py 2024-10-30 13:47:21 -07:00
Daniel Han
a919f17648 Update __init__.py 2024-10-30 13:44:05 -07:00
Daniel Han
bedce9ac2c Update _utils.py 2024-10-30 13:35:01 -07:00
Daniel Han
f88122bd43 Update pyproject.toml 2024-10-30 13:16:00 -07:00
Daniel Han
2d86834813 Bug fixes 2024-10-30 13:11:43 -07:00
Daniel Han
09f667a533 Feat/all tmp (#1219)
* Update save.py

Check whether path is in /tmp dir for Kaggle environment

* Update save.py

Move temporary_location to /tmp in Kaggle

* Enhance Kaggle environment support in save and tokenizer utilities

---------

Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>
2024-10-30 00:43:03 -07:00
Daniel Han
8e30e2e646 Update cross_entropy_loss.py 2024-10-28 15:01:04 -07:00
Daniel Han
d320355de4 Update cross_entropy_loss.py 2024-10-28 14:47:11 -07:00
Daniel Han
1726f04b97 Update cross_entropy_loss.py 2024-10-28 14:30:06 -07:00
Daniel Han
5c669defd5 Update _utils.py 2024-10-28 10:41:31 -07:00
Daniel Han
54ed0fa410 Update _utils.py 2024-10-28 01:12:33 -07:00
Daniel Han
cbbdff23fc More patching 2024-10-28 01:10:23 -07:00
Daniel Han
38e5b23223 Revert "ignored labels"
This reverts commit 4b25138ac7.
2024-10-27 22:18:05 -07:00
Daniel Han
4b25138ac7 ignored labels 2024-10-27 22:10:59 -07:00
Daniel Han
e4205ffad5 Typo 2024-10-27 19:09:27 -07:00
Daniel Han
65f754e0f4 Update llama.py 2024-10-27 19:08:14 -07:00
Daniel Han
ac8d5fc3cb Fix pad token 2024-10-27 19:06:57 -07:00
Daniel Han
6f19b9aecd Update _utils.py 2024-10-27 17:34:33 -07:00
Daniel Han
d818697900 Unk token issues 2024-10-27 17:32:26 -07:00
Daniel Han
b040e3407d Merge branch 'main' into nightly 2024-10-27 16:24:20 -07:00
Daniel Han
9e9d6fe660 Merge branch 'main' of https://github.com/unslothai/unsloth 2024-10-27 15:09:42 -07:00
Daniel Han
55fd65a6ed Update _utils.py 2024-10-27 15:09:35 -07:00
Edd
539fcea071 Fix/casting continue pretraining (#1200)
* Bring back float32 if float16 instead of bfloat16

* Refactor mixed precision handling for lm_head and embed_tokens to ensure correct dtype usage

* Fix dtype retrieval for embed_tokens and lm_head in mixed precision training

* Fix dtype retrieval for embed_tokens and lm_head to use weight dtype in mixed precision training

* Fix dtype handling for embed_tokens and lm_head to ensure correct float32 usage in mixed precision training

* Fix dtype assignment for lm_head modules to ensure correct weight dtype usage in mixed precision training
2024-10-27 15:06:45 -07:00
Daniel Han
9d5f58224d Update pyproject.toml 2024-10-26 18:05:55 -07:00
Daniel Han
e7ede2f7db Torch 2.5 2024-10-26 18:03:15 -07:00
Daniel Han
a4a724bab2 Merge branch 'main' into nightly 2024-10-26 01:22:21 -07:00
Daniel Han
c58dc701c8 Bug fixes (#1195)
* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

* Update _utils.py

* fix/transformers-unpack (#1180)

* Fix DPO, ORPO (#1177)

* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* donot upcast lm_head and embeddings to float32 (#1186)

* Cleanup upcast logs (#1188)

* Fix/phi-longrope (#1193)

* Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding

* Typo

* Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache

* Update llama.py

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update transformers

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
2024-10-26 01:21:24 -07:00
Daniel Han
1e8980127c Update transformers 2024-10-26 01:20:37 -07:00
Edd
c8c4cb3a6d Fix/phi-longrope (#1193)
* Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding

* Typo

* Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache

* Update llama.py

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2024-10-25 15:44:10 -07:00
Datta Nimmaturi
1ba5a0161d Cleanup upcast logs (#1188) 2024-10-25 12:17:54 -07:00
Datta Nimmaturi
06050f1802 donot upcast lm_head and embeddings to float32 (#1186) 2024-10-25 01:28:12 -07:00
Daniel Han
dcf27bcca7 Merge branch 'main' into nightly 2024-10-24 12:17:57 -07:00
Daniel Han
519c0df00c Update _utils.py 2024-10-24 12:17:48 -07:00
Daniel Han
06a5c752e3 Fix 4.47 issue (#1182)
* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

* Update _utils.py

* fix/transformers-unpack (#1180)

* Fix DPO, ORPO (#1177)

* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
2024-10-24 12:17:21 -07:00
Daniel Han
e24b2db194 Update _utils.py 2024-10-24 12:17:09 -07:00
Daniel Han
6f34885c29 Update _utils.py 2024-10-24 12:14:14 -07:00
Daniel Han
8603d08f3b Update cross_entropy_loss.py 2024-10-24 12:11:38 -07:00
Edd
79effae03d fix/transformers-unpack (#1180)
* Fix DPO, ORPO (#1177)

* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
2024-10-24 12:10:52 -07:00
Daniel Han
e3e4b7dfc3 Update _utils.py 2024-10-24 01:11:20 -07:00
Daniel Han
a6e4a8bf76 Fix DPO, ORPO (#1177)
* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
2024-10-24 00:36:37 -07:00
Daniel Han
0bd8517b1e Update _utils.py 2024-10-24 00:25:28 -07:00
Daniel Han
d6382ca656 Merge branch 'main' into nightly 2024-10-24 00:24:27 -07:00
Daniel Han
ccf1f946f3 Fix DPO, ORPO 2024-10-24 00:17:26 -07:00
Daniel Han
7fa3179e88 Update cross_entropy_loss.py 2024-10-23 22:18:24 -07:00
Daniel Han
dd8487a63e n_items 2024-10-23 22:13:45 -07:00
Daniel Han
aa48184c41 Update _utils.py 2024-10-23 12:39:58 -07:00
Edd
da8e547678 Fix/patch tokenizer (#1171)
* fix: correct tokenizer handling in patch_sft_trainer_tokenizer

* Revert "fix: correct tokenizer handling in patch_sft_trainer_tokenizer"

This reverts commit 7a98e465cbd4f980c8b364b0396d44f2d052090f.

* fix: correct condition for test_text assignment in patch_sft_trainer_tokenizer
2024-10-23 12:32:33 -07:00
Daniel Han
4c85177719 Many bug fixes (#1162)
* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
2024-10-23 03:14:57 -07:00
Daniel Han
8e4cd551e7 Update _utils.py 2024-10-23 03:14:48 -07:00
Daniel Han
f1d3a8ae6c Update tokenizer_utils.py 2024-10-23 03:04:22 -07:00
Daniel Han
6a1bef2c4a Disable Flex Attention 2024-10-23 02:58:40 -07:00
Ikko Eltociear Ashimine
5d1de9d42e chore: update chat_templates.py (#1166)
orginal -> original
2024-10-23 00:59:02 -07:00
timothelaborie
8dbf2d5daa Installation guide (#1165) 2024-10-23 00:55:26 -07:00
Daniel Han
1623c3242b Update tokenizer_utils.py 2024-10-22 01:28:38 -07:00
Daniel Han
444bc97f13 Update tokenizer_utils.py 2024-10-22 01:22:13 -07:00
Daniel Han
e749d57859 Update tokenizer_utils.py 2024-10-22 01:09:20 -07:00
Daniel Han
48eda0ba34 Update tokenizer_utils.py 2024-10-22 01:05:41 -07:00
Daniel Han
ccf31a0285 Update tokenizer_utils.py 2024-10-22 00:57:36 -07:00
Daniel Han
e8bc00bd33 Update tokenizer_utils.py 2024-10-22 00:55:47 -07:00
Daniel Han
30230e6a02 Patch processing_class 2024-10-22 00:53:38 -07:00
Daniel Han
0b20946bcf Update mistral.py 2024-10-22 00:29:30 -07:00
Daniel Han
4a51572765 Fix TRL 2024-10-21 01:02:53 -07:00
Daniel Han
a51a84f62d Update save.py 2024-10-20 01:52:21 -07:00
Daniel Han
108fa8dfbe Update _utils.py 2024-10-18 23:10:30 -07:00
Daniel Han
828ef9afc5 Fix get_token 2024-10-18 23:08:30 -07:00
vo1d-ai
f63a2a5026 fix: compute_loss bug (#1151)
Currently, Unsloth doesn't pass additional parameters to Trainer.compute_loss such as return_outputs. This leads to errors when calling trainer.evaluate(). This change fixes the bug by properly passing parameters to Trainer.compute_loss.
2024-10-18 20:46:07 -07:00
Daniel Han
cde7401259 Update _utils.py 2024-10-17 20:50:05 -07:00
Daniel Han
139c3b29b3 Update README.md 2024-10-17 20:46:11 -07:00
Daniel Han
3a33dad3c9 Update README.md 2024-10-17 20:45:40 -07:00
Daniel Han
d57dcf58a1 Gradient Accumulation Fix (#1146)
* Unsloth Zoo

* Update trainer.py

* Update trainer.py

* Update cross_entropy_loss.py

* n_items

* Update llama.py

* kwargs

* Remove extraneous f prefixes (#1133)

Co-authored-by: Emil Sadek <esadek@users.noreply.github.com>

* Update __init__.py

* kwargs

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Fix GA

* Update _utils.py

* Update llama.py

* Update tokenizer_utils.py

* Warn on old versions

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

---------

Co-authored-by: Emil Sadek <esadek@hotmail.com>
Co-authored-by: Emil Sadek <esadek@users.noreply.github.com>
2024-10-17 20:43:07 -07:00
Daniel Han
6ba202708c Update mapper.py 2024-10-16 21:48:05 -07:00
Daniel Han
d2a032e117 Gradient Accumulation Fix (#1134)
* Unsloth Zoo

* Update trainer.py

* Update trainer.py

* Update cross_entropy_loss.py

* n_items

* Update llama.py

* kwargs

* Remove extraneous f prefixes (#1133)

Co-authored-by: Emil Sadek <esadek@users.noreply.github.com>

* Update __init__.py

---------

Co-authored-by: Emil Sadek <esadek@hotmail.com>
Co-authored-by: Emil Sadek <esadek@users.noreply.github.com>
2024-10-14 19:17:35 -07:00
Daniel Han
5bd7d3640f Update save.py 2024-10-11 00:00:06 -07:00
Daniel Han
9dd4462bf9 Update save.py 2024-10-10 23:22:17 -07:00
Giulia Baldini
592191b061 Only remove folder in sentenpiece check if it was created (#1121) 2024-10-10 23:21:27 -07:00
Giulia Baldini
5f2d5a3021 Handle absolute paths using pathlib (#1120) 2024-10-10 23:20:34 -07:00
Daniel Han
e130e748f0 Reload 2024-10-05 17:21:48 -07:00
Daniel Han
c89ae6b9b4 Merge branch 'nightly' 2024-10-01 00:45:16 -07:00
Daniel Han
3c47723bb2 Update README.md 2024-10-01 00:40:17 -07:00
Daniel Han
7fc9b07b94 Update tokenizer_utils.py 2024-10-01 00:35:54 -07:00
Daniel Han
3017eae097 Update tokenizer_utils.py 2024-10-01 00:20:01 -07:00
Daniel Han
dfc5cd3c80 Update tokenizer_utils.py 2024-10-01 00:14:52 -07:00
Daniel Han
ac3f564f7e Update chat_templates.py 2024-09-30 23:08:46 -07:00
Daniel Han
248c27d205 Fix merges (#1079)
* Layernorm

* Update layernorm.py

* Update layernorm.py

* Update layernorm.py

* Update layernorm.py

* Update layernorm.py

* Update layernorm.py

* Patch layernorm

* Update layernorm.py

* RMS Layernorm

* Update rms_layernorm.py

* Causal LM

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update layernorm.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* Llama 3.2

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update loader.py

* Update loader.py

* Update loader.py

* Dependencies

* Update pyproject.toml

* Update _utils.py
2024-09-30 03:03:01 -07:00
Daniel Han
0f1f2cf728 Update _utils.py 2024-09-30 02:51:32 -07:00
Daniel Han
906f88bb4d Update pyproject.toml 2024-09-30 02:48:31 -07:00
Daniel Han
e31152134e Dependencies 2024-09-30 02:09:15 -07:00
Daniel Han
45916d36cf Update loader.py 2024-09-29 23:22:05 -07:00
Daniel Han
a529a39c81 Update loader.py 2024-09-29 23:15:22 -07:00
Daniel Han
f744b3159e Update loader.py 2024-09-29 23:13:44 -07:00
Daniel Han
ab43e02a94 Merge branch 'main' into nightly 2024-09-29 23:13:16 -07:00
Daniel Han
afbb140a79 Update loader.py 2024-09-29 01:42:58 -07:00
Daniel Han
b314837622 Update pyproject.toml 2024-09-27 01:36:45 -07:00
Daniel Han
c0b4d640f2 Update tokenizer_utils.py 2024-09-26 01:23:40 -07:00
Daniel Han
88a542a129 Update README.md 2024-09-26 00:12:42 -07:00
Daniel Han
6bbca3aaa8 Update README.md 2024-09-26 00:05:38 -07:00
Daniel Han
4f4ef22035 Update README.md 2024-09-26 00:02:15 -07:00
Daniel Han
930d2ad1a8 Update pyproject.toml 2024-09-25 23:47:15 -07:00
Daniel Han
5b345ec757 Update pyproject.toml 2024-09-25 23:13:49 -07:00
Daniel Han
c331c886ee Remove version checks 2024-09-25 23:00:09 -07:00
Daniel Han
63e3a85efb Update _utils.py 2024-09-25 22:56:41 -07:00
Daniel Han
dc8bca6713 Update llama.py 2024-09-25 22:12:21 -07:00
Daniel Han
8c6acbc6ce Update llama.py 2024-09-25 19:38:01 -07:00
Daniel Han
50b9003936 Update llama.py 2024-09-25 19:15:40 -07:00
Daniel Han
70775fa740 Update llama.py 2024-09-25 18:24:30 -07:00
Daniel Han
013a2ed95b Update llama.py 2024-09-25 18:10:46 -07:00
Daniel Han
8d910ecba9 Update llama.py 2024-09-25 18:07:21 -07:00
Daniel Han
ff921b8601 Update llama.py 2024-09-25 17:55:42 -07:00
Daniel Han
b61d75592a Update llama.py 2024-09-25 17:51:36 -07:00
Daniel Han
2aae911368 Update llama.py 2024-09-25 17:46:29 -07:00
Daniel Han
48777c492a Update vision.py 2024-09-25 17:44:13 -07:00
Daniel Han
62791d8f98 Update llama.py 2024-09-25 14:32:29 -07:00
Daniel Han
2af2e6e439 Update _utils.py 2024-09-25 14:12:16 -07:00
Daniel Han
54e59f3f49 Update _utils.py 2024-09-25 13:28:05 -07:00
Daniel Han
4e516b76c5 Update _utils.py 2024-09-25 13:19:01 -07:00
Daniel Han
6bc0a12470 Merge branch 'main' into nightly 2024-09-25 13:18:42 -07:00
Daniel Han
3cc9c2410e Update llama.py 2024-09-25 12:42:24 -07:00
Daniel Han
0c07b760de Fix version 2024-09-25 12:35:38 -07:00
Daniel Han
f22a14801d Llama 3.2 2024-09-25 12:18:43 -07:00
Daniel Han
0a05da55f0 Llama 3.2 (#1058)
* Layernorm

* Update layernorm.py

* Update layernorm.py

* Update layernorm.py

* Update layernorm.py

* Update layernorm.py

* Update layernorm.py

* Patch layernorm

* Update layernorm.py

* RMS Layernorm

* Update rms_layernorm.py

* Causal LM

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update layernorm.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* Llama 3.2
2024-09-25 11:48:24 -07:00
Daniel Han
fa0b63b10b Llama 3.2 2024-09-25 11:42:41 -07:00
Daniel Han
50edf0fd32 Update _utils.py 2024-09-25 00:23:23 -07:00
Daniel Han
b40724b730 Update _utils.py 2024-09-25 00:22:54 -07:00
Daniel Han
fe61245404 Update cross_entropy_loss.py 2024-09-25 00:11:57 -07:00
Daniel Han
dc7b21305d Update cross_entropy_loss.py 2024-09-25 00:09:19 -07:00
Daniel Han
91930ec80c Update cross_entropy_loss.py 2024-09-25 00:07:49 -07:00
Daniel Han
ad4b093ece Update cross_entropy_loss.py 2024-09-25 00:06:10 -07:00
Daniel Han
bb12d6bd2b Update cross_entropy_loss.py 2024-09-25 00:04:58 -07:00
Daniel Han
cb875a5b45 Update cross_entropy_loss.py 2024-09-25 00:01:56 -07:00
Daniel Han
507a490355 Update cross_entropy_loss.py 2024-09-25 00:00:51 -07:00
Daniel Han
e56ce736ff Update layernorm.py 2024-09-24 23:59:09 -07:00
Daniel Han
5bb6b3bbd2 Update cross_entropy_loss.py 2024-09-24 23:57:22 -07:00
Daniel Han
45554fb66d Update cross_entropy_loss.py 2024-09-24 23:55:25 -07:00
Daniel Han
6f39ef47c0 Update cross_entropy_loss.py 2024-09-24 23:54:14 -07:00
Daniel Han
7f74268e01 Update cross_entropy_loss.py 2024-09-24 23:52:54 -07:00
Daniel Han
85556bc385 Causal LM 2024-09-24 23:49:57 -07:00
Daniel Han
634489002d Update rms_layernorm.py 2024-09-24 23:13:51 -07:00
Daniel Han
fc4ca43ee1 RMS Layernorm 2024-09-24 22:50:33 -07:00
Daniel Han
d23fd17447 Update layernorm.py 2024-09-24 17:24:39 -07:00
Daniel Han
5103781c3f Patch layernorm 2024-09-24 17:22:20 -07:00
Daniel Han
9079e3c6e4 Update layernorm.py 2024-09-24 17:03:19 -07:00
Daniel Han
6801979ef3 Update layernorm.py 2024-09-24 17:00:23 -07:00
Daniel Han
4f7e68a3fe Update layernorm.py 2024-09-24 16:54:52 -07:00
Daniel Han
9b2c5d4814 Update layernorm.py 2024-09-24 16:45:50 -07:00
Daniel Han
55d41d72a0 Update layernorm.py 2024-09-24 16:44:33 -07:00
Daniel Han
873f245009 Update layernorm.py 2024-09-24 16:29:54 -07:00
Daniel Han
88269a26a8 Layernorm 2024-09-24 02:49:48 -07:00
Daniel Han
b41c182296 Update _utils.py 2024-09-23 10:56:55 -07:00
Daniel Han
9c26f9d3bb Update README.md 2024-09-23 01:36:50 -07:00
Daniel Han
45ca9501a4 Qwen 2.5 2024-09-23 01:27:12 -07:00
Daniel Han
6e387d8ff8 Update chat_templates.py 2024-09-23 01:07:06 -07:00
Daniel Han
388d5149a9 Upgrade Ollama presets 2024-09-23 01:02:24 -07:00
Daniel Han
a08812cc52 Update chat_templates.py 2024-09-23 00:29:21 -07:00
Daniel Han
1b8ef43c14 Update tokenizer_utils.py 2024-09-23 00:04:33 -07:00
Daniel Han
96fd381293 Update tokenizer_utils.py 2024-09-22 23:00:24 -07:00
Daniel Han
c1c37c49a6 Update tokenizer_utils.py 2024-09-22 22:48:35 -07:00
Daniel Han
7e2654ab7a Update mapper.py 2024-09-22 22:13:59 -07:00
Daniel Han
f216cdc289 Update _utils.py 2024-09-22 02:38:28 -07:00
Nazim Ali
0904f7395d fix: chat_templates.py bug (#1048)
* fix: chat_template bug

* fix: check trainer attribute values are not None
2024-09-22 01:18:37 -07:00
Daniel Han
cf3e072867 Update chat_templates.py 2024-09-21 01:56:17 -07:00
Daniel Han
c7c674472f Merge branch 'nightly' 2024-09-18 14:30:33 -07:00
Daniel Han
9fd82c95aa Update mapper.py 2024-09-18 14:23:22 -07:00
Daniel Han
1d4ae059c5 Update README.md (#1036) 2024-09-18 13:23:45 -07:00
Daniel Han
6d8b4c53a5 Update llama.py 2024-09-17 17:38:12 -07:00
Daniel Han
3289975025 Update mapper.py 2024-09-17 10:50:57 -07:00
Daniel Han
563432635a Update mapper.py 2024-09-15 21:50:00 -07:00
Daniel Han
89ada9ef0b Update _utils.py 2024-09-15 18:04:18 -07:00
Daniel Han
c5d7bb591d Update README.md (#1033) 2024-09-15 17:42:09 -07:00
Daniel Han
d27d992f58 Update utils.py 2024-09-08 19:47:21 -07:00
Daniel Han
0ad16b7c1f Update __init__.py 2024-09-08 15:51:27 -07:00
Daniel Han
ffb6aa905f Update README.md 2024-09-08 14:30:54 -07:00
Daniel Han
1bba6954f1 Update README.md 2024-09-08 12:29:31 -07:00
Daniel Han
74c2141bb3 Bug fixes (#1004)
* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* update token retrieval logic (#952)

* Fix DPO (#947)

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* update hf token retrieval logic

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* get_token

* Update README.md

* Update gemma2.py

* Update rms_layernorm.py

* synchronize

* Update gemma2.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* layernorm

* Update rms_layernorm.py

* Update gemma2.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* revert

* Gemma

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma2.py

* Change UnslothTrainingArguments base class to SFTConfig (#979)

* Cohere

* Update trainer.py

* Cohere

* Cohere

* New models

* Update llama.py

* Update llama.py

* Update cohere.py

* Update llama.py

* Update cohere.py

* retry

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* _apply_lora_mlp

* Update _utils.py

* Gemma fixes

* Update llama.py

* Update flex_attention.py

* Update llama.py

* layernorm

* Update llama.py

* Update llama.py

* Flex Attention

* Update gemma2.py

* Update __init__.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update chat_templates.py (#999)

fix all misspelled "unsued" to "unused"

* Update key from "from" to "user" (#1000)

When use [tokenizer.apply_chat_template](https://huggingface.co/docs/transformers/main/en/chat_templating), the key should be "role" rather than "from", this is liknk to [this issue](https://github.com/unslothai/unsloth/issues/994)

I don't know it is suitable for all situation, I also can add a dedicated parameter of the key if you think it is better.

* Update chat_templates.py

* Also patch the KTO trainer (#1001)

* flex attention

* Update llama.py

* Update flex_attention.py

* Update flex_attention.py

* Update _utils.py

* Update _utils.py

* Update flex_attention.py

* Update gemma2.py

* Update gemma2.py

---------

Co-authored-by: Hafedh <70411813+not-lain@users.noreply.github.com>
Co-authored-by: Tuan Pham <82665400+vTuanpham@users.noreply.github.com>
Co-authored-by: Yihao Wang <42559837+AgainstEntropy@users.noreply.github.com>
Co-authored-by: Peng <zphu1024@gmail.com>
Co-authored-by: Kyle Corbitt <kyle@openpipe.ai>
2024-09-08 03:16:09 -07:00
Daniel Han
658e162032 Bug fixes 2024-09-04 00:28:53 -07:00
Daniel Han
a8490a2a8a Fix bug 2024-09-03 17:30:40 -07:00
Daniel Han
1b81cf1859 Gemma faster inference (#987)
* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* update token retrieval logic (#952)

* Fix DPO (#947)

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* update hf token retrieval logic

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* get_token

* Update README.md

* Update gemma2.py

* Update rms_layernorm.py

* synchronize

* Update gemma2.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* layernorm

* Update rms_layernorm.py

* Update gemma2.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* revert

* Gemma

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma2.py

* Change UnslothTrainingArguments base class to SFTConfig (#979)

* Cohere

* Update trainer.py

* Cohere

* Cohere

* New models

* Update llama.py

* Update llama.py

* Update cohere.py

* Update llama.py

* Update cohere.py

* retry

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* _apply_lora_mlp

* Update _utils.py

* Gemma fixes

* Update llama.py

* Update flex_attention.py

---------

Co-authored-by: Hafedh <70411813+not-lain@users.noreply.github.com>
Co-authored-by: Tuan Pham <82665400+vTuanpham@users.noreply.github.com>
2024-09-03 13:52:12 -07:00
Daniel Han
7c3d1091ba Cohere, Bug fixes (#984)
* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* update token retrieval logic (#952)

* Fix DPO (#947)

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* update hf token retrieval logic

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* get_token

* Update README.md

* Update gemma2.py

* Update rms_layernorm.py

* synchronize

* Update gemma2.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* layernorm

* Update rms_layernorm.py

* Update gemma2.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* revert

* Gemma

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma2.py

* Change UnslothTrainingArguments base class to SFTConfig (#979)

* Cohere

* Update trainer.py

* Cohere

* Cohere

* New models

* Update llama.py

* Update llama.py

* Update cohere.py

* Update llama.py

* Update cohere.py

* retry

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* _apply_lora_mlp

* Update _utils.py

---------

Co-authored-by: Hafedh <70411813+not-lain@users.noreply.github.com>
Co-authored-by: Tuan Pham <82665400+vTuanpham@users.noreply.github.com>
2024-09-03 01:52:32 -07:00
Daniel Han
b140367a87 Update save.py 2024-08-27 00:08:39 -07:00
Daniel Han
9f36b83db0 Update gemma2.py 2024-08-23 23:43:57 -07:00
Daniel Han
353991f14a Phi 3.5 bug fix (#955)
* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* update token retrieval logic (#952)

* Fix DPO (#947)

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* update hf token retrieval logic

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* get_token

* Update README.md

---------

Co-authored-by: Hafedh <70411813+not-lain@users.noreply.github.com>
2024-08-23 17:38:24 -07:00
Daniel Han
199766c644 Fix DPO (#947)
* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py
2024-08-22 02:18:03 -07:00
Daniel Han
cadff4f883 Update README.md (#941)
Co-authored-by: Michael <107991372+shimmyshimmer@users.noreply.github.com>
2024-08-20 17:59:50 -07:00
Daniel Han
8e61906e6f Update chat_templates.py 2024-08-20 16:54:11 -07:00
Daniel Han
fb60340a90 Phi 3.5 (#940)
* LongRoPE

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mapper.py

* Phi 3.5
2024-08-20 16:51:39 -07:00
Daniel Han
0927c34392 Update README.md (#938) 2024-08-19 17:18:30 -07:00
Daniel Han
861f232047 Merge branch 'main' into nightly 2024-08-19 17:13:12 -07:00
Daniel Han
92a3aec9e9 Update _auto_install.py 2024-08-19 17:12:46 -07:00
Daniel Han
4450110756 Create _auto_install.py 2024-08-19 17:12:32 -07:00
Daniel Han
bb9539cc82 Fix NEFTune (#937)
* untrained tokens llama 3.1 base

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Bug fixes

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update llama.py

* Update llama.py
2024-08-19 16:17:52 -07:00
Daniel Han
3b4ce17bc9 Merge branch 'main' into nightly 2024-08-19 16:17:00 -07:00
Daniel Han
91bdf27729 Update llama.py 2024-08-19 16:14:01 -07:00
Daniel Han
ce1863d91f Update llama.py 2024-08-19 16:11:09 -07:00
Daniel Han
dcf7e6e952 Update llama.py 2024-08-19 16:08:14 -07:00
Daniel Han
4f51fe0a8c Update tokenizer_utils.py 2024-08-19 16:03:24 -07:00
Daniel Han
b9f71049a4 Update tokenizer_utils.py 2024-08-19 15:52:13 -07:00
Daniel Han
1387a4a23f Update tokenizer_utils.py 2024-08-19 15:50:16 -07:00
Daniel Han
7b35c195c0 Update llama.py 2024-08-19 15:08:53 -07:00
Daniel Han
c16e95decd Bug fixes 2024-08-19 15:04:25 -07:00
Daniel Han
23752a7ab1 Bug #930 (#931)
* untrained tokens llama 3.1 base

* Update tokenizer_utils.py

* Update tokenizer_utils.py
2024-08-16 23:39:44 -07:00
Daniel Han
a8f9f177b3 Update tokenizer_utils.py 2024-08-16 23:38:43 -07:00
Daniel Han
733075c5cd Update tokenizer_utils.py 2024-08-16 23:38:02 -07:00
Daniel Han
a0fa23e66c Merge branch 'main' into nightly 2024-08-16 23:37:16 -07:00
Daniel Han
a3eee645f1 untrained tokens llama 3.1 base (#929) 2024-08-16 19:57:19 -07:00
Daniel Han
bd60ad7d1c untrained tokens llama 3.1 base 2024-08-16 19:28:43 -07:00
Daniel Han
a8bed84683 Update __init__.py 2024-08-15 15:07:42 -07:00
Daniel Han
cdd961d2c1 Bug fixes 2024-08-15 15:04:46 -07:00
Daniel Han
c90bb8dc32 Fix mapping (#921)
* Update pyproject.toml

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* fix_tokenizer

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update pyproject.toml

* Update _utils.py

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* gemma 2 mask

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Torch 2.4 Xformers 0.0.27post2

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Gemma 2 fixes

* Update gemma2.py

* Update llama.py

* Update llama.py

* Update save.py

* Update save.py

* Update llama.py

* Update cross_entropy_loss.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Providing more flexibility for users to customize their llama when using LoRA (#910)

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

* return model

* Update tokenizer_utils.py

* Update chat_templates.py

* Update tokenizer_utils.py

* Train on completions

* load_in_4bit=False broken

* Update llama.py

* MAP_TO_UNSLOTH_16bit

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update mapper.py

* works!

---------

Co-authored-by: Po-Lung Wang <Brownwang0426@gmail.com>
2024-08-15 01:15:35 -07:00
Daniel Han
1091f03d77 Bug Fixes (#920)
* Update pyproject.toml

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* fix_tokenizer

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update pyproject.toml

* Update _utils.py

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* gemma 2 mask

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Torch 2.4 Xformers 0.0.27post2

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Gemma 2 fixes

* Update gemma2.py

* Update llama.py

* Update llama.py

* Update save.py

* Update save.py

* Update llama.py

* Update cross_entropy_loss.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Providing more flexibility for users to customize their llama when using LoRA (#910)

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

* return model

* Update tokenizer_utils.py

* Update chat_templates.py

* Update tokenizer_utils.py

* Train on completions

* load_in_4bit=False broken

---------

Co-authored-by: Po-Lung Wang <Brownwang0426@gmail.com>
2024-08-15 00:31:30 -07:00
Daniel Han
56dbb23135 Fix chat templates (#917)
* Update pyproject.toml

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* fix_tokenizer

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update pyproject.toml

* Update _utils.py

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* gemma 2 mask

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Torch 2.4 Xformers 0.0.27post2

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Gemma 2 fixes

* Update gemma2.py

* Update llama.py

* Update llama.py

* Update save.py

* Update save.py

* Update llama.py

* Update cross_entropy_loss.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Providing more flexibility for users to customize their llama when using LoRA (#910)

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

* return model

* Update tokenizer_utils.py

* Update chat_templates.py

* Update tokenizer_utils.py

* Train on completions

---------

Co-authored-by: Po-Lung Wang <Brownwang0426@gmail.com>
2024-08-14 00:58:02 -07:00
Daniel Han
ec413e63e0 Fix Chat Templates (#916)
* Update pyproject.toml

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* fix_tokenizer

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update pyproject.toml

* Update _utils.py

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* gemma 2 mask

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Torch 2.4 Xformers 0.0.27post2

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Gemma 2 fixes

* Update gemma2.py

* Update llama.py

* Update llama.py

* Update save.py

* Update save.py

* Update llama.py

* Update cross_entropy_loss.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Providing more flexibility for users to customize their llama when using LoRA (#910)

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

* return model

* Update tokenizer_utils.py

* Update chat_templates.py

* Update tokenizer_utils.py

---------

Co-authored-by: Po-Lung Wang <Brownwang0426@gmail.com>
2024-08-13 17:54:02 -07:00
Daniel Han
1204107724 Fix DPO stats (#906)
* Update pyproject.toml

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* fix_tokenizer

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update pyproject.toml

* Update _utils.py

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* gemma 2 mask

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Torch 2.4 Xformers 0.0.27post2

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Gemma 2 fixes

* Update gemma2.py

* Update llama.py

* Update llama.py

* Update save.py

* Update save.py

* Update llama.py

* Update cross_entropy_loss.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py

* Update dpo.py
2024-08-11 18:26:20 -07:00
Daniel Han
1397f2e1ab Torch 2.4, Xformers>0.0.27, TRL>0.9, Python 3.12 + bug fixes (#902)
* Update pyproject.toml

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* fix_tokenizer

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update pyproject.toml

* Update _utils.py

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* gemma 2 mask

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Torch 2.4 Xformers 0.0.27post2

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Gemma 2 fixes

* Update gemma2.py

* Update llama.py

* Update llama.py

* Update save.py

* Update save.py
2024-08-10 19:59:40 -07:00
Daniel Han
064ff70bc5 Update _utils.py 2024-08-07 10:48:39 -07:00
Daniel Han
0ea0802d47 Update _utils.py 2024-08-07 10:47:11 -07:00
Daniel Han
d60422fb68 Update _utils.py 2024-08-07 01:11:06 -07:00
Daniel Han
734e478605 Fix tokenizers (#887)
* Update pyproject.toml

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* fix_tokenizer

* Update tokenizer_utils.py

* Update tokenizer_utils.py
2024-08-06 20:24:44 -07:00
Daniel Han
9be6480ec5 Update README.md 2024-08-05 00:00:53 -07:00
Daniel Han
ba87b3dd31 Update README.md 2024-08-04 23:59:57 -07:00
Daniel Han
d9e330ded7 Update README.md 2024-08-04 23:50:40 -07:00
Daniel Han
7f9c0d8c20 Update llama.py 2024-08-04 23:49:35 -07:00
emuchogu
fe4b9da764 pascal support (#870)
Co-authored-by: Edward Muchogu <muchogu@gmail.com>
2024-08-04 23:45:51 -07:00
moontidef
8ee7c42a32 fix: fix config.torch_dtype bug (#874)
fix the bug #404 
and the bug https://github.com/hiyouga/LLaMA-Factory/issues/4698#issue-2393500878
2024-08-04 23:45:34 -07:00
Daniel Han
9283909b6d Update pyproject.toml 2024-08-04 11:28:21 -07:00
Daniel Han
8633d860d5 Merge branch 'main' into nightly 2024-08-01 18:19:38 -07:00
Daniel Han
c069555926 Fix RoPE extension (#846)
* bugs

* Update _utils.py

* flash-attn softcapping

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update mapper.py

* Update README.md

* Update _utils.py

* Fix ROPE extension issue and device mismatch (#840)

* When an exception has been assigned using as target, it is cleared at the end of the except clause.(https://docs.python.org/3/reference/compound_stmts.html#the-try-statement)

* Update loader.py

* round up to extend rope size

* inv_freq.device changed, make sure they are on the same device

---------

Co-authored-by: xiaoyang <xiaoyang@youzan.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update gemma.py

---------

Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: xiaoyang <xiaoyang@youzan.com>
2024-07-31 12:10:33 -07:00
Daniel Han
9c29d37e6a Update gemma.py 2024-07-31 12:09:33 -07:00
Daniel Han
4e3fde6539 Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly 2024-07-31 12:05:28 -07:00
XiaoYang
32735460a8 Fix ROPE extension issue and device mismatch (#840)
* When an exception has been assigned using as target, it is cleared at the end of the except clause.(https://docs.python.org/3/reference/compound_stmts.html#the-try-statement)

* Update loader.py

* round up to extend rope size

* inv_freq.device changed, make sure they are on the same device

---------

Co-authored-by: xiaoyang <xiaoyang@youzan.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2024-07-31 12:05:08 -07:00
Daniel Han
64d8a32358 Merge branch 'main' into nightly 2024-07-31 12:05:01 -07:00
Daniel Han
2521a8b39f Update README.md 2024-07-31 09:50:11 -07:00
Daniel Han
4e03b77673 Gemma (#843)
* bugs

* Update _utils.py

* flash-attn softcapping

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update mapper.py

* Update README.md

* Update _utils.py
2024-07-31 08:54:58 -07:00
Daniel Han
0989eb3265 Update _utils.py 2024-07-31 08:54:22 -07:00
Daniel Han
8c58eb3901 Update README.md 2024-07-31 08:53:20 -07:00
Daniel Han
2b49611392 Update mapper.py 2024-07-30 23:51:47 -07:00
Daniel Han
5f2990df0d Update gemma2.py 2024-07-30 23:11:35 -07:00
Daniel Han
a82d18d41b Update gemma2.py 2024-07-30 23:11:23 -07:00
Daniel Han
5c403cd0db Update gemma2.py 2024-07-30 23:07:47 -07:00
Daniel Han
04a9f4da74 Update gemma2.py 2024-07-30 23:04:00 -07:00
Daniel Han
282d8a5794 flash-attn softcapping 2024-07-30 22:48:53 -07:00
Daniel Han
b7bdb18552 Update _utils.py 2024-07-30 19:57:53 -07:00
Daniel Han
c0abd06a06 bugs 2024-07-30 19:56:36 -07:00
Daniel Han
9953839b92 Update llama.py 2024-07-30 10:29:54 -07:00
Daniel Han
34838216cc Update loader.py 2024-07-30 10:18:51 -07:00
XiaoYang
fd904b4e97 fix UnboundLocalError (#834)
* When an exception has been assigned using as target, it is cleared at the end of the except clause.(https://docs.python.org/3/reference/compound_stmts.html#the-try-statement)

* Update loader.py

---------

Co-authored-by: xiaoyang <xiaoyang@youzan.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2024-07-30 10:15:09 -07:00
Daniel Han
7c68bab3d9 Merge branch 'main' into nightly 2024-07-30 10:01:11 -07:00
Daniel Han
1cb2412d85 Better debugging (#826)
* Update __init__.py

* Edits

* Checks

* Update _utils.py

* Update _utils.py

* Update loader.py

* Update _utils.py

* Update mapper.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update loader.py
2024-07-28 00:10:02 -07:00
Daniel Han
7a096afec9 Update loader.py 2024-07-27 23:04:23 -07:00
Daniel Han
2b63faeff7 Update mapper.py 2024-07-27 23:03:29 -07:00
Daniel Han
8788703ae3 Update loader.py 2024-07-27 23:03:13 -07:00
Daniel Han
f679e03a06 Update loader.py 2024-07-27 23:02:10 -07:00
Daniel Han
f1f1f7a8c9 Update loader.py 2024-07-27 23:00:47 -07:00
Daniel Han
6195a308af Update loader.py 2024-07-27 22:59:47 -07:00
Daniel Han
961e78fb5e Update loader.py 2024-07-27 22:59:16 -07:00
Daniel Han
1bbf268de1 Update loader.py 2024-07-27 22:58:15 -07:00
Daniel Han
fc5d565e35 Update _utils.py 2024-07-27 22:56:59 -07:00
Daniel Han
e34d92635d Update loader.py 2024-07-27 22:54:49 -07:00
Daniel Han
25025801c0 Update loader.py 2024-07-27 22:52:06 -07:00
Daniel Han
031f552743 Update mapper.py 2024-07-27 22:50:38 -07:00
Daniel Han
971ab6485a Update _utils.py 2024-07-27 22:48:52 -07:00
Daniel Han
28b3c211e4 Update loader.py 2024-07-27 22:35:24 -07:00
Daniel Han
61aba43554 Update _utils.py 2024-07-27 22:32:39 -07:00
Daniel Han
c53b4394dc Update _utils.py 2024-07-27 22:30:48 -07:00
Daniel Han
e6159e0279 Checks 2024-07-27 22:16:33 -07:00
Daniel Han
d28527ce62 Edits 2024-07-27 20:30:32 -07:00
Daniel Han
e0748c93dd Update __init__.py 2024-07-26 16:31:40 -07:00
Daniel Han
9852fdb642 Update llama.py 2024-07-25 08:53:21 -07:00
Daniel Han
393549fa48 Update _utils.py 2024-07-25 00:33:38 -07:00
Daniel Han
1ab912d528 Update _utils.py 2024-07-25 00:29:12 -07:00
Daniel Han
dc39485ec3 Update loader.py 2024-07-25 00:28:20 -07:00
Daniel Han
ef42e61d68 Update llama.py 2024-07-25 00:19:32 -07:00
Daniel Han
f20bc23e84 Fix PEFT 2024-07-25 00:17:19 -07:00
Daniel Han
fcf92c54ca Patch PEFT 2024-07-24 23:45:39 -07:00
Daniel Han
e9ab1ad5bc Mistral 2024-07-24 14:05:31 -07:00
Daniel Han
41bad372af Merge branch 'main' into nightly 2024-07-24 12:37:12 -07:00
Daniel Han
27b23a9bd4 Update README.md 2024-07-23 15:08:09 -07:00
Daniel Han
8ca886825c Create Run.png 2024-07-23 13:14:21 -07:00
Daniel Han
2217e8d86c Update llama.py 2024-07-23 12:28:12 -07:00
Daniel Han
92b3752cad Update llama.py 2024-07-23 12:27:46 -07:00
Daniel Han
affa585b3f Update _utils.py 2024-07-23 12:25:24 -07:00
Daniel Han
da93a2237a Update loader.py 2024-07-23 12:12:29 -07:00
Daniel Han
dd781e0c60 Update README.md 2024-07-23 12:07:27 -07:00
Daniel Han
faa36e853a Update README.md 2024-07-23 11:51:08 -07:00
Daniel Han
56cbd06f1f Llama 3.1 (#797)
* Llama 3.1

* Update _utils.py

* Llama 3.1

* Update _utils.py

* Update llama.py

* Update llama.py

* hack for rotary

* patch RoPE

* refix rope

* Update _utils.py

* Update llama.py

* Llama 3.1 check

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py
2024-07-23 11:40:49 -07:00
Daniel Han
9d415882bf Update llama.py 2024-07-23 11:24:58 -07:00
Daniel Han
eb03e4111a Update llama.py 2024-07-23 11:23:18 -07:00
Daniel Han
7fb8015c88 Update llama.py 2024-07-23 11:23:00 -07:00
Daniel Han
565d2ce6f1 Update llama.py 2024-07-23 11:22:27 -07:00
Daniel Han
a654779617 Update llama.py 2024-07-23 11:21:29 -07:00
Daniel Han
fdce7bff90 Update llama.py 2024-07-23 11:18:12 -07:00
Daniel Han
04096dcff8 Update llama.py 2024-07-23 11:16:40 -07:00
Daniel Han
e89ab65b02 Update llama.py 2024-07-23 11:16:31 -07:00
Daniel Han
122025ed58 Update llama.py 2024-07-23 11:15:35 -07:00
Daniel Han
3b266ebc7b Update llama.py 2024-07-23 11:13:15 -07:00
Daniel Han
e2ef589460 Update llama.py 2024-07-23 11:12:58 -07:00
Daniel Han
a2403852b4 Llama 3.1 check 2024-07-23 11:09:24 -07:00
Daniel Han
a9d6c731ee Update llama.py 2024-07-23 10:58:31 -07:00
Daniel Han
4eca9215a2 Update _utils.py 2024-07-23 10:54:54 -07:00
Daniel Han
c4dc08309e refix rope 2024-07-23 10:53:31 -07:00
Daniel Han
f285c33046 patch RoPE 2024-07-23 10:48:45 -07:00
Daniel Han
d587ce218d hack for rotary 2024-07-23 10:43:36 -07:00
Daniel Han
c17e8ca33d Update llama.py 2024-07-23 10:36:06 -07:00
Daniel Han
9a18fee63f Update llama.py 2024-07-23 10:35:03 -07:00
Daniel Han
19cf853157 Update _utils.py 2024-07-23 10:33:07 -07:00
Daniel Han
5efbd701ba Llama 3.1 2024-07-23 10:27:36 -07:00
Daniel Han
daa4d13564 Update _utils.py 2024-07-22 23:01:18 -07:00
Daniel Han
eda2343056 Llama 3.1 2024-07-22 22:58:02 -07:00
Daniel Han
0690914c62 Update tokenizer_utils.py 2024-07-20 13:25:59 -07:00
Daniel Han
71c4aed1be Update tokenizer_utils.py 2024-07-20 13:22:36 -07:00
Daniel Han
50c51e6ec1 Update llama.py 2024-07-20 12:47:36 -07:00
Daniel Han
07228828a0 Update llama.py 2024-07-20 11:53:32 -07:00
Daniel Han
11dcf38761 Merge branch 'main' into nightly 2024-07-20 09:47:22 -07:00
Daniel Han
32466f7bc4 Update mistral.py 2024-07-19 09:32:27 -07:00
Daniel Han
bffc936663 Fix Gemma 2024-07-19 09:27:18 -07:00
Daniel Han
256b55fcdd Update README.md 2024-07-19 03:05:15 -07:00
Daniel Han
b8e6560b8d Update README.md 2024-07-19 03:03:50 -07:00
Daniel Han
100ac9c052 Nightly (#784)
* Update __init__.py

* dynamic RoPE

* Update mistral.py

* Update llama.py

* Update tokenizer_utils.py

* Update mistral.py

* Update llama.py

* Update __init__.py

* Update flex_attention.py

* Update llama.py

* Update llama.py

* Mistral Nemo

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py
2024-07-19 01:39:08 -07:00
Daniel Han
e6002b1b32 Merge branch 'main' into nightly 2024-07-19 01:38:47 -07:00
Daniel Han
8ae30938d3 Update tokenizer_utils.py 2024-07-19 01:29:52 -07:00
Daniel Han
f6d47c99df Nightly (#783)
* Update __init__.py

* dynamic RoPE

* Update mistral.py

* Update llama.py

* Update tokenizer_utils.py

* Update mistral.py

* Update llama.py

* Update __init__.py

* Update flex_attention.py

* Update llama.py

* Update llama.py

* Mistral Nemo

* Update tokenizer_utils.py

* Update tokenizer_utils.py
2024-07-19 01:24:46 -07:00
Daniel Han
f302c074b7 Update tokenizer_utils.py 2024-07-19 01:06:41 -07:00
Daniel Han
14144ad6fc Update tokenizer_utils.py 2024-07-19 01:03:51 -07:00
Daniel Han
881ee0ed37 Merge branch 'main' into nightly 2024-07-19 00:59:48 -07:00
Daniel Han
8783596962 Update tokenizer_utils.py 2024-07-19 00:41:35 -07:00
Daniel Han
47e08076e6 Mistral Nemo (#782)
* Update __init__.py

* dynamic RoPE

* Update mistral.py

* Update llama.py

* Update tokenizer_utils.py

* Update mistral.py

* Update llama.py

* Update __init__.py

* Update flex_attention.py

* Update llama.py

* Update llama.py

* Mistral Nemo
2024-07-19 00:14:24 -07:00
Daniel Han
2f9556c428 Mistral Nemo 2024-07-18 22:53:23 -07:00
Daniel Han
187157f548 Update llama.py 2024-07-18 22:07:23 -07:00
Daniel Han
ebbbf6be52 Update llama.py 2024-07-18 21:57:33 -07:00
Daniel Han
e1dc32c2a6 Merge branch 'main' into nightly 2024-07-18 21:55:10 -07:00
Daniel Han
742a7629c2 Fix bugs (#779)
* Update __init__.py

* dynamic RoPE

* Update mistral.py

* Update llama.py

* Update tokenizer_utils.py

* Update mistral.py

* Update llama.py

* Update __init__.py

* Update flex_attention.py
2024-07-18 18:19:24 -07:00
Daniel Han
0da004c70e Update flex_attention.py 2024-07-18 18:18:09 -07:00
Daniel Han
d4fa9a0cdf Update __init__.py 2024-07-18 14:43:03 -07:00
Daniel Han
765a7a9330 Update llama.py 2024-07-18 14:31:35 -07:00
Daniel Han
125b3727ff Update mistral.py 2024-07-18 13:33:13 -07:00
Daniel Han
fcac73786c Update tokenizer_utils.py 2024-07-18 13:25:30 -07:00
Daniel Han
1144bbb15c Update llama.py 2024-07-18 12:33:22 -07:00
Daniel Han
72d9e5f5a0 Update mistral.py 2024-07-18 12:08:49 -07:00
Daniel Han
54dd81de67 dynamic RoPE 2024-07-18 12:06:38 -07:00
Daniel Han
5dc52e6b2e Update __init__.py 2024-07-18 11:07:32 -07:00
Daniel Han
66ce2d401a Update pyproject.toml 2024-07-18 10:59:09 -07:00
Daniel Han
1a7c3e1b3c Update __init__.py 2024-07-18 10:58:12 -07:00
Daniel Han
6a437e43f5 Mistral Nemo 12b (#777)
* Update gemma2.py

* Update llama.py

* Update llama.py

* Update gemma2.py

* init

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* All RoPE Scaling support

* cleanup

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* exec

* exec

* Attention_Module

* attention_module

* imports

* exec

* Update llama.py

* Update llama.py

* boolean mask

* revert masking

* Update llama.py

* Update save.py

* Update llama.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update utils.py

* retry

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* Update _utils.py

* Update gemma2.py

* Update chat_templates.py

* Gemma 2 Ollama support

* Update llama.py

* Update llama.py

* error handling

* Update _utils.py

* Update _utils.py

* Stats for debugging

* Update _utils.py

* Update _utils.py

* Debugging

* Update tokenizer_utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Check exec, eval

* Update _utils.py

* Update _utils.py

* Images

* Bug fixes

* Update pyproject.toml

* Bug fixes

* Update _utils.py

* Update _utils.py

* Deprecation fix

* Update chat_templates.py

* Now permitting use of pre-installed llama.cpp (#763)

* Now permitting use of pre-installed llama.cpp

* Update save.py

---------

Co-authored-by: Giuseppe Strafforello <giuseppe.strafforello@titantechnologies.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Deprecation & compile

* typo

* Update chat_templates.py

* Update chat_templates.py

* train_on_responses_only

* Update llama.py

* Update llama.py

* Update save.py

* Update gemma2.py

* Flex Attention

* typos

* Update _utils.py

* Update llama.py

* Update __init__.py

* Update flex_attention.py

* Update llama.py

* Update llama.py

* emulation

* Update __init__.py

* Update rope_embedding.py

* Update flex_attention.py

* Update flex_attention.py

* Update rope_embedding.py

* libdevice

* triton_tanh

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* score

* Update llama.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update llama.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Flex Attention removal

* upload tensorboard training stats to hub if available (#773)

* causal_mask

* Update llama.py

* Update llama.py

* Update flex_attention.py

* Update _utils.py

* Update mapper.py

* Update _utils.py

---------

Co-authored-by: pepistrafforello <pepi.strafforello@gmail.com>
Co-authored-by: Giuseppe Strafforello <giuseppe.strafforello@titantechnologies.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
2024-07-18 10:51:10 -07:00
Daniel Han
fa893e7d67 Chat templates 2024-07-15 14:36:44 -07:00
Daniel Han
ca6c3dcc99 Train on responses only (#770)
* Update gemma2.py

* Update llama.py

* Update llama.py

* Update gemma2.py

* init

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* All RoPE Scaling support

* cleanup

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* exec

* exec

* Attention_Module

* attention_module

* imports

* exec

* Update llama.py

* Update llama.py

* boolean mask

* revert masking

* Update llama.py

* Update save.py

* Update llama.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update utils.py

* retry

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* Update _utils.py

* Update gemma2.py

* Update chat_templates.py

* Gemma 2 Ollama support

* Update llama.py

* Update llama.py

* error handling

* Update _utils.py

* Update _utils.py

* Stats for debugging

* Update _utils.py

* Update _utils.py

* Debugging

* Update tokenizer_utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Check exec, eval

* Update _utils.py

* Update _utils.py

* Images

* Bug fixes

* Update pyproject.toml

* Bug fixes

* Update _utils.py

* Update _utils.py

* Deprecation fix

* Update chat_templates.py

* Now permitting use of pre-installed llama.cpp (#763)

* Now permitting use of pre-installed llama.cpp

* Update save.py

---------

Co-authored-by: Giuseppe Strafforello <giuseppe.strafforello@titantechnologies.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Deprecation & compile

* typo

* Update chat_templates.py

* Update chat_templates.py

* train_on_responses_only

* Update llama.py

* Update llama.py

* Update save.py

* Update gemma2.py

---------

Co-authored-by: pepistrafforello <pepi.strafforello@gmail.com>
Co-authored-by: Giuseppe Strafforello <giuseppe.strafforello@titantechnologies.com>
2024-07-14 22:41:04 -07:00
Daniel Han
f176cbd36a Many bug fixes (#754)
* Update gemma2.py

* Update llama.py

* Update llama.py

* Update gemma2.py

* init

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* All RoPE Scaling support

* cleanup

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* exec

* exec

* Attention_Module

* attention_module

* imports

* exec

* Update llama.py

* Update llama.py

* boolean mask

* revert masking

* Update llama.py

* Update save.py

* Update llama.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update utils.py

* retry

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* Update _utils.py

* Update gemma2.py

* Update chat_templates.py

* Gemma 2 Ollama support

* Update llama.py

* Update llama.py

* error handling

* Update _utils.py

* Update _utils.py

* Stats for debugging

* Update _utils.py

* Update _utils.py

* Debugging

* Update tokenizer_utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Check exec, eval

* Update _utils.py

* Update _utils.py

* Images

* Bug fixes

* Update pyproject.toml

* Bug fixes

* Update _utils.py

* Update _utils.py
2024-07-10 01:59:06 -07:00
Daniel Han
316aaefdf2 Update llama.py 2024-07-08 10:44:19 -07:00
Daniel Han
2eb950872a Update llama.py 2024-07-08 10:38:49 -07:00
Daniel Han
1f1211fbd6 Update _utils.py 2024-07-08 10:01:20 -07:00
Daniel Han
55bf35be5d Update llama.py 2024-07-07 15:46:36 -07:00
Daniel Han
fcc2833767 Nightly (#744)
* Update gemma2.py

* Update llama.py

* Update llama.py

* Update gemma2.py

* init

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* All RoPE Scaling support

* cleanup

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* exec

* exec

* Attention_Module

* attention_module

* imports

* exec

* Update llama.py

* Update llama.py

* boolean mask

* revert masking

* Update llama.py

* Update save.py

* Update llama.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update utils.py

* retry

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* Update _utils.py

* Update gemma2.py

* Update chat_templates.py

* Gemma 2 Ollama support

* Update llama.py

* Update llama.py

* error handling

* Update _utils.py

* Update _utils.py

* Stats for debugging

* Update _utils.py

* Update _utils.py

* Debugging

* Update tokenizer_utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Check exec, eval

* Update _utils.py

* Update _utils.py
2024-07-07 10:22:59 -07:00
Daniel Han
5813e85bb9 Merge branch 'main' of https://github.com/unslothai/unsloth 2024-07-07 09:49:55 -07:00
Daniel Han
05393cc85b Update _utils.py 2024-07-07 09:49:45 -07:00
Daniel Han
775fb647d5 Fix exec, eval (#743)
* Update gemma2.py

* Update llama.py

* Update llama.py

* Update gemma2.py

* init

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* All RoPE Scaling support

* cleanup

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* exec

* exec

* Attention_Module

* attention_module

* imports

* exec

* Update llama.py

* Update llama.py

* boolean mask

* revert masking

* Update llama.py

* Update save.py

* Update llama.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update utils.py

* retry

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* Update _utils.py

* Update gemma2.py

* Update chat_templates.py

* Gemma 2 Ollama support

* Update llama.py

* Update llama.py

* error handling

* Update _utils.py

* Update _utils.py

* Stats for debugging

* Update _utils.py

* Update _utils.py

* Debugging

* Update tokenizer_utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Check exec, eval
2024-07-07 09:33:01 -07:00
Daniel Han
de99c84625 Update llama.py 2024-07-06 23:59:03 -07:00
Daniel Han
75df21a314 Debugging (#739)
* Update gemma2.py

* Update llama.py

* Update llama.py

* Update gemma2.py

* init

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* All RoPE Scaling support

* cleanup

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* exec

* exec

* Attention_Module

* attention_module

* imports

* exec

* Update llama.py

* Update llama.py

* boolean mask

* revert masking

* Update llama.py

* Update save.py

* Update llama.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update utils.py

* retry

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* Update _utils.py

* Update gemma2.py

* Update chat_templates.py

* Gemma 2 Ollama support

* Update llama.py

* Update llama.py

* error handling

* Update _utils.py

* Update _utils.py

* Stats for debugging

* Update _utils.py

* Update _utils.py

* Debugging

* Update tokenizer_utils.py

* Update _utils.py
2024-07-06 18:50:00 -07:00
Daniel Han
86c5675a67 Gemma 2 bug fixes + All RoPE Scaling Support (#736)
* Update gemma2.py

* Update llama.py

* Update llama.py

* Update gemma2.py

* init

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* All RoPE Scaling support

* cleanup

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* exec

* exec

* Attention_Module

* attention_module

* imports

* exec

* Update llama.py

* Update llama.py

* boolean mask

* revert masking

* Update llama.py

* Update save.py

* Update llama.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update utils.py

* retry

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* Update _utils.py

* Update gemma2.py

* Update chat_templates.py

* Gemma 2 Ollama support

* Update llama.py

* Update llama.py
2024-07-05 23:48:42 -07:00
Daniel Han
c1009008e3 Fix GGUF (#731)
* Update mapper.py

* Update Model Conversion Command in `save.py` to `convert_hf_to_gguf.py` (#730)

* Updated convert_hf_to_gguf.py call to align with changes in llama.cpp repository

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Typo Fix (#690)

---------

Co-authored-by: M. Ali Bayram <malibayram91@gmail.com>
Co-authored-by: johnpaulbin <johnpaulbin@gmail.com>
2024-07-04 13:26:57 -07:00
Daniel Han
2510a4abc4 Gemma2 (#723)
* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

---------

Co-authored-by: Michael <107991372+shimmyshimmer@users.noreply.github.com>
2024-07-03 12:12:21 -07:00
Daniel Han
cc4c5d7785 Gemma2 (#709)
* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

* Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (#651)

* Nightly (#649)

* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>

* Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving

* Implemented better list management and then forgot to actually call the new list variable, fixed

* Check type of given quantization method and return type error if not list or string

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>

* Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (#652)

This reverts commit 506cb68867296237e95bc53c32f1bfc9b1757960.

* Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (#653)

This reverts commit 2f48cc9af385579876fd45bd833169d1f1a2ea58.

* Update llama.py

* peft

* patch

* Update loader.py

* retrain

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* offload

* Update llama.py

* Create a starter script for command-line training to integrate in ML ops pipelines. (#623)

* Update chat_templates.py

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Ollama

* Update chat_templates.py

* ollama

* Update mapper.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Fixes

* clearer messages

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* log

* Update __init__.py

* Update llama.py

* Update __init__.py

* Create Merge.png

* Create ollama.png

* Gemma2

* Update llama.py

* Update loader.py

* Update pyproject.toml

* Update pyproject.toml

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Revert Gemma2

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update rms_layernorm.py

* Update gemma2.py

* logit softcapping

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update gemma2.py

* Update gemma2.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update llama.py

* Update gemma2.py

* Update llama.py

* Update llama.py

* Update gemma2.py

* Update gemma2.py

* Update llama.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update _utils.py

* Update _utils.py

* Update gemma2.py

* compile flags

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update gemma2.py

* Update gemma2.py

* fixes

* Update _utils.py

* Fix generation

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* pad token

* Update gemma2.py

* pad token

* Update _utils.py

* Update llama.py

* Update gemma2.py

* edit warning

* Update tokenizer_utils.py

---------

Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>
Co-authored-by: ArcadaLabs-Jason <52756218+ArcadaLabs-Jason@users.noreply.github.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2024-07-02 22:51:01 -07:00
Daniel Han
cfddc79bc8 Nightly (#676)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

* Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (#651)

* Nightly (#649)

* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>

* Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving

* Implemented better list management and then forgot to actually call the new list variable, fixed

* Check type of given quantization method and return type error if not list or string

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>

* Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (#652)

This reverts commit 506cb68867296237e95bc53c32f1bfc9b1757960.

* Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (#653)

This reverts commit 2f48cc9af385579876fd45bd833169d1f1a2ea58.

* Update llama.py

* peft

* patch

* Update loader.py

* retrain

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* offload

* Update llama.py

* Create a starter script for command-line training to integrate in ML ops pipelines. (#623)

* Update chat_templates.py

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Ollama

* Update chat_templates.py

* ollama

* Update mapper.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Fixes

* clearer messages

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* log

* Update __init__.py

* Update llama.py

* Update __init__.py

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>
Co-authored-by: ArcadaLabs-Jason <52756218+ArcadaLabs-Jason@users.noreply.github.com>
2024-06-21 15:32:26 +10:00
Daniel Han
1508654836 Nightly (#673)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

* Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (#651)

* Nightly (#649)

* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>

* Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving

* Implemented better list management and then forgot to actually call the new list variable, fixed

* Check type of given quantization method and return type error if not list or string

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>

* Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (#652)

This reverts commit 506cb68867296237e95bc53c32f1bfc9b1757960.

* Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (#653)

This reverts commit 2f48cc9af385579876fd45bd833169d1f1a2ea58.

* Update llama.py

* peft

* patch

* Update loader.py

* retrain

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* offload

* Update llama.py

* Create a starter script for command-line training to integrate in ML ops pipelines. (#623)

* Update chat_templates.py

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Ollama

* Update chat_templates.py

* ollama

* Update mapper.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Fixes

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>
Co-authored-by: ArcadaLabs-Jason <52756218+ArcadaLabs-Jason@users.noreply.github.com>
2024-06-21 00:28:52 +10:00
Daniel Han
c2066592aa Ollama (#671)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

* Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (#651)

* Nightly (#649)

* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>

* Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving

* Implemented better list management and then forgot to actually call the new list variable, fixed

* Check type of given quantization method and return type error if not list or string

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>

* Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (#652)

This reverts commit 506cb68867296237e95bc53c32f1bfc9b1757960.

* Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (#653)

This reverts commit 2f48cc9af385579876fd45bd833169d1f1a2ea58.

* Update llama.py

* peft

* patch

* Update loader.py

* retrain

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* offload

* Update llama.py

* Create a starter script for command-line training to integrate in ML ops pipelines. (#623)

* Update chat_templates.py

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Ollama

* Update chat_templates.py

* ollama

* Update mapper.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>
Co-authored-by: ArcadaLabs-Jason <52756218+ArcadaLabs-Jason@users.noreply.github.com>
2024-06-20 22:28:28 +10:00
Daniel Han-Chen
08d22b8853 Update chat_templates.py 2024-06-20 19:49:38 +10:00
Daniel Han-Chen
55a8016ae6 Update chat_templates.py 2024-06-20 19:45:02 +10:00
Daniel Han
0e6b31dd84 Ollama bug fixes (#667)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

* Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (#651)

* Nightly (#649)

* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>

* Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving

* Implemented better list management and then forgot to actually call the new list variable, fixed

* Check type of given quantization method and return type error if not list or string

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>

* Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (#652)

This reverts commit 506cb68867296237e95bc53c32f1bfc9b1757960.

* Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (#653)

This reverts commit 2f48cc9af385579876fd45bd833169d1f1a2ea58.

* Update llama.py

* peft

* patch

* Update loader.py

* retrain

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* offload

* Update llama.py

* Create a starter script for command-line training to integrate in ML ops pipelines. (#623)

* Update chat_templates.py

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Ollama

* Update chat_templates.py

* ollama

* Update mapper.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>
Co-authored-by: ArcadaLabs-Jason <52756218+ArcadaLabs-Jason@users.noreply.github.com>
2024-06-20 04:55:13 +10:00
Daniel Han
9a7f3baa15 Ollama (#665)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

* Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (#651)

* Nightly (#649)

* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>

* Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving

* Implemented better list management and then forgot to actually call the new list variable, fixed

* Check type of given quantization method and return type error if not list or string

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>

* Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (#652)

This reverts commit 506cb68867296237e95bc53c32f1bfc9b1757960.

* Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (#653)

This reverts commit 2f48cc9af385579876fd45bd833169d1f1a2ea58.

* Update llama.py

* peft

* patch

* Update loader.py

* retrain

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* offload

* Update llama.py

* Create a starter script for command-line training to integrate in ML ops pipelines. (#623)

* Update chat_templates.py

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>
Co-authored-by: ArcadaLabs-Jason <52756218+ArcadaLabs-Jason@users.noreply.github.com>
2024-06-19 04:53:26 +10:00
Daniel Han
34f65c1eaf Fix continuing LoRA finetuning (#656)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

* Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (#651)

* Nightly (#649)

* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>

* Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving

* Implemented better list management and then forgot to actually call the new list variable, fixed

* Check type of given quantization method and return type error if not list or string

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>

* Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (#652)

This reverts commit 506cb68867296237e95bc53c32f1bfc9b1757960.

* Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (#653)

This reverts commit 2f48cc9af385579876fd45bd833169d1f1a2ea58.

* Update llama.py

* peft

* patch

* Update loader.py

* retrain

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>
Co-authored-by: ArcadaLabs-Jason <52756218+ArcadaLabs-Jason@users.noreply.github.com>
2024-06-17 00:39:20 +10:00
Daniel Han
12d294bcb3 Fix GGUF (#654)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

* Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (#651)

* Nightly (#649)

* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>

* Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving

* Implemented better list management and then forgot to actually call the new list variable, fixed

* Check type of given quantization method and return type error if not list or string

* Update save.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>

* Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (#652)

This reverts commit 506cb68867296237e95bc53c32f1bfc9b1757960.

* Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (#653)

This reverts commit 2f48cc9af385579876fd45bd833169d1f1a2ea58.

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>
Co-authored-by: ArcadaLabs-Jason <52756218+ArcadaLabs-Jason@users.noreply.github.com>
2024-06-16 14:51:58 +10:00
Daniel Han
0df0509c28 Nightly (#649)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>
2024-06-16 04:32:21 +10:00
Daniel Han
ff6fee6785 Nightly (#648)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>
2024-06-16 03:39:00 +10:00
Daniel Han
7be0f03eb4 Nightly (#646)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>
2024-06-15 18:26:25 +10:00
Daniel Han
659889c5bc Fix segfaults (#641)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>
2024-06-15 00:52:33 +10:00
Daniel Han
a3fb597fe1 Qwen bug fixes (#639)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>
2024-06-14 20:59:45 +10:00
Daniel Han-Chen
dee170293f Update __init__.py 2024-06-14 16:00:26 +10:00
Daniel Han-Chen
a2ea54f62e Update tokenizer_utils.py 2024-06-14 03:24:28 +10:00
Daniel Han
c33afda563 Nightly (#632)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
2024-06-14 02:58:08 +10:00
Daniel Han
be0bba4fc8 Ollama Chat Templates (#582)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2024-06-13 05:04:54 +10:00
Daniel Han-Chen
dff5c4271b Update llama.py 2024-06-07 04:54:57 +10:00
Daniel Han-Chen
85fac9e038 Update llama.py 2024-06-07 04:25:33 +10:00
Daniel Han-Chen
a89f888dc9 Update utils.py 2024-06-07 03:47:44 +10:00
Daniel Han-Chen
5774e75a1e Qwen2 2024-06-07 02:53:17 +10:00
Daniel Han-Chen
0451f82140 Update pyproject.toml 2024-06-06 01:22:29 +10:00
Daniel Han-Chen
bddf7fd9e9 Update llama.py 2024-06-05 20:57:08 +10:00
Daniel Han-Chen
86bb9f50fb Update README.md 2024-06-05 06:15:27 +10:00
Daniel Han-Chen
a8ad3cef9f Update README.md 2024-06-05 06:14:11 +10:00
Daniel Han
669552b4bf Fix #563 (#564)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2024-05-31 00:41:44 +10:00
Daniel Han
b2d09de4d4 Fix Phi-3 (#556)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2024-05-29 14:30:26 +10:00
Daniel Han
5cff582ccf Nightly (#548)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2024-05-29 00:30:38 +10:00
Daniel Han-Chen
426949c3a8 Update tokenizer_utils.py 2024-05-24 11:21:29 +10:00
Z
ef3d513e76 Update _utils.py (#520)
Fixed a typo in the tokenizer fixer.
2024-05-24 11:10:06 +10:00
Daniel Han
7486340721 Phi-3, Llama-3 bug fixes (#519)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3
2024-05-24 06:36:10 +10:00
Daniel Han
bf4a1aefa3 Phi 3 Medium (#518)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py
2024-05-24 04:24:01 +10:00
Daniel Han-Chen
87176de87b Update README.md 2024-05-23 04:31:26 +10:00
Daniel Han
b90d9c42c6 Mistral v3 (#514)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3
2024-05-23 04:15:02 +10:00
Daniel Han
289b7fcca5 Fix is_bfloat16_supported missing (#510)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py
2024-05-22 20:40:57 +10:00
Daniel Han
72b19da6bd Nightly (#506)
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py
2024-05-22 04:45:57 +10:00
Daniel Han
dde6a0f0d3 Nightly (#483)
* peft issue

* Update save.py

* Update __init__.py

* Update pyproject.toml
2024-05-17 23:46:33 +10:00
Daniel Han-Chen
617804de0b Squashed commit of the following:
commit 23bd794c246e9c90c453c9f2ab41a21ac1e41b9d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri May 17 14:16:39 2024 +1000

    Update save.py

commit c12cc6c2b13333c4c6709e0ee88665b08c672887
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri May 17 04:14:05 2024 +1000

    peft issue
2024-05-17 14:18:52 +10:00
Daniel Han
63e175a77d peft issue (#480) 2024-05-17 04:18:18 +10:00
Daniel Han
995dbe5043 Fix generation (#472)
* Fix prompt

* Update chat_templates.py

* fix_untrained_tokens

* Update llama.py

* add tokens

* Update _utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* pad_token

* Update chat_templates.py

* Update chat_templates.py

* tokenizer

* Update save.py

* Update chat_templates.py

* Update chat_templates.py

* patch tokenizer padding

* Update tokenizer_utils.py

* Update save.py

* Fix: loading models with resized vocabulary (#377)

* new: vocab resize on load

* new: gitignore

* GGUF fix

* Readme (#390)

* Update README.md

* Update README.md

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Update README.md

* Delete .gitignore

* Phi-3

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Fix reserved tokens

* Update save.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update chat_templates.py

* Update save.py

* Update _utils.py

* Update chat_templates.py

* Adds dependencies and extras for torch 2.3.0 with new xformers versions (#415)

* Adds dependencies and extras for torch 2.3.0 with new xformers versions

* Add 2.3.0 section to readme

* Support Qwen2 (#428)

* support Qwen2

* support Qwen2

* Delete README.md

* Revert "Delete README.md"

This reverts commit 9dde82c35d446393946c3497ad5cf96a2b59197e.

* Update README.md

* Qwen2 == Mistral

* Update llama.py

* Update __init__.py

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update save.py

* Update _utils.py

* Update save.py

* Update save.py

* Update save.py

* test_hf_gguf_equivalence

* Update chat_templates.py

* Update chat_templates.py

* --pad-vocab

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Unspecified max_seq_length

* possible_pad_token

* Update tokenizer_utils.py

* past_key_values

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* _wrap_fast_inference

* Update llama.py

* Update llama.py

* flag

---------

Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nathan Azrak <42650258+nathan-az@users.noreply.github.com>
Co-authored-by: Yang JianXin <995462226@qq.com>
2024-05-16 15:09:42 +10:00
Daniel Han
3329eb6a2c Nightly (#461)
* Fix prompt

* Update chat_templates.py

* fix_untrained_tokens

* Update llama.py

* add tokens

* Update _utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* pad_token

* Update chat_templates.py

* Update chat_templates.py

* tokenizer

* Update save.py

* Update chat_templates.py

* Update chat_templates.py

* patch tokenizer padding

* Update tokenizer_utils.py

* Update save.py

* Fix: loading models with resized vocabulary (#377)

* new: vocab resize on load

* new: gitignore

* GGUF fix

* Readme (#390)

* Update README.md

* Update README.md

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Update README.md

* Delete .gitignore

* Phi-3

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Fix reserved tokens

* Update save.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update chat_templates.py

* Update save.py

* Update _utils.py

* Update chat_templates.py

* Adds dependencies and extras for torch 2.3.0 with new xformers versions (#415)

* Adds dependencies and extras for torch 2.3.0 with new xformers versions

* Add 2.3.0 section to readme

* Support Qwen2 (#428)

* support Qwen2

* support Qwen2

* Delete README.md

* Revert "Delete README.md"

This reverts commit 9dde82c35d446393946c3497ad5cf96a2b59197e.

* Update README.md

* Qwen2 == Mistral

* Update llama.py

* Update __init__.py

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update save.py

* Update _utils.py

* Update save.py

* Update save.py

* Update save.py

* test_hf_gguf_equivalence

* Update chat_templates.py

* Update chat_templates.py

* --pad-vocab

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Unspecified max_seq_length

* possible_pad_token

* Update tokenizer_utils.py

---------

Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nathan Azrak <42650258+nathan-az@users.noreply.github.com>
Co-authored-by: Yang JianXin <995462226@qq.com>
2024-05-14 04:51:23 +10:00
Daniel Han
9b4ed21ade May 2024 Prelim (#447)
* Fix prompt

* Update chat_templates.py

* fix_untrained_tokens

* Update llama.py

* add tokens

* Update _utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* pad_token

* Update chat_templates.py

* Update chat_templates.py

* tokenizer

* Update save.py

* Update chat_templates.py

* Update chat_templates.py

* patch tokenizer padding

* Update tokenizer_utils.py

* Update save.py

* Fix: loading models with resized vocabulary (#377)

* new: vocab resize on load

* new: gitignore

* GGUF fix

* Readme (#390)

* Update README.md

* Update README.md

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Update README.md

* Delete .gitignore

* Phi-3

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Fix reserved tokens

* Update save.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update chat_templates.py

* Update save.py

* Update _utils.py

* Update chat_templates.py

* Adds dependencies and extras for torch 2.3.0 with new xformers versions (#415)

* Adds dependencies and extras for torch 2.3.0 with new xformers versions

* Add 2.3.0 section to readme

* Support Qwen2 (#428)

* support Qwen2

* support Qwen2

* Delete README.md

* Revert "Delete README.md"

This reverts commit 9dde82c35d446393946c3497ad5cf96a2b59197e.

* Update README.md

* Qwen2 == Mistral

* Update llama.py

* Update __init__.py

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update save.py

* Update _utils.py

* Update save.py

* Update save.py

* Update save.py

* test_hf_gguf_equivalence

* Update chat_templates.py

* Update chat_templates.py

* --pad-vocab

* Update tokenizer_utils.py

---------

Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nathan Azrak <42650258+nathan-az@users.noreply.github.com>
Co-authored-by: Yang JianXin <995462226@qq.com>
2024-05-13 05:22:03 +10:00
Daniel Han
0a433c33ca llama-3 bug fixes (#429)
* Fix prompt

* Update chat_templates.py

* fix_untrained_tokens

* Update llama.py

* add tokens

* Update _utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* pad_token

* Update chat_templates.py

* Update chat_templates.py

* tokenizer

* Update save.py

* Update chat_templates.py

* Update chat_templates.py

* patch tokenizer padding

* Update tokenizer_utils.py

* Update save.py

* Fix: loading models with resized vocabulary (#377)

* new: vocab resize on load

* new: gitignore

* GGUF fix

* Readme (#390)

* Update README.md

* Update README.md

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Update README.md

* Delete .gitignore

* Phi-3

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Fix reserved tokens

* Update save.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update chat_templates.py

* Update save.py

* Update _utils.py

* Update chat_templates.py

---------

Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2024-05-08 07:40:41 +10:00
Daniel Han-Chen
073a987a91 Update save.py 2024-05-05 13:28:21 +10:00
Daniel Han
533f4ba136 Fix llama-3 (#423)
* Fix prompt

* Update chat_templates.py

* fix_untrained_tokens

* Update llama.py

* add tokens

* Update _utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* pad_token

* Update chat_templates.py

* Update chat_templates.py

* tokenizer

* Update save.py

* Update chat_templates.py

* Update chat_templates.py

* patch tokenizer padding

* Update tokenizer_utils.py

* Update save.py

* Fix: loading models with resized vocabulary (#377)

* new: vocab resize on load

* new: gitignore

* GGUF fix

* Readme (#390)

* Update README.md

* Update README.md

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Update README.md

* Delete .gitignore

* Phi-3

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Fix reserved tokens

* Update save.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

---------

Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2024-05-05 05:45:01 +10:00
Daniel Han-Chen
d4e23b5d86 Update README.md 2024-04-30 20:26:10 +10:00
Daniel Han
a0d8184a0e Phi-3 (#397)
* Fix prompt

* Update chat_templates.py

* fix_untrained_tokens

* Update llama.py

* add tokens

* Update _utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* pad_token

* Update chat_templates.py

* Update chat_templates.py

* tokenizer

* Update save.py

* Update chat_templates.py

* Update chat_templates.py

* patch tokenizer padding

* Update tokenizer_utils.py

* Update save.py

* Fix: loading models with resized vocabulary (#377)

* new: vocab resize on load

* new: gitignore

* GGUF fix

* Readme (#390)

* Update README.md

* Update README.md

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Update README.md

* Delete .gitignore

* Phi-3

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

---------

Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2024-04-30 05:59:02 +10:00
Daniel Han-Chen
308ed4d15d Update save.py 2024-04-29 17:55:04 +10:00
Daniel Han
838ecde97a Nightly (#370)
* Fix prompt

* Update chat_templates.py

* fix_untrained_tokens

* Update llama.py

* add tokens

* Update _utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* pad_token

* Update chat_templates.py

* Update chat_templates.py

* tokenizer

* Update save.py

* Update chat_templates.py

* Update chat_templates.py

* patch tokenizer padding

* Update tokenizer_utils.py

* Update save.py

* Fix: loading models with resized vocabulary (#377)

* new: vocab resize on load

* new: gitignore

* GGUF fix

* Readme (#390)

* Update README.md

* Update README.md

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Update README.md

* Delete .gitignore

---------

Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2024-04-29 04:47:03 +10:00
Daniel Han
4a88539991 Fix Llama-3 (#366)
* Fix prompt

* Update chat_templates.py

* fix_untrained_tokens

* Update llama.py

* add tokens

* Update _utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* pad_token

* Update chat_templates.py

* Update chat_templates.py

* tokenizer

* Update save.py

* Update chat_templates.py

* Update chat_templates.py
2024-04-22 05:12:11 +10:00
Daniel Han-Chen
7a0ded3ab4 Update README.md 2024-04-20 14:22:52 +10:00
Daniel Han
79396e367f Fix prompt (#357) 2024-04-20 04:59:19 +10:00
Daniel Han
68d7f13dc2 Update README.md (#352) 2024-04-19 05:50:19 +10:00
Daniel Han
a32efe4dde Update README.md (#351) 2024-04-19 05:47:04 +10:00
Daniel Han-Chen
95b276ceb1 Update mapper.py 2024-04-19 05:37:53 +10:00
Daniel Han-Chen
2771e5042b Update _utils.py 2024-04-19 03:36:45 +10:00
Daniel Han-Chen
faa66d209d Update _utils.py 2024-04-19 03:33:12 +10:00
Daniel Han-Chen
a4123e16c7 Update tokenizer_utils.py 2024-04-19 03:05:54 +10:00
Daniel Han-Chen
69cedf9fcb Update tokenizer_utils.py 2024-04-19 03:03:05 +10:00
Daniel Han-Chen
8f89ca62f1 Llama-3 2024-04-19 02:55:20 +10:00
Daniel Han
6dad5dd932 Tokenizers fix (#336)
* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py

* Update save.py

* GGUF incorrect

* Update save.py

* Update pyproject.toml

* kaggle new

* Update pyproject.toml

* Update pyproject.toml

* upcasting

* Fix Colab

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update rope_embedding.py

* Update rope_embedding.py

* Fix bugs

* Update fast_lora.py

* Update fast_lora.py

* Update README.md

* Update README.md

* GGUF

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update README.md

* Update README.md

* Bugs

* Update fast_lora.py

* Update pyproject.toml

* Update fast_lora.py

* Update __init__.py

* Update fast_lora.py

* dtype

* Update llama.py

* Update llama.py

* Update llama.py

* dtype

* Update mistral.py

* trust_remote_code

* lm_head

* Update llama.py

* save_pretrained_settings

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* state_dict

* Update save.py

* whoami

* Update llama.py

* Update save.py

* Update llama.py

* Patch tokenizer

* Update chat_templates.py

* Heal tokenizers

* Update chat_templates.py

* Update mapper.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update chat_templates.py

* tokenizer patching

* patch_tokenizer

* Update chat_templates.py

* Update tokenizer_utils.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update tokenizer_utils.py

* Edit

* Update mistral.py

* Update mistral.py

* Stats

* Update mistral.py

* attention_mask

* Update llama.py

* Update llama.py

* batch

* Temp fix batch inference

* Update llama.py

* Update gemma.py

* Fix inference

* swiglu

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* fast inference

* model

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update llama.py

* Update utils.py

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* overhead

* Update llama.py

* Update llama.py

* compile

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* lora mamtul

* Update llama.py

* Update llama.py

* Update llama.py

* offloaded checkpointing

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update gemma.py

* Revert "Update gemma.py"

This reverts commit e3c3c5f3fa3d04a87f854056f6b547ced610d712.

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Saving

* sentencepiece_model_pb2

* Update llama.py

* Update save.py

* Update llama.py

* padding side

* Update tokenizer_utils.py

* cache dir

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update save.py

* Update save.py

* checkpoint

* Gemma 1.1

* more models

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* dtype

* Update llama.py

* CodeGemma

* Fix downcasting

* Some bugs

* Fix Yi tokenizer

* HF_TOKEN

* Update llama.py

* Update tokenizer_utils.py
2024-04-15 04:18:01 +10:00
Daniel Han
6d36f6b9a9 Readme Changes (#324)
* Update README.md

* Update README.md

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2024-04-11 01:43:34 +10:00
Daniel Han-Chen
66874d9918 Update _utils.py 2024-04-10 00:51:06 +10:00
Daniel Han
648bde7f06 Fix downcasting LoRA (#318)
* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py

* Update save.py

* GGUF incorrect

* Update save.py

* Update pyproject.toml

* kaggle new

* Update pyproject.toml

* Update pyproject.toml

* upcasting

* Fix Colab

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update rope_embedding.py

* Update rope_embedding.py

* Fix bugs

* Update fast_lora.py

* Update fast_lora.py

* Update README.md

* Update README.md

* GGUF

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update README.md

* Update README.md

* Bugs

* Update fast_lora.py

* Update pyproject.toml

* Update fast_lora.py

* Update __init__.py

* Update fast_lora.py

* dtype

* Update llama.py

* Update llama.py

* Update llama.py

* dtype

* Update mistral.py

* trust_remote_code

* lm_head

* Update llama.py

* save_pretrained_settings

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* state_dict

* Update save.py

* whoami

* Update llama.py

* Update save.py

* Update llama.py

* Patch tokenizer

* Update chat_templates.py

* Heal tokenizers

* Update chat_templates.py

* Update mapper.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update chat_templates.py

* tokenizer patching

* patch_tokenizer

* Update chat_templates.py

* Update tokenizer_utils.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update tokenizer_utils.py

* Edit

* Update mistral.py

* Update mistral.py

* Stats

* Update mistral.py

* attention_mask

* Update llama.py

* Update llama.py

* batch

* Temp fix batch inference

* Update llama.py

* Update gemma.py

* Fix inference

* swiglu

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* fast inference

* model

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update llama.py

* Update utils.py

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* overhead

* Update llama.py

* Update llama.py

* compile

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* lora mamtul

* Update llama.py

* Update llama.py

* Update llama.py

* offloaded checkpointing

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update gemma.py

* Revert "Update gemma.py"

This reverts commit c68b59bbfd.

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Saving

* sentencepiece_model_pb2

* Update llama.py

* Update save.py

* Update llama.py

* padding side

* Update tokenizer_utils.py

* cache dir

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update save.py

* Update save.py

* checkpoint

* Gemma 1.1

* more models

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* dtype

* Update llama.py

* CodeGemma

* Fix downcasting
2024-04-10 00:44:58 +10:00
Daniel Han
c7649138ee CodeGemma (#317)
* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py

* Update save.py

* GGUF incorrect

* Update save.py

* Update pyproject.toml

* kaggle new

* Update pyproject.toml

* Update pyproject.toml

* upcasting

* Fix Colab

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update rope_embedding.py

* Update rope_embedding.py

* Fix bugs

* Update fast_lora.py

* Update fast_lora.py

* Update README.md

* Update README.md

* GGUF

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update README.md

* Update README.md

* Bugs

* Update fast_lora.py

* Update pyproject.toml

* Update fast_lora.py

* Update __init__.py

* Update fast_lora.py

* dtype

* Update llama.py

* Update llama.py

* Update llama.py

* dtype

* Update mistral.py

* trust_remote_code

* lm_head

* Update llama.py

* save_pretrained_settings

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* state_dict

* Update save.py

* whoami

* Update llama.py

* Update save.py

* Update llama.py

* Patch tokenizer

* Update chat_templates.py

* Heal tokenizers

* Update chat_templates.py

* Update mapper.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update chat_templates.py

* tokenizer patching

* patch_tokenizer

* Update chat_templates.py

* Update tokenizer_utils.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update tokenizer_utils.py

* Edit

* Update mistral.py

* Update mistral.py

* Stats

* Update mistral.py

* attention_mask

* Update llama.py

* Update llama.py

* batch

* Temp fix batch inference

* Update llama.py

* Update gemma.py

* Fix inference

* swiglu

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* fast inference

* model

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update llama.py

* Update utils.py

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* overhead

* Update llama.py

* Update llama.py

* compile

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* lora mamtul

* Update llama.py

* Update llama.py

* Update llama.py

* offloaded checkpointing

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update gemma.py

* Revert "Update gemma.py"

This reverts commit c68b59bbfd.

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Saving

* sentencepiece_model_pb2

* Update llama.py

* Update save.py

* Update llama.py

* padding side

* Update tokenizer_utils.py

* cache dir

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update save.py

* Update save.py

* checkpoint

* Gemma 1.1

* more models

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* dtype

* Update llama.py

* CodeGemma
2024-04-09 23:35:02 +10:00
Daniel Han
6a53964f6b Torch dtype (#314)
* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py

* Update save.py

* GGUF incorrect

* Update save.py

* Update pyproject.toml

* kaggle new

* Update pyproject.toml

* Update pyproject.toml

* upcasting

* Fix Colab

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update rope_embedding.py

* Update rope_embedding.py

* Fix bugs

* Update fast_lora.py

* Update fast_lora.py

* Update README.md

* Update README.md

* GGUF

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update README.md

* Update README.md

* Bugs

* Update fast_lora.py

* Update pyproject.toml

* Update fast_lora.py

* Update __init__.py

* Update fast_lora.py

* dtype

* Update llama.py

* Update llama.py

* Update llama.py

* dtype

* Update mistral.py

* trust_remote_code

* lm_head

* Update llama.py

* save_pretrained_settings

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* state_dict

* Update save.py

* whoami

* Update llama.py

* Update save.py

* Update llama.py

* Patch tokenizer

* Update chat_templates.py

* Heal tokenizers

* Update chat_templates.py

* Update mapper.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update chat_templates.py

* tokenizer patching

* patch_tokenizer

* Update chat_templates.py

* Update tokenizer_utils.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update tokenizer_utils.py

* Edit

* Update mistral.py

* Update mistral.py

* Stats

* Update mistral.py

* attention_mask

* Update llama.py

* Update llama.py

* batch

* Temp fix batch inference

* Update llama.py

* Update gemma.py

* Fix inference

* swiglu

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* fast inference

* model

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update llama.py

* Update utils.py

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* overhead

* Update llama.py

* Update llama.py

* compile

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* lora mamtul

* Update llama.py

* Update llama.py

* Update llama.py

* offloaded checkpointing

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update gemma.py

* Revert "Update gemma.py"

This reverts commit c68b59bbfd.

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Saving

* sentencepiece_model_pb2

* Update llama.py

* Update save.py

* Update llama.py

* padding side

* Update tokenizer_utils.py

* cache dir

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update save.py

* Update save.py

* checkpoint

* Gemma 1.1

* more models

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* dtype
2024-04-08 23:19:46 +10:00
Daniel Han
4474e4bca4 Fix Gemma GGUF (#311)
* Update gemma.py

* Fix Gemma merging

* Update rms_layernorm.py

* Update gemma.py

* Update pyproject.toml

* Layernorms

* Gemma precision

* Update gemma.py

* sqrt

* Update gemma.py

* Update save.py

* RoPE and Gemma precision

* Update rms_layernorm.py

* Fix warning

* Update chat_templates.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py

* Update save.py

* GGUF incorrect

* Update save.py

* Update pyproject.toml

* kaggle new

* Update pyproject.toml

* Update pyproject.toml

* upcasting

* Fix Colab

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update rope_embedding.py

* Update rope_embedding.py

* Fix bugs

* Update fast_lora.py

* Update fast_lora.py

* Update README.md

* Update README.md

* GGUF

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update README.md

* Update README.md

* Bugs

* Update fast_lora.py

* Update pyproject.toml

* Update fast_lora.py

* Update __init__.py

* Update fast_lora.py

* dtype

* Update llama.py

* Update llama.py

* Update llama.py

* dtype

* Update mistral.py

* trust_remote_code

* lm_head

* Update llama.py

* save_pretrained_settings

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* state_dict

* Update save.py

* whoami

* Update llama.py

* Update save.py

* Update llama.py

* Patch tokenizer

* Update chat_templates.py

* Heal tokenizers

* Update chat_templates.py

* Update mapper.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update chat_templates.py

* tokenizer patching

* patch_tokenizer

* Update chat_templates.py

* Update tokenizer_utils.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update tokenizer_utils.py

* Edit

* Update mistral.py

* Update mistral.py

* Stats

* Update mistral.py

* attention_mask

* Update llama.py

* Update llama.py

* batch

* Temp fix batch inference

* Update llama.py

* Update gemma.py

* Fix inference

* swiglu

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* fast inference

* model

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update llama.py

* Update utils.py

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* overhead

* Update llama.py

* Update llama.py

* compile

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* lora mamtul

* Update llama.py

* Update llama.py

* Update llama.py

* offloaded checkpointing

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update gemma.py

* Revert "Update gemma.py"

This reverts commit c68b59bbfd.

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Saving

* sentencepiece_model_pb2

* Update llama.py

* Update save.py

* Update llama.py

* padding side

* Update tokenizer_utils.py

* cache dir

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update save.py

* Update save.py

* checkpoint

* Gemma 1.1

* more models
2024-04-08 01:28:00 +10:00
Daniel Han
f3d05d19e3 Bug fixes (#308)
* Update rms_layernorm.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Fix Gemma merging

* Update rms_layernorm.py

* Update gemma.py

* Update pyproject.toml

* Layernorms

* Gemma precision

* Update gemma.py

* sqrt

* Update gemma.py

* Update save.py

* RoPE and Gemma precision

* Update rms_layernorm.py

* Fix warning

* Update chat_templates.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py

* Update save.py

* GGUF incorrect

* Update save.py

* Update pyproject.toml

* kaggle new

* Update pyproject.toml

* Update pyproject.toml

* upcasting

* Fix Colab

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update rope_embedding.py

* Update rope_embedding.py

* Fix bugs

* Update fast_lora.py

* Update fast_lora.py

* Update README.md

* Update README.md

* GGUF

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update README.md

* Update README.md

* Bugs

* Update fast_lora.py

* Update pyproject.toml

* Update fast_lora.py

* Update __init__.py

* Update fast_lora.py

* dtype

* Update llama.py

* Update llama.py

* Update llama.py

* dtype

* Update mistral.py

* trust_remote_code

* lm_head

* Update llama.py

* save_pretrained_settings

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* state_dict

* Update save.py

* whoami

* Update llama.py

* Update save.py

* Update llama.py

* Patch tokenizer

* Update chat_templates.py

* Heal tokenizers

* Update chat_templates.py

* Update mapper.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update chat_templates.py

* tokenizer patching

* patch_tokenizer

* Update chat_templates.py

* Update tokenizer_utils.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update tokenizer_utils.py

* Edit

* Update mistral.py

* Update mistral.py

* Stats

* Update mistral.py

* attention_mask

* Update llama.py

* Update llama.py

* batch

* Temp fix batch inference

* Update llama.py

* Update gemma.py

* Fix inference

* swiglu

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* fast inference

* model

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update llama.py

* Update utils.py

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* overhead

* Update llama.py

* Update llama.py

* compile

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* lora mamtul

* Update llama.py

* Update llama.py

* Update llama.py

* offloaded checkpointing

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update gemma.py

* Revert "Update gemma.py"

This reverts commit c68b59bbfd.

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Saving

* sentencepiece_model_pb2

* Update llama.py

* Update save.py

* Update llama.py

* padding side

* Update tokenizer_utils.py

* cache dir

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py
2024-04-07 03:44:45 +10:00
Daniel Han
920f0ae6e5 Bug fixes (#306)
* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py

* Update gemma.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Fix Gemma merging

* Update rms_layernorm.py

* Update gemma.py

* Update pyproject.toml

* Layernorms

* Gemma precision

* Update gemma.py

* sqrt

* Update gemma.py

* Update save.py

* RoPE and Gemma precision

* Update rms_layernorm.py

* Fix warning

* Update chat_templates.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py

* Update save.py

* GGUF incorrect

* Update save.py

* Update pyproject.toml

* kaggle new

* Update pyproject.toml

* Update pyproject.toml

* upcasting

* Fix Colab

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update rope_embedding.py

* Update rope_embedding.py

* Fix bugs

* Update fast_lora.py

* Update fast_lora.py

* Update README.md

* Update README.md

* GGUF

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update README.md

* Update README.md

* Bugs

* Update fast_lora.py

* Update pyproject.toml

* Update fast_lora.py

* Update __init__.py

* Update fast_lora.py

* dtype

* Update llama.py

* Update llama.py

* Update llama.py

* dtype

* Update mistral.py

* trust_remote_code

* lm_head

* Update llama.py

* save_pretrained_settings

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* state_dict

* Update save.py

* whoami

* Update llama.py

* Update save.py

* Update llama.py

* Patch tokenizer

* Update chat_templates.py

* Heal tokenizers

* Update chat_templates.py

* Update mapper.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update chat_templates.py

* tokenizer patching

* patch_tokenizer

* Update chat_templates.py

* Update tokenizer_utils.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update tokenizer_utils.py

* Edit

* Update mistral.py

* Update mistral.py

* Stats

* Update mistral.py

* attention_mask

* Update llama.py

* Update llama.py

* batch

* Temp fix batch inference

* Update llama.py

* Update gemma.py

* Fix inference

* swiglu

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* fast inference

* model

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update llama.py

* Update utils.py

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* overhead

* Update llama.py

* Update llama.py

* compile

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* lora mamtul

* Update llama.py

* Update llama.py

* Update llama.py

* offloaded checkpointing

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update gemma.py

* Revert "Update gemma.py"

This reverts commit c68b59bbfd.

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Saving

* sentencepiece_model_pb2

* Update llama.py

* Update save.py

* Update llama.py

* padding side
2024-04-06 04:31:24 +11:00
Daniel Han-Chen
db4a24e602 Gemma inference fix 2024-04-05 03:51:53 +11:00
Daniel Han-Chen
b122b76b26 Update gemma.py
eabdullin
2024-04-04 22:46:32 +11:00
Daniel Han
7e1c6a62e2 Nightly (#299)
* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update pyproject.toml

* Small fixes

* Update pyproject.toml

* Approx gelu

* Update geglu.py

* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py

* Update gemma.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Fix Gemma merging

* Update rms_layernorm.py

* Update gemma.py

* Update pyproject.toml

* Layernorms

* Gemma precision

* Update gemma.py

* sqrt

* Update gemma.py

* Update save.py

* RoPE and Gemma precision

* Update rms_layernorm.py

* Fix warning

* Update chat_templates.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py

* Update save.py

* GGUF incorrect

* Update save.py

* Update pyproject.toml

* kaggle new

* Update pyproject.toml

* Update pyproject.toml

* upcasting

* Fix Colab

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update rope_embedding.py

* Update rope_embedding.py

* Fix bugs

* Update fast_lora.py

* Update fast_lora.py

* Update README.md

* Update README.md

* GGUF

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update README.md

* Update README.md

* Bugs

* Update fast_lora.py

* Update pyproject.toml

* Update fast_lora.py

* Update __init__.py

* Update fast_lora.py

* dtype

* Update llama.py

* Update llama.py

* Update llama.py

* dtype

* Update mistral.py

* trust_remote_code

* lm_head

* Update llama.py

* save_pretrained_settings

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* state_dict

* Update save.py

* whoami

* Update llama.py

* Update save.py

* Update llama.py

* Patch tokenizer

* Update chat_templates.py

* Heal tokenizers

* Update chat_templates.py

* Update mapper.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update chat_templates.py

* tokenizer patching

* patch_tokenizer

* Update chat_templates.py

* Update tokenizer_utils.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update tokenizer_utils.py

* Edit

* Update mistral.py

* Update mistral.py

* Stats

* Update mistral.py

* attention_mask

* Update llama.py

* Update llama.py

* batch

* Temp fix batch inference

* Update llama.py

* Update gemma.py

* Fix inference

* swiglu

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* fast inference

* model

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update llama.py

* Update utils.py

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* overhead

* Update llama.py

* Update llama.py

* compile

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* lora mamtul
2024-04-03 05:38:31 +11:00
Daniel Han
537577720c Fix batched inference (#298)
* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update pyproject.toml

* Small fixes

* Update pyproject.toml

* Approx gelu

* Update geglu.py

* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py

* Update gemma.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Fix Gemma merging

* Update rms_layernorm.py

* Update gemma.py

* Update pyproject.toml

* Layernorms

* Gemma precision

* Update gemma.py

* sqrt

* Update gemma.py

* Update save.py

* RoPE and Gemma precision

* Update rms_layernorm.py

* Fix warning

* Update chat_templates.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py

* Update save.py

* GGUF incorrect

* Update save.py

* Update pyproject.toml

* kaggle new

* Update pyproject.toml

* Update pyproject.toml

* upcasting

* Fix Colab

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update rope_embedding.py

* Update rope_embedding.py

* Fix bugs

* Update fast_lora.py

* Update fast_lora.py

* Update README.md

* Update README.md

* GGUF

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update README.md

* Update README.md

* Bugs

* Update fast_lora.py

* Update pyproject.toml

* Update fast_lora.py

* Update __init__.py

* Update fast_lora.py

* dtype

* Update llama.py

* Update llama.py

* Update llama.py

* dtype

* Update mistral.py

* trust_remote_code

* lm_head

* Update llama.py

* save_pretrained_settings

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* state_dict

* Update save.py

* whoami

* Update llama.py

* Update save.py

* Update llama.py

* Patch tokenizer

* Update chat_templates.py

* Heal tokenizers

* Update chat_templates.py

* Update mapper.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update chat_templates.py

* tokenizer patching

* patch_tokenizer

* Update chat_templates.py

* Update tokenizer_utils.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update tokenizer_utils.py

* Edit

* Update mistral.py

* Update mistral.py

* Stats

* Update mistral.py

* attention_mask

* Update llama.py

* Update llama.py

* batch

* Temp fix batch inference

* Update llama.py

* Update gemma.py

* Fix inference

* swiglu

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* fast inference

* model

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update llama.py

* Update utils.py

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* overhead

* Update llama.py

* Update llama.py

* compile

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py
2024-04-03 04:56:11 +11:00
Daniel Han-Chen
8921867157 Revert "Temp fix batch inference (#294)"
This reverts commit e209991ba1.
2024-04-02 13:18:31 +11:00
Daniel Han
e209991ba1 Temp fix batch inference (#294)
* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update pyproject.toml

* Small fixes

* Update pyproject.toml

* Approx gelu

* Update geglu.py

* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py

* Update gemma.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Fix Gemma merging

* Update rms_layernorm.py

* Update gemma.py

* Update pyproject.toml

* Layernorms

* Gemma precision

* Update gemma.py

* sqrt

* Update gemma.py

* Update save.py

* RoPE and Gemma precision

* Update rms_layernorm.py

* Fix warning

* Update chat_templates.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py

* Update save.py

* GGUF incorrect

* Update save.py

* Update pyproject.toml

* kaggle new

* Update pyproject.toml

* Update pyproject.toml

* upcasting

* Fix Colab

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update rope_embedding.py

* Update rope_embedding.py

* Fix bugs

* Update fast_lora.py

* Update fast_lora.py

* Update README.md

* Update README.md

* GGUF

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update README.md

* Update README.md

* Bugs

* Update fast_lora.py

* Update pyproject.toml

* Update fast_lora.py

* Update __init__.py

* Update fast_lora.py

* dtype

* Update llama.py

* Update llama.py

* Update llama.py

* dtype

* Update mistral.py

* trust_remote_code

* lm_head

* Update llama.py

* save_pretrained_settings

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* state_dict

* Update save.py

* whoami

* Update llama.py

* Update save.py

* Update llama.py

* Patch tokenizer

* Update chat_templates.py

* Heal tokenizers

* Update chat_templates.py

* Update mapper.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update chat_templates.py

* tokenizer patching

* patch_tokenizer

* Update chat_templates.py

* Update tokenizer_utils.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update tokenizer_utils.py

* Edit

* Update mistral.py

* Update mistral.py

* Stats

* Update mistral.py

* attention_mask

* Update llama.py

* Update llama.py

* batch

* Temp fix batch inference

* Update llama.py

* Update gemma.py
2024-04-02 04:35:28 +11:00
Daniel Han
8e263b8b7d Nightly (#293)
Env checking
2024-04-01 04:38:12 +11:00
Daniel Han
74f79684da Auto Healing Tokenizer (#283)
* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update pyproject.toml

* Small fixes

* Update pyproject.toml

* Approx gelu

* Update geglu.py

* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py

* Update gemma.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Fix Gemma merging

* Update rms_layernorm.py

* Update gemma.py

* Update pyproject.toml

* Layernorms

* Gemma precision

* Update gemma.py

* sqrt

* Update gemma.py

* Update save.py

* RoPE and Gemma precision

* Update rms_layernorm.py

* Fix warning

* Update chat_templates.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py

* Update save.py

* GGUF incorrect

* Update save.py

* Update pyproject.toml

* kaggle new

* Update pyproject.toml

* Update pyproject.toml

* upcasting

* Fix Colab

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update rope_embedding.py

* Update rope_embedding.py

* Fix bugs

* Update fast_lora.py

* Update fast_lora.py

* Update README.md

* Update README.md

* GGUF

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update README.md

* Update README.md

* Bugs

* Update fast_lora.py

* Update pyproject.toml

* Update fast_lora.py

* Update __init__.py

* Update fast_lora.py

* dtype

* Update llama.py

* Update llama.py

* Update llama.py

* dtype

* Update mistral.py

* trust_remote_code

* lm_head

* Update llama.py

* save_pretrained_settings

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* state_dict

* Update save.py

* whoami

* Update llama.py

* Update save.py

* Update llama.py

* Patch tokenizer

* Update chat_templates.py

* Heal tokenizers

* Update chat_templates.py

* Update mapper.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update chat_templates.py

* tokenizer patching

* patch_tokenizer

* Update chat_templates.py

* Update tokenizer_utils.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update tokenizer_utils.py

* Edit
2024-03-28 04:16:50 +11:00
Daniel Han-Chen
bb45fdabb6 Update mapper.py 2024-03-24 14:00:04 +11:00
Daniel Han
b9d5ca53dc lm_head issue (#266)
* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update pyproject.toml

* Small fixes

* Update pyproject.toml

* Approx gelu

* Update geglu.py

* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py

* Update gemma.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Fix Gemma merging

* Update rms_layernorm.py

* Update gemma.py

* Update pyproject.toml

* Layernorms

* Gemma precision

* Update gemma.py

* sqrt

* Update gemma.py

* Update save.py

* RoPE and Gemma precision

* Update rms_layernorm.py

* Fix warning

* Update chat_templates.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py

* Update save.py

* GGUF incorrect

* Update save.py

* Update pyproject.toml

* kaggle new

* Update pyproject.toml

* Update pyproject.toml

* upcasting

* Fix Colab

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update rope_embedding.py

* Update rope_embedding.py

* Fix bugs

* Update fast_lora.py

* Update fast_lora.py

* Update README.md

* Update README.md

* GGUF

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update README.md

* Update README.md

* Bugs

* Update fast_lora.py

* Update pyproject.toml

* Update fast_lora.py

* Update __init__.py

* Update fast_lora.py

* dtype

* Update llama.py

* Update llama.py

* Update llama.py

* dtype

* Update mistral.py

* trust_remote_code

* lm_head

* Update llama.py

* save_pretrained_settings

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* state_dict

* Update save.py

* whoami

* Update llama.py

* Update save.py

* Update llama.py
2024-03-20 04:48:15 +11:00
Daniel Han
e269b0dc59 Fix Saving (#264)
* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update pyproject.toml

* Small fixes

* Update pyproject.toml

* Approx gelu

* Update geglu.py

* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py

* Update gemma.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Fix Gemma merging

* Update rms_layernorm.py

* Update gemma.py

* Update pyproject.toml

* Layernorms

* Gemma precision

* Update gemma.py

* sqrt

* Update gemma.py

* Update save.py

* RoPE and Gemma precision

* Update rms_layernorm.py

* Fix warning

* Update chat_templates.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py

* Update save.py

* GGUF incorrect

* Update save.py

* Update pyproject.toml

* kaggle new

* Update pyproject.toml

* Update pyproject.toml

* upcasting

* Fix Colab

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update rope_embedding.py

* Update rope_embedding.py

* Fix bugs

* Update fast_lora.py

* Update fast_lora.py

* Update README.md

* Update README.md

* GGUF

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update README.md

* Update README.md

* Bugs

* Update fast_lora.py

* Update pyproject.toml

* Update fast_lora.py

* Update __init__.py

* Update fast_lora.py

* dtype

* Update llama.py

* Update llama.py

* Update llama.py

* dtype

* Update mistral.py

* trust_remote_code

* lm_head

* Update llama.py

* save_pretrained_settings

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* state_dict

* Update save.py

* whoami

* Update llama.py

* Update save.py
2024-03-19 19:55:02 +11:00
Daniel Han
696e8817ea Fix GGUF and saving (#261)
* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update pyproject.toml

* Small fixes

* Update pyproject.toml

* Approx gelu

* Update geglu.py

* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py

* Update gemma.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Fix Gemma merging

* Update rms_layernorm.py

* Update gemma.py

* Update pyproject.toml

* Layernorms

* Gemma precision

* Update gemma.py

* sqrt

* Update gemma.py

* Update save.py

* RoPE and Gemma precision

* Update rms_layernorm.py

* Fix warning

* Update chat_templates.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py

* Update save.py

* GGUF incorrect

* Update save.py

* Update pyproject.toml

* kaggle new

* Update pyproject.toml

* Update pyproject.toml

* upcasting

* Fix Colab

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update rope_embedding.py

* Update rope_embedding.py

* Fix bugs

* Update fast_lora.py

* Update fast_lora.py

* Update README.md

* Update README.md

* GGUF

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update README.md

* Update README.md

* Bugs

* Update fast_lora.py

* Update pyproject.toml

* Update fast_lora.py

* Update __init__.py

* Update fast_lora.py

* dtype

* Update llama.py

* Update llama.py

* Update llama.py

* dtype

* Update mistral.py

* trust_remote_code

* lm_head

* Update llama.py

* save_pretrained_settings

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py
2024-03-19 04:32:28 +11:00
Daniel Han
36473e2d6e Fix lm_head, embed_tokens (#258)
* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update pyproject.toml

* Small fixes

* Update pyproject.toml

* Approx gelu

* Update geglu.py

* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py

* Update gemma.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Fix Gemma merging

* Update rms_layernorm.py

* Update gemma.py

* Update pyproject.toml

* Layernorms

* Gemma precision

* Update gemma.py

* sqrt

* Update gemma.py

* Update save.py

* RoPE and Gemma precision

* Update rms_layernorm.py

* Fix warning

* Update chat_templates.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py

* Update save.py

* GGUF incorrect

* Update save.py

* Update pyproject.toml

* kaggle new

* Update pyproject.toml

* Update pyproject.toml

* upcasting

* Fix Colab

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update rope_embedding.py

* Update rope_embedding.py

* Fix bugs

* Update fast_lora.py

* Update fast_lora.py

* Update README.md

* Update README.md

* GGUF

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update README.md

* Update README.md

* Bugs

* Update fast_lora.py

* Update pyproject.toml

* Update fast_lora.py

* Update __init__.py

* Update fast_lora.py

* dtype

* Update llama.py

* Update llama.py

* Update llama.py

* dtype

* Update mistral.py

* trust_remote_code
2024-03-18 04:18:15 +11:00
Daniel Han-Chen
16b3dd86b7 Update fast_lora.py 2024-03-17 22:46:44 +11:00
Daniel Han-Chen
4b9aaee6b1 Update fast_lora.py 2024-03-17 22:46:17 +11:00
Daniel Han-Chen
3685ec607d Squashed commit of the following:
commit 61d45c60db
Merge: f024ce2 64d847b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 17 22:12:44 2024 +1100

    Merge branch 'main' into nightly

commit f024ce2821
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 17 22:12:18 2024 +1100

    Update __init__.py

commit d38cf5387c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 17 22:09:23 2024 +1100

    Update fast_lora.py

commit 9c35a2c4b0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 17 22:08:33 2024 +1100

    Update pyproject.toml

commit 6edc35f686
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 17 20:18:00 2024 +1100

    Update fast_lora.py

commit 2a9d4fb947
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 17 20:10:30 2024 +1100

    Bugs

commit 14717be070
Merge: 5c24a3b c599ae0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 17 20:10:26 2024 +1100

    Merge branch 'main' into nightly

commit 5c24a3bc2e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 17 02:46:35 2024 +1100

    Update README.md

commit fd729a7131
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 17 02:44:58 2024 +1100

    Update README.md

commit 7e9f092e9f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Mar 16 23:17:35 2024 +1100

    Update save.py

commit e3efca8778
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Mar 16 22:36:57 2024 +1100

    Update save.py

commit d58fa31e0c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Mar 16 22:31:02 2024 +1100

    Update save.py

commit 64d954bd13
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Mar 16 20:32:12 2024 +1100

    Update save.py

commit 815202f832
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Mar 16 20:03:51 2024 +1100

    GGUF

commit 338b2c928b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Mar 16 04:40:21 2024 +1100

    Update README.md

commit f342425c1e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Mar 16 04:34:51 2024 +1100

    Update README.md

commit cef733a420
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Mar 16 04:10:54 2024 +1100

    Update fast_lora.py

commit e5bcab2a74
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Mar 16 03:43:16 2024 +1100

    Update fast_lora.py

commit 80cfe132f6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Mar 16 03:38:13 2024 +1100

    Fix bugs

commit d8e98be90d
Merge: 51c2484 39713e6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Mar 16 00:14:02 2024 +1100

    Merge branch 'main' into nightly

commit 51c2484ffd
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Mar 15 22:35:44 2024 +1100

    Update rope_embedding.py

commit 3e93a78794
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Mar 15 20:14:56 2024 +1100

    Update rope_embedding.py

commit 82d80e9e98
Merge: 718a6a1 990c7a8
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Mar 15 19:50:20 2024 +1100

    Merge branch 'main' into nightly

commit 718a6a1c84
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Mar 15 19:50:03 2024 +1100

    Update pyproject.toml

commit 385c6d44a8
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Mar 15 19:49:09 2024 +1100

    Update pyproject.toml

commit 9c9ede4680
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Mar 15 19:48:39 2024 +1100

    Update pyproject.toml

commit 9d6c9c9ebc
Merge: a12e4ea 2c5c5bb
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Mar 15 05:07:09 2024 +1100

    Merge branch 'main' into nightly

commit a12e4ea3e0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Mar 15 04:44:54 2024 +1100

    Update chat_templates.py

commit c3e0e518d9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Mar 15 04:30:03 2024 +1100

    Update chat_templates.py

commit fe0e2d7baa
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Mar 15 03:50:32 2024 +1100

    Update chat_templates.py

commit 5dfe582c96
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Mar 15 03:37:37 2024 +1100

    Update chat_templates.py

commit ec73a776ec
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Mar 15 03:34:32 2024 +1100

    Update chat_templates.py

commit 5c4241a240
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Mar 14 20:27:20 2024 +1100

    Update pyproject.toml

commit 6841a303d9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Mar 14 20:20:16 2024 +1100

    Update pyproject.toml

commit 341b5f46e6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Mar 14 20:12:54 2024 +1100

    Update pyproject.toml

commit 6809846464
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Mar 14 20:12:02 2024 +1100

    Update pyproject.toml

commit 77edfb10b7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Mar 14 20:11:24 2024 +1100

    Update pyproject.toml

commit 0ed19ec170
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Mar 14 20:10:10 2024 +1100

    Update pyproject.toml

commit 1f4c625f2a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Mar 14 20:06:05 2024 +1100

    Update pyproject.toml

commit 8fa0aab61c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Mar 14 20:04:01 2024 +1100

    Update pyproject.toml

commit eb61377632
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Mar 14 19:02:29 2024 +1100

    Fix Colab

commit d6ac9b56c1
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Mar 12 20:12:55 2024 +1100

    upcasting

commit b7c3190e97
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Mar 12 00:46:57 2024 +1100

    Update pyproject.toml

commit 4d77c32425
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 11 23:28:03 2024 +1100

    Update pyproject.toml

commit 98573e8618
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 11 23:21:27 2024 +1100

    kaggle new

commit 4a794dd880
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 11 20:28:09 2024 +1100

    Update pyproject.toml

commit a0c18c9880
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 11 20:05:06 2024 +1100

    Update save.py

commit 684eaae9ea
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 11 19:52:18 2024 +1100

    GGUF incorrect

commit 029d588603
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 11 19:13:42 2024 +1100

    Update save.py

commit 252a38a2df
Merge: 63b1f58 3222377
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 11 19:13:29 2024 +1100

    Merge branch 'main' into nightly

commit 63b1f5879e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 11 04:05:43 2024 +1100

    Update llama.py

commit cb0d937469
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 11 03:53:23 2024 +1100

    Account for DoRA

commit 2e87755a5c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 11 03:19:20 2024 +1100

    Update llama.py

commit 93d88ad68d
Merge: aba595d 8bea94c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 11 03:15:30 2024 +1100

    Merge branch 'main' into nightly

commit aba595de9c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 11 03:07:44 2024 +1100

    Update llama.py

commit 133097d995
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 11 02:25:10 2024 +1100

    Update save.py

commit b5d3d63df5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 11 02:15:33 2024 +1100

    Update save.py

commit 8be91b050f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 11 02:10:51 2024 +1100

    Update chat_templates.py

commit 23b7a5764c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 10 19:32:02 2024 +1100

    Update fast_lora.py

commit c1728a9904
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 10 19:11:00 2024 +1100

    Update fast_lora.py

commit c192ce3ed4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 10 19:09:10 2024 +1100

    Update fast_lora.py

commit 7e3abd19ba
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 10 18:36:35 2024 +1100

    Update fast_lora.py

commit c1f3e70394
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 10 18:33:52 2024 +1100

    Update fast_lora.py

commit 08da057f04
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 10 18:09:58 2024 +1100

    Update save.py

commit 1e8922af2b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 10 17:42:41 2024 +1100

    Revert

commit 74fc5caa60
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 10 16:20:20 2024 +1100

    Accuracy

commit 35c6d776c4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 10 14:13:53 2024 +1100

    Update save.py

commit 6d2bc97117
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 10 14:08:38 2024 +1100

    Update llama.py

commit baf8e4c0a8
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 10 13:11:42 2024 +1100

    Update llama.py

commit c0d9516255
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 10 13:10:03 2024 +1100

    Update llama.py

commit 91877f5506
Merge: f887080 1fcf9d4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 10 13:08:23 2024 +1100

    Merge branch 'main' into nightly

commit f887080a6d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 10 04:31:40 2024 +1100

    Tokenizer overwritten

commit 1c1461ae09
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 10 04:08:02 2024 +1100

    Update loader.py

commit 14f063819a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 10 03:54:18 2024 +1100

    model_name

commit daba749ee1
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 10 03:48:05 2024 +1100

    Update llama.py

commit 457b7ba6d6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 10 02:57:41 2024 +1100

    Update chat_templates.py

commit 5c9629f5fe
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 10 02:55:07 2024 +1100

    Update save.py

commit e6c3cdfc2d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Mar 9 23:22:14 2024 +1100

    Update save.py

commit f0a3c05b07
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Mar 9 23:19:02 2024 +1100

    Update save.py

commit 58d1f1e03c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Mar 9 20:05:52 2024 +1100

    Update chat_templates.py

commit 9b3dd3e9f8
Merge: b321ada 70f271b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Mar 8 19:40:25 2024 +1100

    Merge branch 'main' into nightly

commit b321adac2c
Merge: 03232df fedcafe
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Mar 7 04:32:51 2024 +1100

    Merge branch 'main' into nightly

commit 03232dff4e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Mar 7 02:48:15 2024 +1100

    Update chat_templates.py

commit e67c6b4ec9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Mar 7 02:05:38 2024 +1100

    Fix warning

commit 5a7f52819e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Mar 6 19:21:44 2024 +1100

    Update rms_layernorm.py

commit 837ba610cf
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Mar 6 18:18:07 2024 +1100

    RoPE and Gemma precision

commit 1e41fa0c8c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Mar 6 05:07:37 2024 +1100

    Update save.py

commit 333d3d9a51
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Mar 6 04:52:59 2024 +1100

    Update gemma.py

commit 85c052d8fb
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Mar 6 04:48:26 2024 +1100

    sqrt

commit ed1aa0007a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Mar 6 04:24:10 2024 +1100

    Update gemma.py

commit be81c07d43
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Mar 6 04:14:11 2024 +1100

    Gemma precision

commit 5a693a107d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Mar 5 23:58:00 2024 +1100

    Layernorms

commit e0a2463738
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Mar 5 20:07:44 2024 +1100

    Update pyproject.toml

commit 160320b3d9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Mar 5 18:55:50 2024 +1100

    Update gemma.py

commit 9f7f205f67
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Mar 5 18:27:11 2024 +1100

    Update rms_layernorm.py

commit 43e710b064
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Mar 5 18:21:59 2024 +1100

    Fix Gemma merging

commit 3137392ccf
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Mar 5 02:41:56 2024 +1100

    Update gemma.py

commit fe35b127fc
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Mar 5 02:40:20 2024 +1100

    Update gemma.py

commit 5fae81b628
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Mar 5 02:22:31 2024 +1100

    Update gemma.py

commit 961813b35f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Mar 5 01:45:08 2024 +1100

    Update gemma.py

commit c027dacb18
Merge: cb193f7 7b7665d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 4 16:19:13 2024 +1100

    Merge branch 'main' into nightly

commit cb193f7e0a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 4 16:14:50 2024 +1100

    Update gemma.py

commit 440c29273f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 4 16:11:12 2024 +1100

    Update rms_layernorm.py

commit c31b27b7dc
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 4 16:04:29 2024 +1100

    Update rms_layernorm.py

commit 6fa081ae77
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Mar 4 16:00:35 2024 +1100

    Update rms_layernorm.py

commit aa2fb63048
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 3 19:32:18 2024 +1100

    Update gemma.py

commit ac23e4b612
Merge: 1fea4ff fa2a43b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 3 19:29:41 2024 +1100

    Merge branch 'main' into nightly

commit 1fea4ffcf2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 3 18:07:55 2024 +1100

    Update geglu.py

commit 245fe4716c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 3 03:39:36 2024 +1100

    Update _utils.py

commit e27523cd5e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 3 03:39:10 2024 +1100

    Update __init__.py

commit 65fbde6cd9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 3 03:37:43 2024 +1100

    Update __init__.py

commit 9a2e791e6c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 3 03:35:21 2024 +1100

    Update llama.py

commit 786885c38c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 3 03:17:59 2024 +1100

    Approx gelu

commit 1c7f0d21ee
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Mar 3 02:24:39 2024 +1100

    Update geglu.py

commit 393e53b016
Merge: c88ab10 307f2da
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Mar 2 18:29:02 2024 +1100

    Merge branch 'main' into nightly

commit c88ab10a5c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Mar 2 18:28:39 2024 +1100

    Approx gelu

commit e032445694
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Mar 2 02:39:11 2024 +1100

    Update pyproject.toml

commit c970a2b3be
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Mar 2 02:38:40 2024 +1100

    Small fixes

commit db87262625
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Mar 1 04:16:25 2024 +1100

    Update pyproject.toml

commit d44bbf5f2e
Merge: 0866020 d0c15bb
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 29 00:17:47 2024 +1100

    Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly

commit 0866020037
Merge: 54b26c0 2561964
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 29 00:17:12 2024 +1100

    Merge branch 'main' into nightly

commit d0c15bb508
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Thu Feb 29 00:17:03 2024 +1100

    Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

    * Update save.py

    * saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update __init__.py

    * Update save.py

    * Update save.py

    * Update save.py

    * save

    * trainer

    * spaces

    * original

    * Gemma

    * Update pyproject.toml

    * Update mapper.py

    * Update fast_lora.py

    * FastGemmaModel

    * model_type

    * Update llama.py

    * Update llama.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update llama.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update llama.py

    * Update cross_entropy_loss.py

    * Update llama.py

    * Update llama.py

    * gemma

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Fast CE Loss

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * CE

    * Update llama.py

    * Update llama.py

    * Update cross_entropy_loss.py

    * Update geglu.py

    * Update cross_entropy_loss.py

    * revert

    * Update llama.py

    * Update llama.py

    * norm

    * Update gemma.py

    * Update gemma.py

    * position_ids

    * Update gemma.py

    * Update gemma.py

    * pos

    * Update llama.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update cross_entropy_loss.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update llama.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update llama.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * revert

    * revert

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update llama.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update cross_entropy_loss.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * rope

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * llama

    * Update llama.py

    * gemma

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update gemma.py

    * Update save.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update gemma.py

    * correct_dtype

    * Update gemma.py

    * Update cross_entropy_loss.py

    * Update cross_entropy_loss.py

    * Chat Templates

    * Update README.md

    * Update README.md

    * Update llama.py

    * DoRA

    * Update _utils.py

    * Update chat_templates.py

commit 54b26c0466
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 29 00:16:19 2024 +1100

    Update llama.py

commit 074aa737ce
Merge: b1892b5 e7c53fb
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 29 00:14:42 2024 +1100

    Merge branch 'main' into nightly

commit b1892b5511
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 29 00:09:00 2024 +1100

    Update chat_templates.py

commit 072dc0c447
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 29 00:02:40 2024 +1100

    Update _utils.py

commit d37f284eaa
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 28 23:59:49 2024 +1100

    DoRA

commit 967fed83c7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 28 22:02:18 2024 +1100

    Update llama.py

commit a4cd4a6e41
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 27 01:40:45 2024 +1100

    Update README.md

commit 90a5c2c121
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 27 01:39:52 2024 +1100

    Update README.md

commit 4b7df80abd
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 27 00:16:57 2024 +1100

    Chat Templates

commit 54ff6eb169
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 17:27:42 2024 +1100

    Update cross_entropy_loss.py

commit 7cf69dd9f2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 16:19:39 2024 +1100

    Update cross_entropy_loss.py

commit 1f450cd6ee
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 14:44:22 2024 +1100

    Update gemma.py

commit 7a5db6758c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 14:39:43 2024 +1100

    correct_dtype

commit cf16811ab9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 14:38:27 2024 +1100

    Update gemma.py

commit 637d01643c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 14:37:48 2024 +1100

    Update llama.py

commit 3d73560df3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 14:37:09 2024 +1100

    Update llama.py

commit 0d24acf0f5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 14:36:39 2024 +1100

    Update llama.py

commit 1aeca6f00c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 14:35:44 2024 +1100

    RoPE

commit 4ec1b216e0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 14:15:37 2024 +1100

    Update save.py

commit 11c9ea4d1b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 05:06:04 2024 +1100

    Update gemma.py

commit 24e9d0af28
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 05:05:41 2024 +1100

    Update gemma.py

commit 0b693df646
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 04:37:12 2024 +1100

    Update gemma.py

commit ae336e9663
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 04:36:10 2024 +1100

    Update gemma.py

commit 1d53b366a0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 04:32:49 2024 +1100

    Update gemma.py

commit b1864710c5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 04:30:29 2024 +1100

    Update gemma.py

commit 686d9fa4c4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 04:27:55 2024 +1100

    Update gemma.py

commit 40169f9fd3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 04:23:38 2024 +1100

    Update gemma.py

commit b0b38f770b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 04:20:19 2024 +1100

    Update gemma.py

commit 13e06a062b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:56:34 2024 +1100

    Update gemma.py

commit fd5389a485
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:55:15 2024 +1100

    Update cross_entropy_loss.py

commit a5abe39ded
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:53:49 2024 +1100

    Update cross_entropy_loss.py

commit 78c96ef83c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:50:34 2024 +1100

    Update cross_entropy_loss.py

commit 88cfe5eeb2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:48:16 2024 +1100

    Update cross_entropy_loss.py

commit dba4c0355d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:46:18 2024 +1100

    Update cross_entropy_loss.py

commit 199460d5ef
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:43:22 2024 +1100

    Update cross_entropy_loss.py

commit 13718b9847
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:36:38 2024 +1100

    Update cross_entropy_loss.py

commit 48736a0755
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:35:45 2024 +1100

    Update cross_entropy_loss.py

commit c1dbf6710e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:29:19 2024 +1100

    Update cross_entropy_loss.py

commit b55d7ad4b2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:29:11 2024 +1100

    Update cross_entropy_loss.py

commit bc3ff0f3a6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:28:22 2024 +1100

    Update cross_entropy_loss.py

commit 7bbce701c8
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:26:42 2024 +1100

    Update cross_entropy_loss.py

commit 9c0b3cd431
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:19:26 2024 +1100

    Update cross_entropy_loss.py

commit e696487e84
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:18:01 2024 +1100

    Update cross_entropy_loss.py

commit c675220373
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:16:31 2024 +1100

    Update cross_entropy_loss.py

commit bb6b6f238e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:15:19 2024 +1100

    Update cross_entropy_loss.py

commit 096544eff4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:14:04 2024 +1100

    Update cross_entropy_loss.py

commit b5eff3bfbc
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:12:23 2024 +1100

    Update cross_entropy_loss.py

commit fdaa9493be
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 03:09:37 2024 +1100

    Update cross_entropy_loss.py

commit 8259b16a5d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 26 02:49:02 2024 +1100

    gemma

commit df8034d185
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 04:48:34 2024 +1100

    Update llama.py

commit db2b387b22
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 04:47:45 2024 +1100

    llama

commit 36d29078cd
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 04:35:39 2024 +1100

    Update gemma.py

commit 4cc14b2074
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 04:30:59 2024 +1100

    Update gemma.py

commit bc428f4e7b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 04:29:01 2024 +1100

    Update gemma.py

commit 7c958ab9e7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 04:25:57 2024 +1100

    Update gemma.py

commit d530f95135
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 04:25:15 2024 +1100

    Update gemma.py

commit 6ad44835d2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 04:22:35 2024 +1100

    Update gemma.py

commit 46723124f7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 04:21:03 2024 +1100

    Update gemma.py

commit ad1ce483db
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 04:20:54 2024 +1100

    Update gemma.py

commit 47d4a33f41
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 04:20:09 2024 +1100

    Update gemma.py

commit 6b531eff93
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 04:18:23 2024 +1100

    Update gemma.py

commit 147d129772
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 04:17:16 2024 +1100

    Update gemma.py

commit 884fadc744
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 04:14:57 2024 +1100

    Update gemma.py

commit 9d648cbccf
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 04:14:38 2024 +1100

    Update gemma.py

commit 6238f16625
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 04:08:04 2024 +1100

    Update gemma.py

commit 9a20dff5d2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 04:07:47 2024 +1100

    Update gemma.py

commit 94833602da
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:55:44 2024 +1100

    Update gemma.py

commit 40c244a7bc
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:53:40 2024 +1100

    Update gemma.py

commit 0e1578dedb
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:51:32 2024 +1100

    Update gemma.py

commit 33a72ba122
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:50:59 2024 +1100

    Update gemma.py

commit 890d73e6d0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:49:55 2024 +1100

    Update gemma.py

commit 33eeb7add2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:44:22 2024 +1100

    Update gemma.py

commit 096a4192fb
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:39:40 2024 +1100

    Update gemma.py

commit f270f377ab
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:39:31 2024 +1100

    Update gemma.py

commit 765e54fe55
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:38:18 2024 +1100

    Update gemma.py

commit 29721a86d1
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:37:30 2024 +1100

    Update gemma.py

commit 208e2c1189
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:32:23 2024 +1100

    Update gemma.py

commit aff1db7a4c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:22:24 2024 +1100

    Update gemma.py

commit 98903e72c2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:21:37 2024 +1100

    Update gemma.py

commit 20da5547f5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:19:34 2024 +1100

    Update gemma.py

commit 20e9ca2fac
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:16:55 2024 +1100

    Update gemma.py

commit 842b310767
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:16:34 2024 +1100

    Update gemma.py

commit 11067cf849
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:14:02 2024 +1100

    Update gemma.py

commit 6725dc93f1
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:13:18 2024 +1100

    Update gemma.py

commit baf11e0e34
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:12:07 2024 +1100

    Update gemma.py

commit e0ada344ac
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:11:20 2024 +1100

    Update gemma.py

commit 39a58f7d2e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:07:20 2024 +1100

    Update gemma.py

commit c96e8c28f8
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:03:31 2024 +1100

    Update gemma.py

commit 10d03d8315
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:03:23 2024 +1100

    Update gemma.py

commit 45a33bae1c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:01:56 2024 +1100

    Update gemma.py

commit 77fea80a1b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:01:30 2024 +1100

    Update gemma.py

commit e3517f0248
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 03:00:22 2024 +1100

    Update gemma.py

commit 03c7f211a6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 02:49:14 2024 +1100

    Update gemma.py

commit 105a0325f1
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 02:42:56 2024 +1100

    Update gemma.py

commit 9d9e38a74a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 02:41:24 2024 +1100

    Update gemma.py

commit 49a124243e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 02:37:48 2024 +1100

    Update gemma.py

commit 5c31fd1e5b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 02:34:18 2024 +1100

    Update gemma.py

commit c0f6b6c0cc
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 02:31:22 2024 +1100

    Update gemma.py

commit 911745377a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 02:30:06 2024 +1100

    Update gemma.py

commit f73698859e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 02:28:01 2024 +1100

    Update gemma.py

commit e56eb3f406
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 02:26:57 2024 +1100

    Update gemma.py

commit 045bce2fa5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 02:25:39 2024 +1100

    Update gemma.py

commit a847719499
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 02:18:43 2024 +1100

    Update gemma.py

commit c058d1aa96
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 02:17:45 2024 +1100

    Update gemma.py

commit dfb2e250f6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 02:17:12 2024 +1100

    Update gemma.py

commit f2fca2d6c1
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 02:10:35 2024 +1100

    Update gemma.py

commit ae473e0f2b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 02:09:08 2024 +1100

    rope

commit ce2d74732d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 02:07:29 2024 +1100

    Update llama.py

commit ed5042aa07
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 02:04:11 2024 +1100

    Update llama.py

commit 42840647d0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 02:02:59 2024 +1100

    Update llama.py

commit ecd21b375b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 02:02:02 2024 +1100

    Update llama.py

commit 5bf8cacebd
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 01:58:33 2024 +1100

    Update gemma.py

commit 66a4380d08
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 01:50:42 2024 +1100

    Update gemma.py

commit b85b45e7a0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 01:49:39 2024 +1100

    Update gemma.py

commit 72870a1ed3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 01:48:02 2024 +1100

    Update gemma.py

commit 7b903368ab
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 01:45:44 2024 +1100

    Update gemma.py

commit 0c2d7e503b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 01:39:25 2024 +1100

    Update cross_entropy_loss.py

commit 9ae5abc83f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 01:12:44 2024 +1100

    Update gemma.py

commit 3907ea9003
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 01:10:12 2024 +1100

    Update gemma.py

commit c4c8558a90
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 01:04:09 2024 +1100

    Update gemma.py

commit e0e96ef5ce
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 01:03:04 2024 +1100

    Update gemma.py

commit df22d0cb3a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 00:46:58 2024 +1100

    Update gemma.py

commit b9d9aa1b07
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 25 00:44:44 2024 +1100

    Update gemma.py

commit 785760ba0d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 22:43:00 2024 +1100

    Update gemma.py

commit 103b1cdbed
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 22:42:09 2024 +1100

    Update gemma.py

commit b482ce13e9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 22:40:46 2024 +1100

    Update gemma.py

commit 7be57334cb
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 22:39:28 2024 +1100

    Update gemma.py

commit 2242bdf0a3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 22:30:14 2024 +1100

    Update gemma.py

commit a066c348f2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 22:24:43 2024 +1100

    Update llama.py

commit b3d7e61608
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 20:20:37 2024 +1100

    Update gemma.py

commit 5b5652de36
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 20:17:24 2024 +1100

    Update gemma.py

commit 0f5cc839f3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 20:15:25 2024 +1100

    Update gemma.py

commit 6011a490c6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 20:13:24 2024 +1100

    Update gemma.py

commit 6d94cf88a7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 20:11:23 2024 +1100

    Update gemma.py

commit c706447fd4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 20:04:31 2024 +1100

    Update gemma.py

commit ed3f139a9c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 19:55:22 2024 +1100

    Update gemma.py

commit b7ba95857f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 19:47:10 2024 +1100

    Update gemma.py

commit f0b21f9b7b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 19:45:20 2024 +1100

    Update gemma.py

commit 762790d6a9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 19:35:03 2024 +1100

    Update gemma.py

commit b174e54507
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 19:33:04 2024 +1100

    Update gemma.py

commit 47c6feaf89
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 19:12:52 2024 +1100

    Update gemma.py

commit b2b658cbee
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 19:10:36 2024 +1100

    Update gemma.py

commit 407205dd87
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 18:58:31 2024 +1100

    Update gemma.py

commit 82e45c02a5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 18:25:25 2024 +1100

    revert

commit 94201755db
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 18:17:38 2024 +1100

    revert

commit d5e625b674
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 18:04:40 2024 +1100

    Update cross_entropy_loss.py

commit ba19344fb9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 18:03:45 2024 +1100

    Update cross_entropy_loss.py

commit 7916872408
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:48:19 2024 +1100

    Update llama.py

commit 49129bc66c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:43:19 2024 +1100

    Update gemma.py

commit ebbd4756d4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:41:53 2024 +1100

    Update gemma.py

commit 68e280405c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:40:21 2024 +1100

    Update gemma.py

commit 9af090f6ea
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:32:04 2024 +1100

    Update gemma.py

commit 61add47f37
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:31:10 2024 +1100

    Update gemma.py

commit de96285442
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:29:51 2024 +1100

    Update gemma.py

commit 0abbcdc15a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:29:09 2024 +1100

    Update gemma.py

commit 17036a6e54
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:27:49 2024 +1100

    Update llama.py

commit 8b7de591c1
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:26:19 2024 +1100

    Update gemma.py

commit 041aa7d909
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:18:09 2024 +1100

    Update gemma.py

commit 71483e6c8d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:17:26 2024 +1100

    Update gemma.py

commit 8cb12078f9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:15:44 2024 +1100

    Update gemma.py

commit 23e6ebf14d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:13:40 2024 +1100

    Update gemma.py

commit a6752f3f16
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:12:00 2024 +1100

    Update gemma.py

commit 9c565580fb
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:09:03 2024 +1100

    Update gemma.py

commit cd479b4374
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:05:59 2024 +1100

    Update cross_entropy_loss.py

commit 0080357a88
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:03:54 2024 +1100

    Update gemma.py

commit e8de606be9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:02:54 2024 +1100

    Update gemma.py

commit 35adcbf83c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 17:02:14 2024 +1100

    Update gemma.py

commit 7e33aa2520
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 16:59:24 2024 +1100

    Update gemma.py

commit 4c7d21e41b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 16:57:16 2024 +1100

    Update gemma.py

commit 3feae56451
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 16:55:21 2024 +1100

    Update gemma.py

commit c060bad7ce
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 16:47:39 2024 +1100

    Update gemma.py

commit 8ffaf5f109
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 15:44:10 2024 +1100

    Update gemma.py

commit be098122f1
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 15:42:12 2024 +1100

    Update llama.py

commit 8a618c60ed
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 14:29:15 2024 +1100

    pos

commit ef235c3508
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 14:27:39 2024 +1100

    Update gemma.py

commit a48adb0a83
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 14:27:31 2024 +1100

    Update gemma.py

commit 8920cda9a7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 14:13:31 2024 +1100

    position_ids

commit a1eab801fa
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 14:06:48 2024 +1100

    Update gemma.py

commit c07aff5188
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 14:05:18 2024 +1100

    Update gemma.py

commit 4c6e122caa
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 14:02:26 2024 +1100

    norm

commit 5c5bb53241
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 13:59:30 2024 +1100

    Update llama.py

commit 0e1826dd8c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 13:55:37 2024 +1100

    Update llama.py

commit 893b5dfe2b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 13:54:00 2024 +1100

    revert

commit 9ab87f3648
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 13:33:05 2024 +1100

    Update cross_entropy_loss.py

commit d2dc658077
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 13:22:17 2024 +1100

    Update geglu.py

commit 1fcbd61f76
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 03:40:56 2024 +1100

    Update cross_entropy_loss.py

commit fadcb311c3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 03:40:43 2024 +1100

    Update llama.py

commit 1a2a10d028
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 03:38:25 2024 +1100

    Update llama.py

commit 17a1a855e0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 03:33:46 2024 +1100

    CE

commit 097629108f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 03:30:43 2024 +1100

    Update llama.py

commit 30cc4ffd67
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 03:27:21 2024 +1100

    Update llama.py

commit 20227ba5c7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 03:14:29 2024 +1100

    Update llama.py

commit 73a8616f99
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 03:03:00 2024 +1100

    Update llama.py

commit bd4fd22f34
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 02:58:05 2024 +1100

    Update cross_entropy_loss.py

commit 2814f02087
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 02:33:28 2024 +1100

    Update cross_entropy_loss.py

commit dd321363f2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 02:30:49 2024 +1100

    Update cross_entropy_loss.py

commit 574b2f787f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 02:29:33 2024 +1100

    Update cross_entropy_loss.py

commit 6184f770e7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 02:10:38 2024 +1100

    Update cross_entropy_loss.py

commit f5f3d6794e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 02:08:26 2024 +1100

    Update cross_entropy_loss.py

commit 88bf684c61
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 02:06:11 2024 +1100

    Update cross_entropy_loss.py

commit 603c71c7f0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 24 02:02:11 2024 +1100

    Fast CE Loss

commit 25339b71f7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 18:12:15 2024 +1100

    Update fast_lora.py

commit ebfc8f8a55
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 17:57:42 2024 +1100

    Update fast_lora.py

commit 6ab5eb75f4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 17:55:58 2024 +1100

    Update llama.py

commit f999bcd52d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 17:46:33 2024 +1100

    Update llama.py

commit 13d0cee4f6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 17:37:17 2024 +1100

    Update llama.py

commit b66f6dbfd3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 17:28:29 2024 +1100

    Update llama.py

commit 5866b08f71
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 04:11:08 2024 +1100

    gemma

commit dd478be2fa
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 03:59:09 2024 +1100

    Update llama.py

commit 2ff33fb270
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 03:52:24 2024 +1100

    Update llama.py

commit 4c9f366688
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 03:27:42 2024 +1100

    Update cross_entropy_loss.py

commit cfebc8d979
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 03:27:03 2024 +1100

    Update llama.py

commit 4b009aad33
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 03:16:50 2024 +1100

    Update llama.py

commit 6f340f5417
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 02:38:16 2024 +1100

    Update fast_lora.py

commit ff27a824fa
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 02:33:37 2024 +1100

    Update llama.py

commit 5d728270d3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 02:28:05 2024 +1100

    Update llama.py

commit e743e58cb6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 02:23:25 2024 +1100

    Update gemma.py

commit 879bdd2efb
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 02:20:26 2024 +1100

    Update gemma.py

commit 9052a85f92
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 02:19:06 2024 +1100

    Update gemma.py

commit bad295a413
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 02:15:12 2024 +1100

    Update llama.py

commit b30d2502cc
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 02:12:02 2024 +1100

    Update llama.py

commit 11233cb3d4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 02:10:11 2024 +1100

    model_type

commit 76de9c1fe3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 02:05:31 2024 +1100

    FastGemmaModel

commit c0d32de795
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 02:02:39 2024 +1100

    Update fast_lora.py

commit bd2fa264c9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 01:57:34 2024 +1100

    Update mapper.py

commit 9a1b28d691
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 01:55:25 2024 +1100

    Update pyproject.toml

commit 0beaf18908
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 23 01:50:51 2024 +1100

    Gemma

commit 0659d90d95
Merge: 7372768 1b7bf71
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 21 17:49:57 2024 +1100

    Merge branch 'main' into nightly

commit 7372768d14
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 21 03:29:49 2024 +1100

    original

commit 10d9d56434
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 21 03:25:46 2024 +1100

    spaces

commit edd03f66fb
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 21 03:14:27 2024 +1100

    trainer

commit 028ee5ca06
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 21 02:05:33 2024 +1100

    save

commit 917f791ab7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 21 00:53:17 2024 +1100

    Update save.py

commit bf3e10b26e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 21 00:48:18 2024 +1100

    Update save.py

commit d266332141
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 20 23:28:37 2024 +1100

    Update save.py

commit 4d1e575047
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 20 20:01:33 2024 +1100

    Update __init__.py

commit 6aac6c4be8
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 20 19:50:29 2024 +1100

    Update save.py

commit 83d906a2c0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 20 18:35:11 2024 +1100

    Update save.py

commit 5d88ffefd5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 20 18:00:51 2024 +1100

    Update save.py

commit 4c9be6d057
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 20 17:12:18 2024 +1100

    Update save.py

commit d3eac595a1
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 20 17:11:33 2024 +1100

    Update save.py

commit 632705b9b0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 20 17:00:19 2024 +1100

    Update save.py

commit 8f60bf57be
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 20 15:59:38 2024 +1100

    Update save.py

commit 164fa807fe
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 20 04:44:17 2024 +1100

    Update save.py

commit 11e04a7057
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 20 04:42:47 2024 +1100

    Update save.py

commit d2e76580ae
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 20 04:41:40 2024 +1100

    Update save.py

commit 3e3dc37640
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 20 04:41:34 2024 +1100

    Update save.py

commit d9751e6553
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 20 04:35:07 2024 +1100

    Update save.py

commit 44659a05a3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 20 04:13:58 2024 +1100

    Update save.py

commit 34998914ac
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 20 03:35:19 2024 +1100

    saving

commit 3728b8e876
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 20 02:19:36 2024 +1100

    Update save.py

commit c20bb30710
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 20 02:07:01 2024 +1100

    Update save.py

commit 2c207b4989
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 20 00:10:04 2024 +1100

    llama.cpp bugs

commit 0ffd7b46f3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 19 19:39:47 2024 +1100

    linking

commit ad0c2bcd87
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 19 13:08:03 2024 +1100

    Update save.py

commit 0c1b71ac63
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 19 12:51:25 2024 +1100

    Update save.py

commit 14db89f4fc
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 19 04:13:40 2024 +1100

    Update save.py

commit e31071f368
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 19 04:01:27 2024 +1100

    Update save.py

commit aa45208f5a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 19 03:39:09 2024 +1100

    Update save.py

commit 28de27dd8d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 19 03:22:21 2024 +1100

    PeftModel token + saving

commit 99cdf0bd5a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 19 02:45:54 2024 +1100

    Update save.py

commit 71555c7678
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 19 02:37:43 2024 +1100

    Update save.py

commit 89d2418cd7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 23:35:18 2024 +1100

    Update save.py

commit 3e89d4e7ec
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 23:31:24 2024 +1100

    Update save.py

commit 457c044473
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 23:25:36 2024 +1100

    install

commit 22378e95cc
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 23:12:37 2024 +1100

    Update pyproject.toml

commit 3d789cb09e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 22:59:37 2024 +1100

    Update save.py

commit 31a62c52d3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 22:54:38 2024 +1100

    trainer

commit 870edf3534
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 22:52:51 2024 +1100

    Update save.py

commit b6a6e90b7a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 22:03:22 2024 +1100

    Update save.py

commit 02b7b7f3f2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 20:28:56 2024 +1100

    Update save.py

commit 53f6a07b92
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 20:22:52 2024 +1100

    Update save.py

commit 7ee8243e8b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 20:19:28 2024 +1100

    Update save.py

commit 94b5c58c2e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 20:11:51 2024 +1100

    Update save.py

commit c6ad5f97a3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 20:09:11 2024 +1100

    Update loader.py

commit cc2764cddd
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 19:54:34 2024 +1100

    Update save.py

commit 0af3a1b143
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 19:44:59 2024 +1100

    Update save.py

commit ef8abf4c6e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 19:25:56 2024 +1100

    apache

commit 164b950cad
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 19:24:35 2024 +1100

    spaces

commit 7aedc92637
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 19:21:02 2024 +1100

    slashes

commit 7573ee2c22
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 19:15:41 2024 +1100

    slash

commit e51a381305
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 19:08:04 2024 +1100

    globals

commit c660879cdc
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 18:30:03 2024 +1100

    spaces

commit dcbbab3c22
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 18:25:19 2024 +1100

    spaces

commit 5edae1cd79
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 18:21:54 2024 +1100

    readme

commit 22f1f52513
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 18:18:12 2024 +1100

    Update llama.py

commit 2e65e63678
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 18:09:38 2024 +1100

    saving bugs

commit c9a524a99a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 18 04:15:20 2024 +1100

    Bugs

commit 6d3a3b4286
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 15 19:24:07 2024 +1100

    Fix RoPE precision issues

commit 84b37f8548
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 15 03:53:58 2024 +1100

    Update mapper.py

commit bd4a701e7a
Merge: 629c39d 0439b85
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 15 03:35:47 2024 +1100

    Merge branch 'main' into nightly

commit 629c39d2e5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 14 23:17:38 2024 +1100

    Update mistral.py

commit 227c26ca44
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 14 23:15:45 2024 +1100

    Update llama.py

commit b660132380
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 14 23:11:06 2024 +1100

    Saving, LlamaRotaryEmbedding issues

commit 91a3c43468
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 14 17:58:41 2024 +1100

    Update chat_templates.py

commit efbb1e6049
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 14 17:56:14 2024 +1100

    patch tokenizer

commit 5f5910ffee
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 14 17:45:02 2024 +1100

    Update chat_templates.py

commit d40a12852e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 14 17:31:33 2024 +1100

    Update chat_templates.py

commit 7c713cb58c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 14 17:27:03 2024 +1100

    Update chat_templates.py

commit b28a383d1c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 14 04:20:41 2024 +1100

    Update chat_templates.py

commit 4f20e20e28
Merge: 2cdf43d 474fd32
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 14 03:25:40 2024 +1100

    Merge branch 'main' into nightly

commit 2cdf43d8b7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 12 04:28:41 2024 +1100

    Chat Templates

commit acd635aa0d
Merge: b7c5296 3d5cf37
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 11 16:42:37 2024 +1100

    Merge branch 'main' into nightly

commit b7c52963ad
Merge: 868fb27 99b8d23
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 9 03:52:36 2024 +1100

    Merge branch 'main' into nightly

commit 868fb27e11
Merge: 601dc9e b7f24e8
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 8 16:50:19 2024 +1100

    Merge branch 'main' into nightly

commit 601dc9ec4b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 8 03:39:45 2024 +1100

    Update llama.py

commit 31de486f1c
Merge: 81128a4 25cfc7f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 8 03:39:19 2024 +1100

    revert

commit 81128a4504
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 8 03:00:53 2024 +1100

    Update llama.py

commit 998097394a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 8 02:57:09 2024 +1100

    Update llama.py

commit 277ca9eecf
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 8 02:52:43 2024 +1100

    Update llama.py

commit d6ab9c92d7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 8 02:49:26 2024 +1100

    Update llama.py

commit e094914b0e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 8 02:30:49 2024 +1100

    Update llama.py

commit 9a54a6f05e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 7 20:15:19 2024 +1100

    Update llama.py

commit e94647c2dc
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 7 19:25:46 2024 +1100

    Update llama.py

commit e8593b7103
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 7 18:25:08 2024 +1100

    Update llama.py

commit 1065936ecb
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 7 18:21:36 2024 +1100

    Update llama.py

commit 60b47f6130
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 7 18:18:43 2024 +1100

    Update llama.py

commit e43e819a57
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 7 18:02:58 2024 +1100

    Update llama.py

commit 1e77ab25b6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 7 17:43:51 2024 +1100

    Update llama.py

commit b36e0bfc89
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 7 17:34:59 2024 +1100

    Update llama.py

commit ab0a3c9976
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 7 04:40:19 2024 +1100

    Update mistral.py

commit 2b346dc498
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 7 03:45:07 2024 +1100

    Update save.py

commit 17a6b12ee3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 7 03:41:32 2024 +1100

    Update save.py

commit 7da7afcc78
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 7 03:20:55 2024 +1100

    __version__

commit 9e2b00e167
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 7 03:19:33 2024 +1100

    __version__

commit 9c3849fc1d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 7 03:18:02 2024 +1100

    Update pyproject.toml

commit bfb3ea7179
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 7 02:57:54 2024 +1100

    Update save.py

commit 213cfee903
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 7 02:57:24 2024 +1100

    Update save.py

commit 8b52dc027e
Merge: 8e9d9c3 bb66faa
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 7 02:50:35 2024 +1100

    Merge branch 'main' into nightly

commit 8e9d9c38b1
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Feb 7 01:48:31 2024 +1100

    SWA inference

commit d0b1144423
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 6 18:53:32 2024 +1100

    Fix llm_int8_skip_modules

commit 39a2a7c57d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 6 18:50:48 2024 +1100

    Fix SWA inference

commit aa0427f898
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 6 02:11:06 2024 +1100

    Update save.py

commit 262289d5a5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 6 01:54:22 2024 +1100

    Update save.py

commit 53ca91f744
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Feb 6 01:30:48 2024 +1100

    Update save.py

commit e487abd94a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 5 19:07:00 2024 +1100

    Update save.py

commit 797a87a3e3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 5 18:53:27 2024 +1100

    Update save.py

commit e9031ceabe
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 5 18:48:40 2024 +1100

    mistral swa

commit 33192d6dc8
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 5 18:05:25 2024 +1100

    Update save.py

commit 03ed3a83d5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 5 17:38:01 2024 +1100

    Torch 2.2.0

commit a50daa19d8
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 5 17:32:43 2024 +1100

    Update save.py

commit d69cef9f23
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 5 17:16:20 2024 +1100

    Update save.py

commit 1750b13a63
Merge: 63ed23a efa0d23
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Feb 5 02:29:56 2024 +1100

    Merge branch 'main' into nightly

commit 63ed23ae98
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 14:02:26 2024 +1100

    Update utils.py

commit 990068b977
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 13:50:21 2024 +1100

    Update utils.py

commit 6ab4019e4a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 04:07:47 2024 +1100

    Update llama.py

commit 7ab3426608
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 03:57:29 2024 +1100

    Update llama.py

commit 201d90c3ba
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 03:54:12 2024 +1100

    Update llama.py

commit 00242d5f89
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 03:52:08 2024 +1100

    Update llama.py

commit 3bfb0ebf9d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 03:50:31 2024 +1100

    Update llama.py

commit 71899d7060
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 03:48:18 2024 +1100

    Update llama.py

commit 9cd2517949
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 03:46:30 2024 +1100

    Update llama.py

commit 24431b4347
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 03:45:25 2024 +1100

    Update llama.py

commit 63816fc119
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 03:42:41 2024 +1100

    Update llama.py

commit 7166b11599
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 03:41:15 2024 +1100

    Update llama.py

commit d867b9bbdf
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 03:27:15 2024 +1100

    Update llama.py

commit 9a5ebef148
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 03:23:26 2024 +1100

    Update llama.py

commit 65270cec60
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 03:13:42 2024 +1100

    Update llama.py

commit 80fa8e93c9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 03:12:07 2024 +1100

    Update llama.py

commit 36b400e598
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 03:08:49 2024 +1100

    Update llama.py

commit 4a5d3b1de7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 03:05:49 2024 +1100

    Update llama.py

commit 88a695da3a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 02:49:51 2024 +1100

    Update llama.py

commit ac9bc79251
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 02:42:30 2024 +1100

    Update llama.py

commit 8c8685eeef
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 02:40:30 2024 +1100

    Update llama.py

commit 607dfa1d0e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 02:35:17 2024 +1100

    Update llama.py

commit 54802ecbb9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 02:32:21 2024 +1100

    New version

commit 711e5c0922
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 02:19:42 2024 +1100

    attention_mask

commit 74d7fc65c6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 02:03:23 2024 +1100

    SDPA

commit fcb884643b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 02:02:51 2024 +1100

    Update llama.py

commit aa032fc80d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 01:46:04 2024 +1100

    Update mistral.py

commit 31578f2010
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Feb 4 01:42:46 2024 +1100

    fast inference

commit 257cd7d531
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 23:22:21 2024 +1100

    Update llama.py

commit e500b785f7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 23:12:16 2024 +1100

    Update llama.py

commit cf0fae9a55
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 23:01:56 2024 +1100

    Update llama.py

commit 381b991c45
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 22:42:11 2024 +1100

    Update llama.py

commit 665908eb70
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 22:28:20 2024 +1100

    Update llama.py

commit 5534f8a8f1
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 20:22:42 2024 +1100

    more temp matrices

commit 68db1c7af7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 19:44:36 2024 +1100

    fast inference again

commit d76f583349
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 19:30:22 2024 +1100

    Update mistral.py

commit 9225dd6708
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 19:19:13 2024 +1100

    Update llama.py

commit 8270821269
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 19:08:42 2024 +1100

    Update llama.py

commit 1c6e1f18b4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 18:32:18 2024 +1100

    Update llama.py

commit dd03abedd6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 18:20:06 2024 +1100

    Update llama.py

commit 522f6dbfd2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 18:07:55 2024 +1100

    Update llama.py

commit ad357dec89
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 03:52:25 2024 +1100

    fast inference + saving config.json

commit a78d6fba7e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 03:34:02 2024 +1100

    Update llama.py

commit fa3d23406e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 03:10:39 2024 +1100

    Update utils.py

commit 8e3f0296be
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 02:55:22 2024 +1100

    Update utils.py

commit 404e177adb
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 02:30:47 2024 +1100

    Update utils.py

commit a81d193830
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 02:20:00 2024 +1100

    Update utils.py

commit 4436a1099d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 01:50:48 2024 +1100

    Update llama.py

commit a15ffc95e0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 01:48:03 2024 +1100

    Update llama.py

commit 0be46406d8
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 01:39:39 2024 +1100

    Update llama.py

commit e911047188
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 01:35:45 2024 +1100

    Update llama.py

commit c934c16e21
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 01:29:50 2024 +1100

    Update llama.py

commit 705bbba5c5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Feb 3 01:22:53 2024 +1100

    Update llama.py

commit 9c2bed35b9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 23:45:02 2024 +1100

    Update llama.py

commit ea9b4eea0c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 23:40:52 2024 +1100

    Update llama.py

commit c6ad936f88
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 23:32:58 2024 +1100

    Update llama.py

commit 03df291110
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 23:32:36 2024 +1100

    Update llama.py

commit dc2740416b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 22:26:47 2024 +1100

    Update llama.py

commit b33c92d1bd
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 22:22:42 2024 +1100

    Update llama.py

commit 923c6bad14
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 22:20:45 2024 +1100

    Update llama.py

commit b3703926ca
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 22:17:02 2024 +1100

    Update llama.py

commit 1b274212e1
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 20:35:00 2024 +1100

    Update llama.py

commit 9df19aeb9f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 20:32:21 2024 +1100

    past_key_values

commit 82eea75cde
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 20:29:35 2024 +1100

    torch compile

commit e748078095
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 20:24:44 2024 +1100

    Update llama.py

commit 9146fa443d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 20:24:28 2024 +1100

    Update llama.py

commit a5a123db08
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 20:20:14 2024 +1100

    Update llama.py

commit 82ead808d5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 20:19:15 2024 +1100

    Update llama.py

commit 1497521e5c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 20:06:52 2024 +1100

    Update mistral.py

commit c3c454a729
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 20:06:31 2024 +1100

    Update llama.py

commit f299c9cde1
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 19:44:52 2024 +1100

    Update llama.py

commit 0e1d67d4e1
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 19:07:41 2024 +1100

    fast inference

commit 168ded977e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 18:28:05 2024 +1100

    Update llama.py

commit bf441055ca
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 18:14:31 2024 +1100

    faster inference

commit 7c2b04254c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 16:59:02 2024 +1100

    Update llama.py

commit ca99d7c194
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 16:46:30 2024 +1100

    Update mistral.py

commit 40e8848c4e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 15:56:56 2024 +1100

    Update llama.py

commit 6cc9835021
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 13:52:12 2024 +1100

    Update llama.py

commit 7ad1a1fa62
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 04:18:34 2024 +1100

    Update llama.py

commit 0b661a23b9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Feb 2 03:55:36 2024 +1100

    Update llama.py

commit da64d3403c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 23:54:01 2024 +1100

    Update llama.py

commit abc47836dc
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 23:41:32 2024 +1100

    Update llama.py

commit f231c4f395
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 23:41:23 2024 +1100

    Update llama.py

commit 5f3c51b394
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 23:41:11 2024 +1100

    Update llama.py

commit e0ea238256
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 23:40:36 2024 +1100

    Update llama.py

commit 73f63d6884
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 23:38:07 2024 +1100

    Update llama.py

commit e4b5e38800
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 23:17:53 2024 +1100

    inference

commit 334c5ed1f0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 22:48:54 2024 +1100

    Update llama.py

commit cd39f6108f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 22:34:33 2024 +1100

    lm_head

commit 24c4c37b7c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 22:21:35 2024 +1100

    revert

commit e791db9b06
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 20:12:59 2024 +1100

    Update llama.py

commit 1793a16c8f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 20:03:54 2024 +1100

    faster inference

commit 19fb50e244
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 19:56:54 2024 +1100

    Update utils.py

commit b83cea7cb4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 19:56:06 2024 +1100

    Update llama.py

commit 8920cafbe7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 19:46:38 2024 +1100

    inference

commit 38b59825b1
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 19:36:36 2024 +1100

    Update llama.py

commit e8ec80a4c2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 19:19:44 2024 +1100

    Update llama.py

commit a2cb7a113b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 19:07:30 2024 +1100

    Update llama.py

commit a5ee70b63f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 18:34:43 2024 +1100

    Update llama.py

commit e2f0dd8683
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 18:16:46 2024 +1100

    Update llama.py

commit 3f0ddf0c7c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 18:16:15 2024 +1100

    Update llama.py

commit e90b3bf192
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 17:59:21 2024 +1100

    Update llama.py

commit 57044509ad
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 17:44:22 2024 +1100

    Update llama.py

commit cf4b58eeb6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 17:20:02 2024 +1100

    inference

commit 648c79ec1b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 03:47:51 2024 +1100

    faster inference

commit 20f19391e6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 03:05:02 2024 +1100

    Update mistral.py

commit e2f72fe52f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 03:04:46 2024 +1100

    Revert

commit 329f80ac4c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 02:47:57 2024 +1100

    Update llama.py

commit 713a95ca0e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Feb 1 02:46:26 2024 +1100

    Inference

commit acbdef7ff5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Jan 31 20:16:08 2024 +1100

    padding

commit 0dc26ed98a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Jan 31 20:06:28 2024 +1100

    Update llama.py

commit 9f31254539
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Jan 31 20:02:21 2024 +1100

    Fix SDPA

commit 7227de48cf
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Jan 31 19:48:33 2024 +1100

    Update llama.py

commit 5edad55b5d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Jan 31 19:44:19 2024 +1100

    Update llama.py

commit c928c57aa9
Merge: 5da0555 2f55935
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Jan 31 19:38:27 2024 +1100

    Merge branch 'main' into nightly

commit 5da05558a0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Jan 31 03:50:47 2024 +1100

    past_key_value

commit d347db0944
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Jan 31 03:46:11 2024 +1100

    Update loader.py

commit c0edaa46db
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Jan 31 03:40:55 2024 +1100

    revert inference

commit 55fe6052ca
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Jan 31 03:36:40 2024 +1100

    if past_key_value is not None and q_len == 1:

commit 248887b51c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Jan 31 03:35:24 2024 +1100

    LlamaAttention_fast_forward_inference

commit c68a3bc9ec
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Jan 31 03:28:30 2024 +1100

    Update loader.py

commit 44168377f3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Jan 31 01:32:56 2024 +1100

    Update rope_embedding.py

commit 270df81b60
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 30 17:33:08 2024 +1100

    Remove fast path

commit b8f665bf22
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 30 17:26:45 2024 +1100

    fast lm_head

commit ed5a653ecf
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 30 04:16:39 2024 +1100

    Update mistral.py

commit e0bad0eec5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 30 04:10:14 2024 +1100

    Fix inference

commit 71725aeea5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 30 03:23:42 2024 +1100

    Update __init__.py

commit 3ddda6f492
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 30 03:08:46 2024 +1100

    Update mistral.py

commit 6f74c98fbc
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 23:19:43 2024 +1100

    Update utils.py

commit a7bfeec919
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 23:14:51 2024 +1100

    Update utils.py

commit 7c87d60bc1
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 23:07:40 2024 +1100

    Update utils.py

commit bb364204cc
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 20:09:59 2024 +1100

    Update llama.py

commit 4700d51916
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 19:55:40 2024 +1100

    Fast inference repatch

commit 58cabcbc06
Merge: 01b8162 a3a2ad9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 17:59:34 2024 +1100

    Merge branch 'main' into nightly

commit 01b8162244
Merge: 5cfea20 90309ca
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 17:49:57 2024 +1100

    Merge branch 'main' into nightly

commit 5cfea20129
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 17:35:19 2024 +1100

    Update llama.py

commit 25a88ea003
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 16:52:04 2024 +1100

    Update llama.py

commit 03ca52dc94
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 03:43:39 2024 +1100

    saving

commit 01e5c305f9
Merge: e10e488 a16bc73
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 03:43:19 2024 +1100

    Merge branch 'main' into nightly

commit e10e48893b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 03:43:17 2024 +1100

    Update save.py

commit 5bd916b91b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 03:43:05 2024 +1100

    Update mistral.py

commit 11ba2c520b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 03:34:27 2024 +1100

    Mistral patch

commit 498dfb8acc
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 02:42:08 2024 +1100

    print

commit a3d2b9b778
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 02:15:52 2024 +1100

    Update save.py

commit 9e00cc287a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 00:59:29 2024 +1100

    Update save.py

commit 460de24ea2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 00:57:11 2024 +1100

    Update save.py

commit b060d7b621
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 00:56:44 2024 +1100

    Update save.py

commit 74b69775c2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 00:51:57 2024 +1100

    Update save.py

commit fef0589d6e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 00:51:45 2024 +1100

    Update save.py

commit 9dec4b3f08
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 29 00:49:31 2024 +1100

    Update save.py

commit ee0bf6fe66
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 20:07:17 2024 +1100

    Update save.py

commit c69c166b82
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 18:12:11 2024 +1100

    patch_saving_functions

commit ac02ba6c38
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 18:08:52 2024 +1100

    Update save.py

commit 20e524a596
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 18:04:35 2024 +1100

    Update save.py

commit 788e695180
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 17:48:34 2024 +1100

    Patch saving

commit 31222ced74
Merge: 893aab0 af33224
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 17:48:28 2024 +1100

    Merge branch 'main' into nightly

commit 893aab0e57
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 17:20:49 2024 +1100

    Update dpo.py

commit d5c852e711
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 16:47:33 2024 +1100

    Update llama.py

commit ddb48efd33
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 16:43:10 2024 +1100

    Update llama.py

commit 5fe166d32e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 16:40:59 2024 +1100

    Update llama.py

commit 2d64e0a904
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 16:37:21 2024 +1100

    Update mistral.py

commit 2f73cb4049
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 16:31:01 2024 +1100

    Update llama.py

commit 6aa46ffff6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 15:13:54 2024 +1100

    Update llama.py

commit ee6f5096ed
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 13:53:06 2024 +1100

    attention mask

commit f1b0fd0848
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 13:37:38 2024 +1100

    Update mistral.py

commit 7663a32753
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 04:29:52 2024 +1100

    Update save.py

commit a158003625
Merge: 36829b7 e2bbd38
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 04:29:35 2024 +1100

    Merge branch 'nightly' of https://github.com/unslothai/unsloth into nightly

commit 36829b7ec4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 04:27:10 2024 +1100

    Update save.py

commit 917ce15861
Merge: 166f8c8 a81aff2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 04:20:15 2024 +1100

    Merge branch 'main' into nightly

commit 166f8c812e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 04:18:19 2024 +1100

    attention mask

commit 6c7f0dbcb4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 04:08:19 2024 +1100

    Update llama.py

commit c836ed7f3d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 04:08:03 2024 +1100

    Update mistral.py

commit 12d57e5308
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 03:59:23 2024 +1100

    labels

commit 4c5ebcc960
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 03:56:54 2024 +1100

    Update llama.py

commit 6a027a8292
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 03:55:23 2024 +1100

    Update llama.py

commit 9f9739cbac
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 03:49:54 2024 +1100

    attention_mask

commit 2bd77e7277
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 02:34:03 2024 +1100

    Update fast_lora.py

commit a1e5aca7cc
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 01:27:46 2024 +1100

    Update fast_lora.py

commit a2f705d65b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 01:17:27 2024 +1100

    Update fast_lora.py

commit d01ba458df
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 28 00:45:55 2024 +1100

    Update fast_lora.py

commit 6fa0635971
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 23:20:41 2024 +1100

    Update fast_lora.py

commit e094af81a1
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 20:39:19 2024 +1100

    Update fast_lora.py

commit c74e1af85c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 20:37:43 2024 +1100

    Update fast_lora.py

commit 363ffba1c2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 20:17:08 2024 +1100

    Update fast_lora.py

commit 510c85f412
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 20:04:55 2024 +1100

    Update swiglu.py

commit 35daafdd6e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 19:53:57 2024 +1100

    Update fast_lora.py

commit 6201f7681f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 19:38:28 2024 +1100

    Update fast_lora.py

commit 86a1c9788b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 19:29:53 2024 +1100

    Update fast_lora.py

commit 8e0e4ccdc6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 19:18:59 2024 +1100

    Update fast_lora.py

commit 2e4c59deaf
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 19:05:02 2024 +1100

    Update swiglu.py

commit 85e87d9ba2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 19:03:04 2024 +1100

    Swiglu

commit 83b6937285
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 18:21:29 2024 +1100

    Update fast_lora.py

commit 3d3e7f5b42
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 18:15:34 2024 +1100

    Update fast_lora.py

commit d3f3b6fc49
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 18:00:41 2024 +1100

    Update fast_lora.py

commit e77d7c069f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 18:00:22 2024 +1100

    Update fast_lora.py

commit f7d11d10f8
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 17:47:54 2024 +1100

    Update fast_lora.py

commit 8ed03f5f45
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 17:38:30 2024 +1100

    Update pyproject.toml

commit af65cb0d3d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:50:47 2024 +1100

    Works?

commit fb5333726a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:49:36 2024 +1100

    Update llama.py

commit a59ec7903c
Merge: 704e36a 7da0c50
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:48:40 2024 +1100

    Merge branch 'main' into nightly

commit 704e36a64e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:48:34 2024 +1100

    Revert "Update llama.py"

    This reverts commit a208ec46e0.

commit a208ec46e0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:48:03 2024 +1100

    Update llama.py

commit bd2ff90817
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:47:42 2024 +1100

    Update llama.py

commit a3d892a15b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:47:01 2024 +1100

    Update llama.py

commit b89599a3f7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:46:47 2024 +1100

    Update llama.py

commit baeea64917
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:20:58 2024 +1100

    Update save.py

commit ecdbb28dcb
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:19:59 2024 +1100

    Update save.py

commit 393341f211
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:18:25 2024 +1100

    Update swiglu.py

commit 47babc780a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:16:38 2024 +1100

    Update fast_lora.py

commit 9edc309f59
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:15:37 2024 +1100

    Update llama.py

commit a27ac6175e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:03:47 2024 +1100

    Update utils.py

commit 7fb64e0e31
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 03:58:21 2024 +1100

    Update fast_lora.py

commit e3bd0bb74f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 03:51:53 2024 +1100

    Update save.py

commit ba847f5411
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 03:38:33 2024 +1100

    Update fast_lora.py

commit c57495df6f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 03:38:19 2024 +1100

    Update fast_lora.py

commit d379bb8cf5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 03:36:27 2024 +1100

    Update fast_lora.py

commit 421ed33b42
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 03:23:50 2024 +1100

    Update fast_lora.py

commit 83ceb11cc9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 03:12:53 2024 +1100

    Update fast_lora.py

commit f3da0d2e4a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 02:52:54 2024 +1100

    Update fast_lora.py

commit 4ae3ad3bce
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 02:34:21 2024 +1100

    Update fast_lora.py

commit c74d3dd3b9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 02:22:15 2024 +1100

    Update fast_lora.py

commit 0a1aa98d3c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 02:19:00 2024 +1100

    Update fast_lora.py

commit f7760719f0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 01:28:03 2024 +1100

    Update fast_lora.py

commit 38bff800b6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 27 01:16:31 2024 +1100

    Update fast_lora.py

commit 485d54fd70
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 23:48:17 2024 +1100

    Update fast_lora.py

commit ce08a15cee
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 23:33:17 2024 +1100

    Update fast_lora.py

commit b4492eb29c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 23:14:44 2024 +1100

    Update fast_lora.py

commit 3865c8cd68
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 23:06:06 2024 +1100

    Update fast_lora.py

commit e9e34f1d4d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 22:29:38 2024 +1100

    Update fast_lora.py

commit d96f665510
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 22:21:54 2024 +1100

    Update swiglu.py

commit f8aa20d2db
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 22:09:32 2024 +1100

    Update fast_lora.py

commit ae0d9380c1
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 22:07:10 2024 +1100

    Update swiglu.py

commit 694da2dba3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 20:06:52 2024 +1100

    Update fast_lora.py

commit 4f573494da
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 20:04:51 2024 +1100

    Update fast_lora.py

commit cb06ce849b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 19:41:48 2024 +1100

    Update fast_lora.py

commit e0a36b356b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 19:30:18 2024 +1100

    Update fast_lora.py

commit c4f0de58a9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 19:21:00 2024 +1100

    Update fast_lora.py

commit a0a409d5af
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 19:20:53 2024 +1100

    Update fast_lora.py

commit 97658f9a74
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 19:12:08 2024 +1100

    Update fast_lora.py

commit 979de52220
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 19:00:22 2024 +1100

    Update fast_lora.py

commit 39d251757f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 18:19:36 2024 +1100

    Update fast_lora.py

commit 77365cc740
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 18:11:43 2024 +1100

    Update llama.py

commit 4bfbccec4b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 18:04:59 2024 +1100

    Update fast_lora.py

commit 56dffca23f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 17:44:10 2024 +1100

    Update llama.py

commit 259097bc61
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 17:36:40 2024 +1100

    Update fast_lora.py

commit b8de6b6a9b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 17:25:36 2024 +1100

    Update fast_lora.py

commit 796aa4d0ef
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 17:13:09 2024 +1100

    Update fast_lora.py

commit 9ecb3bc869
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 17:04:33 2024 +1100

    Update fast_lora.py

commit 7bed11a8ba
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 16:43:50 2024 +1100

    Update fast_lora.py

commit 317bbc57fe
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 16:16:32 2024 +1100

    Update fast_lora.py

commit c4ac728d4c
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 16:08:10 2024 +1100

    Update fast_lora.py

commit ab44945336
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 14:06:25 2024 +1100

    Update fast_lora.py

commit 84772de40f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 13:59:57 2024 +1100

    Update fast_lora.py

commit 653f1036ae
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 03:49:48 2024 +1100

    Update fast_lora.py

commit 46ec8bbc3d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 03:15:23 2024 +1100

    Repatch

commit 99eeebf72a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 02:41:15 2024 +1100

    Update swiglu.py

commit ae156d9154
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 02:07:35 2024 +1100

    Update llama.py

commit 29945bd8c1
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 26 01:33:23 2024 +1100

    Update llama.py

commit 194093e375
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Jan 25 23:35:56 2024 +1100

    remove patching

commit 4f60055f9e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Jan 25 23:25:17 2024 +1100

    Update fast_lora.py

commit 2d25facc7b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Jan 25 23:14:37 2024 +1100

    Update fast_lora.py

commit 380f2fd6e4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Jan 25 19:05:43 2024 +1100

    Fix saving and bnb-4bit

commit 0474451a6a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Jan 25 03:55:26 2024 +1100

    Update pyproject.toml

commit 5a5d34b8ae
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Jan 25 03:47:23 2024 +1100

    Update mapper.py

commit bbf5ef65af
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Jan 25 03:46:23 2024 +1100

    Graceful FA2 error + torch 2.1.1

commit 5ea888c578
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Thu Jan 25 02:29:37 2024 +1100

    Update to transformers 4.37

commit d2f1521a49
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 23:23:05 2024 +1100

    incorrect inference

commit f186fe9bc9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 19:44:45 2024 +1100

    Update mistral.py

commit e184b06c4b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 19:28:17 2024 +1100

    Update mistral.py

commit b314500993
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 19:23:58 2024 +1100

    q_len issue

commit d6e85d7731
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 19:17:42 2024 +1100

    q_len == 1

commit 5d9e68181e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 19:11:17 2024 +1100

    hidden_states

commit f311a8efcc
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 18:04:20 2024 +1100

    Update llama.py

commit eeee6333ec
Merge: 20d8f22 04f8771
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 18:04:10 2024 +1100

    Merge branch 'main' into nightly

commit 20d8f223d7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 03:47:28 2024 +1100

    Fast LoRA saving

commit 1bb1c3c2b9
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 03:36:50 2024 +1100

    LoRA

commit 24c7a67556
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 02:44:52 2024 +1100

    Update llama.py

commit d87ef86991
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 02:22:37 2024 +1100

    Update llama.py

commit a77c448939
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 02:18:37 2024 +1100

    Update llama.py

commit 5260ec2a0a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 02:13:01 2024 +1100

    Update llama.py

commit 716e03fe1b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 02:09:35 2024 +1100

    Update llama.py

commit 00f50876f7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 02:03:25 2024 +1100

    Update llama.py

commit e5b5333137
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 02:00:10 2024 +1100

    Update llama.py

commit 9a5062e6c4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 01:44:10 2024 +1100

    Update llama.py

commit 1289ae825b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 01:41:20 2024 +1100

    RoPE

commit 7e4140ebfc
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 01:37:36 2024 +1100

    Update llama.py

commit 1ba28d8e26
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 01:33:51 2024 +1100

    Update llama.py

commit 3e1b244d5e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 01:29:03 2024 +1100

    Fast inference RoPE

commit 8647f0e86e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 01:06:34 2024 +1100

    Update llama.py

commit 085a8e944a
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 23 00:32:57 2024 +1100

    inference

commit f41a437540
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 22 22:34:33 2024 +1100

    Update utils.py

commit 31f0c9c08d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 22 22:31:38 2024 +1100

    Update utils.py

commit 4220f6a6dc
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 22 22:26:55 2024 +1100

    No print

commit e1cbc9e423
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 22 22:21:29 2024 +1100

    Update utils.py

commit 050c61bc0b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 22 20:04:27 2024 +1100

    Update utils.py

commit 7c3b647ba7
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 22 20:01:59 2024 +1100

    fast_linear_forward

commit 8c613e851f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 22 19:36:17 2024 +1100

    Apache 2

commit a18f982e67
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 22 19:31:34 2024 +1100

    Max sequence lengths

commit e38485feaa
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 22 18:18:23 2024 +1100

    Mistral correct RoPE scaling

commit 278de9f375
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 22 04:33:44 2024 +1100

    Update llama.py

commit 0666589ace
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 22 04:28:24 2024 +1100

    Update save.py

commit e34393f975
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 22 04:25:37 2024 +1100

    Update llama.py

commit 2a3d4f3a8d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 22 04:18:42 2024 +1100

    fast inference

commit a5ab4dc21a
Merge: 8828ece 3a9b2de
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Mon Jan 22 01:58:00 2024 +1100

    Merge branch 'main' into nightly

commit 8828eceb5b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 22:18:53 2024 +1100

    Update llama.py

commit fcde58859b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 20:06:14 2024 +1100

    Update llama.py

commit 2f80890578
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 19:21:56 2024 +1100

    Update llama.py

commit 7be801ff8d
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 19:10:48 2024 +1100

    Update llama.py

commit 92512e670e
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 19:01:45 2024 +1100

    Update llama.py

commit 5c927e4930
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 19:01:28 2024 +1100

    Update mistral.py

commit fe2bc30987
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 18:52:25 2024 +1100

    Update llama.py

commit ac99a47a45
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 18:52:15 2024 +1100

    Update llama.py

commit 5bf108ebda
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 18:15:55 2024 +1100

    Update llama.py

commit b591f33a37
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 18:06:44 2024 +1100

    Update llama.py

commit da7b4f59ee
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 17:32:37 2024 +1100

    Update save.py

commit 196ab974d6
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 16:53:58 2024 +1100

    Update llama.py

commit cbc1c69e29
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 16:22:28 2024 +1100

    faster saving & inference
2024-03-17 22:21:36 +11:00
Daniel Han
64d847bede Bug fixes (#257)
* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update pyproject.toml

* Small fixes

* Update pyproject.toml

* Approx gelu

* Update geglu.py

* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py

* Update gemma.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Fix Gemma merging

* Update rms_layernorm.py

* Update gemma.py

* Update pyproject.toml

* Layernorms

* Gemma precision

* Update gemma.py

* sqrt

* Update gemma.py

* Update save.py

* RoPE and Gemma precision

* Update rms_layernorm.py

* Fix warning

* Update chat_templates.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py

* Update save.py

* GGUF incorrect

* Update save.py

* Update pyproject.toml

* kaggle new

* Update pyproject.toml

* Update pyproject.toml

* upcasting

* Fix Colab

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update rope_embedding.py

* Update rope_embedding.py

* Fix bugs

* Update fast_lora.py

* Update fast_lora.py

* Update README.md

* Update README.md

* GGUF

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update README.md

* Update README.md

* Bugs

* Update fast_lora.py

* Update pyproject.toml

* Update fast_lora.py
2024-03-17 22:09:50 +11:00
Daniel Han
c599ae0f27 Bug fixes (#249)
* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update pyproject.toml

* Small fixes

* Update pyproject.toml

* Approx gelu

* Update geglu.py

* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py

* Update gemma.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Fix Gemma merging

* Update rms_layernorm.py

* Update gemma.py

* Update pyproject.toml

* Layernorms

* Gemma precision

* Update gemma.py

* sqrt

* Update gemma.py

* Update save.py

* RoPE and Gemma precision

* Update rms_layernorm.py

* Fix warning

* Update chat_templates.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py

* Update save.py

* GGUF incorrect

* Update save.py

* Update pyproject.toml

* kaggle new

* Update pyproject.toml

* Update pyproject.toml

* upcasting

* Fix Colab

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update rope_embedding.py

* Update rope_embedding.py

* Fix bugs

* Update fast_lora.py

* Update fast_lora.py

* Update README.md

* Update README.md

* GGUF

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update README.md

* Update README.md
2024-03-17 02:47:05 +11:00
Qubitium
39713e66ed Fix single gpu limit code overriding the wrong cuda gpu id via env (#228) 2024-03-16 00:12:16 +11:00
HuyNguyen-hust
e29a630cd3 10% faster RoPE embedding from HuyNguyen-hust (#238) 2024-03-16 00:09:59 +11:00
Daniel Han
990c7a809c Gemma GGUF chat templates work! (#246)
* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update pyproject.toml

* Small fixes

* Update pyproject.toml

* Approx gelu

* Update geglu.py

* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py

* Update gemma.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Fix Gemma merging

* Update rms_layernorm.py

* Update gemma.py

* Update pyproject.toml

* Layernorms

* Gemma precision

* Update gemma.py

* sqrt

* Update gemma.py

* Update save.py

* RoPE and Gemma precision

* Update rms_layernorm.py

* Fix warning

* Update chat_templates.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py

* Update save.py

* GGUF incorrect

* Update save.py

* Update pyproject.toml

* kaggle new

* Update pyproject.toml

* Update pyproject.toml

* upcasting

* Fix Colab

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py
2024-03-15 05:09:45 +11:00
Daniel Han
2c5c5bb4bb Fix Gemma GGUF (#234)
* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update pyproject.toml

* Small fixes

* Update pyproject.toml

* Approx gelu

* Update geglu.py

* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py

* Update gemma.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Fix Gemma merging

* Update rms_layernorm.py

* Update gemma.py

* Update pyproject.toml

* Layernorms

* Gemma precision

* Update gemma.py

* sqrt

* Update gemma.py

* Update save.py

* RoPE and Gemma precision

* Update rms_layernorm.py

* Fix warning

* Update chat_templates.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py

* Update save.py

* GGUF incorrect

* Update save.py

* Update pyproject.toml

* kaggle new

* Update pyproject.toml

* Update pyproject.toml

* upcasting

* Fix Colab

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml
2024-03-14 20:32:04 +11:00
Daniel Han
32223779c4 Fix more bugs (#232)
* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update pyproject.toml

* Small fixes

* Update pyproject.toml

* Approx gelu

* Update geglu.py

* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py

* Update gemma.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Fix Gemma merging

* Update rms_layernorm.py

* Update gemma.py

* Update pyproject.toml

* Layernorms

* Gemma precision

* Update gemma.py

* sqrt

* Update gemma.py

* Update save.py

* RoPE and Gemma precision

* Update rms_layernorm.py

* Fix warning

* Update chat_templates.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Account for DoRA

* Update llama.py
2024-03-11 04:31:03 +11:00
Daniel Han
8bea94c137 Saving fixes (#231)
* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update pyproject.toml

* Small fixes

* Update pyproject.toml

* Approx gelu

* Update geglu.py

* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py

* Update gemma.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Fix Gemma merging

* Update rms_layernorm.py

* Update gemma.py

* Update pyproject.toml

* Layernorms

* Gemma precision

* Update gemma.py

* sqrt

* Update gemma.py

* Update save.py

* RoPE and Gemma precision

* Update rms_layernorm.py

* Fix warning

* Update chat_templates.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Accuracy

* Revert

* Update save.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py
2024-03-10 20:09:34 +11:00
Daniel Han
1fcf9d4577 Fix bugs (#230)
* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update pyproject.toml

* Small fixes

* Update pyproject.toml

* Approx gelu

* Update geglu.py

* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py

* Update gemma.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Fix Gemma merging

* Update rms_layernorm.py

* Update gemma.py

* Update pyproject.toml

* Layernorms

* Gemma precision

* Update gemma.py

* sqrt

* Update gemma.py

* Update save.py

* RoPE and Gemma precision

* Update rms_layernorm.py

* Fix warning

* Update chat_templates.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update llama.py

* model_name

* Update loader.py

* Tokenizer overwritten
2024-03-10 04:54:23 +11:00
Daniel Han
70f271b1d3 Fix Gemma (#223)
* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update pyproject.toml

* Small fixes

* Update pyproject.toml

* Approx gelu

* Update geglu.py

* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py

* Update gemma.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Fix Gemma merging

* Update rms_layernorm.py

* Update gemma.py

* Update pyproject.toml

* Layernorms

* Gemma precision

* Update gemma.py

* sqrt

* Update gemma.py

* Update save.py

* RoPE and Gemma precision

* Update rms_layernorm.py

* Fix warning

* Update chat_templates.py
2024-03-07 04:34:06 +11:00
Daniel Han
fedcafe281 Fix Gemma norm float32 (#217)
* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update pyproject.toml

* Small fixes

* Update pyproject.toml

* Approx gelu

* Update geglu.py

* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py

* Update gemma.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update gemma.py
2024-03-04 16:19:55 +11:00
Daniel Han
7b7665d9d6 Fix Gemma fast inference (#215)
* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update pyproject.toml

* Small fixes

* Update pyproject.toml

* Approx gelu

* Update geglu.py

* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py

* Update gemma.py
2024-03-03 19:36:06 +11:00
Daniel Han
fa2a43baf3 Fix Gemma activation function (#214)
* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update pyproject.toml

* Small fixes

* Update pyproject.toml

* Approx gelu

* Update geglu.py

* Approx gelu

* Update llama.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update geglu.py
2024-03-03 18:21:44 +11:00
Daniel Han
307f2da353 Nightly (#204)
* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py

* Update llama.py

* Hotfix - fix DoRA, Gemma prompt template (#202) (#203)

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py
2024-02-29 00:18:38 +11:00
Daniel Han
25619645dd Hotfix - fix DoRA, Gemma prompt template (#202)
* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md

* Update llama.py

* DoRA

* Update _utils.py

* Update chat_templates.py
2024-02-29 00:15:22 +11:00
Daniel Han
e7c53fb370 2.4x faster Gemma (#197)
* Update save.py

* Update save.py

* linking

* llama.cpp bugs

* Update save.py

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original

* Gemma

* Update pyproject.toml

* Update mapper.py

* Update fast_lora.py

* FastGemmaModel

* model_type

* Update llama.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* gemma

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Fast CE Loss

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* CE

* Update llama.py

* Update llama.py

* Update cross_entropy_loss.py

* Update geglu.py

* Update cross_entropy_loss.py

* revert

* Update llama.py

* Update llama.py

* norm

* Update gemma.py

* Update gemma.py

* position_ids

* Update gemma.py

* Update gemma.py

* pos

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* revert

* revert

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* rope

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* llama

* Update llama.py

* gemma

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update save.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update gemma.py

* correct_dtype

* Update gemma.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Chat Templates

* Update README.md

* Update README.md
2024-02-27 01:42:10 +11:00
Daniel Han
1b7bf718cc Feb 2024 Release (#187)
* Fast inference repatch

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update mistral.py

* Update __init__.py

* Fix inference

* Update mistral.py

* fast lm_head

* Remove fast path

* Update rope_embedding.py

* Update loader.py

* LlamaAttention_fast_forward_inference

* if past_key_value is not None and q_len == 1:

* revert inference

* Update loader.py

* past_key_value

* Update llama.py

* Update llama.py

* Fix SDPA

* Update llama.py

* padding

* Inference

* Update llama.py

* Revert

* Update mistral.py

* faster inference

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* inference

* Update llama.py

* Update utils.py

* faster inference

* Update llama.py

* revert

* lm_head

* Update llama.py

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* faster inference

* Update llama.py

* fast inference

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* torch compile

* past_key_values

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update llama.py

* fast inference + saving config.json

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* fast inference again

* more temp matrices

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update mistral.py

* Update llama.py

* SDPA

* attention_mask

* New version

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update save.py

* Update save.py

* Torch 2.2.0

* Update save.py

* mistral swa

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Fix SWA inference

* Fix llm_int8_skip_modules

* SWA inference

* Update save.py

* Update save.py

* Update pyproject.toml

* __version__

* __version__

* Update save.py

* Update save.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Chat Templates

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* patch tokenizer

* Update chat_templates.py

* Saving, LlamaRotaryEmbedding issues

* Update llama.py

* Update mistral.py

* Update mapper.py

* Fix RoPE precision issues

* Bugs

* saving bugs

* Update llama.py

* readme

* spaces

* spaces

* globals

* slash

* slashes

* spaces

* apache

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* trainer

* Update save.py

* Update pyproject.toml

* install

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* PeftModel token + saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* linking

* llama.cpp bugs

* Update save.py

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original
2024-02-21 03:58:59 +11:00
Daniel Han
0439b8508d Prelim Feb release (#173)
* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask

* Update save.py

* Update save.py

* Update mistral.py

* attention mask

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update dpo.py

* Patch saving

* Update save.py

* Update save.py

* patch_saving_functions

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* print

* Mistral patch

* Update mistral.py

* Update save.py

* saving

* Update llama.py

* Update llama.py

* Fast inference repatch

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update mistral.py

* Update __init__.py

* Fix inference

* Update mistral.py

* fast lm_head

* Remove fast path

* Update rope_embedding.py

* Update loader.py

* LlamaAttention_fast_forward_inference

* if past_key_value is not None and q_len == 1:

* revert inference

* Update loader.py

* past_key_value

* Update llama.py

* Update llama.py

* Fix SDPA

* Update llama.py

* padding

* Inference

* Update llama.py

* Revert

* Update mistral.py

* faster inference

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* inference

* Update llama.py

* Update utils.py

* faster inference

* Update llama.py

* revert

* lm_head

* Update llama.py

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* faster inference

* Update llama.py

* fast inference

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* torch compile

* past_key_values

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update llama.py

* fast inference + saving config.json

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* fast inference again

* more temp matrices

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update mistral.py

* Update llama.py

* SDPA

* attention_mask

* New version

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update save.py

* Update save.py

* Torch 2.2.0

* Update save.py

* mistral swa

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Fix SWA inference

* Fix llm_int8_skip_modules

* SWA inference

* Update save.py

* Update save.py

* Update pyproject.toml

* __version__

* __version__

* Update save.py

* Update save.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Chat Templates

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* patch tokenizer

* Update chat_templates.py

* Saving, LlamaRotaryEmbedding issues

* Update llama.py

* Update mistral.py
2024-02-15 00:07:42 +11:00
Younes Belkada
474fd32f91 add HF tagging in unsloth (#170) 2024-02-13 18:28:42 +11:00
Daniel Han
3d5cf373bc Update README.md (#165) 2024-02-09 15:59:17 +11:00
Daniel Han
2bc34566c4 Update README.md (#164) 2024-02-09 15:49:09 +11:00
Daniel Han-Chen
99b8d231ce Update mapper.py 2024-02-09 03:51:59 +11:00
Daniel Han
b7f24e804c Update README.md (#162) 2024-02-08 13:11:54 +11:00
Daniel Han
0b01dcb655 Nightly (#161)
* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec46e0.

* Update llama.py

* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask

* Update save.py

* Update save.py

* Update mistral.py

* attention mask

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update dpo.py

* Patch saving

* Update save.py

* Update save.py

* patch_saving_functions

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* print

* Mistral patch

* Update mistral.py

* Update save.py

* saving

* Update llama.py

* Update llama.py

* Fast inference repatch

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update mistral.py

* Update __init__.py

* Fix inference

* Update mistral.py

* fast lm_head

* Remove fast path

* Update rope_embedding.py

* Update loader.py

* LlamaAttention_fast_forward_inference

* if past_key_value is not None and q_len == 1:

* revert inference

* Update loader.py

* past_key_value

* Update llama.py

* Update llama.py

* Fix SDPA

* Update llama.py

* padding

* Inference

* Update llama.py

* Revert

* Update mistral.py

* faster inference

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* inference

* Update llama.py

* Update utils.py

* faster inference

* Update llama.py

* revert

* lm_head

* Update llama.py

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* faster inference

* Update llama.py

* fast inference

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* torch compile

* past_key_values

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update llama.py

* fast inference + saving config.json

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* fast inference again

* more temp matrices

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update mistral.py

* Update llama.py

* SDPA

* attention_mask

* New version

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update save.py

* Update save.py

* Torch 2.2.0

* Update save.py

* mistral swa

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Fix SWA inference

* Fix llm_int8_skip_modules

* SWA inference

* Update save.py

* Update save.py

* Update pyproject.toml

* __version__

* __version__

* Update save.py

* Update save.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py
2024-02-08 03:40:28 +11:00
Daniel Han
25cfc7f590 Torch 2.2 (#157)
* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update save.py

* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec46e0.

* Update llama.py

* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask

* Update save.py

* Update save.py

* Update mistral.py

* attention mask

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update dpo.py

* Patch saving

* Update save.py

* Update save.py

* patch_saving_functions

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* print

* Mistral patch

* Update mistral.py

* Update save.py

* saving

* Update llama.py

* Update llama.py

* Fast inference repatch

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update mistral.py

* Update __init__.py

* Fix inference

* Update mistral.py

* fast lm_head

* Remove fast path

* Update rope_embedding.py

* Update loader.py

* LlamaAttention_fast_forward_inference

* if past_key_value is not None and q_len == 1:

* revert inference

* Update loader.py

* past_key_value

* Update llama.py

* Update llama.py

* Fix SDPA

* Update llama.py

* padding

* Inference

* Update llama.py

* Revert

* Update mistral.py

* faster inference

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* inference

* Update llama.py

* Update utils.py

* faster inference

* Update llama.py

* revert

* lm_head

* Update llama.py

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* faster inference

* Update llama.py

* fast inference

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* torch compile

* past_key_values

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update llama.py

* fast inference + saving config.json

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* fast inference again

* more temp matrices

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update mistral.py

* Update llama.py

* SDPA

* attention_mask

* New version

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update save.py

* Update save.py

* Torch 2.2.0

* Update save.py

* mistral swa

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Fix SWA inference

* Fix llm_int8_skip_modules

* SWA inference

* Update save.py

* Update save.py

* Update pyproject.toml

* __version__

* __version__

* Update save.py

* Update save.py

* Update mistral.py
2024-02-07 04:40:50 +11:00
Michael Han
bb66faaa33 ReadMe Revamp (#156)
* HF Perf Button

* Update README.md

Adding new buttons cleanup

* Update README.md

* Delete images/Discord.png

* Delete images/try live demo green.png

* new transparent logos

* Revamping page

* Revamp mainpage

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* finetune button

* Delete start free finetune button.png

* free finetune button

* Add files via upload

* Update README.md

* Update README.md

* Add files via upload

* Add files via upload

* Update README.md

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Squashed commit of the following:

commit efa0d2332e
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Feb 4 17:35:56 2024 +1100

    2x faster inference (#151)

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e0.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

    * Update llama.py

    * Update llama.py

    * Fast inference repatch

    * Update llama.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update mistral.py

    * Update __init__.py

    * Fix inference

    * Update mistral.py

    * fast lm_head

    * Remove fast path

    * Update rope_embedding.py

    * Update loader.py

    * LlamaAttention_fast_forward_inference

    * if past_key_value is not None and q_len == 1:

    * revert inference

    * Update loader.py

    * past_key_value

    * Update llama.py

    * Update llama.py

    * Fix SDPA

    * Update llama.py

    * padding

    * Inference

    * Update llama.py

    * Revert

    * Update mistral.py

    * faster inference

    * inference

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * inference

    * Update llama.py

    * Update utils.py

    * faster inference

    * Update llama.py

    * revert

    * lm_head

    * Update llama.py

    * inference

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * faster inference

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * torch compile

    * past_key_values

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update llama.py

    * fast inference + saving config.json

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * fast inference again

    * more temp matrices

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update mistral.py

    * Update llama.py

    * SDPA

    * attention_mask

    * New version

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update utils.py

    * Update utils.py

commit 2f55935f94
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Wed Jan 31 04:03:37 2024 +1100

    Hotfix - fix inference (#146)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e0.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

    * Update llama.py

    * Update llama.py

    * Fast inference repatch

    * Update llama.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update mistral.py

    * Update __init__.py

    * Fix inference

    * Update mistral.py

    * fast lm_head

    * Remove fast path

    * Update rope_embedding.py

    * Update loader.py

    * LlamaAttention_fast_forward_inference

    * if past_key_value is not None and q_len == 1:

    * revert inference

    * Update loader.py

    * past_key_value

commit a3a2ad9382
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Mon Jan 29 17:49:54 2024 +1100

    Fix inference attention mask (#142)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e0.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

    * Update llama.py

    * Update llama.py

commit 90309ca8dc
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Mon Jan 29 03:45:07 2024 +1100

    Nightly (#140)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e0.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

commit a16bc73e80
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Mon Jan 29 02:52:39 2024 +1100

    Fix saving issues (#139)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e0.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

commit af33224554
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 28 04:30:29 2024 +1100

    1 more bug (#138)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e0.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

commit e2bbd3819e
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 28 04:20:06 2024 +1100

    Fix bugs + more accurate Swiglu (#137)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e0.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

commit a81aff286f
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:50:22 2024 +1100

    Inference bug fix (#134)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e0.

    * Update llama.py

commit 7da0c50f75
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:47:54 2024 +1100

    More bug fixes (#133)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

commit 62fae3aa74
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 26 04:19:17 2024 +1100

    Fix bugs (#129)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

commit 04f8771821
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Tue Jan 23 03:55:24 2024 +1100

    2-4x faster native HF inference (#119)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

commit 3a9b2dee98
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 21 22:20:22 2024 +1100

    Hotfix (#118)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

commit a6f4fb0075
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 05:00:37 2024 +1100

    Update save.py

commit 705cac0357
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 04:21:54 2024 +1100

    Update save.py

commit 16edcb3be2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 04:13:03 2024 +1100

    Update save.py

commit 3d05a74b12
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 21 03:43:49 2024 +1100

    Fixed saving! (#113)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

    * RSLoRA and LoftQ direct support

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Fix DPO + GGUF

    * Fix quantization_method

    * Fix quantization_config

    * patch model

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update save.py

    * Update save.py

    * tokenizer_save_settings

    * Update save.py

    * quantization and loftq

    * Update save.py

    * Update llama.py

    * Update save.py

    * upload_to_huggingface

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

commit bb05d6b6e2
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 20 23:23:00 2024 +1100

    Hotfix for Jan 2024 Release (#110)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

    * RSLoRA and LoftQ direct support

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Fix DPO + GGUF

    * Fix quantization_method

    * Fix quantization_config

    * patch model

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update save.py

    * Update save.py

    * tokenizer_save_settings

    * Update save.py

    * quantization and loftq

    * Update save.py

    * Update llama.py

    * Update save.py

commit 12e75c93d0
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 20 04:25:06 2024 +1100

    Quick fixes (#106)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

    * RSLoRA and LoftQ direct support

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Fix DPO + GGUF

commit 52b5ef31e0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 20 02:30:31 2024 +1100

    Update _utils.py

commit 1a19c38675
Merge: 0a52390 0d6e52b
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 23:15:38 2024 +1100

    Merge branch 'main' of https://github.com/unslothai/unsloth

commit 0a52390ac2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 23:15:20 2024 +1100

    Revert quantization methods

commit 0d6e52b5c7
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 19 22:57:22 2024 +1100

    getattr issues (#103)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

commit b3fcea6421
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 19 22:52:30 2024 +1100

    Quick fixes (#101)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

commit d691516ab9
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 19 04:51:19 2024 +1100

    2024 Release (#96)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

commit 9e2dec16fb
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 03:41:00 2024 +1100

    Update pyproject.toml

commit 396c7245dd
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 03:35:17 2024 +1100

    Update pyproject.toml

commit 738e91591f
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Thu Jan 11 04:08:03 2024 +1100

    Fix some bugs (#83)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

commit a1da50b5ce
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Wed Jan 10 23:10:48 2024 +1100

    Update README.md (#81)

commit 606e8a9284
Author: shimmy <107991372+shimmyshimmer@users.noreply.github.com>
Date:   Wed Jan 10 23:10:23 2024 +1100

    Discord button redo (#80)

commit 0169294ffb
Author: shimmy <107991372+shimmyshimmer@users.noreply.github.com>
Date:   Wed Jan 10 23:02:20 2024 +1100

    Update logos (#79)

    * HF Perf Button

    * Update README.md

    Adding new buttons cleanup

    * Update README.md

    * Delete images/Discord.png

    * Delete images/try live demo green.png

    * new transparent logos

    * Revamping page

    * Revamp mainpage

    * Update README.md

    * Update README.md

commit b2a8c33430
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Wed Jan 10 20:03:01 2024 +1100

    Create FUNDING.yml (#78)

commit c9c1abf290
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Jan 10 01:02:44 2024 +1100

    fix_tokenizer

commit 6efffb46e4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 9 23:40:43 2024 +1100

    check_tokenizer

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2024-02-07 02:00:12 +11:00
Daniel Han
efa0d2332e 2x faster inference (#151)
* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update save.py

* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec46e0.

* Update llama.py

* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask

* Update save.py

* Update save.py

* Update mistral.py

* attention mask

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update dpo.py

* Patch saving

* Update save.py

* Update save.py

* patch_saving_functions

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* print

* Mistral patch

* Update mistral.py

* Update save.py

* saving

* Update llama.py

* Update llama.py

* Fast inference repatch

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update mistral.py

* Update __init__.py

* Fix inference

* Update mistral.py

* fast lm_head

* Remove fast path

* Update rope_embedding.py

* Update loader.py

* LlamaAttention_fast_forward_inference

* if past_key_value is not None and q_len == 1:

* revert inference

* Update loader.py

* past_key_value

* Update llama.py

* Update llama.py

* Fix SDPA

* Update llama.py

* padding

* Inference

* Update llama.py

* Revert

* Update mistral.py

* faster inference

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* inference

* Update llama.py

* Update utils.py

* faster inference

* Update llama.py

* revert

* lm_head

* Update llama.py

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* faster inference

* Update llama.py

* fast inference

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* torch compile

* past_key_values

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update llama.py

* fast inference + saving config.json

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* fast inference again

* more temp matrices

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update mistral.py

* Update llama.py

* SDPA

* attention_mask

* New version

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py
2024-02-04 17:35:56 +11:00
Daniel Han
2f55935f94 Hotfix - fix inference (#146)
* faster saving & inference

* Update llama.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update llama.py

* Update save.py

* Update llama.py

* Mistral correct RoPE scaling

* Max sequence lengths

* Apache 2

* fast_linear_forward

* Update utils.py

* Update utils.py

* No print

* Update utils.py

* Update utils.py

* inference

* Update llama.py

* Fast inference RoPE

* Update llama.py

* Update llama.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* LoRA

* Fast LoRA saving

* Update llama.py

* hidden_states

* q_len == 1

* q_len issue

* Update mistral.py

* Update mistral.py

* incorrect inference

* Update to transformers 4.37

* Graceful FA2 error + torch 2.1.1

* Update mapper.py

* Update pyproject.toml

* Fix saving and bnb-4bit

* Update fast_lora.py

* Update fast_lora.py

* remove patching

* Update llama.py

* Update llama.py

* Update swiglu.py

* Repatch

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update save.py

* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec46e0.

* Update llama.py

* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask

* Update save.py

* Update save.py

* Update mistral.py

* attention mask

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update dpo.py

* Patch saving

* Update save.py

* Update save.py

* patch_saving_functions

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* print

* Mistral patch

* Update mistral.py

* Update save.py

* saving

* Update llama.py

* Update llama.py

* Fast inference repatch

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update mistral.py

* Update __init__.py

* Fix inference

* Update mistral.py

* fast lm_head

* Remove fast path

* Update rope_embedding.py

* Update loader.py

* LlamaAttention_fast_forward_inference

* if past_key_value is not None and q_len == 1:

* revert inference

* Update loader.py

* past_key_value
2024-01-31 04:03:37 +11:00
Daniel Han
a3a2ad9382 Fix inference attention mask (#142)
* faster saving & inference

* Update llama.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update llama.py

* Update save.py

* Update llama.py

* Mistral correct RoPE scaling

* Max sequence lengths

* Apache 2

* fast_linear_forward

* Update utils.py

* Update utils.py

* No print

* Update utils.py

* Update utils.py

* inference

* Update llama.py

* Fast inference RoPE

* Update llama.py

* Update llama.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* LoRA

* Fast LoRA saving

* Update llama.py

* hidden_states

* q_len == 1

* q_len issue

* Update mistral.py

* Update mistral.py

* incorrect inference

* Update to transformers 4.37

* Graceful FA2 error + torch 2.1.1

* Update mapper.py

* Update pyproject.toml

* Fix saving and bnb-4bit

* Update fast_lora.py

* Update fast_lora.py

* remove patching

* Update llama.py

* Update llama.py

* Update swiglu.py

* Repatch

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update save.py

* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec46e0.

* Update llama.py

* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask

* Update save.py

* Update save.py

* Update mistral.py

* attention mask

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update dpo.py

* Patch saving

* Update save.py

* Update save.py

* patch_saving_functions

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* print

* Mistral patch

* Update mistral.py

* Update save.py

* saving

* Update llama.py

* Update llama.py
2024-01-29 17:49:54 +11:00
Daniel Han
90309ca8dc Nightly (#140)
* faster saving & inference

* Update llama.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update llama.py

* Update save.py

* Update llama.py

* Mistral correct RoPE scaling

* Max sequence lengths

* Apache 2

* fast_linear_forward

* Update utils.py

* Update utils.py

* No print

* Update utils.py

* Update utils.py

* inference

* Update llama.py

* Fast inference RoPE

* Update llama.py

* Update llama.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* LoRA

* Fast LoRA saving

* Update llama.py

* hidden_states

* q_len == 1

* q_len issue

* Update mistral.py

* Update mistral.py

* incorrect inference

* Update to transformers 4.37

* Graceful FA2 error + torch 2.1.1

* Update mapper.py

* Update pyproject.toml

* Fix saving and bnb-4bit

* Update fast_lora.py

* Update fast_lora.py

* remove patching

* Update llama.py

* Update llama.py

* Update swiglu.py

* Repatch

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update save.py

* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec46e0.

* Update llama.py

* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask

* Update save.py

* Update save.py

* Update mistral.py

* attention mask

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update dpo.py

* Patch saving

* Update save.py

* Update save.py

* patch_saving_functions

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* print

* Mistral patch

* Update mistral.py

* Update save.py

* saving
2024-01-29 03:45:07 +11:00
Daniel Han
a16bc73e80 Fix saving issues (#139)
* faster saving & inference

* Update llama.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update llama.py

* Update save.py

* Update llama.py

* Mistral correct RoPE scaling

* Max sequence lengths

* Apache 2

* fast_linear_forward

* Update utils.py

* Update utils.py

* No print

* Update utils.py

* Update utils.py

* inference

* Update llama.py

* Fast inference RoPE

* Update llama.py

* Update llama.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* LoRA

* Fast LoRA saving

* Update llama.py

* hidden_states

* q_len == 1

* q_len issue

* Update mistral.py

* Update mistral.py

* incorrect inference

* Update to transformers 4.37

* Graceful FA2 error + torch 2.1.1

* Update mapper.py

* Update pyproject.toml

* Fix saving and bnb-4bit

* Update fast_lora.py

* Update fast_lora.py

* remove patching

* Update llama.py

* Update llama.py

* Update swiglu.py

* Repatch

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update save.py

* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec46e0.

* Update llama.py

* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask

* Update save.py

* Update save.py

* Update mistral.py

* attention mask

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update dpo.py

* Patch saving

* Update save.py

* Update save.py

* patch_saving_functions

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* print
2024-01-29 02:52:39 +11:00
Daniel Han
af33224554 1 more bug (#138)
* faster saving & inference

* Update llama.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update llama.py

* Update save.py

* Update llama.py

* Mistral correct RoPE scaling

* Max sequence lengths

* Apache 2

* fast_linear_forward

* Update utils.py

* Update utils.py

* No print

* Update utils.py

* Update utils.py

* inference

* Update llama.py

* Fast inference RoPE

* Update llama.py

* Update llama.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* LoRA

* Fast LoRA saving

* Update llama.py

* hidden_states

* q_len == 1

* q_len issue

* Update mistral.py

* Update mistral.py

* incorrect inference

* Update to transformers 4.37

* Graceful FA2 error + torch 2.1.1

* Update mapper.py

* Update pyproject.toml

* Fix saving and bnb-4bit

* Update fast_lora.py

* Update fast_lora.py

* remove patching

* Update llama.py

* Update llama.py

* Update swiglu.py

* Repatch

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update save.py

* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec46e0.

* Update llama.py

* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask

* Update save.py

* Update save.py
2024-01-28 04:30:29 +11:00
Daniel Han
e2bbd3819e Fix bugs + more accurate Swiglu (#137)
* faster saving & inference

* Update llama.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update llama.py

* Update save.py

* Update llama.py

* Mistral correct RoPE scaling

* Max sequence lengths

* Apache 2

* fast_linear_forward

* Update utils.py

* Update utils.py

* No print

* Update utils.py

* Update utils.py

* inference

* Update llama.py

* Fast inference RoPE

* Update llama.py

* Update llama.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* LoRA

* Fast LoRA saving

* Update llama.py

* hidden_states

* q_len == 1

* q_len issue

* Update mistral.py

* Update mistral.py

* incorrect inference

* Update to transformers 4.37

* Graceful FA2 error + torch 2.1.1

* Update mapper.py

* Update pyproject.toml

* Fix saving and bnb-4bit

* Update fast_lora.py

* Update fast_lora.py

* remove patching

* Update llama.py

* Update llama.py

* Update swiglu.py

* Repatch

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update save.py

* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec46e0.

* Update llama.py

* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask
2024-01-28 04:20:06 +11:00
Daniel Han
a81aff286f Inference bug fix (#134)
* faster saving & inference

* Update llama.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update llama.py

* Update save.py

* Update llama.py

* Mistral correct RoPE scaling

* Max sequence lengths

* Apache 2

* fast_linear_forward

* Update utils.py

* Update utils.py

* No print

* Update utils.py

* Update utils.py

* inference

* Update llama.py

* Fast inference RoPE

* Update llama.py

* Update llama.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* LoRA

* Fast LoRA saving

* Update llama.py

* hidden_states

* q_len == 1

* q_len issue

* Update mistral.py

* Update mistral.py

* incorrect inference

* Update to transformers 4.37

* Graceful FA2 error + torch 2.1.1

* Update mapper.py

* Update pyproject.toml

* Fix saving and bnb-4bit

* Update fast_lora.py

* Update fast_lora.py

* remove patching

* Update llama.py

* Update llama.py

* Update swiglu.py

* Repatch

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update save.py

* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec46e0.

* Update llama.py
2024-01-27 04:50:22 +11:00
994 changed files with 236633 additions and 3096 deletions

2
.gitattributes vendored Normal file
View file

@ -0,0 +1,2 @@
# Normalize Python files to LF line endings
*.py text eol=lf

55
.github/CODEOWNERS vendored Normal file
View file

@ -0,0 +1,55 @@
# Inspired from https://github.com/vllm-project/vllm/blob/main/.github/CODEOWNERS
/unsloth/models/loader.py @danielhanchen @mmathew23
/unsloth/models/llama.py @Datta0 @danielhanchen @mmathew23
/unsloth/models/rl.py @Datta0 @pluesclues @danielhanchen
/unsloth/models/rl_replacements.py @Datta0 @pluesclues @danielhanchen
/unsloth/trainer.py @danielhanchen
/unsloth/models/sentence_transformer.py @Etherll @danielhanchen
/unsloth/save.py @rolandtannous @danielhanchen
/unsloth/tokenizer_utils.py @mmathew23 @danielhanchen
/unsloth/chat_templates.py @rolandtannous @danielhanchen
/unsloth/ollama_template_mappers.py @rolandtannous @danielhanchen
/unsloth/kernels/moe/*.py @Datta0
/unsloth/import_fixes.py @danielhanchen
/unsloth/device_type.py @danielhanchen
/unsloth/_auto_install.py @danielhanchen
/unsloth/dataprep/*.py @danielhanchen
/unsloth/kernels/cross_entropy_loss.py @danielhanchen
/unsloth/kernels/fast_lora.py @danielhanchen
/unsloth/kernels/flex_attention.py @danielhanchen
/unsloth/kernels/fp8.py @Datta0
/unsloth/kernels/geglu.py @danielhanchen
/unsloth/kernels/layernorm.py @danielhanchen
/unsloth/kernels/rms_layernorm.py @danielhanchen
/unsloth/kernels/rope_embedding.py @danielhanchen
/unsloth/kernels/swiglu.py @danielhanchen
/unsloth/kernels/utils.py @danielhanchen @Datta0
/unsloth/models/_utils.py @danielhanchen @mmathew23
/unsloth/models/cohere.py @danielhanchen
/unsloth/models/dpo.py @danielhanchen
/unsloth/models/falcon_h1.py @danielhanchen
/unsloth/models/gemma.py @danielhanchen
/unsloth/models/gemma2.py @danielhanchen
/unsloth/models/glm4_moe.py @Datta0
/unsloth/models/granite.py @danielhanchen
/unsloth/models/llama4.py @danielhanchen
/unsloth/models/loader_utils.py @Datta0 @danielhanchen
/unsloth/models/mapper.py @danielhanchen
/unsloth/models/mistral.py @danielhanchen
/unsloth/models/qwen2.py @danielhanchen
/unsloth/models/qwen3.py @Datta0
/unsloth/models/qwen3_moe.py @Datta0
/unsloth/models/vision.py @mmathew23 @danielhanchen
/unsloth/utils/attention_dispatch.py @mmathew23
/unsloth/utils/hf_hub.py @mmathew23
/unsloth/utils/packing.py @mmathew23
/cli/ @rolandtannous @Manan17
/studio/frontend/ @Shine1i @rolandtannous @Manan17
/studio/frontend/public/ @Shine1i
/studio/backend/ @rolandtannous
/studio/backend/core/data_recipe/ @rolandtannous
/studio/backend/tests/ @rolandtannous @danielhanchen
/tests/ @rolandtannous @danielhanchen
/scripts/ @rolandtannous @danielhanchen

4
.github/FUNDING.yml vendored
View file

@ -1,9 +1,9 @@
# These are supported funding model platforms
github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
github: unslothai
patreon: # Replace with a single Patreon username
open_collective: # Replace with a single Open Collective username
ko_fi: unsloth
ko_fi: # unsloth
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
liberapay: # Replace with a single Liberapay username

22
.github/ISSUE_TEMPLATE/bug---issue.md vendored Normal file
View file

@ -0,0 +1,22 @@
---
name: Bug / Issue
about: Bug / Issue
title: "[Bug] Please fill in your issue title here."
labels: bug
assignees: ''
---
Note: Please do not remove the questions. Answer beside them.
1. Did you update? `pip install --upgrade unsloth unsloth_zoo`
2. `Colab` or `Kaggle` or local / cloud
3. Number GPUs used, use `nvidia-smi`
4. Which notebook? Please link!
5. Which Unsloth version, TRL version, transformers version, PyTorch version?
6. Which trainer? `SFTTrainer`, `GRPOTrainer` etc
```python
Put Minimal code to reproduce error here ###Remove Hugging Face token###
###Please make sure to check formatting properly, edit if needed.###
```
🦥 You can also ask via our Reddit page: https://reddit.com/r/unsloth/

View file

@ -0,0 +1,21 @@
---
name: Feature Request
about: New features, model support, ideas
title: "[Feature]"
labels: feature request
assignees: ''
---
For new models, have you tried:
```python
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
"microsoft/Phi-4-multimodal-instruct",
trust_remote_code = True,
)
from transformers import AutoModelForSequenceClassification
model, tokenizer = FastModel.from_pretrained(
auto_model = AutoModelForSequenceClassification,
)
```

27
.github/dependabot.yml vendored Normal file
View file

@ -0,0 +1,27 @@
---
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"
groups:
actions:
patterns: ["*"]
- package-ecosystem: "bun"
directory: "/studio/frontend"
schedule:
interval: "weekly"
groups:
bun-frontend:
patterns: ["*"]
- package-ecosystem: "npm"
directory: "/studio/backend/core/data_recipe/oxc-validator"
schedule:
interval: "weekly"
groups:
npm-oxc-validator:
patterns: ["*"]
...

37
.github/workflows/stale.yml vendored Normal file
View file

@ -0,0 +1,37 @@
name: 'Inactive Issue Pinger'
on:
schedule:
- cron: '30 5 * * *' # Runs at 5:30 UTC every day
jobs:
stale:
runs-on: ubuntu-latest
permissions:
issues: write
steps:
- uses: actions/stale@v10
with:
# The message to post on stale issues.
# This message will ping the issue author.
# Note: The stale bot action does not currently support a direct placeholder for the last commenter.
# As a workaround, this message encourages any participant to reply.
stale-issue-message: >
Is this issue still important to you?
Apologies in advance we might have missed this issue as well.
For faster response times, please post on our Reddit server - https://www.reddit.com/r/unsloth or our Discord - https://discord.com/invite/unsloth
# The number of days of inactivity before an issue is considered stale.
days-before-issue-stale: 9999
# Set to -1 to never close stale issues.
days-before-issue-close: -1
# A label to apply to stale issues.
stale-issue-label: 'inactive'
# The number of operations to perform per run to avoid rate limiting.
operations-per-run: 500
enable-statistics: false

62
.gitignore vendored
View file

@ -1,7 +1,17 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
*.class
unsloth_compiled_cache/
# ML artifacts (large files)
feature/
outputs/
exports/
/datasets/
studio/backend/assets/datasets/
unsloth_training_checkpoints/
*.gguf
*.safetensors
# C extensions
*.so
@ -94,6 +104,12 @@ ipython_config.py
# install all needed dependencies.
#Pipfile.lock
# UV
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
#uv.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
@ -106,8 +122,10 @@ ipython_config.py
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
.pdm.toml
.pdm-python
.pdm-build/
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
@ -127,6 +145,9 @@ venv/
ENV/
env.bak/
venv.bak/
.venv_overlay/
.venv_t5/
environment.yaml
# Spyder project settings
.spyderproject
@ -158,3 +179,40 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
# Ruff stuff:
.ruff_cache/
.pre-commit-cache/
# PyPI configuration file and IDE/Editors
.pypirc
.vscode
.idea/
.claude/
*.swp
*.swo
# oh-my-codex
.omx/
# Firebase
firebase-debug.log
# Other
resources/
tmp/
**/node_modules/
auth.db
# Local working docs
**/CLAUDE.md
**/claude.md
**/AGENT.md
**/agent.md
docs/canvas-lab-architecture.md
log_rtx.txt
log.txt
setup_leo.sh
server.pid
*.log
package-lock.json

6
.pre-commit-ci.yaml Normal file
View file

@ -0,0 +1,6 @@
ci:
autofix_prs: true
autofix_prs_limit: 5
autoupdate_schedule: monthly
autoupdate_commit_msg: "chore: pre-commit autoupdate"
skip: []

18
.pre-commit-config.yaml Normal file
View file

@ -0,0 +1,18 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.10
hooks:
- id: ruff
args:
- --fix
- --exit-non-zero-on-fix
exclude: '\.ipynb$'
- repo: local
hooks:
- id: ruff-format-with-kwargs
name: Ruff format with kwarg spacing
entry: scripts/run_ruff_format.py
language: python
types: [python]
additional_dependencies:
- ruff==0.6.9

132
CODE_OF_CONDUCT.md Normal file
View file

@ -0,0 +1,132 @@
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, caste, color, religion, or sexual
identity and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the overall
community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or advances of
any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email address,
without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at support@unsloth.ai.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series of
actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or permanent
ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within the
community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.1, available at
[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
Community Impact Guidelines were inspired by
[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
For answers to common questions about this code of conduct, see the FAQ at
[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
[https://www.contributor-covenant.org/translations][translations].
[homepage]: https://www.contributor-covenant.org
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
[Mozilla CoC]: https://github.com/mozilla/diversity
[FAQ]: https://www.contributor-covenant.org/faq
[translations]: https://www.contributor-covenant.org/translations

29
CONTRIBUTING.md Normal file
View file

@ -0,0 +1,29 @@
# 🦥 Contributing to Unsloth
Thank you for not only using Unsloth but also for being interested in helping out! We value all contributions, whether they come in the form of code, ideas, support for others or just by simply spreading the word of Unsloth! 💕
- **[Support the Community](https://github.com/unslothai/unsloth/issues)**: Answer questions, review pull requests, or assist others in discussions.
- **Fix Bugs**: Identify and resolve issues with the existing codebase.
- **Submit Ideas**: Request new features or share enhancements you'd like to see.
- **Develop Features**: Implement new functionality or improve existing tools which can be done via PRs.
- **[Improve Documentation](https://docs.unsloth.ai/)**: Help by creating guides, FAQs, or enhancing clarity.
One of the best ways to support us is by spreading the word about Unsloth! Share how its powering your amazing projects in blog posts or social media, and inspire others to explore its potential. Even a simple star on our repo goes a long way in showing your support and helping the community grow. 🌟
## Submitting Issues
If you find a bug or have a feature idea, wed love to hear from you! Heres how to make your submission stand out:
### Reporting Bugs
1. **Search First**: Check if the issue has already been reported using GitHubs search bar under Issues.
2. **Details Matter**: Is this on Google Colab, Kaggle, or on another platform service? Are you using Unsloth's official notebook? Include your OS, Python version, and other relevant details. For bugs, a concise code snippet that reproduces the issue is incredibly helpful.
3. **Be Thorough**: Attach screenshots, traceback logs, or any additional information that might speed up resolution.
## Spread the Word
Your support extends beyond code:
- Spread the word by writing about Unsloth in blogs or social media.
- Share how Unsloth powers your projects.
- Star our repository to show your appreciation.
Finally, please be mindful of our [Code of Conduct](https://github.com/unslothai/unsloth/blob/main/CODE_OF_CONDUCT.md) to ensure a welcoming and inclusive environment for everyone.
Thank you so much for reading and we hope you have lots of fun using Unsloth! 🦥

664
COPYING Normal file
View file

@ -0,0 +1,664 @@
GNU AFFERO GENERAL PUBLIC LICENSE
Version 3, 19 November 2007
Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The GNU Affero General Public License is a free, copyleft license for
software and other kinds of works, specifically designed to ensure
cooperation with the community in the case of network server software.
The licenses for most software and other practical works are designed
to take away your freedom to share and change the works. By contrast,
our General Public Licenses are intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.
Developers that use our General Public Licenses protect your rights
with two steps: (1) assert copyright on the software, and (2) offer
you this License which gives you legal permission to copy, distribute
and/or modify the software.
A secondary benefit of defending all users' freedom is that
improvements made in alternate versions of the program, if they
receive widespread use, become available for other developers to
incorporate. Many developers of free software are heartened and
encouraged by the resulting cooperation. However, in the case of
software used on network servers, this result may fail to come about.
The GNU General Public License permits making a modified version and
letting the public access it on a server without ever releasing its
source code to the public.
The GNU Affero General Public License is designed specifically to
ensure that, in such cases, the modified source code becomes available
to the community. It requires the operator of a network server to
provide the source code of the modified version running there to the
users of that server. Therefore, public use of a modified version, on
a publicly accessible server, gives the public access to the source
code of the modified version.
An older license, called the Affero General Public License and
published by Affero, was designed to accomplish similar goals. This is
a different license, not a version of the Affero GPL, but Affero has
released a new version of the Affero GPL which permits relicensing under
this license.
The precise terms and conditions for copying, distribution and
modification follow.
TERMS AND CONDITIONS
0. Definitions.
"This License" refers to version 3 of the GNU Affero General Public License.
"Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
"The Program" refers to any copyrightable work licensed under this
License. Each licensee is addressed as "you". "Licensees" and
"recipients" may be individuals or organizations.
To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy. The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.
A "covered work" means either the unmodified Program or a work based
on the Program.
To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy. Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.
To "convey" a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.
An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License. If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.
1. Source Code.
The "source code" for a work means the preferred form of the work
for making modifications to it. "Object code" means any non-source
form of a work.
A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.
The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form. A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.
The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities. However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work. For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.
The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.
The Corresponding Source for a work in source code form is that
same work.
2. Basic Permissions.
All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met. This License explicitly affirms your unlimited
permission to run the unmodified Program. The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work. This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.
You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force. You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright. Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.
Conveying under any other circumstances is permitted solely under
the conditions stated below. Sublicensing is not allowed; section 10
makes it unnecessary.
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article
11 of the WIPO copyright treaty adopted on 20 December 1996, or
similar laws prohibiting or restricting circumvention of such
measures.
When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention
is effected by exercising rights under this License with respect to
the covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's
users, your or third parties' legal rights to forbid circumvention of
technological measures.
4. Conveying Verbatim Copies.
You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the code;
keep intact all notices of the absence of any warranty; and give all
recipients a copy of this License along with the Program.
You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.
5. Conveying Modified Source Versions.
You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:
a) The work must carry prominent notices stating that you modified
it, and giving a relevant date.
b) The work must carry prominent notices stating that it is
released under this License and any conditions added under section
7. This requirement modifies the requirement in section 4 to
"keep intact all notices".
c) You must license the entire work, as a whole, under this
License to anyone who comes into possession of a copy. This
License will therefore apply, along with any applicable section 7
additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged. This License gives no
permission to license the work in any other way, but it does not
invalidate such permission if you have separately received it.
d) If the work has interactive user interfaces, each must display
Appropriate Legal Notices; however, if the Program has interactive
interfaces that do not display Appropriate Legal Notices, your
work need not make them do so.
A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,
in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit. Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.
6. Conveying Non-Source Forms.
You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:
a) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by the
Corresponding Source fixed on a durable physical medium
customarily used for software interchange.
b) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by a
written offer, valid for at least three years and valid for as
long as you offer spare parts or customer support for that product
model, to give anyone who possesses the object code either (1) a
copy of the Corresponding Source for all the software in the
product that is covered by this License, on a durable physical
medium customarily used for software interchange, for a price no
more than your reasonable cost of physically performing this
conveying of source, or (2) access to copy the
Corresponding Source from a network server at no charge.
c) Convey individual copies of the object code with a copy of the
written offer to provide the Corresponding Source. This
alternative is allowed only occasionally and noncommercially, and
only if you received the object code with such an offer, in accord
with subsection 6b.
d) Convey the object code by offering access from a designated
place (gratis or for a charge), and offer equivalent access to the
Corresponding Source in the same way through the same place at no
further charge. You need not require recipients to copy the
Corresponding Source along with the object code. If the place to
copy the object code is a network server, the Corresponding Source
may be on a different server (operated by you or a third party)
that supports equivalent copying facilities, provided you maintain
clear directions next to the object code saying where to find the
Corresponding Source. Regardless of what server hosts the
Corresponding Source, you remain obligated to ensure that it is
available for as long as needed to satisfy these requirements.
e) Convey the object code using peer-to-peer transmission, provided
you inform other peers where the object code and Corresponding
Source of the work are being offered to the general public at no
charge under subsection 6d.
A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
included in conveying the object code work.
A "User Product" is either (1) a "consumer product", which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling. In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage. For a particular
product received by a particular user, "normally used" refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product. A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.
"Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source. The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.
If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as
part of a transaction in which the right of possession and use of the
User Product is transferred to the recipient in perpetuity or for a
fixed term (regardless of how the transaction is characterized), the
Corresponding Source conveyed under this section must be accompanied
by the Installation Information. But this requirement does not apply
if neither you nor any third party retains the ability to install
modified object code on the User Product (for example, the work has
been installed in ROM).
The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed. Access to a
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
protocols for communication across the network.
Corresponding Source conveyed, and Installation Information provided,
in accord with this section must be in a format that is publicly
documented (and with an implementation available to the public in
source code form), and must require no special password or key for
unpacking, reading or copying.
7. Additional Terms.
"Additional permissions" are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall
be treated as though they were included in this License, to the extent
that they are valid under applicable law. If additional permissions
apply only to part of the Program, that part may be used separately
under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.
When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part of
it. (Additional permissions may be written to require their own
removal in certain cases when you modify the work.) You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.
Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:
a) Disclaiming warranty or limiting liability differently from the
terms of sections 15 and 16 of this License; or
b) Requiring preservation of specified reasonable legal notices or
author attributions in that material or in the Appropriate Legal
Notices displayed by works containing it; or
c) Prohibiting misrepresentation of the origin of that material, or
requiring that modified versions of such material be marked in
reasonable ways as different from the original version; or
d) Limiting the use for publicity purposes of names of licensors or
authors of the material; or
e) Declining to grant rights under trademark law for use of some
trade names, trademarks, or service marks; or
f) Requiring indemnification of licensors and authors of that
material by anyone who conveys the material (or modified versions of
it) with contractual assumptions of liability to the recipient, for
any liability that these contractual assumptions directly impose on
those licensors and authors.
All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10. If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further
restriction, you may remove that term. If a license document contains
a further restriction but permits relicensing or conveying under this
License, you may add to a covered work material governed by the terms
of that license document, provided that the further restriction does
not survive such relicensing or conveying.
If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.
Additional terms, permissive or non-permissive, may be stated in the
form of a separately written license, or stated as exceptions;
the above requirements apply either way.
8. Termination.
You may not propagate or modify a covered work except as expressly
provided under this License. Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights under
this License (including any patent licenses granted under the third
paragraph of section 11).
However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the copyright
holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.
Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License. If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.
9. Acceptance Not Required for Having Copies.
You are not required to accept this License in order to receive or
run a copy of the Program. Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission
to receive a copy likewise does not require acceptance. However,
nothing other than this License grants you permission to propagate or
modify any covered work. These actions infringe copyright if you do
not accept this License. Therefore, by modifying or propagating a
covered work, you indicate your acceptance of this License to do so.
10. Automatic Licensing of Downstream Recipients.
Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License. You are not responsible
for enforcing compliance by third parties with this License.
An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations. If propagation of a covered
work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or could
give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if
the predecessor has it or can get it with reasonable efforts.
You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License. For example, you may
not impose a license fee, royalty, or other charge for exercise of
rights granted under this License, and you may not initiate litigation
(including a cross-claim or counterclaim in a lawsuit) alleging that
any patent claim is infringed by making, using, selling, offering for
sale, or importing the Program or any portion of it.
11. Patents.
A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based. The
work thus licensed is called the contributor's "contributor version".
A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner, permitted
by this License, of making, using, or selling its contributor version,
but do not include claims that would be infringed only as a
consequence of further modification of the contributor version. For
purposes of this definition, "control" includes the right to grant
patent sublicenses in a manner consistent with the requirements of
this License.
Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to
make, use, sell, offer for sale, import and otherwise run, modify and
propagate the contents of its contributor version.
In the following three paragraphs, a "patent license" is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement). To "grant" such a patent license to a
party means to make such an agreement or commitment not to enforce a
patent against the party.
If you convey a covered work, knowingly relying on a patent license,
and the Corresponding Source of the work is not available for anyone
to copy, free of charge and under the terms of this License, through a
publicly available network server or other readily accessible means,
then you must either (1) cause the Corresponding Source to be so
available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner
consistent with the requirements of this License, to extend the patent
license to downstream recipients. "Knowingly relying" means you have
actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.
If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate, modify
or convey a specific copy of the covered work, then the patent license
you grant is automatically extended to all recipients of the covered
work and works based on it.
A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that are
specifically granted under this License. You may not convey a covered
work if you are a party to an arrangement with a third party that is
in the business of distributing software, under which you make payment
to the third party based on the extent of your activity of conveying
the work, and under which the third party grants, to any of the
parties who would receive the covered work from you, a discriminatory
patent license (a) in connection with copies of the covered work
conveyed by you (or copies made from those copies), or (b) primarily
for and in connection with specific products or compilations that
contain the covered work, unless you entered into that arrangement,
or that patent license was granted, prior to 28 March 2007.
Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.
12. No Surrender of Others' Freedom.
If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot convey a
covered work so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you may
not convey it at all. For example, if you agree to terms that obligate you
to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.
13. Remote Network Interaction; Use with the GNU General Public License.
Notwithstanding any other provision of this License, if you modify the
Program, your modified version must prominently offer all users
interacting with it remotely through a computer network (if your version
supports such interaction) an opportunity to receive the Corresponding
Source of your version by providing access to the Corresponding Source
from a network server at no charge, through some standard or customary
means of facilitating copying of software. This Corresponding Source
shall include the Corresponding Source for any work covered by version 3
of the GNU General Public License that is incorporated pursuant to the
following paragraph.
Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU General Public License into a single
combined work, and to convey the resulting work. The terms of this
License will continue to apply to the part which is the covered work,
but the work with which it is combined will remain governed by version
3 of the GNU General Public License.
14. Revised Versions of this License.
The Free Software Foundation may publish revised and/or new versions of
the GNU Affero General Public License from time to time. Such new versions
will be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the
Program specifies that a certain numbered version of the GNU Affero General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation. If the Program does not specify a version number of the
GNU Affero General Public License, you may choose any version ever published
by the Free Software Foundation.
If the Program specifies that a proxy can decide which future
versions of the GNU Affero General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.
Later license versions may give you additional or different
permissions. However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.
15. Disclaimer of Warranty.
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. Limitation of Liability.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.
17. Interpretation of Sections 15 and 16.
If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely approximates
an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published
by the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Also add information on how to contact you by electronic and paper mail.
If your software can interact with users remotely through a computer
network, you should also make sure that it provides a way for users to
get its source. For example, if your program is a web application, its
interface could display a "Source" link that leads users to an archive
of the code. There are many ways you could offer source, and different
solutions will be better for different programs; see section 13 for the
specific requirements.
You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU AGPL, see
<https://www.gnu.org/licenses/>.
Files under unsloth/*, tests/*, scripts/* are Apache 2.0 licensed.
Files under studio/*, unsloth_cli/* which is optional to install are AGPLv3 licensed.

View file

@ -186,7 +186,9 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Copyright [2024-] [Unsloth AI. Inc team, Daniel Han-Chen & Michael Han-Chen]
Files under unsloth/*, tests/*, scripts/* are Apache 2.0 licensed.
Files under studio/*, unsloth_cli/* which is optional to install are AGPLv3 licensed.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.

625
README.md
View file

@ -1,441 +1,266 @@
<p align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/shimmyshimmer/unsloth/main/images/unsloth%20logo%20white%20text.png">
<source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/shimmyshimmer/unsloth/main/images/unsloth%20logo%20black%20text.png">
<img alt="unsloth logo" src="./images/unsloth%20logo%20black%20text.png" height="120" style="max-width: 100%;">
</picture>
</p>
<p align="center">
<a href="https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing"><img src="./images/Free version button.png" height="50"></a>
<a href="https://discord.gg/u54VK8m8tk"><img src="./images/Discord button.png" height="50"></a>
<a href="https://ko-fi.com/unsloth"><img src="./images/Kofi button.png" height="50"></a>
</p>
<h1 align="center" style="margin:0;">
<a href="https://unsloth.ai/docs"><picture>
<source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20logo%20white%20text.png">
<source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20logo%20black%20text.png">
<img alt="Unsloth logo" src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20logo%20black%20text.png" height="80" style="max-width:100%;">
</picture></a>
</h1>
<h3 align="center" style="margin: 0; margin-top: 0;">
Unsloth Studio lets you run and train models locally.
</h3>
<h2 align="center">
Finetune Mistral, Llama 2-5x faster with 50% less memory!
</h2>
<p align="center">
<a href="#-features">Features</a>
<a href="#-install">Quickstart</a>
<a href="#-free-notebooks">Notebooks</a>
<a href="https://unsloth.ai/docs">Documentation</a>
</p>
<br>
<a href="https://unsloth.ai/docs/new/studio">
<img alt="unsloth studio ui homepage" src="https://github.com/user-attachments/assets/53ae17a9-d975-44ef-9686-efb4ebd0454d" style="max-width: 100%; margin-bottom: 0;"></a>
| Llama 2 7b | Mistral 7b | CodeLlama 34b | Llama 7b Kaggle 2x T4 |
|-----------------------------|-----------------------------|-------------------------|------------------------|
| **2.2x faster 43% less VRAM** | **2.2x faster 62% less VRAM** | **1.9x faster 27% less VRAM** | **5.5x faster 44% less VRAM** |
| [⭐Llama **free** Colab notebook](https://colab.research.google.com/drive/1lBzz5KeZJKXjvivbYvmGarix9Ao6Wxe5?usp=sharing") | [⭐Mistral **free** Colab notebook](https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing) | [CodeLlama A100 Colab notebook](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing) | [⭐Kaggle **free** Alpaca notebook](https://www.kaggle.com/danielhanchen/unsloth-alpaca-t4-ddp)
| [Llama A100 Colab notebook](https://colab.research.google.com/drive/1YIPY_18xm-K0iJDgvNkRoJsgkPMPAO3G?usp=sharing) | [Mistral A100 Colab notebook](https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing) | 50+ more examples below! | [⭐Kaggle **free** Slim Orca notebook](https://www.kaggle.com/danielhanchen/unsloth-slimorca-t4-ddp) |
## ⚡ Get started
* **NEW!** [DPO](https://arxiv.org/abs/2305.18290) support. ⭐**Free!** DPO Zephyr, Mistral example! <a href="https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing"><img src="./images/Colab.png" height="20"> [More info](#DPO) on DPO
* **NEW!** [TinyLlama 1.1b](https://github.com/jzhang38/TinyLlama) on 3T tokens! ⭐**Free!** example <a href="https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing"><img src="./images/Colab.png" height="20">
* **NEW!** We're in 🤗 Huggingface's official docs! We're on the [SFT docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth) and the [DPO docs](https://huggingface.co/docs/trl/main/en/dpo_trainer#accelerate-dpo-fine-tuning-using-unsloth)!
* Supports Llama, Yi, Mistral, CodeLlama, Qwen (llamafied), Deepseek and their derived models (Open Hermes etc).
* All kernels written in [OpenAI's Triton](https://openai.com/research/triton) language. **Manual backprop engine**.
* **0% loss in accuracy** - no approximation methods - all exact.
* No change of hardware. Supports NVIDIA GPUs since 2018+. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc) [Check your GPU!](https://developer.nvidia.com/cuda-gpus) GTX 1070, 1080 works, but is slow.
* Works on **Linux** and **Windows** via WSL.
* **NEW!** Download 4 bit models 4x faster from 🤗 Huggingface! Eg: `unsloth/mistral-7b-bnb-4bit`
* Supports 4bit and 16bit QLoRA / LoRA finetuning via [bitsandbytes](https://github.com/TimDettmers/bitsandbytes).
* **NEW!** Want a UI for finetuning? Try [Llama-Factory](https://github.com/hiyouga/LLaMA-Factory) and use `--use_unsloth`!
* Open source trains 5x faster - see [Unsloth Pro](https://unsloth.ai/) for **30x faster training**!
| 1 A100 40GB | 🤗 Hugging Face | Flash Attention | 🦥 Unsloth Open Source | [🦥 Unsloth Pro](https://unsloth.ai/pricing) |
|--------------|--------------|-----------------|---------------------|-----------------|
| Alpaca | 1x | 1.04x | 1.98x | **15.64x** |
| LAION Chip2 | 1x | 0.92x | 1.61x | **20.73x** |
| OASST | 1x | 1.19x | 2.17x | **14.83x** |
| Slim Orca | 1x | 1.18x | 2.22x | **14.82x** |
Join our [Discord](https://discord.gg/nsS4V5Z6ge)!
<img src="./images/unsloth made with love.png" width="200" />
If you trained a model with 🦥 Unsloth, we made a cool sticker if you want to use it!
# Installation Instructions - Conda
Select either `pytorch-cuda=11.8` for CUDA 11.8 or `pytorch-cuda=12.1` for CUDA 12.1.
#### macOS, Linux, WSL:
```bash
conda install cudatoolkit xformers bitsandbytes pytorch pytorch-cuda=12.1 \
-c pytorch -c nvidia -c xformers -c conda-forge -y
pip install "unsloth[conda] @ git+https://github.com/unslothai/unsloth.git"
curl -fsSL https://unsloth.ai/install.sh | sh
```
# Installation Instructions - Pip
Do **NOT** use this if you have Anaconda. You must use the Conda install method, or else stuff will BREAK.
1. Find your CUDA version via
```python
import torch; torch.version.cuda
#### Windows:
```powershell
irm https://unsloth.ai/install.ps1 | iex
```
2. For Pytorch 2.1.0: You can update Pytorch via Pip (interchange `cu121` / `cu118`). Go to https://pytorch.org/ to learn more. Select either `cu118` for CUDA 11.8 or `cu121` for CUDA 12.1. If you have a RTX 3060 or higher (A100, H100 etc), use the `"ampere"` path. For Pytorch 2.1.1: got to step 3.
#### Community:
- [Discord](https://discord.gg/unsloth)
- [𝕏 (Twitter)](https://x.com/UnslothAI)
- [Reddit](https://reddit.com/r/unsloth)
## ⭐ Features
Unsloth Studio (Beta) lets you run and train text, [audio](https://unsloth.ai/docs/basics/text-to-speech-tts-fine-tuning), [embedding](https://unsloth.ai/docs/new/embedding-finetuning), [vision](https://unsloth.ai/docs/basics/vision-fine-tuning) models on Windows, Linux and macOS.
### Inference
* **Search + download + run models** including GGUF, LoRA adapters, safetensors
* **Export models**: [Save or export](https://unsloth.ai/docs/new/studio/export) models to GGUF, 16-bit safetensors and other formats.
* **Tool calling**: Support for [self-healing tool calling](https://unsloth.ai/docs/new/studio/chat#auto-healing-tool-calling) and web search
* **[Code execution](https://unsloth.ai/docs/new/studio/chat#code-execution)**: lets LLMs test code in Claude artifacts and sandbox environments
* [Auto-tune inference parameters](https://unsloth.ai/docs/new/studio/chat#auto-parameter-tuning) and customize chat templates.
* We work directly with teams behind [gpt-oss](https://docs.unsloth.ai/new/gpt-oss-how-to-run-and-fine-tune#unsloth-fixes-for-gpt-oss), [Qwen3](https://www.reddit.com/r/LocalLLaMA/comments/1kaodxu/qwen3_unsloth_dynamic_ggufs_128k_context_bug_fixes/), [Llama 4](https://github.com/ggml-org/llama.cpp/pull/12889), [Mistral](models/tutorials/devstral-how-to-run-and-fine-tune.md), [Gemma 1-3](https://news.ycombinator.com/item?id=39671146), and [Phi-4](https://unsloth.ai/blog/phi4), where weve fixed bugs that improve model accuracy.
* Upload images, audio, PDFs, code, DOCX and more file types to chat with.
### Training
* Train and RL **500+ models** up to **2x faster** with up to **70% less VRAM**, with no accuracy loss.
* Custom Triton and mathematical **kernels**. See some collabs we did with [PyTorch](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/fp8-reinforcement-learning) and [Hugging Face](https://unsloth.ai/docs/new/faster-moe).
* **Data Recipes**: [Auto-create datasets](https://unsloth.ai/docs/new/studio/data-recipe) from **PDF, CSV, DOCX** etc. Edit data in a visual-node workflow.
* **[Reinforcement Learning](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide)** (RL): The most efficient [RL](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide) library, using **80% less VRAM** for GRPO, [FP8](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/fp8-reinforcement-learning) etc.
* Supports full fine-tuning, RL, pretraining, 4-bit, 16-bit and, FP8 training.
* **Observability**: Monitor training live, track loss and GPU usage and customize graphs.
* [Multi-GPU](https://unsloth.ai/docs/basics/multi-gpu-training-with-unsloth) training is supported, with major improvements coming soon.
## 📥 Install
Unsloth can be used in two ways: through **[Unsloth Studio](https://unsloth.ai/docs/new/studio/)**, the web UI, or through **Unsloth Core**, the code-based version. Each has different requirements.
### Unsloth Studio (web UI)
Unsloth Studio (Beta) works on **Windows, Linux, WSL** and **macOS**.
* **CPU:** Supported for Chat and Data Recipes currently
* **NVIDIA:** Training works on RTX 30/40/50, Blackwell, DGX Spark, Station and more
* **macOS:** Currently supports chat and Data Recipes. **MLX training** is coming very soon
* **AMD:** Chat + Data works. Train with [Unsloth Core](#unsloth-core-code-based). Studio support is out soon.
* **Coming soon:** Training support for Apple MLX, AMD, and Intel.
* **Multi-GPU:** Available now, with a major upgrade on the way
#### macOS, Linux, WSL:
```bash
pip install --upgrade --force-reinstall --no-cache-dir torch==2.1.0 triton \
--index-url https://download.pytorch.org/whl/cu121
curl -fsSL https://unsloth.ai/install.sh | sh
```
#### Windows:
```powershell
irm https://unsloth.ai/install.ps1 | iex
```
#### Launch
```bash
pip install "unsloth[cu118] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu118_ampere] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121_ampere] @ git+https://github.com/unslothai/unsloth.git"
unsloth studio -H 0.0.0.0 -p 8888
```
3. For Pytorch 2.1.1: Use the `"ampere"` path for newer RTX 30xx GPUs or higher.
#### Update
To update, use the same install commands as above. Or run (does not work on Windows):
```bash
pip install --upgrade --force-reinstall --no-cache-dir torch==2.1.1 triton \
--index-url https://download.pytorch.org/whl/cu121
unsloth studio update
```
#### Docker
Use our [Docker image](https://hub.docker.com/r/unsloth/unsloth) ```unsloth/unsloth``` container. Run:
```bash
pip install "unsloth[cu118_torch211] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121_torch211] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu118_ampere_torch211] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121_ampere_torch211] @ git+https://github.com/unslothai/unsloth.git"
```
4. We're working on Pytorch 2.1.2 support.
5. If you get errors, try the below first, then go back to step 1:
docker run -d -e JUPYTER_PASSWORD="mypassword" \
-p 8888:8888 -p 8000:8000 -p 2222:22 \
-v $(pwd)/work:/workspace/work \
--gpus all \
unsloth/unsloth
```
#### Developer, Nightly, Uninstall
To see developer, nightly and uninstallation etc. instructions, see [advanced installation](#-advanced-installation).
### Unsloth Core (code-based)
#### Linux, WSL:
```bash
pip install --upgrade pip
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv unsloth_env --python 3.13
source unsloth_env/bin/activate
uv pip install unsloth --torch-backend=auto
```
# Documentation
We support Huggingface's TRL, Trainer, Seq2SeqTrainer or even Pytorch code!
We're in 🤗 Huggingface's official docs! We're on the [SFT docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth) and the [DPO docs](https://huggingface.co/docs/trl/main/en/dpo_trainer#accelerate-dpo-fine-tuning-using-unsloth)!
```python
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset
max_seq_length = 2048 # Supports RoPE Scaling interally, so choose any!
# Get LAION dataset
url = "https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl"
dataset = load_dataset("json", data_files = {"train" : url}, split = "train")
# 4bit pre quantized models we support - 4x faster downloading!
fourbit_models = [
"unsloth/mistral-7b-bnb-4bit",
"unsloth/llama-2-7b-bnb-4bit",
"unsloth/llama-2-13b-bnb-4bit",
"unsloth/codellama-34b-bnb-4bit",
"unsloth/tinyllama-bnb-4bit",
]
# Load Llama model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/mistral-7b-bnb-4bit", # Supports Llama, Mistral - replace this!
max_seq_length = max_seq_length,
dtype = None,
load_in_4bit = True,
)
# Do model patching and add fast LoRA weights
model = FastLanguageModel.get_peft_model(
model,
r = 16,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
use_gradient_checkpointing = True,
random_state = 3407,
max_seq_length = max_seq_length,
)
trainer = SFTTrainer(
model = model,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
tokenizer = tokenizer,
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 10,
max_steps = 60,
fp16 = not torch.cuda.is_bf16_supported(),
bf16 = torch.cuda.is_bf16_supported(),
logging_steps = 1,
output_dir = "outputs",
optim = "adamw_8bit",
seed = 3407,
),
)
trainer.train()
#### Windows:
```powershell
winget install -e --id Python.Python.3.13
winget install --id=astral-sh.uv -e
uv venv unsloth_env --python 3.13
.\unsloth_env\Scripts\activate
uv pip install unsloth --torch-backend=auto
```
For Windows, `pip install unsloth` works only if you have PyTorch installed. Read our [Windows Guide](https://unsloth.ai/docs/get-started/install/windows-installation).
You can use the same Docker image as Unsloth Studio.
<a name="DPO"></a>
# DPO (Direct Preference Optimization) Support
DPO, PPO, Reward Modelling all seem to work as per 3rd party independent testing from [Llama-Factory](https://github.com/hiyouga/LLaMA-Factory). We have a preliminary Google Colab notebook for reproducing Zephyr on Tesla T4 here: [notebook](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing).
#### AMD, Intel:
For RTX 50x, B200, 6000 GPUs: `uv pip install unsloth --torch-backend=auto`. Read our guides for: [Blackwell](https://unsloth.ai/docs/blog/fine-tuning-llms-with-blackwell-rtx-50-series-and-unsloth) and [DGX Spark](https://unsloth.ai/docs/blog/fine-tuning-llms-with-nvidia-dgx-spark-and-unsloth). <br>
To install Unsloth on **AMD** and **Intel** GPUs, follow our [AMD Guide](https://unsloth.ai/docs/get-started/install/amd) and [Intel Guide](https://unsloth.ai/docs/get-started/install/intel).
We're in 🤗 Huggingface's official docs! We're on the [SFT docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth) and the [DPO docs](https://huggingface.co/docs/trl/main/en/dpo_trainer#accelerate-dpo-fine-tuning-using-unsloth)!
## 📒 Free Notebooks
```python
from unsloth import FastLanguageModel, PatchDPOTrainer
PatchDPOTrainer()
import torch
from transformers import TrainingArguments
from trl import DPOTrainer
Train for free with our notebooks. You can use our new [free Unsloth Studio notebook](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb) to run and train models for free in a web UI.
Read our [guide](https://unsloth.ai/docs/get-started/fine-tuning-llms-guide). Add dataset, run, then deploy your trained model.
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/zephyr-sft-bnb-4bit",
max_seq_length = max_seq_length,
dtype = None,
load_in_4bit = True,
)
| Model | Free Notebooks | Performance | Memory use |
|-----------|---------|--------|----------|
| **Gemma 4 (E2B)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma4_(E2B)-Vision.ipynb) | 1.5x faster | 50% less |
| **Qwen3.5 (4B)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_(4B)_Vision.ipynb) | 1.5x faster | 60% less |
| **gpt-oss (20B)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-(20B)-Fine-tuning.ipynb) | 2x faster | 70% less |
| **Qwen3.5 GSPO** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_(4B)_Vision_GRPO.ipynb) | 2x faster | 70% less |
| **gpt-oss (20B): GRPO** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-(20B)-GRPO.ipynb) | 2x faster | 80% less |
| **Qwen3: Advanced GRPO** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb) | 2x faster | 70% less |
| **embeddinggemma (300M)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/EmbeddingGemma_(300M).ipynb) | 2x faster | 20% less |
| **Mistral Ministral 3 (3B)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Ministral_3_VL_(3B)_Vision.ipynb) | 1.5x faster | 60% less |
| **Llama 3.1 (8B) Alpaca** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb) | 2x faster | 70% less |
| **Llama 3.2 Conversational** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) | 2x faster | 70% less |
| **Orpheus-TTS (3B)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Orpheus_(3B)-TTS.ipynb) | 1.5x faster | 50% less |
# Do model patching and add fast LoRA weights
model = FastLanguageModel.get_peft_model(
model,
r = 64,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 64,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
use_gradient_checkpointing = True,
random_state = 3407,
max_seq_length = max_seq_length,
)
- See all our notebooks for: [Kaggle](https://github.com/unslothai/notebooks?tab=readme-ov-file#-kaggle-notebooks), [GRPO](https://unsloth.ai/docs/get-started/unsloth-notebooks#grpo-reasoning-rl-notebooks), [TTS](https://unsloth.ai/docs/get-started/unsloth-notebooks#text-to-speech-tts-notebooks), [embedding](https://unsloth.ai/docs/new/embedding-finetuning) & [Vision](https://unsloth.ai/docs/get-started/unsloth-notebooks#vision-multimodal-notebooks)
- See [all our models](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [all our notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks)
- See detailed documentation for Unsloth [here](https://unsloth.ai/docs)
dpo_trainer = DPOTrainer(
model = model,
ref_model = None,
args = TrainingArguments(
per_device_train_batch_size = 4,
gradient_accumulation_steps = 8,
warmup_ratio = 0.1,
num_train_epochs = 3,
fp16 = not torch.cuda.is_bf16_supported(),
bf16 = torch.cuda.is_bf16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
seed = 42,
output_dir = "outputs",
),
beta = 0.1,
train_dataset = YOUR_DATASET_HERE,
# eval_dataset = YOUR_DATASET_HERE,
tokenizer = tokenizer,
max_length = 1024,
max_prompt_length = 512,
)
dpo_trainer.train()
```
## 🦥 Unsloth News
- **Qwen3.6**: Qwen3.6-35B-A3B can now be trained and run in Unsloth Studio. [Blog](https://unsloth.ai/docs/models/qwen3.6)
- **Gemma 4**: Run and train Googles new models directly in Unsloth. [Blog](https://unsloth.ai/docs/models/gemma-4)
- **Introducing Unsloth Studio**: our new web UI for running and training LLMs. [Blog](https://unsloth.ai/docs/new/studio)
- **Qwen3.5** - 0.8B, 2B, 4B, 9B, 27B, 35-A3B, 112B-A10B are now supported. [Guide + notebooks](https://unsloth.ai/docs/models/qwen3.5/fine-tune)
- Train **MoE LLMs 12x faster** with 35% less VRAM - DeepSeek, GLM, Qwen and gpt-oss. [Blog](https://unsloth.ai/docs/new/faster-moe)
- **Embedding models**: Unsloth now supports ~1.8-3.3x faster embedding fine-tuning. [Blog](https://unsloth.ai/docs/new/embedding-finetuning) • [Notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks#embedding-models)
- New **7x longer context RL** vs. all other setups, via our new batching algorithms. [Blog](https://unsloth.ai/docs/new/grpo-long-context)
- New RoPE & MLP **Triton Kernels** & **Padding Free + Packing**: 3x faster training & 30% less VRAM. [Blog](https://unsloth.ai/docs/new/3x-faster-training-packing)
- **500K Context**: Training a 20B model with >500K context is now possible on an 80GB GPU. [Blog](https://unsloth.ai/docs/blog/500k-context-length-fine-tuning)
- **FP8 & Vision RL**: You can now do FP8 & VLM GRPO on consumer GPUs. [FP8 Blog](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/fp8-reinforcement-learning) • [Vision RL](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/vision-reinforcement-learning-vlm-rl)
- **gpt-oss** by OpenAI: Read our [RL blog](https://unsloth.ai/docs/models/gpt-oss-how-to-run-and-fine-tune/gpt-oss-reinforcement-learning), [Flex Attention](https://unsloth.ai/docs/models/gpt-oss-how-to-run-and-fine-tune/long-context-gpt-oss-training) blog and [Guide](https://unsloth.ai/docs/models/gpt-oss-how-to-run-and-fine-tune).
# Support us!
We're currently 2 brothers trying to make LLMs for everyone! It'll be super cool if you can support our work!!
<a href="https://ko-fi.com/unsloth"><img src="./images/Kofi button.png" height="50"></a>
# Future Milestones and limitations
1. Support Mixtral.
2. Supports all Mistral, Llama type models, but some are unoptimized (Qwen with biases)
3. Dropout, bias in LoRA matrices are supported, just not optimized.
# Performance comparisons on 1 Tesla T4 GPU:
**Time taken for 1 epoch**
One Tesla T4 on Google Colab
`bsz = 2, ga = 4, max_grad_norm = 0.3, num_train_epochs = 1, seed = 3047, lr = 2e-4, wd = 0.01, optim = "adamw_8bit", schedule = "linear", schedule_steps = 10`
| System | GPU | Alpaca (52K) | LAION OIG (210K) | Open Assistant (10K) | SlimOrca (518K) |
| --- | --- | --- | --- | --- | --- |
| Huggingface | 1 T4 | 23h 15m | 56h 28m | 8h 38m | 391h 41m |
| Unsloth Open | 1 T4 | 13h 7m (1.8x) | 31h 47m (1.8x) | 4h 27m (1.9x) | 240h 4m (1.6x) |
| Unsloth Pro | 1 T4 | 3h 6m (7.5x) | 5h 17m (10.7x) | 1h 7m (7.7x) | 59h 53m (6.5x) |
| Unsloth Max | 1 T4 | 2h 39m (8.8x) | 4h 31m (12.5x) | 0h 58m (8.9x) | 51h 30m (7.6x) |
**Peak Memory Usage**
| System | GPU | Alpaca (52K) | LAION OIG (210K) | Open Assistant (10K) | SlimOrca (518K) |
| --- | --- | --- | --- | --- | --- |
| Huggingface | 1 T4 | 7.3GB | 5.9GB | 14.0GB | 13.3GB |
| Unsloth Open | 1 T4 | 6.8GB | 5.7GB | 7.8GB | 7.7GB |
| Unsloth Pro | 1 T4 | 6.4GB | 6.4GB | 6.4GB | 6.4GB |
| Unsloth Max | 1 T4 | 11.4GB | 12.4GB | 11.9GB | 14.4GB |
# Performance comparisons on 2 Tesla T4 GPUs via DDP:
**Time taken for 1 epoch**
Two Tesla T4s on Kaggle
`bsz = 2, ga = 4, max_grad_norm = 0.3, num_train_epochs = 1, seed = 3047, lr = 2e-4, wd = 0.01, optim = "adamw_8bit", schedule = "linear", schedule_steps = 10`
| System | GPU | Alpaca (52K) | LAION OIG (210K) | Open Assistant (10K) | SlimOrca (518K) * |
| --- | --- | --- | --- | --- | --- |
| Huggingface | 2 T4 | 84h 47m | 163h 48m | 30h 51m | 1301h 24m * |
| Unsloth Pro | 2 T4 | 3h 20m (25.4x) | 5h 43m (28.7x) | 1h 12m (25.7x) | 71h 40m (18.1x) * |
| Unsloth Max | 2 T4 | 3h 4m (27.6x) | 5h 14m (31.3x) | 1h 6m (28.1x) | 54h 20m (23.9x) * |
**Peak Memory Usage on a Multi GPU System (2 GPUs)**
| System | GPU | Alpaca (52K) | LAION OIG (210K) | Open Assistant (10K) | SlimOrca (518K) * |
| --- | --- | --- | --- | --- | --- |
| Huggingface | 2 T4 | 8.4GB \| 6GB | 7.2GB \| 5.3GB | 14.3GB \| 6.6GB | 10.9GB \| 5.9GB * |
| Unsloth Pro | 2 T4 | 7.7GB \| 4.9GB | 7.5GB \| 4.9GB | 8.5GB \| 4.9GB | 6.2GB \| 4.7GB * |
| Unsloth Max | 2 T4 | 10.5GB \| 5GB | 10.6GB \| 5GB | 10.6GB \| 5GB | 10.5GB \| 5GB * |
* Slim Orca `bsz=1` for all benchmarks since `bsz=2` OOMs. We can handle `bsz=2`, but we benchmark it with `bsz=1` for consistency.
# Llama-Factory 3rd party benchmarking
| Method | Bits | TGS | GRAM | Speed |
| --- | --- | --- | --- | --- |
| HF | 16 | 2392 | 18GB | 100% |
| HF+FA2 | 16 | 2954 | 17GB | 123% |
| Unsloth+FA2 | 16 | 4007 | 16GB | **168%** |
| HF | 4 | 2415 | 9GB | 101% |
| Unsloth+FA2 | 4 | 3726 | 7GB | **160%** |
[Link](https://github.com/hiyouga/LLaMA-Factory/wiki/Performance-Comparison) to performance table. TGS: tokens per GPU per second. Model: LLaMA2-7B. GPU: NVIDIA A100 * 1. Batch size: 4. Gradient accumulation: 2. LoRA rank: 8. Max length: 1024.
# How did we make it faster?
Manual autograd, Triton kernels etc. See our [Benchmark Breakdown](https://unsloth.ai/blog/mistral-benchmark) for more info!
# Troubleshooting
1. Sometimes `bitsandbytes` or `xformers` does not link properly. Try running:
## 📥 Advanced Installation
The below advanced instructions are for Unsloth Studio. For Unsloth Core advanced installation, [view our docs](https://unsloth.ai/docs/get-started/install/pip-install#advanced-pip-installation).
#### Developer installs: macOS, Linux, WSL:
```bash
!ldconfig /usr/lib64-nvidia
git clone https://github.com/unslothai/unsloth
cd unsloth
./install.sh --local
unsloth studio -H 0.0.0.0 -p 8888
```
Then to update :
```bash
unsloth studio update
```
2. Windows is not supported as of yet - we rely on Xformers and Triton support, so until both packages support Windows officially, Unsloth will then support Windows.
3. If it doesn't install - maybe try updating `pip`.
#### Developer installs: Windows PowerShell:
```powershell
git clone https://github.com/unslothai/unsloth.git
cd unsloth
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
.\install.ps1 --local
unsloth studio -H 0.0.0.0 -p 8888
```
Then to update :
```bash
unsloth studio update
```
#### Nightly: MacOS, Linux, WSL:
```bash
git clone https://github.com/unslothai/unsloth
cd unsloth
git checkout nightly
./install.sh --local
unsloth studio -H 0.0.0.0 -p 8888
```
Then to launch every time:
```bash
unsloth studio -H 0.0.0.0 -p 8888
```
# Full benchmarking tables
Click "Code" for a fully reproducible example.
"Unsloth Equal" is a preview of our PRO version, with code stripped out. All settings and the loss curve remains identical.
| 1 A100 40GB | Hugging Face | Flash Attention 2 | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |
|--------------|-------------|-------------|-----------------|--------------|---------------|-------------|
| Alpaca | 1x | 1.04x | 1.98x | 2.48x | 5.32x | **15.64x** |
| code | [Code](https://colab.research.google.com/drive/1u4dBeM-0vGNVmmO6X7cScAut-Hyt4KDF?usp=sharing) | [Code](https://colab.research.google.com/drive/1fgTOxpMbVjloQBvZyz4lF4BacKSZOB2A?usp=sharing) | [Code](https://colab.research.google.com/drive/1YIPY_18xm-K0iJDgvNkRoJsgkPMPAO3G?usp=sharing) | [Code](https://colab.research.google.com/drive/1ANW8EFL3LVyTD7Gq4TkheC1Z7Rxw-rHp?usp=sharing) | | |
| seconds| 1040 | 1001 | 525 | 419 | 196 | 67 |
| memory MB| 18235 | 15365 | 9631 | 8525 | | |
| % saved| | 15.74 | 47.18 | 53.25 | | | |
#### Nightly: Windows:
Run in Windows Powershell:
```bash
git clone https://github.com/unslothai/unsloth.git
cd unsloth
git checkout nightly
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
.\install.ps1 --local
unsloth studio -H 0.0.0.0 -p 8888
```
Then to launch every time:
```bash
unsloth studio -H 0.0.0.0 -p 8888
```
#### Uninstall
You can uninstall Unsloth Studio by deleting its install folder usually located under `$HOME/.unsloth/studio` on Mac/Linux/WSL and `%USERPROFILE%\.unsloth\studio` on Windows. Using the `rm -rf` commands will **delete everything**, including your history, cache:
| 1 A100 40GB | Hugging Face | Flash Attention 2 | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |
|--------------|-------------|-------------|-----------------|--------------|---------------|-------------|
| LAION Chip2 | 1x | 0.92x | 1.61x | 1.84x | 7.05x | **20.73x** |
| code |[Code](https://colab.research.google.com/drive/1gjL1TaKwc_xv2TcxJC8QWEWBG1msh3g2?usp=sharing) | [Code](https://colab.research.google.com/drive/15vlPjMr8xDj5BFhGdqunGaOQSMqXPEXU?usp=sharing) | [Code](https://colab.research.google.com/drive/1zPwvf-BmHyHlPMBxDsY8zS0BnQ-KKbCc?usp=sharing) | [Code](https://colab.research.google.com/drive/1X2uHy-arRsZxqWHvKHwwW102JaMwChD2?usp=sharing) | | |
| seconds| 581 | 631 | 361 | 315 | 82 | 28 |
| memory MB| 7763 | 8047 | 7763 | 6441 | | |
| % saved| | -3.66 | 0.00 | 17.03 | | | |
* **MacOS, WSL, Linux:** `rm -rf ~/.unsloth/studio`
* **Windows (PowerShell):** `Remove-Item -Recurse -Force "$HOME\.unsloth\studio"`
For more info, [see our docs](https://unsloth.ai/docs/new/studio/install#uninstall).
| 1 A100 40GB | Hugging Face | Flash Attention 2 | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |
|--------------|-------------|-------------|-----------------|--------------|---------------|-------------|
| OASST | 1x | 1.19x | 2.17x | 2.66x | 5.04x | **14.83x** |
| code |[Code](https://colab.research.google.com/drive/10NzDreFbuWELGUuBv0MOoC7y3MBewaNx?usp=sharing) | [Code](https://colab.research.google.com/drive/1TwdkJ1sHsuEH-kgeCPqSFeCpOnCfz6Ou?usp=sharing) | [Code](https://colab.research.google.com/drive/1AkwjUkOF0XeRBMT_S8Uhh74kitEsZHla?usp=sharing) | [Code](https://colab.research.google.com/drive/1roMkp2UjbeK2t3DkNz50cRs1MT92RPFT?usp=sharing) | | |
| seconds| 1852 | 1558 | 852 | 696 | 367 | 125 |
| memory MB| 26431 | 16565 | 12267| 11223| | |
| % saved| | 37.33 | 53.59 | 57.54 | | |
#### Deleting model files
| 1 A100 40GB | Hugging Face | Flash Attention 2 | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |
|--------------|-------------|-------------|-----------------|--------------|---------------|-------------|
| Slim Orca | 1x | 1.18x | 2.22x | 2.64x | 5.04x | **14.82x** |
| code |[Code](https://colab.research.google.com/drive/1UNo1xsMl8YH7xnWnIVjDFnCAPfc0RGgu?usp=sharing) | [Code](https://colab.research.google.com/drive/1zbphER-SKhbSWGjHTfnBLPFyTgIVvaeH?usp=sharing) | [Code](https://colab.research.google.com/drive/156si33585iv4Uh-VILFglUmIMrNCNuc2?usp=sharing) | [Code](https://colab.research.google.com/drive/1_mhZy7dfl9jEnJRuJBZJ5y3OwW06jgQA?usp=sharing) | | |
| seconds| 1824 | 1545 | 821 | 691 | 362 | 123 |
| memory MB| 24557 | 15681 | 10595| 9007 | | |
| % saved| | 36.14 | 56.86 | 63.32 | | |
You can delete old model files either from the bin icon in model search or by removing the relevant cached model folder from the default Hugging Face cache directory. By default, HF uses:
### Mistral 7b
| 1 A100 40GB | Hugging Face | Flash Attention 2 | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |
|--------------|-------------|-------------|-----------------|--------------|---------------|-------------|
| Mistral 7B Slim Orca | 1x | 1.15x | 2.15x | 2.53x | 4.61x | **13.69x** |
| code | [Code](https://colab.research.google.com/drive/1mePk3KzwTD81hr5mcNcs_AX3Kbg_Ha0x?usp=sharing) | [Code](https://colab.research.google.com/drive/1dgHxjvTmX6hb0bPcLp26RXSE6_n9DKj7?usp=sharing) | [Code](https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing) | [Code](https://colab.research.google.com/drive/18yOiyX0T81mTwZqOALFSCX_tSAqju6aD?usp=sharing) | |
| seconds | 1813 | 1571 | 842 | 718 | 393 | 132 |
| memory MB | 32853 | 19385 | 12465 | 10271 | | |
| % saved| | 40.99 | 62.06 | 68.74 | | |
* **MacOS, Linux, WSL:** `~/.cache/huggingface/hub/`
* **Windows:** `%USERPROFILE%\.cache\huggingface\hub\`
### CodeLlama 34b
| 1 A100 40GB | Hugging Face | Flash Attention 2 | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |
|--------------|-------------|-------------|-----------------|--------------|---------------|-------------|
| Code Llama 34B | OOM ❌ | 0.99x | 1.87x | 2.61x | 4.27x | 12.82x |
| code | [Code](https://colab.research.google.com/drive/1ykfz3BqrtC_AUFegCzUQjjfUNlxp6Otc?usp=sharing) | [Code](https://colab.research.google.com/drive/12ZypxQh7OC6kBXvWZI-5d05I4m-B_hoR?usp=sharing) | [Code](https://colab.research.google.com/drive/1gdHyAx8XJsz2yNV-DHvbHjR1iCef5Qmh?usp=sharing) | [Code](https://colab.research.google.com/drive/1fm7wqx9MJ0kRrwKOfmLkK1Rmw-pySahB?usp=sharing) | |
| seconds | 1953 | 1982 | 1043 | 748 | 458 | 152 |
| memory MB | 40000 | 33217 | 27413 | 22161 | | |
| % saved| | 16.96| 31.47 | 44.60 | | | |
## 💚 Community and Links
| Type | Links |
| ----------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------ |
| <img width="16" src="https://cdn.prod.website-files.com/6257adef93867e50d84d30e2/66e3d80db9971f10a9757c99_Symbol.svg" />  **Discord** | [Join Discord server](https://discord.com/invite/unsloth) |
| <img width="15" src="https://redditinc.com/hs-fs/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" />  **r/unsloth Reddit** | [Join Reddit community](https://reddit.com/r/unsloth) |
| 📚 **Documentation & Wiki** | [Read Our Docs](https://unsloth.ai/docs) |
| <img width="13" src="https://upload.wikimedia.org/wikipedia/commons/0/09/X_(formerly_Twitter)_logo_late_2025.svg" />  **Twitter (aka X)** | [Follow us on X](https://twitter.com/unslothai) |
| 🔮 **Our Models** | [Unsloth Catalog](https://unsloth.ai/docs/get-started/unsloth-model-catalog) |
| ✍️ **Blog** | [Read our Blogs](https://unsloth.ai/blog) |
### 1 Tesla T4
### Citation
| 1 T4 16GB | Hugging Face | Flash Attention | Unsloth Open | Unsloth Pro Equal | Unsloth Pro | Unsloth Max |
|--------------|-------------|-----------------|-----------------|---------------|---------------|-------------|
| Alpaca | 1x | 1.09x | 1.69x | 1.79x | 2.93x | **8.3x** |
| code | [Code](https://colab.research.google.com/drive/1XpLIV4s8Bj5uryB-X2gqM88oRGHEGdaB?usp=sharing) | [Code](https://colab.research.google.com/drive/1LyXu6CjuymQg6ddHX8g1dpUvrMa1nn4L?usp=sharing) | [Code](https://colab.research.google.com/drive/1gsv4LpY7C32otl1rgRo5wXTk4HIitXoM?usp=sharing) | [Code](https://colab.research.google.com/drive/1VtULwRQwhEnVdNryjm27zXfdSM1tNfFK?usp=sharing) | | |
| seconds | 1599 | 1468 | 942 | 894 | 545 | 193 |
| memory MB | 7199 | 7059 | 6459 | 5443 | | |
| % saved | | 1.94 | 10.28 | 24.39 | | |
You can cite the Unsloth repo as follows:
```bibtex
@software{unsloth,
author = {Daniel Han, Michael Han and Unsloth team},
title = {Unsloth},
url = {https://github.com/unslothai/unsloth},
year = {2023}
}
```
If you trained a model with 🦥Unsloth, you can use this cool sticker!   <img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/made with unsloth.png" width="200" align="center" />
| 1 T4 16GB | Hugging Face | Flash Attention | Unsloth Open | Unsloth Pro Equal | Unsloth Pro | Unsloth Max |
|--------------|-------------|-----------------|-----------------|---------------|---------------|-------------|
| LAION Chip2 | 1x | 0.99x | 1.80x | 1.75x | 4.15x | **11.75x** |
| code | [Code](https://colab.research.google.com/drive/1EtdStADehE4FVJnU2Cu6O8p9jDYdqG2L?usp=sharing) | [Code](https://colab.research.google.com/drive/1Ik4jO68odUiQIJ_szZ3xok5fk58WpA5Q?usp=sharing) | [Code](https://colab.research.google.com/drive/1E2nR4V3bXIWBQIUE7uR39lYPr3UikzqH?usp=sharing) | [Code](https://colab.research.google.com/drive/13jbj8D8FOt9KyXwZt9Yf2MsYkD8CyCVR?usp=sharing) | | |
| seconds | 952 | 955 | 529 | 543 | 229 | 81 |
| memory MB | 6037 | 6033 | 5797 | 4855 | | |
| % saved | | 0.07 | 3.98 | 19.58 | | |
### License
Unsloth uses a dual-licensing model of Apache 2.0 and AGPL-3.0. The core Unsloth package remains licensed under **[Apache 2.0](https://github.com/unslothai/unsloth?tab=Apache-2.0-1-ov-file)**, while certain optional components, such as the Unsloth Studio UI are licensed under the open-source license **[AGPL-3.0](https://github.com/unslothai/unsloth?tab=AGPL-3.0-2-ov-file)**.
| 1 T4 16GB | Hugging Face | Flash Attention | Unsloth Open | Unsloth Pro Equal | Unsloth Pro | Unsloth Max |
|--------------|-------------|-----------------|-----------------|---------------|---------------|-------------|
| OASST | 1x | 1.19x | 1.95x | 1.86x | 2.58x | **7.3x** |
| code | [Code](https://colab.research.google.com/drive/1aXzGgEM3yYB6SWy_XR81nQFWME40ksSy?usp=sharing) | [Code](https://colab.research.google.com/drive/1-5MdIOp0cM0scC-CdRZhh8OYhnGHqct4?usp=sharing) | [Code](https://colab.research.google.com/drive/1n-fgduZhRUsSjgpqNtVkXA3rSfE7iBdg?usp=sharing) | [Code](https://colab.research.google.com/drive/1z_GlHr2M_bB4lQrPhdWC7dseZv23cBIy?usp=sharing) | | |
| seconds | 2640 | 2222 | 1355 | 1421 | 1024 | 362 |
| memory MB | 14827 | 10391 | 8413 | 7031 | | |
| % saved | | 29.92 | 43.26 | 52.58 | | |
This structure helps support ongoing Unsloth development while keeping the project open source and enabling the broader ecosystem to continue growing.
| 1 T4 16GB | Hugging Face | Flash Attention | Unsloth Open | Unsloth Pro Equal | Unsloth Pro | Unsloth Max |
|--------------|-------------|-----------------|-----------------|---------------|---------------|-------------|
| Slim Orca | 1x | 1.21x | 1.77x | 1.85x | 2.71x | **7.67x** |
| code | [Code](https://colab.research.google.com/drive/15yLlJx9IE84kzx7ikky45pRcarPyUtEs?usp=sharing) | [Code](https://colab.research.google.com/drive/16IShIBmjKULWy87I-xURpj4nztTkAF13?usp=sharing) | [Code](https://colab.research.google.com/drive/1CJG3XLg_OQpCz71eB7Uqx7wuK_n2b-a8?usp=sharing) | [Code](https://colab.research.google.com/drive/1UmwuWHtlrC6MAfl9mX7A_TRfo5iSHDa-?usp=sharing) | | |
| seconds | 2735 | 2262 | 1545 | 1478 | 1009 | 356 |
| memory MB | 13933 | 10489 | 7661 | 6563 | | |
| % saved | | 24.72 | 45.02 | 52.90 | | |
### 2 Tesla T4s via DDP
| 2 T4 DDP | Hugging Face | Flash Attention | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |
|--------------|----------|-------------|-----------------|--------------|---------------|-------------|
| Alpaca | 1x | 0.99x | 4.95x | 4.44x | 7.28x | **20.61x** |
| code | [Code](https://www.kaggle.com/danielhanchen/hf-original-alpaca-t4-ddp) | [Code](https://www.kaggle.com/danielhanchen/hf-sdpa-alpaca-t4-ddp) | [Code](https://www.kaggle.com/danielhanchen/unsloth-alpaca-t4-ddp) | | |
| seconds | 9882 | 9946 | 1996 | 2227 | 1357 | 480 |
| memory MB| 9176 | 9128 | 6904 | 6782 | | |
| % saved | | 0.52 | 24.76 | 26.09 | | | |
| 2 T4 DDP | Hugging Face | Flash Attention | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |
|--------------|----------|-------------|-----------------|--------------|---------------|-------------|
| LAION Chip2 | 1x | 1.12x | 5.28x | 4.21x | 10.01x | **28.32x** |
| code | [Code](https://www.kaggle.com/danielhanchen/hf-original-laion-t4-ddp) | [Code](https://www.kaggle.com/danielhanchen/hf-sdpa-laion-t4-ddp) | [Code](https://www.kaggle.com/danielhanchen/unsloth-laion-t4-ddp) | | |
| seconds | 5418 | 4854 | 1027 | 1286 | 541 | 191 |
| memory MB| 7316 | 7316 | 5732 | 5934 | | |
| % saved | | 0.00 | 21.65 | 18.89 | | |
| 2 T4 DDP | Hugging Face | Flash Attention | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |
|--------------|----------|-------------|-----------------|--------------|---------------|-------------|
| OASST (bsz=1) | 1x | 1.14x | 5.56x | 5.09x | 5.64x | **15.97x** |
| code | [Code](https://www.kaggle.com/danielhanchen/hf-original-oasst-bsz1-t4-ddp) | [Code](https://www.kaggle.com/danielhanchen/hf-sdpa-oasst-bsz1-t4-ddp) | [Code](https://www.kaggle.com/danielhanchen/unsloth-oasst-bsz1-t4-ddp) | | | |
| seconds | 4503 | 3955 | 811 | 885 | 798 | 282 |
| memory MB | 11896 | 11628 | 6616 | 7105 | | |
| % saved | | 2.25 | 44.38 | 40.27 | | |
| 2 T4 DDP | Hugging Face | Flash Attention | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |
|--------------|----------|-------------|-----------------|--------------|---------------|-------------|
| Slim Orca (bsz=1) | 1x | 0.97x | 5.54x | 4.68x | 6.88x | **19.46x** |
| code | [Code](https://www.kaggle.com/danielhanchen/hf-original-slimorca-bsz1-t4-ddp) | [Code](https://www.kaggle.com/danielhanchen/hf-sdpa-slimorca-bsz1-t4-ddp) | [Code](https://www.kaggle.com/danielhanchen/unsloth-slimorca-bsz1-t4-ddp) | | |
| seconds | 4042 | 4158 | 729 | 863 | 588 | 208 |
| memory MB| 11010 | 11042 | 6492 | 7410 | | |
| % saved | | -0.29| 41.04 | 32.70 | | | |
| 2 T4 DDP | Hugging Face | Flash Attention | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |
|--------------|----------|-------------|-----------------|--------------|---------------|-------------|
| OASST (bsz=2) | OOM ❌ | OOM ❌ | ✓ | ✓ | ✓ | ✓ |
| code | [Code](https://www.kaggle.com/danielhanchen/hf-original-oasst-t4-ddp) | [Code](https://www.kaggle.com/danielhanchen/hf-sdpa-oasst-t4-ddp) | [Code](https://www.kaggle.com/danielhanchen/unsloth-oasst-t4-ddp) | | | |
| seconds | OOM | OOM | 2719 | 3391 | 2794 | 987 |
| memory MB| OOM | OOM | 8134 | 9600 | | |
| % saved | OOM | OOM | | | | |
| 2 T4 DDP | Hugging Face | Flash Attention | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |
|--------------|----------|-------------|-----------------|--------------|---------------|-------------|
| Slim Orca (bsz=2) | OOM ❌ | OOM ❌ | ✓ | ✓ | ✓ |✓ |
| code | [Code](https://www.kaggle.com/danielhanchen/hf-original-slimorca-t4-ddp) | [Code](https://www.kaggle.com/danielhanchen/hf-sdpa-slimorca-t4-ddp) | [Code](https://www.kaggle.com/danielhanchen/unsloth-slimorca-t4-ddp) | | |
| seconds | OOM | OOM | 2990 | 3444 | 2351 | 831 |
| memory MB| OOM | OOM | 7594 | 8881 | | |
| % saved | OOM | OOM | | | | |
# Credits
1. [RandomInternetPreson](https://github.com/RandomInternetPreson) for confirming WSL support
2. [152334H](https://github.com/152334H) for experimental DPO support
3. [atgctg](https://github.com/atgctg) for syntax highlighting
<img src="./images/unsloth loading page render.png" width="300" />
### Thank You to
- The [llama.cpp library](https://github.com/ggml-org/llama.cpp) that lets users run and save models with Unsloth
- The Hugging Face team and their libraries: [transformers](https://github.com/huggingface/transformers) and [TRL](https://github.com/huggingface/trl)
- The Pytorch and [Torch AO](https://github.com/unslothai/unsloth/pull/3391) team for their contributions
- NVIDIA for their [NeMo DataDesigner](https://github.com/NVIDIA-NeMo/DataDesigner) library and their contributions
- And of course for every single person who has contributed or has used Unsloth!

79
build.sh Normal file
View file

@ -0,0 +1,79 @@
#!/usr/bin/env bash
set -euo pipefail
# 1. Build frontend (Vite outputs to dist/)
cd studio/frontend
# Clean stale dist to force a full rebuild
rm -rf dist
# Tailwind v4's oxide scanner respects .gitignore in parent directories.
# Python venvs create a .gitignore with "*" (ignore everything), which
# prevents Tailwind from scanning .tsx source files for class names.
# Temporarily hide any such .gitignore during the build, then restore it.
_HIDDEN_GITIGNORES=()
_dir="$(pwd)"
while [ "$_dir" != "/" ]; do
_dir="$(dirname "$_dir")"
if [ -f "$_dir/.gitignore" ] && grep -qx '\*' "$_dir/.gitignore" 2>/dev/null; then
mv "$_dir/.gitignore" "$_dir/.gitignore._twbuild"
_HIDDEN_GITIGNORES+=("$_dir/.gitignore")
fi
done
_restore_gitignores() {
for _gi in "${_HIDDEN_GITIGNORES[@]+"${_HIDDEN_GITIGNORES[@]}"}"; do
mv "${_gi}._twbuild" "$_gi" 2>/dev/null || true
done
}
trap _restore_gitignores EXIT
# Use bun for install if available (faster), fall back to npm.
_install_ok=false
if command -v bun &>/dev/null; then
if bun install; then
_install_ok=true
else
echo "⚠ bun install failed, falling back to npm"
rm -rf node_modules
fi
fi
if [ "$_install_ok" != "true" ]; then
if ! npm install; then
echo "❌ ERROR: package install failed" >&2
exit 1
fi
fi
npm run build # outputs to studio/frontend/dist/
_restore_gitignores
trap - EXIT
# Validate CSS output -- catch truncated Tailwind builds before packaging
MAX_CSS_SIZE=$(find dist/assets -name '*.css' -exec wc -c {} + 2>/dev/null | sort -n | tail -1 | awk '{print $1}')
if [ -z "$MAX_CSS_SIZE" ]; then
echo "❌ ERROR: No CSS files were emitted into dist/assets."
echo " The frontend build may have failed silently."
exit 1
fi
if [ "$MAX_CSS_SIZE" -lt 100000 ]; then
echo "❌ ERROR: Largest CSS file is only $((MAX_CSS_SIZE / 1024))KB (expected >100KB)."
echo " Tailwind may not have scanned all source files."
echo " Check for .gitignore files blocking the Tailwind oxide scanner."
exit 1
fi
echo "✅ Frontend CSS validated (${MAX_CSS_SIZE} bytes)"
cd ../..
# 2. Clean old artifacts
rm -rf build dist *.egg-info
# 3. Build wheel
python -m build
# 4. Optionally publish
if [ "${1:-}" = "publish" ]; then
python -m twine upload dist/*
fi

7
cli.py Normal file
View file

@ -0,0 +1,7 @@
# SPDX-License-Identifier: AGPL-3.0-only
# Copyright 2026-present the Unsloth AI Inc. team. All rights reserved. See /studio/LICENSE.AGPL-3.0
from unsloth_cli import app
if __name__ == "__main__":
app()

BIN
images/Assistant.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 81 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

BIN
images/Merge.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

BIN
images/Run.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 162 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 159 KiB

BIN
images/Terminal_Type.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

BIN
images/Where_Terminal.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 175 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 69 KiB

BIN
images/ollama.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

BIN
images/unsloth end.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 871 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 57 KiB

After

Width:  |  Height:  |  Size: 354 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 56 KiB

After

Width:  |  Height:  |  Size: 59 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 58 KiB

After

Width:  |  Height:  |  Size: 351 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 59 KiB

After

Width:  |  Height:  |  Size: 59 KiB

BIN
images/unsloth sticker.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.2 MiB

1125
install.ps1 Normal file

File diff suppressed because it is too large Load diff

1671
install.sh Executable file

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

179
scripts/enforce_kwargs_spacing.py Executable file
View file

@ -0,0 +1,179 @@
#!/usr/bin/env python3
"""Ensure keyword arguments use spaces around '=', prune redundant pass statements."""
from __future__ import annotations
import ast
import argparse
import io
import sys
import tokenize
from collections import defaultdict
from pathlib import Path
def enforce_spacing(text: str) -> tuple[str, bool]:
"""Return updated text with keyword '=' padded by spaces, plus change flag."""
lines = text.splitlines(keepends=True)
if not lines:
return text, False
offsets: dict[int, int] = defaultdict(int)
changed = False
reader = io.StringIO(text).readline
for token in tokenize.generate_tokens(reader):
if token.type != tokenize.OP or token.string != "=":
continue
line_index = token.start[0] - 1
col = token.start[1] + offsets[line_index]
if line_index < 0 or line_index >= len(lines):
continue
line = lines[line_index]
if col >= len(line) or line[col] != "=":
continue
line_changed = False
# Insert a space before '=' when missing and not preceded by whitespace.
if col > 0 and line[col - 1] not in {" ", "\t"}:
line = f"{line[:col]} {line[col:]}"
offsets[line_index] += 1
col += 1
line_changed = True
changed = True
# Insert a space after '=' when missing and not followed by whitespace or newline.
next_index = col + 1
if next_index < len(line) and line[next_index] not in {" ", "\t", "\n", "\r"}:
line = f"{line[:next_index]} {line[next_index:]}"
offsets[line_index] += 1
line_changed = True
changed = True
if line_changed:
lines[line_index] = line
if not changed:
return text, False
return "".join(lines), True
def remove_redundant_passes(text: str) -> tuple[str, bool]:
"""Drop pass statements that share a block with other executable code."""
try:
tree = ast.parse(text)
except SyntaxError:
return text, False
redundant: list[ast.Pass] = []
def visit(node: ast.AST) -> None:
for attr in ("body", "orelse", "finalbody"):
value = getattr(node, attr, None)
if not isinstance(value, list) or len(value) <= 1:
continue
for stmt in value:
if isinstance(stmt, ast.Pass):
redundant.append(stmt)
for stmt in value:
if isinstance(stmt, ast.AST):
visit(stmt)
handlers = getattr(node, "handlers", None)
if handlers:
for handler in handlers:
visit(handler)
visit(tree)
if not redundant:
return text, False
lines = text.splitlines(keepends=True)
changed = False
for node in sorted(
redundant, key=lambda item: (item.lineno, item.col_offset), reverse=True
):
start = node.lineno - 1
end = (node.end_lineno or node.lineno) - 1
if start >= len(lines):
continue
changed = True
if start == end:
line = lines[start]
col_start = node.col_offset
col_end = node.end_col_offset or (col_start + 4)
segment = line[:col_start] + line[col_end:]
lines[start] = segment if segment.strip() else ""
continue
# Defensive fall-back for unexpected multi-line 'pass'.
prefix = lines[start][: node.col_offset]
lines[start] = prefix if prefix.strip() else ""
for idx in range(start + 1, end):
lines[idx] = ""
suffix = lines[end][(node.end_col_offset or 0) :]
lines[end] = suffix
# Normalise to ensure lines end with newlines except at EOF.
result_lines: list[str] = []
for index, line in enumerate(lines):
if not line:
continue
if index < len(lines) - 1 and not line.endswith("\n"):
result_lines.append(f"{line}\n")
else:
result_lines.append(line)
return "".join(result_lines), changed
def process_file(path: Path) -> bool:
try:
with tokenize.open(path) as handle:
original = handle.read()
encoding = handle.encoding
except (OSError, SyntaxError) as exc: # SyntaxError from tokenize on invalid python
print(f"Failed to read {path}: {exc}", file=sys.stderr)
return False
updated, changed = enforce_spacing(original)
updated, removed = remove_redundant_passes(updated)
if changed or removed:
path.write_text(updated, encoding=encoding)
return True
return False
def main(argv: list[str]) -> int:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("files", nargs="+", help="Python files to fix")
args = parser.parse_args(argv)
touched: list[Path] = []
self_path = Path(__file__).resolve()
for entry in args.files:
path = Path(entry)
# Skip modifying this script to avoid self-edit loops.
if path.resolve() == self_path:
continue
if not path.exists() or path.is_dir():
continue
if process_file(path):
touched.append(path)
if touched:
for path in touched:
print(f"Adjusted kwarg spacing in {path}")
return 0
if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))

169
scripts/install_gemma4_mlx.sh Executable file
View file

@ -0,0 +1,169 @@
#!/bin/bash
set -e
# ============================================================
# Gemma 4 MLX — One-command setup + inference
#
# Usage:
# bash install_gemma4_mlx.sh [--venv-dir DIR]
#
# This script:
# 1. Creates a Python virtual environment
# 2. Installs uv, mlx-vlm, transformers
# ============================================================
# ── Output style (inspired by unsloth/install.sh) ─────────────
RULE=""
_rule_i=0
while [ "$_rule_i" -lt 52 ]; do
RULE="${RULE}"
_rule_i=$((_rule_i + 1))
done
if [ -n "${NO_COLOR:-}" ]; then
C_TITLE= C_DIM= C_OK= C_WARN= C_ERR= C_RST=
elif [ -t 1 ] || [ -n "${FORCE_COLOR:-}" ]; then
_ESC="$(printf '\033')"
C_TITLE="${_ESC}[38;5;117m"
C_DIM="${_ESC}[38;5;245m"
C_OK="${_ESC}[38;5;108m"
C_WARN="${_ESC}[38;5;136m"
C_ERR="${_ESC}[91m"
C_RST="${_ESC}[0m"
else
C_TITLE= C_DIM= C_OK= C_WARN= C_ERR= C_RST=
fi
step() { printf " ${C_DIM}%-18.18s${C_RST}${3:-$C_OK}%s${C_RST}\n" "$1" "$2"; }
substep() { printf " ${C_DIM}%-18s${2:-$C_DIM}%s${C_RST}\n" "" "$1"; }
fail() { step "error" "$1" "$C_ERR"; exit 1; }
# ── Parse flags ───────────────────────────────────────────────
VENV_DIR=""
_next_is_venv=false
for arg in "$@"; do
if [ "$_next_is_venv" = true ]; then
VENV_DIR="$arg"
_next_is_venv=false
continue
fi
case "$arg" in
--venv-dir) _next_is_venv=true ;;
esac
done
# Default venv location
if [ -z "$VENV_DIR" ]; then
VENV_DIR="$HOME/.unsloth/unsloth_gemma4_mlx"
fi
# ── Banner ────────────────────────────────────────────────────
echo ""
printf " ${C_TITLE}%s${C_RST}\n" "💎 Gemma 4 MLX Installer"
printf " ${C_DIM}%s${C_RST}\n" "$RULE"
echo ""
# ── Platform check ────────────────────────────────────────────
if [ "$(uname)" != "Darwin" ]; then
fail "MLX requires macOS with Apple Silicon. Detected: $(uname)"
fi
_ARCH=$(uname -m)
if [ "$_ARCH" != "arm64" ]; then
step "warning" "Apple Silicon recommended (detected: $_ARCH)" "$C_WARN"
fi
step "platform" "macOS ($_ARCH)"
# ── Detect Python ─────────────────────────────────────────────
PYTHON=""
for _candidate in python3.12 python3.11 python3.13 python3; do
if command -v "$_candidate" >/dev/null 2>&1; then
PYTHON="$_candidate"
break
fi
done
if [ -z "$PYTHON" ]; then
fail "Python 3 not found. Install via: brew install python@3.12"
fi
_PY_VERSION=$("$PYTHON" -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}')")
step "python" "$PYTHON ($_PY_VERSION)"
# ── Create virtual environment ────────────────────────────────
if [ -x "$VENV_DIR/bin/python" ]; then
step "venv" "using existing environment"
substep "$VENV_DIR"
else
step "venv" "creating virtual environment"
substep "$VENV_DIR"
mkdir -p "$(dirname "$VENV_DIR")"
"$PYTHON" -m venv "$VENV_DIR"
fi
# ── Install uv ───────────────────────────────────────────────
if ! command -v uv >/dev/null 2>&1; then
step "uv" "installing uv package manager..."
_uv_tmp=$(mktemp)
curl -LsSf "https://astral.sh/uv/install.sh" -o "$_uv_tmp"
sh "$_uv_tmp" </dev/null >/dev/null 2>&1
rm -f "$_uv_tmp"
if [ -f "$HOME/.local/bin/env" ]; then
. "$HOME/.local/bin/env"
fi
export PATH="$HOME/.local/bin:$PATH"
substep "done"
else
step "uv" "found $(uv --version 2>/dev/null || echo 'uv')"
fi
_VENV_PY="$VENV_DIR/bin/python"
# ── Install dependencies ──────────────────────────────────────
step "install" "installing mlx-vlm..."
uv pip install --python "$_VENV_PY" -q mlx-vlm
substep "done"
step "install" "installing transformers>=5.5.0..."
if uv pip install --python "$_VENV_PY" -q "transformers>=5.5.0" 2>/dev/null; then
substep "installed from PyPI"
else
substep "PyPI install failed (Python <3.10?), trying GitHub..."
if uv pip install --python "$_VENV_PY" -q "git+https://github.com/huggingface/transformers.git@v5.5-release" 2>/dev/null; then
substep "installed from huggingface/transformers v5.5-release"
else
step "warning" "could not install transformers>=5.5.0" "$C_WARN"
substep "tried: PyPI, huggingface/transformers v5.5-release"
fi
fi
# ── Verify installation ──────────────────────────────────────
if "$_VENV_PY" -c "import mlx_vlm"; then
substep "mlx-vlm verified"
else
fail "Installation verification failed."
fi
# ── Done ──────────────────────────────────────────────────────
echo ""
printf " ${C_TITLE}%s${C_RST}\n" "Gemma 4 MLX installed!"
printf " ${C_DIM}%s${C_RST}\n" "$RULE"
echo ""
step "available models" "unsloth/gemma-4-E2B-it-UD-MLX-4bit"
substep "unsloth/gemma-4-E4B-it-UD-MLX-4bit"
substep "unsloth/gemma-4-26b-a4b-it-UD-MLX-4bit"
substep "unsloth/gemma-4-31b-it-UD-MLX-4bit"
echo ""
step "venv activate" "source ${VENV_DIR}/bin/activate"
echo ""
step "text chat" "python -m mlx_vlm.chat --model unsloth/gemma-4-E2B-it-UD-MLX-4bit"
echo ""
step "vision chat" "python -m mlx_vlm.chat --model unsloth/gemma-4-31b-it-UD-MLX-4bit"
substep "Use /image path/to/image.jpg to load an image"
echo ""
step "gradio UI" "python -m mlx_vlm.chat_ui --model unsloth/gemma-4-31b-it-UD-MLX-4bit"
echo ""
printf " ${C_DIM}%s${C_RST}\n" "$RULE"
echo ""

View file

@ -0,0 +1,191 @@
#!/bin/bash
set -e
# ============================================================
# Qwen3.6 MLX — One-command setup + inference
#
# Usage:
# bash install_qwen3_6_mlx.sh [--venv-dir DIR]
#
# This script:
# 1. Creates a Python virtual environment
# 2. Installs uv, mlx-vlm, transformers, torch, torchvision
# ============================================================
# ── Output style (inspired by unsloth/install.sh) ─────────────
RULE=""
_rule_i=0
while [ "$_rule_i" -lt 52 ]; do
RULE="${RULE}"
_rule_i=$((_rule_i + 1))
done
if [ -n "${NO_COLOR:-}" ]; then
C_TITLE= C_DIM= C_OK= C_WARN= C_ERR= C_RST=
elif [ -t 1 ] || [ -n "${FORCE_COLOR:-}" ]; then
_ESC="$(printf '\033')"
C_TITLE="${_ESC}[38;5;117m"
C_DIM="${_ESC}[38;5;245m"
C_OK="${_ESC}[38;5;108m"
C_WARN="${_ESC}[38;5;136m"
C_ERR="${_ESC}[91m"
C_RST="${_ESC}[0m"
else
C_TITLE= C_DIM= C_OK= C_WARN= C_ERR= C_RST=
fi
step() { printf " ${C_DIM}%-18.18s${C_RST}${3:-$C_OK}%s${C_RST}\n" "$1" "$2"; }
substep() { printf " ${C_DIM}%-18s${2:-$C_DIM}%s${C_RST}\n" "" "$1"; }
fail() { step "error" "$1" "$C_ERR"; exit 1; }
# ── Parse flags ───────────────────────────────────────────────
VENV_DIR=""
_next_is_venv=false
for arg in "$@"; do
if [ "$_next_is_venv" = true ]; then
VENV_DIR="$arg"
_next_is_venv=false
continue
fi
case "$arg" in
--venv-dir) _next_is_venv=true ;;
esac
done
# Default venv location
if [ -z "$VENV_DIR" ]; then
VENV_DIR="$HOME/.unsloth/unsloth_qwen3_6_mlx"
fi
# ── Banner ────────────────────────────────────────────────────
echo ""
printf " ${C_TITLE}%s${C_RST}\n" "Qwen3.6 MLX Installer"
printf " ${C_DIM}%s${C_RST}\n" "$RULE"
echo ""
# ── Platform check ────────────────────────────────────────────
if [ "$(uname)" != "Darwin" ]; then
fail "MLX requires macOS with Apple Silicon. Detected: $(uname)"
fi
_ARCH=$(uname -m)
if [ "$_ARCH" != "arm64" ]; then
step "warning" "Apple Silicon recommended (detected: $_ARCH)" "$C_WARN"
fi
step "platform" "macOS ($_ARCH)"
# ── Detect Python ─────────────────────────────────────────────
PYTHON=""
for _candidate in python3.12 python3.11 python3.13 python3; do
if command -v "$_candidate" >/dev/null 2>&1; then
PYTHON="$_candidate"
break
fi
done
if [ -z "$PYTHON" ]; then
fail "Python 3 not found. Install via: brew install python@3.12"
fi
_PY_VERSION=$("$PYTHON" -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}')")
step "python" "$PYTHON ($_PY_VERSION)"
# ── Create virtual environment ────────────────────────────────
if [ -x "$VENV_DIR/bin/python" ]; then
step "venv" "using existing environment"
substep "$VENV_DIR"
else
step "venv" "creating virtual environment"
substep "$VENV_DIR"
mkdir -p "$(dirname "$VENV_DIR")"
"$PYTHON" -m venv "$VENV_DIR"
fi
# ── Install uv ───────────────────────────────────────────────
if ! command -v uv >/dev/null 2>&1; then
step "uv" "installing uv package manager..."
_uv_tmp=$(mktemp)
curl -LsSf "https://astral.sh/uv/install.sh" -o "$_uv_tmp"
sh "$_uv_tmp" </dev/null
rm -f "$_uv_tmp"
if [ -f "$HOME/.local/bin/env" ]; then
. "$HOME/.local/bin/env"
fi
export PATH="$HOME/.local/bin:$PATH"
substep "done"
else
step "uv" "found $(uv --version 2>/dev/null || echo 'uv')"
fi
_VENV_PY="$VENV_DIR/bin/python"
# ── Install dependencies ──────────────────────────────────────
step "install" "installing mlx-vlm..."
uv pip install --python "$_VENV_PY" -q mlx-vlm
substep "done"
step "install" "installing transformers>=5.2.0..."
if uv pip install --python "$_VENV_PY" -q "transformers>=5.2.0"; then
substep "installed from PyPI"
else
substep "PyPI install failed, trying GitHub..."
if uv pip install --python "$_VENV_PY" -q "git+https://github.com/huggingface/transformers.git"; then
substep "installed from huggingface/transformers main"
else
fail "Could not install transformers>=5.2.0 (required for Qwen3.5/3.6 model support). Please check your Python version (>=3.10 required) and network connection, then try again."
fi
fi
step "install" "installing torch + torchvision (needed for Qwen3 VL processor)..."
uv pip install --python "$_VENV_PY" -q torch torchvision
substep "done"
# ── Verify installation ──────────────────────────────────────
if "$_VENV_PY" -c "import mlx_vlm; import torch; import torchvision; import transformers"; then
substep "mlx-vlm + torch + transformers verified"
else
fail "Installation verification failed. Please ensure Python >=3.10 and try again."
fi
# ── Apply patches for multi-turn image chat ──────────────────
_PATCH_BASE="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/fix/ui-fix/unsloth/models/patches/mlx_vlm_qwen3_5"
_SITE_PKGS=$("$_VENV_PY" -c "import site; print(site.getsitepackages()[0])")
step "patch" "fixing multi-turn image chat..."
if curl -sSLf "${_PATCH_BASE}/qwen3_5.py" -o "${_SITE_PKGS}/mlx_vlm/models/qwen3_5/qwen3_5.py"; then
substep "patched qwen3_5.py (MRoPE position reset)"
else
step "warning" "failed to download qwen3_5.py patch — multi-turn image chat may not work" "$C_WARN"
fi
if curl -sSLf "${_PATCH_BASE}/generate.py" -o "${_SITE_PKGS}/mlx_vlm/generate.py"; then
substep "patched generate.py (mask trim on cache reuse)"
else
step "warning" "failed to download generate.py patch — multi-turn image chat may not work" "$C_WARN"
fi
# Clear pycache so patches take effect
find "${_SITE_PKGS}/mlx_vlm" -name "__pycache__" -type d -exec rm -rf {} + 2>/dev/null || true
substep "cleared bytecode cache"
# ── Done ──────────────────────────────────────────────────────
echo ""
printf " ${C_TITLE}%s${C_RST}\n" "Qwen3.6 MLX installed!"
printf " ${C_DIM}%s${C_RST}\n" "$RULE"
echo ""
step "available models" "unsloth/Qwen3.6-35B-A3B-UD-MLX-3bit"
substep "unsloth/Qwen3.6-35B-A3B-UD-MLX-4bit"
substep "unsloth/Qwen3.6-35B-A3B-MLX-8bit"
echo ""
step "venv activate" "source ${VENV_DIR}/bin/activate"
echo ""
step "vision chat" "python -m mlx_vlm.chat --model unsloth/Qwen3.6-35B-A3B-UD-MLX-4bit"
substep "Use /image path/to/image.jpg to load an image"
echo ""
step "gradio UI" "python -m mlx_vlm.chat_ui --model unsloth/Qwen3.6-35B-A3B-UD-MLX-4bit"
echo ""
printf " ${C_DIM}%s${C_RST}\n" "$RULE"
echo ""

30
scripts/run_ruff_format.py Executable file
View file

@ -0,0 +1,30 @@
#!/usr/bin/env python3
"""Run `ruff format` followed by kwarg spacing enforcement."""
from __future__ import annotations
import subprocess
import sys
from pathlib import Path
HERE = Path(__file__).resolve().parent
def main(argv: list[str]) -> int:
files = [arg for arg in argv if Path(arg).exists()]
if not files:
return 0
ruff_cmd = [sys.executable, "-m", "ruff", "format", *files]
ruff_proc = subprocess.run(ruff_cmd)
if ruff_proc.returncode != 0:
return ruff_proc.returncode
spacing_script = HERE / "enforce_kwargs_spacing.py"
spacing_cmd = [sys.executable, str(spacing_script), *files]
spacing_proc = subprocess.run(spacing_cmd)
return spacing_proc.returncode
if __name__ == "__main__":
raise SystemExit(main(sys.argv[1:]))

661
studio/LICENSE.AGPL-3.0 Normal file
View file

@ -0,0 +1,661 @@
GNU AFFERO GENERAL PUBLIC LICENSE
Version 3, 19 November 2007
Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The GNU Affero General Public License is a free, copyleft license for
software and other kinds of works, specifically designed to ensure
cooperation with the community in the case of network server software.
The licenses for most software and other practical works are designed
to take away your freedom to share and change the works. By contrast,
our General Public Licenses are intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.
Developers that use our General Public Licenses protect your rights
with two steps: (1) assert copyright on the software, and (2) offer
you this License which gives you legal permission to copy, distribute
and/or modify the software.
A secondary benefit of defending all users' freedom is that
improvements made in alternate versions of the program, if they
receive widespread use, become available for other developers to
incorporate. Many developers of free software are heartened and
encouraged by the resulting cooperation. However, in the case of
software used on network servers, this result may fail to come about.
The GNU General Public License permits making a modified version and
letting the public access it on a server without ever releasing its
source code to the public.
The GNU Affero General Public License is designed specifically to
ensure that, in such cases, the modified source code becomes available
to the community. It requires the operator of a network server to
provide the source code of the modified version running there to the
users of that server. Therefore, public use of a modified version, on
a publicly accessible server, gives the public access to the source
code of the modified version.
An older license, called the Affero General Public License and
published by Affero, was designed to accomplish similar goals. This is
a different license, not a version of the Affero GPL, but Affero has
released a new version of the Affero GPL which permits relicensing under
this license.
The precise terms and conditions for copying, distribution and
modification follow.
TERMS AND CONDITIONS
0. Definitions.
"This License" refers to version 3 of the GNU Affero General Public License.
"Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
"The Program" refers to any copyrightable work licensed under this
License. Each licensee is addressed as "you". "Licensees" and
"recipients" may be individuals or organizations.
To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy. The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.
A "covered work" means either the unmodified Program or a work based
on the Program.
To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy. Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.
To "convey" a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.
An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License. If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.
1. Source Code.
The "source code" for a work means the preferred form of the work
for making modifications to it. "Object code" means any non-source
form of a work.
A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.
The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form. A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.
The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities. However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work. For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.
The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.
The Corresponding Source for a work in source code form is that
same work.
2. Basic Permissions.
All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met. This License explicitly affirms your unlimited
permission to run the unmodified Program. The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work. This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.
You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force. You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright. Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.
Conveying under any other circumstances is permitted solely under
the conditions stated below. Sublicensing is not allowed; section 10
makes it unnecessary.
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article
11 of the WIPO copyright treaty adopted on 20 December 1996, or
similar laws prohibiting or restricting circumvention of such
measures.
When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention
is effected by exercising rights under this License with respect to
the covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's
users, your or third parties' legal rights to forbid circumvention of
technological measures.
4. Conveying Verbatim Copies.
You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the code;
keep intact all notices of the absence of any warranty; and give all
recipients a copy of this License along with the Program.
You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.
5. Conveying Modified Source Versions.
You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:
a) The work must carry prominent notices stating that you modified
it, and giving a relevant date.
b) The work must carry prominent notices stating that it is
released under this License and any conditions added under section
7. This requirement modifies the requirement in section 4 to
"keep intact all notices".
c) You must license the entire work, as a whole, under this
License to anyone who comes into possession of a copy. This
License will therefore apply, along with any applicable section 7
additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged. This License gives no
permission to license the work in any other way, but it does not
invalidate such permission if you have separately received it.
d) If the work has interactive user interfaces, each must display
Appropriate Legal Notices; however, if the Program has interactive
interfaces that do not display Appropriate Legal Notices, your
work need not make them do so.
A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,
in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit. Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.
6. Conveying Non-Source Forms.
You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:
a) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by the
Corresponding Source fixed on a durable physical medium
customarily used for software interchange.
b) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by a
written offer, valid for at least three years and valid for as
long as you offer spare parts or customer support for that product
model, to give anyone who possesses the object code either (1) a
copy of the Corresponding Source for all the software in the
product that is covered by this License, on a durable physical
medium customarily used for software interchange, for a price no
more than your reasonable cost of physically performing this
conveying of source, or (2) access to copy the
Corresponding Source from a network server at no charge.
c) Convey individual copies of the object code with a copy of the
written offer to provide the Corresponding Source. This
alternative is allowed only occasionally and noncommercially, and
only if you received the object code with such an offer, in accord
with subsection 6b.
d) Convey the object code by offering access from a designated
place (gratis or for a charge), and offer equivalent access to the
Corresponding Source in the same way through the same place at no
further charge. You need not require recipients to copy the
Corresponding Source along with the object code. If the place to
copy the object code is a network server, the Corresponding Source
may be on a different server (operated by you or a third party)
that supports equivalent copying facilities, provided you maintain
clear directions next to the object code saying where to find the
Corresponding Source. Regardless of what server hosts the
Corresponding Source, you remain obligated to ensure that it is
available for as long as needed to satisfy these requirements.
e) Convey the object code using peer-to-peer transmission, provided
you inform other peers where the object code and Corresponding
Source of the work are being offered to the general public at no
charge under subsection 6d.
A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
included in conveying the object code work.
A "User Product" is either (1) a "consumer product", which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling. In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage. For a particular
product received by a particular user, "normally used" refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product. A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.
"Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source. The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.
If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as
part of a transaction in which the right of possession and use of the
User Product is transferred to the recipient in perpetuity or for a
fixed term (regardless of how the transaction is characterized), the
Corresponding Source conveyed under this section must be accompanied
by the Installation Information. But this requirement does not apply
if neither you nor any third party retains the ability to install
modified object code on the User Product (for example, the work has
been installed in ROM).
The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed. Access to a
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
protocols for communication across the network.
Corresponding Source conveyed, and Installation Information provided,
in accord with this section must be in a format that is publicly
documented (and with an implementation available to the public in
source code form), and must require no special password or key for
unpacking, reading or copying.
7. Additional Terms.
"Additional permissions" are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall
be treated as though they were included in this License, to the extent
that they are valid under applicable law. If additional permissions
apply only to part of the Program, that part may be used separately
under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.
When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part of
it. (Additional permissions may be written to require their own
removal in certain cases when you modify the work.) You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.
Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:
a) Disclaiming warranty or limiting liability differently from the
terms of sections 15 and 16 of this License; or
b) Requiring preservation of specified reasonable legal notices or
author attributions in that material or in the Appropriate Legal
Notices displayed by works containing it; or
c) Prohibiting misrepresentation of the origin of that material, or
requiring that modified versions of such material be marked in
reasonable ways as different from the original version; or
d) Limiting the use for publicity purposes of names of licensors or
authors of the material; or
e) Declining to grant rights under trademark law for use of some
trade names, trademarks, or service marks; or
f) Requiring indemnification of licensors and authors of that
material by anyone who conveys the material (or modified versions of
it) with contractual assumptions of liability to the recipient, for
any liability that these contractual assumptions directly impose on
those licensors and authors.
All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10. If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further
restriction, you may remove that term. If a license document contains
a further restriction but permits relicensing or conveying under this
License, you may add to a covered work material governed by the terms
of that license document, provided that the further restriction does
not survive such relicensing or conveying.
If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.
Additional terms, permissive or non-permissive, may be stated in the
form of a separately written license, or stated as exceptions;
the above requirements apply either way.
8. Termination.
You may not propagate or modify a covered work except as expressly
provided under this License. Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights under
this License (including any patent licenses granted under the third
paragraph of section 11).
However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the copyright
holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.
Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License. If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.
9. Acceptance Not Required for Having Copies.
You are not required to accept this License in order to receive or
run a copy of the Program. Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission
to receive a copy likewise does not require acceptance. However,
nothing other than this License grants you permission to propagate or
modify any covered work. These actions infringe copyright if you do
not accept this License. Therefore, by modifying or propagating a
covered work, you indicate your acceptance of this License to do so.
10. Automatic Licensing of Downstream Recipients.
Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License. You are not responsible
for enforcing compliance by third parties with this License.
An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations. If propagation of a covered
work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or could
give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if
the predecessor has it or can get it with reasonable efforts.
You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License. For example, you may
not impose a license fee, royalty, or other charge for exercise of
rights granted under this License, and you may not initiate litigation
(including a cross-claim or counterclaim in a lawsuit) alleging that
any patent claim is infringed by making, using, selling, offering for
sale, or importing the Program or any portion of it.
11. Patents.
A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based. The
work thus licensed is called the contributor's "contributor version".
A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner, permitted
by this License, of making, using, or selling its contributor version,
but do not include claims that would be infringed only as a
consequence of further modification of the contributor version. For
purposes of this definition, "control" includes the right to grant
patent sublicenses in a manner consistent with the requirements of
this License.
Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to
make, use, sell, offer for sale, import and otherwise run, modify and
propagate the contents of its contributor version.
In the following three paragraphs, a "patent license" is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement). To "grant" such a patent license to a
party means to make such an agreement or commitment not to enforce a
patent against the party.
If you convey a covered work, knowingly relying on a patent license,
and the Corresponding Source of the work is not available for anyone
to copy, free of charge and under the terms of this License, through a
publicly available network server or other readily accessible means,
then you must either (1) cause the Corresponding Source to be so
available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner
consistent with the requirements of this License, to extend the patent
license to downstream recipients. "Knowingly relying" means you have
actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.
If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate, modify
or convey a specific copy of the covered work, then the patent license
you grant is automatically extended to all recipients of the covered
work and works based on it.
A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that are
specifically granted under this License. You may not convey a covered
work if you are a party to an arrangement with a third party that is
in the business of distributing software, under which you make payment
to the third party based on the extent of your activity of conveying
the work, and under which the third party grants, to any of the
parties who would receive the covered work from you, a discriminatory
patent license (a) in connection with copies of the covered work
conveyed by you (or copies made from those copies), or (b) primarily
for and in connection with specific products or compilations that
contain the covered work, unless you entered into that arrangement,
or that patent license was granted, prior to 28 March 2007.
Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.
12. No Surrender of Others' Freedom.
If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot convey a
covered work so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you may
not convey it at all. For example, if you agree to terms that obligate you
to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.
13. Remote Network Interaction; Use with the GNU General Public License.
Notwithstanding any other provision of this License, if you modify the
Program, your modified version must prominently offer all users
interacting with it remotely through a computer network (if your version
supports such interaction) an opportunity to receive the Corresponding
Source of your version by providing access to the Corresponding Source
from a network server at no charge, through some standard or customary
means of facilitating copying of software. This Corresponding Source
shall include the Corresponding Source for any work covered by version 3
of the GNU General Public License that is incorporated pursuant to the
following paragraph.
Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU General Public License into a single
combined work, and to convey the resulting work. The terms of this
License will continue to apply to the part which is the covered work,
but the work with which it is combined will remain governed by version
3 of the GNU General Public License.
14. Revised Versions of this License.
The Free Software Foundation may publish revised and/or new versions of
the GNU Affero General Public License from time to time. Such new versions
will be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the
Program specifies that a certain numbered version of the GNU Affero General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation. If the Program does not specify a version number of the
GNU Affero General Public License, you may choose any version ever published
by the Free Software Foundation.
If the Program specifies that a proxy can decide which future
versions of the GNU Affero General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.
Later license versions may give you additional or different
permissions. However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.
15. Disclaimer of Warranty.
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. Limitation of Liability.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.
17. Interpretation of Sections 15 and 16.
If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely approximates
an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Also add information on how to contact you by electronic and paper mail.
If your software can interact with users remotely through a computer
network, you should also make sure that it provides a way for users to
get its source. For example, if your program is a web application, its
interface could display a "Source" link that leads users to an archive
of the code. There are many ways you could offer source, and different
solutions will be better for different programs; see section 13 for the
specific requirements.
You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU AGPL, see
<https://www.gnu.org/licenses/>.

View file

@ -0,0 +1,153 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"id": "6b87de59",
"metadata": {
"id": "6b87de59"
},
"source": [
"To run this, press \"*Runtime*\" and press \"*Run all*\" on a **free** Tesla T4 Google Colab instance!\n",
"<div class=\"align-center\">\n",
"<a href=\"https://unsloth.ai/\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png\" width=\"115\"></a>\n",
"<a href=\"https://discord.gg/unsloth\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/Discord button.png\" width=\"145\"></a>\n",
"<a href=\"https://unsloth.ai/docs/\"><img src=\"https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true\" width=\"125\"></a> Join Discord if you need help + ⭐ <i>Star us on <a href=\"https://github.com/unslothai/unsloth\">Github</a> </i> ⭐\n",
"</div>\n",
"\n",
"To install Unsloth Studio on your local device, follow [our guide](https://unsloth.ai/docs/new/unsloth-studio/install). Unsloth Studio is licensed [AGPL-3.0](https://github.com/unslothai/unsloth/blob/main/studio/LICENSE.AGPL-3.0).\n",
"\n",
"### Unsloth Studio\n",
"\n",
"Train and run open models with [**Unsloth Studio**](https://unsloth.ai/docs/new/unsloth-studio/start). NEW! Installation should now only take 2 mins!\n",
"\n",
"\n",
"We are actively working on making Unsloth Studio install on Colab T4 GPUs faster.\n",
"\n",
"[Features](https://unsloth.ai/docs/new/unsloth-studio#features) • [Quickstart](https://unsloth.ai/docs/new/unsloth-studio/start) • [Data Recipes](https://unsloth.ai/docs/new/unsloth-studio/data-recipe) • [Studio Chat](https://unsloth.ai/docs/new/unsloth-studio/chat) • [Export](https://unsloth.ai/docs/new/unsloth-studio/export)"
]
},
{
"cell_type": "markdown",
"id": "e4206349",
"metadata": {
"id": "e4206349"
},
"source": [
"<p align=\"left\"><img src=\"https://github.com/unslothai/unsloth/raw/main/studio/frontend/public/studio%20github%20landscape%20colab%20display.png\" width=\"600\"></p>"
]
},
{
"cell_type": "markdown",
"id": "27da2957",
"metadata": {
"id": "27da2957"
},
"source": [
"### Setup: Clone repo and run setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "27e68f91",
"metadata": {
"id": "27e68f91"
},
"outputs": [],
"source": "!git clone --depth 1 --branch main https://github.com/unslothai/unsloth.git\n%cd /content/unsloth\n!chmod +x studio/setup.sh && ./studio/setup.sh"
},
{
"cell_type": "markdown",
"id": "3e1771a9",
"metadata": {
"id": "3e1771a9"
},
"source": [
"### Start Unsloth Studio"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "277e431e",
"metadata": {
"id": "277e431e"
},
"outputs": [],
"source": [
"import sys, time\n",
"sys.path.insert(0, \"/content/unsloth/studio/backend\")\n",
"from colab import start\n",
"start()"
]
},
{
"cell_type": "code",
"source": [
"from google.colab import output\n",
"output.serve_kernel_port_as_iframe(8888, height = 1200, width = \"100%\")\n",
"for _ in range(10000): time.sleep(300), print(\"=\", end = \"\")"
],
"metadata": {
"id": "wb9UELh--XzX"
},
"id": "wb9UELh--XzX",
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "f2b0c6a1",
"metadata": {
"id": "f2b0c6a1"
},
"source": [
"And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n",
"\n",
"Some other resources:\n",
"1. Looking to use Unsloth locally? Read our [Installation Guide](https://unsloth.ai/docs/get-started/install) for details on installing Unsloth on Windows, Docker, AMD, Intel GPUs.\n",
"2. Learn how to do Reinforcement Learning with our [RL Guide and notebooks](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide).\n",
"3. Read our guides and notebooks for [Text-to-speech (TTS)](https://unsloth.ai/docs/basics/text-to-speech-tts-fine-tuning) and [vision](https://unsloth.ai/docs/basics/vision-fine-tuning) model support.\n",
"4. Explore our [LLM Tutorials Directory](https://unsloth.ai/docs/models/tutorials-how-to-fine-tune-and-run-llms) to find dedicated guides for each model.\n",
"5. Need help with Inference? Read our [Inference & Deployment page](https://unsloth.ai/docs/basics/inference-and-deployment) for details on using vLLM, llama.cpp, Ollama etc.\n",
"\n",
"<div class=\"align-center\">\n",
" <a href=\"https://unsloth.ai\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png\" width=\"115\"></a>\n",
" <a href=\"https://discord.gg/unsloth\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/Discord.png\" width=\"145\"></a>\n",
" <a href=\"https://unsloth.ai/docs/\"><img src=\"https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true\" width=\"125\"></a>\n",
"\n",
" Join Discord if you need help + ⭐️ <i>Star us on <a href=\"https://github.com/unslothai/unsloth\">Github</a> </i> ⭐️\n",
"\n",
" <b>This notebook is licensed <a href=\"https://github.com/unslothai/unsloth/blob/main/studio/LICENSE.AGPL-3.0\">AGPL-3.0</a></b>\n",
"</div>"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "T4",
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

2
studio/__init__.py Normal file
View file

@ -0,0 +1,2 @@
# SPDX-License-Identifier: AGPL-3.0-only
# Copyright 2026-present the Unsloth AI Inc. team. All rights reserved. See /studio/LICENSE.AGPL-3.0

View file

@ -0,0 +1,2 @@
# SPDX-License-Identifier: AGPL-3.0-only
# Copyright 2026-present the Unsloth AI Inc. team. All rights reserved. See /studio/LICENSE.AGPL-3.0

View file

@ -0,0 +1,59 @@
# SPDX-License-Identifier: AGPL-3.0-only
# Copyright 2026-present the Unsloth AI Inc. team. All rights reserved. See /studio/LICENSE.AGPL-3.0
"""
Compatibility shim for Anaconda/conda-forge Python builds.
Anaconda modifies sys.version to include distributor metadata between pipe
characters, e.g. '3.12.4 | packaged by Anaconda, Inc. | (main, ...) [MSC ...]'.
Python's platform._sys_version() has a hardcoded regex that cannot parse this,
raising ValueError. CPython closed this as "not planned" (cpython#102396).
This module seeds platform._sys_version_cache so the stdlib parser never sees
the problematic string, fixing the import chain:
structlog -> rich.pretty -> attrs._compat -> platform.python_implementation()
Import this module before any library imports that may trigger the above chain.
Safe to import multiple times (no-op if cache is already seeded or no pipes).
"""
import platform
import re
import sys
def _seed_sys_version_cache() -> None:
"""One-shot cache prime: parse a cleaned sys.version and seed the cache."""
raw = sys.version
# Strip paired |...| segments (Anaconda, conda-forge metadata)
cleaned = re.sub(r"\s*\|[^|]*\|\s*", " ", raw).strip()
# Format B: "ver (build) | label | (build_dup) \n[compiler]"
# After pipe-strip, two consecutive (...) groups remain; drop the second.
cleaned = re.sub(r"(\([^)]*\))\s+\([^)]*\)", r"\1", cleaned)
if "|" in cleaned:
# Unpaired pipe remaining -- keep version + everything from "(" onward
m = re.match(r"([\w.+]+)\s*", cleaned)
p = cleaned.find("(")
if m and p > 0:
cleaned = m.group(0) + cleaned[p:]
if cleaned == raw:
return # Nothing to fix
# Parse the cleaned string through the real stdlib parser
try:
result = platform._sys_version(cleaned)
except ValueError:
return # Cleaning didn't produce a parseable string; don't make things worse
# Seed the cache so future calls with the raw string skip parsing entirely
cache = getattr(platform, "_sys_version_cache", None)
if isinstance(cache, dict):
cache[raw] = result
if "|" in sys.version:
_seed_sys_version_cache()

View file

@ -0,0 +1,2 @@
# SPDX-License-Identifier: AGPL-3.0-only
# Copyright 2026-present the Unsloth AI Inc. team. All rights reserved. See /studio/LICENSE.AGPL-3.0

View file

@ -0,0 +1,2 @@
# SPDX-License-Identifier: AGPL-3.0-only
# Copyright 2026-present the Unsloth AI Inc. team. All rights reserved. See /studio/LICENSE.AGPL-3.0

View file

@ -0,0 +1,42 @@
model: unsloth/Qwen2.5-0.5B
data:
dataset: tatsu-lab/alpaca
format_type: auto
training:
training_type: full
max_seq_length: 2048
load_in_4bit: false
output_dir: outputs
num_epochs: 1
learning_rate: 2e-5
batch_size: 1
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 0
save_steps: 0
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: false
gradient_checkpointing: "unsloth"
lora:
lora_r: 64
lora_alpha: 16
lora_dropout: 0.0
target_modules: ""
vision_all_linear: false
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: unsloth-training
enable_tensorboard: false
tensorboard_dir: runs

View file

@ -0,0 +1,398 @@
{
"_comment": "Per-model-family inference parameter defaults. Sources: (1) Ollama params blobs, (2) Existing Unsloth Studio YAML configs. Patterns ordered longest-match-first.",
"families": {
"qwen3.6": {
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"min_p": 0.0,
"repetition_penalty": 1.0,
"presence_penalty": 1.5
},
"qwen3.5": {
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"min_p": 0.0,
"repetition_penalty": 1.0,
"presence_penalty": 1.5
},
"qwen3-coder": {
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"min_p": 0.0,
"repetition_penalty": 1.0
},
"qwen3-next": {
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"min_p": 0.0,
"repetition_penalty": 1.0
},
"qwen3-vl": {
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"min_p": 0.0,
"repetition_penalty": 1.0
},
"qwen3": {
"temperature": 0.6,
"top_p": 0.95,
"top_k": 20,
"min_p": 0.0,
"repetition_penalty": 1.0
},
"qwen2.5-coder": {
"temperature": 1.5,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.1,
"repetition_penalty": 1.0
},
"qwen2.5-vl": {
"temperature": 1.5,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.1,
"repetition_penalty": 1.0
},
"qwen2.5-omni": {
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"min_p": 0.0,
"repetition_penalty": 1.0
},
"qwen2.5-math": {
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"min_p": 0.0,
"repetition_penalty": 1.0
},
"qwen2.5": {
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"min_p": 0.0,
"repetition_penalty": 1.0
},
"qwen2-vl": {
"temperature": 1.5,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.1,
"repetition_penalty": 1.0
},
"qwen2": {
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"min_p": 0.0,
"repetition_penalty": 1.0
},
"qwq": {
"temperature": 0.6,
"top_p": 0.95,
"top_k": 40,
"min_p": 0.0,
"repetition_penalty": 1.0
},
"gemma-4": {
"temperature": 1.0,
"top_p": 0.95,
"top_k": 64,
"min_p": 0.0,
"repetition_penalty": 1.0,
"presence_penalty": 0.0
},
"gemma-3n": {
"temperature": 1.0,
"top_p": 0.95,
"top_k": 64,
"min_p": 0.0,
"repetition_penalty": 1.0
},
"gemma-3": {
"temperature": 1.0,
"top_p": 0.95,
"top_k": 64,
"min_p": 0.0,
"repetition_penalty": 1.0
},
"medgemma": {
"temperature": 1.0,
"top_p": 0.95,
"top_k": 64,
"min_p": 0.0,
"repetition_penalty": 1.0
},
"gemma-2": {
"temperature": 1.0,
"top_p": 0.95,
"top_k": 64,
"min_p": 0.0,
"repetition_penalty": 1.0
},
"llama-4": {
"temperature": 1.0,
"top_p": 0.9,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"llama-3.3": {
"temperature": 1.5,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.1,
"repetition_penalty": 1.0
},
"llama-3.2": {
"temperature": 1.5,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.1,
"repetition_penalty": 1.0
},
"llama-3.1": {
"temperature": 1.5,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.1,
"repetition_penalty": 1.0
},
"llama-3": {
"temperature": 1.5,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.1,
"repetition_penalty": 1.0
},
"phi-4": {
"temperature": 0.8,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.0,
"repetition_penalty": 1.0
},
"phi-3": {
"temperature": 0.7,
"top_p": 0.9,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"mistral-nemo": {
"temperature": 0.7,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"mistral-small": {
"temperature": 0.15,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"mistral-large": {
"temperature": 0.7,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"magistral": {
"temperature": 0.7,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"ministral": {
"temperature": 0.15,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"devstral": {
"temperature": 0.7,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"pixtral": {
"temperature": 1.5,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.1,
"repetition_penalty": 1.0
},
"deepseek-r1": {
"temperature": 0.6,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"deepseek-v3": {
"temperature": 0.6,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"deepseek-ocr": {
"temperature": 0.0,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"glm-5": {
"temperature": 1.0,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"glm-4": {
"temperature": 1.0,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"nemotron": {
"temperature": 1.0,
"top_p": 1.0,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"minimax-m2.5": {
"temperature": 1.0,
"top_p": 0.95,
"top_k": 40,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"minimax": {
"temperature": 1.0,
"top_p": 0.95,
"top_k": 40,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"gpt-oss": {
"temperature": 1.0,
"top_p": 1.0,
"top_k": 0,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"granite-4": {
"temperature": 0.0,
"top_p": 1.0,
"top_k": 0,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"kimi-k2": {
"temperature": 0.6,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"kimi": {
"temperature": 0.6,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"lfm2": {
"temperature": 0.1,
"top_p": 0.1,
"top_k": 50,
"min_p": 0.15,
"repetition_penalty": 1.05
},
"smollm": {
"temperature": 0.7,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"olmo": {
"temperature": 0.7,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"falcon": {
"temperature": 0.7,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"ernie": {
"temperature": 0.7,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"seed": {
"temperature": 0.7,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"grok": {
"temperature": 1.0,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
},
"mimo": {
"temperature": 0.7,
"top_p": 0.95,
"top_k": -1,
"min_p": 0.01,
"repetition_penalty": 1.0
}
},
"patterns": [
"qwen3.6", "qwen3.5",
"qwen3-coder", "qwen3-next", "qwen3-vl", "qwen3",
"qwen2.5-coder", "qwen2.5-vl", "qwen2.5-omni", "qwen2.5-math", "qwen2.5",
"qwen2-vl", "qwen2",
"qwq",
"gemma-4", "gemma-3n", "gemma-3", "medgemma", "gemma-2",
"llama-4", "llama-3.3", "llama-3.2", "llama-3.1", "llama-3",
"phi-4", "phi-3",
"mistral-nemo", "mistral-small", "mistral-large", "magistral", "ministral",
"devstral", "pixtral",
"deepseek-r1", "deepseek-v3", "deepseek-ocr",
"glm-5", "glm-4",
"nemotron",
"minimax-m2.5", "minimax",
"gpt-oss", "granite-4",
"kimi-k2", "kimi",
"lfm2", "smollm", "olmo", "falcon", "ernie", "seed", "grok", "mimo"
]
}

View file

@ -0,0 +1,42 @@
model: unsloth/Qwen2.5-0.5B
data:
dataset: tatsu-lab/alpaca
format_type: auto
training:
training_type: lora
max_seq_length: 2048
load_in_4bit: true
output_dir: outputs
num_epochs: 1
learning_rate: 0.0002
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 0
save_steps: 0
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: false
gradient_checkpointing: "unsloth"
lora:
lora_r: 64
lora_alpha: 16
lora_dropout: 0.0
target_modules: "q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj"
vision_all_linear: false
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: unsloth-training
enable_tensorboard: false
tensorboard_dir: runs

View file

@ -0,0 +1,56 @@
# Default model training parameters
# Used for models without specific configurations
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_ratio: 0.1
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 16
lora_alpha: 16
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 0.7
top_p: 0.95
top_k: -1
min_p: 0.01

View file

@ -0,0 +1,43 @@
# Model defaults for unsloth/Qwen3-Embedding-0.6B
# Based on Qwen3_Embedding_(0_6B).py embedding notebook
# Also applies to: unsloth/Qwen3-Embedding-4B
training:
max_seq_length: 512
# num_epochs: 2
num_epochs: 0
learning_rate: 3e-5
batch_size: 256
gradient_accumulation_steps: 1
warmup_ratio: 0.03
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: false
gradient_checkpointing: false
optim: "adamw_8bit"
lr_scheduler_type: "constant_with_warmup"
lora:
lora_r: 32
lora_alpha: 32
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "embedding-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 50

View file

@ -0,0 +1,39 @@
# Model defaults for unsloth/all-MiniLM-L6-v2
# Based on All_MiniLM_L6_v2.py embedding notebook
training:
max_seq_length: 512
# num_epochs: 2
num_epochs: 0
learning_rate: 2e-4
batch_size: 256
gradient_accumulation_steps: 1
warmup_ratio: 0.03
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: false
gradient_checkpointing: false
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 64
lora_alpha: 128
lora_dropout: 0.0
target_modules:
- "value"
- "key"
- "dense"
- "query"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "embedding-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 50

View file

@ -0,0 +1,39 @@
# Model defaults for unsloth/bge-m3
# Based on BGE_M3.py embedding notebook
training:
max_seq_length: 512
# num_epochs: 2
num_epochs: 0
learning_rate: 3e-5
batch_size: 256
gradient_accumulation_steps: 1
warmup_ratio: 0.03
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: false
gradient_checkpointing: false
optim: "adamw_8bit"
lr_scheduler_type: "constant_with_warmup"
lora:
lora_r: 32
lora_alpha: 64
lora_dropout: 0.0
target_modules:
- "key"
- "query"
- "dense"
- "value"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "embedding-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 50

View file

@ -0,0 +1,42 @@
# Model defaults for unsloth/embeddinggemma-300m
# Based on EmbeddingGemma_(300M).py embedding notebook
training:
max_seq_length: 1024
# num_epochs: 1
num_epochs: 0
learning_rate: 2e-5
batch_size: 64
gradient_accumulation_steps: 2
warmup_ratio: 0.03
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: false
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 32
lora_alpha: 64
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "embedding-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 5

View file

@ -0,0 +1,38 @@
# Model defaults for unsloth/gte-modernbert-base
# Based on ModernBert.py embedding notebook
training:
max_seq_length: 512
# num_epochs: 2
num_epochs: 0
learning_rate: 3e-5
batch_size: 256
gradient_accumulation_steps: 1
warmup_ratio: 0.03
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: false
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "constant_with_warmup"
lora:
lora_r: 64
lora_alpha: 128
lora_dropout: 0.0
target_modules:
- "Wi"
- "Wo"
- "Wqkv"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "embedding-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 50

View file

@ -0,0 +1,47 @@
# Model defaults for unsloth/ERNIE-4.5-21B-A3B-PT
# Based on ERNIE_4_5_21B_A3B_PT-Conversational.ipynb
# Also applies to: unsloth/ERNIE-4.5-21B-A3B-PT
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 4
gradient_accumulation_steps: 2
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 8
lora_alpha: 16
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false

View file

@ -0,0 +1,55 @@
# Model defaults for unsloth/ERNIE-4.5-VL-28B-A3B-PT
# Based on ERNIE_4_5_VL_28B_A3B_PT_Vision.ipynb
# Also applies to: unsloth/ERNIE-4.5-VL-28B-A3B-PT
# added inference parameters from unsloth notebook
training:
trust_remote_code: true
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 2
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 8
lora_alpha: 16
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: true
temperature: 1.5
min_p: 0.1

View file

@ -0,0 +1,47 @@
# Model defaults for tiiuae/Falcon-H1-0.5B-Instruct
# Based on Falcon_H1_(0.5B)-Alpaca.ipynb
# Also applies to: tiiuae/Falcon-H1-0.5B-Instruct, unsloth/Falcon-H1-0.5B-Instruct
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 8
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: false
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 16
lora_alpha: 16
lora_dropout: 0.1
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false

View file

@ -0,0 +1,50 @@
# Model defaults for unsloth/codegemma-7b-bnb-4bit
# Based on CodeGemma_(7B)-Conversational.ipynb
# Also applies to: unsloth/codegemma-7b, google/codegemma-7b
# added inference parameters from Ollama
training:
trust_remote_code: false
max_seq_length: 4096
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 1
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 16
lora_alpha: 16
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 0
top_p: 0.9

View file

@ -0,0 +1,53 @@
# Model defaults for unsloth/functiongemma-270m-it
# Based on FunctionGemma_(270M).ipynb
# Also applies to: unsloth/functiongemma-270m-it-unsloth-bnb-4bit, google/functiongemma-270m-it, unsloth/functiongemma-270m-it-unsloth-bnb-4bit
# added inference parameters from unsloth guides
training:
trust_remote_code: false
max_seq_length: 4096
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 4
gradient_accumulation_steps: 2
warmup_steps: 10
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 128
lora_alpha: 256
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 1.0
top_k: 64
top_p: 0.95
min_p: 0.0

View file

@ -0,0 +1,46 @@
# Model defaults for unsloth/gemma-2-27b-bnb-4bit
# Based on Gemma2_(9B)-Alpaca.ipynb (same defaults for larger models)
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 16
lora_alpha: 16
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false

View file

@ -0,0 +1,47 @@
# Model defaults for unsloth/gemma-2-2b
# Based on Gemma2_(2B)-Alpaca.ipynb
# Also applies to: unsloth/gemma-2-2b-bnb-4bit, google/gemma-2-2b
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 16
lora_alpha: 16
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false

View file

@ -0,0 +1,53 @@
# Model defaults for unsloth/gemma-3-270m-it
# Based on Gemma3_(270M).ipynb
# Also applies to: unsloth/gemma-3-270m-it-unsloth-bnb-4bit, google/gemma-3-270m-it, unsloth/gemma-3-270m-it-bnb-4bit
# added inference parameters from unsloth guides
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 5e-5
batch_size: 4
gradient_accumulation_steps: 1
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 128
lora_alpha: 128
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 1.0
top_k: 64
top_p: 0.95
min_p: 0.0

View file

@ -0,0 +1,51 @@
# Model defaults for unsloth/gemma-3-27b-it
# Based on Gemma3_(27B)_A100-Conversational.ipynb
# Also applies to: unsloth/gemma-3-27b-it-unsloth-bnb-4bit, google/gemma-3-27b-it, unsloth/gemma-3-27b-it-bnb-4bit
# added inference parameters from unsloth guides
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 8
lora_alpha: 8
lora_dropout: 0.0
target_modules:
- "all-linear"
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 1.0
top_k: 64
top_p: 0.95
min_p: 0.0

View file

@ -0,0 +1,51 @@
# Model defaults for unsloth/gemma-3-4b-it
# Based on Gemma3_(4B).ipynb
# Also applies to: unsloth/gemma-3-4b-it-unsloth-bnb-4bit, google/gemma-3-4b-it, unsloth/gemma-3-4b-it-bnb-4bit
# added inference parameters from unsloth guides
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 8
lora_alpha: 8
lora_dropout: 0.0
target_modules:
- "all-linear"
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 1.0
top_k: 64
top_p: 0.95
min_p: 0.0

View file

@ -0,0 +1,51 @@
# Model defaults for unsloth/gemma-3-4b-pt
# Based on Gemma3_(4B)-Vision.ipynb
# Also applies to: unsloth/gemma-3-4b-pt-unsloth-bnb-4bit, google/gemma-3-4b-pt, unsloth/gemma-3-4b-pt-bnb-4bit
# added inference parameters from unsloth guides
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 2
num_epochs: 0
learning_rate: 2e-4
batch_size: 1
gradient_accumulation_steps: 4
warmup_ratio: 0.03
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: true
optim: "adamw_torch_fused"
lr_scheduler_type: "cosine"
lora:
lora_r: 16
lora_alpha: 16
lora_dropout: 0.0
target_modules:
- "all-linear"
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 1.0
top_k: 64
top_p: 0.95
min_p: 0.0

View file

@ -0,0 +1,53 @@
# Model defaults for unsloth/gemma-3n-E4B-it
# Based on Gemma3N_(4B)-Conversational.ipynb
# Also applies to: unsloth/gemma-3n-E4B-it-unsloth-bnb-4bit, google/gemma-3n-E4B-it, unsloth/gemma-3n-E4B-it-unsloth-bnb-4bit
# added inference parameters from unsloth guides
training:
trust_remote_code: false
max_seq_length: 1024
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 1
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 8
lora_alpha: 8
lora_dropout: 0.0
target_modules:
- "all-linear"
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
audio_input: true
inference:
trust_remote_code: false
temperature: 1.0
top_k: 64
top_p: 0.95
min_p: 0.0

View file

@ -0,0 +1,53 @@
# Model defaults for unsloth/gemma-3n-E4B
# Based on Gemma3N_(4B)-Vision.ipynb
# Also applies to: unsloth/gemma-3n-E4B-unsloth-bnb-4bit, google/gemma-3n-E4B
# added inference parameters from unsloth guides
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 2
num_epochs: 0
learning_rate: 2e-4
batch_size: 1
gradient_accumulation_steps: 4
warmup_ratio: 0.03
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: true
optim: "adamw_torch_fused"
lr_scheduler_type: "cosine"
lora:
lora_r: 32
lora_alpha: 32
lora_dropout: 0.0
target_modules:
- "all-linear"
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
audio_input: true
inference:
trust_remote_code: false
temperature: 1.0
top_k: 64
top_p: 0.95
min_p: 0.0

View file

@ -0,0 +1,47 @@
# Model defaults for unsloth/gemma-4-26B-A4B-it
# Also applies to: google/gemma-4-26B-A4B-it, unsloth/gemma-4-26B-A4B-it-GGUF
training:
trust_remote_code: false
max_seq_length: 2048
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 8
lora_alpha: 8
lora_dropout: 0.0
target_modules:
- "all-linear"
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 1.0
top_p: 0.95
top_k: 64
min_p: 0.0

View file

@ -0,0 +1,47 @@
# Model defaults for unsloth/gemma-4-26B-A4B (base/pretrained)
# Also applies to: google/gemma-4-26B-A4B
training:
trust_remote_code: false
max_seq_length: 2048
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 8
lora_alpha: 8
lora_dropout: 0.0
target_modules:
- "all-linear"
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 1.0
top_p: 0.95
top_k: 64
min_p: 0.0

View file

@ -0,0 +1,47 @@
# Model defaults for unsloth/gemma-4-31B-it
# Also applies to: google/gemma-4-31B-it, unsloth/gemma-4-31B-it-GGUF
training:
trust_remote_code: false
max_seq_length: 2048
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 8
lora_alpha: 8
lora_dropout: 0.0
target_modules:
- "all-linear"
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 1.0
top_p: 0.95
top_k: 64
min_p: 0.0

View file

@ -0,0 +1,47 @@
# Model defaults for unsloth/gemma-4-31B (base/pretrained)
# Also applies to: google/gemma-4-31B
training:
trust_remote_code: false
max_seq_length: 2048
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 8
lora_alpha: 8
lora_dropout: 0.0
target_modules:
- "all-linear"
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 1.0
top_p: 0.95
top_k: 64
min_p: 0.0

View file

@ -0,0 +1,47 @@
# Model defaults for unsloth/gemma-4-E2B-it
# Also applies to: google/gemma-4-E2B-it, unsloth/gemma-4-E2B-it-GGUF
training:
trust_remote_code: false
max_seq_length: 2048
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 8
lora_alpha: 8
lora_dropout: 0.0
target_modules:
- "all-linear"
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 1.0
top_p: 0.95
top_k: 64
min_p: 0.0

View file

@ -0,0 +1,47 @@
# Model defaults for unsloth/gemma-4-E2B (base/pretrained)
# Also applies to: google/gemma-4-E2B
training:
trust_remote_code: false
max_seq_length: 2048
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 8
lora_alpha: 8
lora_dropout: 0.0
target_modules:
- "all-linear"
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 1.0
top_p: 0.95
top_k: 64
min_p: 0.0

View file

@ -0,0 +1,47 @@
# Model defaults for unsloth/gemma-4-E4B-it
# Also applies to: google/gemma-4-E4B-it, unsloth/gemma-4-E4B-it-GGUF
training:
trust_remote_code: false
max_seq_length: 2048
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 8
lora_alpha: 8
lora_dropout: 0.0
target_modules:
- "all-linear"
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 1.0
top_p: 0.95
top_k: 64
min_p: 0.0

View file

@ -0,0 +1,47 @@
# Model defaults for unsloth/gemma-4-E4B (base/pretrained)
# Also applies to: google/gemma-4-E4B
training:
trust_remote_code: false
max_seq_length: 2048
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 8
lora_alpha: 8
lora_dropout: 0.0
target_modules:
- "all-linear"
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 1.0
top_p: 0.95
top_k: 64
min_p: 0.0

View file

@ -0,0 +1,52 @@
# Model defaults for unsloth/gpt-oss-120b
# Based on gpt-oss-(120B)_A100-Fine-tuning.ipynb
# Also applies to: openai/gpt-oss-120b, unsloth/gpt-oss-120b-unsloth-bnb-4bit
# added inference parameters from unsloth guides
training:
trust_remote_code: false
max_seq_length: 4096
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 4
gradient_accumulation_steps: 1
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 16
lora_alpha: 32
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 1.0
top_p: 1.0
top_k: 0

View file

@ -0,0 +1,52 @@
# Model defaults for unsloth/gpt-oss-20b
# Based on gpt-oss-(20B)-Fine-tuning.ipynb
# Also applies to: openai/gpt-oss-20b, unsloth/gpt-oss-20b-unsloth-bnb-4bit, unsloth/gpt-oss-20b-BF16
# added inference parameters from unsloth guides
training:
trust_remote_code: false
max_seq_length: 1024
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 1
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 8
lora_alpha: 16
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 1.0
top_p: 1.0
top_k: 0

View file

@ -0,0 +1,54 @@
# Model defaults for unsloth/granite-4.0-350m
# Based on Granite4.0_350M.ipynb
# Also applies to: ibm-granite/granite-4.0-350m, unsloth/granite-4.0-350m-bnb-4bit
# added inference parameters from unsloth guides
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 32
lora_alpha: 32
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
- "shared_mlp.input_linear"
- "shared_mlp.output_linear"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 0.0
top_p: 1.0
top_k: 0

View file

@ -0,0 +1,54 @@
# Model defaults for unsloth/granite-4.0-h-micro
# Based on Granite4.0.ipynb
# Also applies to: ibm-granite/granite-4.0-h-micro, unsloth/granite-4.0-h-micro-bnb-4bit, unsloth/granite-4.0-h-micro-unsloth-bnb-4bit
# added inference parameters from unsloth guides
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 32
lora_alpha: 32
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
- "shared_mlp.input_linear"
- "shared_mlp.output_linear"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 0.0
top_p: 1.0
top_k: 0

View file

@ -0,0 +1,49 @@
# Model defaults for unsloth/Llama-3.2-11B-Vision-Instruct
# Based on Llama3.2_(11B)-Vision.ipynb
# Also applies to: unsloth/Llama-3.2-11B-Vision-Instruct-unsloth-bnb-4bit, meta-llama/Llama-3.2-11B-Vision-Instruct, unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit
# added inference parameters from unsloth notebook
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 16
lora_alpha: 16
lora_dropout: 0.0
target_modules:
- "all-linear"
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 1.5
min_p: 0.1

View file

@ -0,0 +1,47 @@
# Model defaults for unsloth/Llama-3.2-1B-Instruct
# Based on Llama3.2_(1B)-RAFT.ipynb
# Also applies to: unsloth/Llama-3.2-1B-Instruct-unsloth-bnb-4bit, meta-llama/Llama-3.2-1B-Instruct, unsloth/Llama-3.2-1B-Instruct-bnb-4bit, RedHatAI/Llama-3.2-1B-Instruct-FP8, unsloth/Llama-3.2-1B-Instruct-FP8-Block, unsloth/Llama-3.2-1B-Instruct-FP8-Dynamic
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 5
num_epochs: 0
learning_rate: 2e-5
batch_size: 1
gradient_accumulation_steps: 8
warmup_steps: 0
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: true
optim: "adamw_torch"
lr_scheduler_type: "cosine"
lora:
lora_r: 16
lora_alpha: 16
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false

View file

@ -0,0 +1,51 @@
# Model defaults for unsloth/Llama-3.2-3B-Instruct
# Based on Llama3.2_(1B_and_3B)-Conversational.ipynb
# Also applies to: unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit, meta-llama/Llama-3.2-3B-Instruct, unsloth/Llama-3.2-3B-Instruct-bnb-4bit, RedHatAI/Llama-3.2-3B-Instruct-FP8, unsloth/Llama-3.2-3B-Instruct-FP8-Block, unsloth/Llama-3.2-3B-Instruct-FP8-Dynamic
# added inference parameters from unsloth notebook
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 16
lora_alpha: 16
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 1.5
min_p: 0.1

View file

@ -0,0 +1,51 @@
# Model defaults for unsloth/Llama-3.3-70B-Instruct
# Based on Llama3.3_(70B)_A100-Conversational.ipynb
# Also applies to: unsloth/Llama-3.3-70B-Instruct-unsloth-bnb-4bit, meta-llama/Llama-3.3-70B-Instruct, unsloth/Llama-3.3-70B-Instruct-bnb-4bit, RedHatAI/Llama-3.3-70B-Instruct-FP8, unsloth/Llama-3.3-70B-Instruct-FP8-Block, unsloth/Llama-3.3-70B-Instruct-FP8-Dynamic
# added inference parameters from unsloth notebook
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 16
lora_alpha: 16
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 1.5
min_p: 0.1

View file

@ -0,0 +1,47 @@
# Model defaults for unsloth/Meta-Llama-3.1-70B-bnb-4bit
# Based on Llama3.1_(8B)-Alpaca.ipynb
# Also applies to: unsloth/Meta-Llama-3.1-8B-bnb-4bit, unsloth/Meta-Llama-3.1-8B-unsloth-bnb-4bit, meta-llama/Meta-Llama-3.1-8B, unsloth/Meta-Llama-3.1-8B, unsloth/Meta-Llama-3.1-70B, meta-llama/Meta-Llama-3.1-70B, unsloth/Meta-Llama-3.1-405B-bnb-4bit, meta-llama/Meta-Llama-3.1-405B
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 16
lora_alpha: 16
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false

View file

@ -0,0 +1,47 @@
# Model defaults for unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
# Based on Llama3.1_(8B)-Inference.ipynb
# Also applies to: "unsloth/Meta-Llama-3.1-8B-Instruct-unsloth-bnb-4bit", "meta-llama/Meta-Llama-3.1-8B-Instruct", "unsloth/Meta-Llama-3.1-8B-Instruct","RedHatAI/Llama-3.1-8B-Instruct-FP8","unsloth/Llama-3.1-8B-Instruct-FP8-Block","unsloth/Llama-3.1-8B-Instruct-FP8-Dynamic"
training:
trust_remote_code: false
max_seq_length: 8192
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 16
lora_alpha: 16
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false

View file

@ -0,0 +1,47 @@
# Model defaults for unsloth/llama-3-8b-Instruct-bnb-4bit
# Based on Llama3_(8B)-Conversational.ipynb
# Also applies to: unsloth/llama-3-8b-Instruct, meta-llama/Meta-Llama-3-8B-Instruct
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 16
lora_alpha: 16
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false

View file

@ -0,0 +1,47 @@
# Model defaults for unsloth/llama-3-8b-bnb-4bit
# Based on Llama3_(8B)-Alpaca.ipynb
# Also applies to: unsloth/llama-3-8b, meta-llama/Meta-Llama-3-8B
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 16
lora_alpha: 16
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false

View file

@ -0,0 +1,46 @@
# Model defaults for unsloth/Llasa-3B
# Based on Llasa_TTS_(3B).ipynb and Llasa_TTS_(1B).ipynb
# Also applies to: HKUSTAudio/Llasa-1B
# added inference parameters from unsloth notebook
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 5e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 128
lora_alpha: 128
lora_dropout: 0.0
target_modules:
- "q_proj"
- "v_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 1.2
top_p: 1.2

View file

@ -0,0 +1,56 @@
# Model defaults for unsloth/Magistral-Small-2509
# Based on Magistral_(24B)-Reasoning-Conversational.ipynb
# Also applies to: mistralai/Magistral-Small-2509, unsloth/Magistral-Small-2509-bnb-4bit
# added inference parameters from unsloth guides
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 2
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 32
lora_alpha: 32
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 0.7
min_p: 0.01
top_p: 0.95

View file

@ -0,0 +1,55 @@
# Model defaults for unsloth/Ministral-3-3B-Instruct-2512
# Based on Ministral_3_VL_(3B)_Vision.ipynb
# Also applies to: unsloth/Ministral-3-3B-Instruct-2512
# added inference parameters from unsloth guides
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 4
gradient_accumulation_steps: 2
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 32
lora_alpha: 32
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false
temperature: 0.15
top_p: 0.95

View file

@ -0,0 +1,47 @@
# Model defaults for unsloth/Mistral-Nemo-Base-2407-bnb-4bit
# Based on Mistral_Nemo_(12B)-Alpaca.ipynb
# Also applies to: "unsloth/Mistral-Nemo-Base-2407", "mistralai/Mistral-Nemo-Base-2407", "unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit", "unsloth/Mistral-Nemo-Instruct-2407", "mistralai/Mistral-Nemo-Instruct-2407",
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 16
lora_alpha: 16
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false

View file

@ -0,0 +1,47 @@
# Model defaults for unsloth/Mistral-Small-Instruct-2409
# Based on Mistral_Small_(22B)-Alpaca.ipynb
# Also applies to: unsloth/Mistral-Small-Instruct-2409-bnb-4bit, mistralai/Mistral-Small-Instruct-2409
training:
trust_remote_code: false
max_seq_length: 2048
# num_epochs: 4
num_epochs: 0
learning_rate: 2e-4
batch_size: 1
gradient_accumulation_steps: 4
warmup_steps: 5
max_steps: 30
save_steps: 30
weight_decay: 0.001
random_seed: 3407
packing: false
train_on_completions: true
gradient_checkpointing: "unsloth"
optim: "adamw_8bit"
lr_scheduler_type: "linear"
lora:
lora_r: 16
lora_alpha: 16
lora_dropout: 0.0
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
use_rslora: false
use_loftq: false
logging:
enable_wandb: false
wandb_project: "llm-finetuning"
enable_tensorboard: false
tensorboard_dir: "runs"
log_frequency: 10
inference:
trust_remote_code: false

Some files were not shown because too many files have changed in this diff Show more