unsloth

mirror of https://github.com/unslothai/unsloth synced 2026-04-21 13:37:39 +00:00

Author	SHA1	Message	Date
Avaya Aggarwal	7c5464ad71	feat: Add cactus QAT scheme support (#4679 ) * feat: Add cactus QAT scheme support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test(qat): add tests for cactus QAT scheme and fix missing import * Fix cactus QAT scheme: correct MappingType import, tighten PerGroup filter - Drop the broken `from torchao.dtypes import MappingType` import. `MappingType` lives in `torchao.quantization` (and `torchao.quantization.quant_primitives`); it is not exported from `torchao.dtypes` in any supported torchao release (verified on 0.14, 0.16, 0.17). The previous code raised `ImportError` on every cactus call and was masked as a misleading 'torchao not found' error. - Since `IntxWeightOnlyConfig` already defaults `mapping_type` to `MappingType.SYMMETRIC`, drop the explicit kwarg entirely and remove the import. Behavior is unchanged. - Introduce a named `group_size = 32` constant (matches the int4 / fp8-int4 pattern in the surrounding branches) and add a `% group_size == 0` divisibility guard to the filter. `PerGroup(32)` requires `in_features % 32 == 0` at `quantize_()` time, otherwise torchao raises `ValueError: in_features (N) % group_size (32) must be == 0`. The old `in_features >= 32` filter would admit non-aligned widths (e.g. 33, 48, 65, 127) and crash `_prepare_model_for_qat` for those shapes. * Warn when cactus QAT skips non-divisible Linear layers Multiple reviewers flagged that the divisibility guard added in the previous commit can silently leave Linear layers in full precision when their in_features is not a multiple of 32. For currently supported Unsloth models (Qwen, Llama, Gemma, Mistral, Phi) every Linear width is already a multiple of 32/64/128 so this never triggers, but surfacing the coverage gap is cheap and avoids users assuming 100% QAT coverage when they bring a custom model with unusual shapes. Emit a UserWarning listing up to the first 8 skipped layers whenever the cactus filter excludes any Linear due to the modulo guard. This keeps the lenient silent-skip behavior (consistent with int4 / fp8-int4), but stops making it silent. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-15 07:40:03 -07:00
Avaya Aggarwal	f18e9dddf0	feat: Add support for OLMo-3 model (#4678 ) * feat: Add support for OLMo-3 model in mapping and tests * Update unsloth/models/mapper.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update tests/test_get_model_name.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Fix casing, add Think variants, and align version gate for OLMo-3 PR 4678 Mapper: switch slugs from OLMo-3 to canonical Olmo-3 mixed case, drop the non-existent unsloth/Olmo-3-7B-Instruct-bnb-4bit dead alias, and add the already-published Olmo-3-7B-Think and Olmo-3-32B-Think Unsloth mirrors. Loader: change the olmo3 transformers version gate from Version("4.57.0") to Version("4.57.0.dev0") so nightly/source builds that already contain olmo3 are not blocked, matching the OLMo-2, Gemma 3 and Cohere patterns. * Use canonical Olmo-3 casing and cover Think variants in OLMo-3 tests Mirrors the mapper.py fixes on pr-4678-code: HuggingFace canonical slugs for the OLMo-3 family use mixed-case Olmo-3 (not OLMo-3 like OLMo-2), and Unsloth already hosts Olmo-3-7B-Think and Olmo-3-32B-Think mirrors, so the resolution matrix now covers all three published Olmo-3 families. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-15 07:39:11 -07:00
Roland Tannous	13928b5f0e	Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var (#5024 ) * Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var When set, UNSLOTH_PYTORCH_MIRROR overrides the default https://download.pytorch.org/whl base URL in all four install scripts (install.sh, install.ps1, studio/setup.ps1, studio/install_python_stack.py). When unset or empty, the official URL is used. This lets users behind corporate proxies or in regions with poor connectivity to pytorch.org point at a local mirror without patching scripts. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add pytest for UNSLOTH_PYTORCH_MIRROR in install_python_stack.py Tests that _PYTORCH_WHL_BASE picks up the env var when set, falls back to the official URL when unset or empty, and preserves the value as-is (including trailing slashes). * Remove stale test assertions for missing install.sh messages * Fix GPU mocking in test_get_torch_index_url.sh Extract _has_usable_nvidia_gpu and _has_amd_rocm_gpu alongside get_torch_index_url so the GPU-presence checks work in tests. Add -L flag handling to mock nvidia-smi so it passes the GPU listing check. All 26 tests now pass on CPU-only machines. * Strip trailing slash from UNSLOTH_PYTORCH_MIRROR to avoid double-slash URLs --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 11:39:11 +04:00
Daniel Han	7252410ccc	studio: stream export worker output into the export dialog (#4897 ) * studio: stream export worker output into the export dialog The Export Model dialog only showed a spinner on the "Exporting..." button while the worker subprocess was doing the actual heavy lifting. For Merged to 16bit and GGUF / Llama.cpp exports this meant several minutes (or more, for large models) of opaque silence, with no way to tell whether save_pretrained_merged, convert_hf_to_gguf.py, or llama-quantize was making progress. This adds a live terminal-style output panel inside the export dialog, rendered just above the Cancel / Start Export buttons and scrollable with auto-follow-tail. It shows stdout and stderr from both the worker process itself and any child process it spawns (GGUF converter, llama-quantize), coloured by stream. Backend - core/export/worker.py: new _setup_log_capture(resp_queue) installed before LogConfig.setup_logging. It saves the original stdout/stderr fds, creates pipes, os.dup2's the write ends onto fds 1 and 2 (so every child process inherits the redirected fds), and spins up two daemon reader threads. Each thread reads bytes from a pipe, echoes them back to the original fd (so the server console keeps working), splits on \n and \r, and forwards each line to the resp queue as {"type":"log","stream":"stdout\|stderr","line":...,"ts":...}. PYTHONUNBUFFERED=1 is set so nested Python converters flush immediately. - core/export/orchestrator.py: - Thread-safe ring buffer (collections.deque, maxlen 4000) with a monotonically increasing seq counter. clear_logs(), get_logs_since(cursor), get_current_log_seq(), is_export_active(). - _wait_response handles rtype == "log" by appending to the buffer and continuing the wait loop. Status messages are also surfaced as a "status" stream so users see high level progress alongside raw subprocess output. - load_checkpoint, _run_export, and cleanup_memory now wrap their bodies with the existing self._lock (previously unused), clear the log buffer at the start of each op, and flip _export_active in a try/finally so the SSE endpoint can detect idle. - routes/export.py: - Wrapped every sync orchestrator call (load_checkpoint, cleanup_memory, export_merged_model, export_base_model, export_gguf, export_lora_adapter) in asyncio.to_thread so the FastAPI event loop stays free during long exports. Without this the new SSE endpoint could not be served concurrently with the blocking export POST. - New GET /api/export/logs/stream SSE endpoint. Honors Last-Event-ID and a since query param for reconnect, emits log / heartbeat / complete / error events, uses the id field to carry the log seq so clients can resume cleanly. On first connect without an explicit cursor it starts from the current seq so old lines from a previous run are not replayed. Frontend - features/export/api/export-api.ts: streamExportLogs() helper that authFetches the SSE endpoint and parses id / event / data fields manually (same pattern as streamTrainingProgress in train-api.ts). - features/export/components/export-dialog.tsx: - Local useExportLogs(exporting) hook that opens the SSE stream on exporting transitions to true, accumulates up to 4000 lines in component state, and aborts on cleanup. - New scrollable output panel rendered above DialogFooter, only shown for Merged to 16bit and GGUF / Llama.cpp (LoRA adapter is a fast disk write with nothing to show). Dark terminal styling (bg-black/85, emerald text, rose for stderr, sky for status), max-height 14rem, auto-scrolls to the bottom on new output but stops following if the user scrolls up. A small streaming / idle indicator is shown next to the panel title. - DialogContent widens from sm:max-w-lg to sm:max-w-2xl when the output panel is visible so the logs have room to breathe. Verified - Python smoke test (tests/smoke_export_log_capture.py): spawns a real mp.get_context("spawn") process, installs _setup_log_capture, confirms that parent stdout prints, parent stderr prints, AND a child subprocess invoked via subprocess.run (both its stdout and stderr) are all captured in the resp queue. Passes. - Orchestrator log helpers tested in isolation: _append_log, get_logs_since (with and without a cursor), clear_logs not resetting seq so reconnecting clients still progress. Passes. - routes.export imports cleanly in the studio venv and /logs/stream shows up in router.routes. - bun run build: tsc -b plus vite build, no TypeScript errors. No existing export behavior is changed. If the subprocess, the SSE endpoint, or the frontend hook fails, the export itself still runs to completion the same way it did before, with or without logs visible. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * export dialog: trim bootstrap noise, scope logs per screen, show realpath Several follow-ups to the live export log work: 1. Worker bootstrap noise (transformers venv activation, Unsloth banner, "Top GGUF/hub models" lists, vision detection, 2k-step weight load bar) is dropped from the export-dialog stream. A threading.Event gate in worker.py defaults closed and only opens once _handle_export actually starts; until then the reader thread still echoes lines to the saved console fd for debugging but does not push them onto the resp_queue. The orchestrator already spawns a fresh subprocess for every checkpoint load, so the gate is naturally reset between runs. 2. tqdm in non-tty mode defaults to a 10s mininterval, which makes multi-step bars look frozen in the panel. Set TQDM_MININTERVAL=0.5 in the worker env so any tqdm-driven progress emits more often. 3. The dialog's useExportLogs hook now also clears its line buffer when exportMethod or open changes, so re-opening the dialog into a different action's screen no longer shows the previous action's saved output. A useElapsedSeconds tick + "Working Xs" badge in the log header gives users a visible sign that long single-step phases (cache copies, GGUF conversion) are still running when no new lines are arriving. 4. ExportBackend.export_{merged,base,gguf,lora} now return (success, message, output_path); the worker forwards output_path on each export__done response, the orchestrator's _run_export passes it to routes/export.py, which surfaces it via ExportOperationResponse.details.output_path. The dialog's Export Complete screen renders the resolved on-disk realpath under "Saved to" so users can find their exported model directly. fix(cli): unpack 3-tuple return from export backend ExportOrchestrator.export_{merged,base,gguf,lora} now return (success, message, output_path) so the studio dialog can show the on-disk realpath. The CLI still unpacked 2 values, so every `unsloth export --format ...` crashed with ValueError before reporting completion. Update the four call sites and surface output_path via a "Saved to:" echo. * fix(studio): anchor export log SSE cursor at run start The export dialog SSE defaulted its cursor to get_current_log_seq() at connect time, so any line emitted between the POST that kicks off the export and the client opening the stream was buffered with seqs 1..k and then skipped (seq <= cursor). Long-running exports looked silent during their first seconds. Snapshot _log_seq into _run_start_seq inside clear_logs() and expose it via get_run_start_seq(). The SSE default cursor now uses that snapshot, so every line emitted since the current run began is reachable regardless of when the client connects. Old runs still can't leak in because their seqs are <= the snapshot. * fix(studio): reconnect export log SSE on stream drop useExportLogs launched streamExportLogs once per exporting transition and recorded any drop in .catch(). Long GGUF exports behind a proxy with an idle kill-timeout would silently lose the stream for the rest of the run even though the backend already supports Last-Event-ID resume. The "retry: 3000" directive emitted by the backend is only meaningful to native EventSource; this hook uses a manual fetch + ReadableStream parse so it had no effect. Wrap streamExportLogs in a retry loop that tracks lastSeq from ExportLogEvent.id and passes it as since on reconnect. Backoff is exponential with jitter, capped at 5s, reset on successful open. The loop stops on explicit backend `complete` event or on effect cleanup. * fix(studio): register a second command so Typer keeps `export` as a subcommand The CLI export unpacking tests wrap `unsloth_cli.commands.export.export` in a fresh Typer app with a single registered command. Typer flattens a single-command app into that command, so the test's `runner.invoke(cli_app, ["export", ckpt, out, ...])` treats the leading `"export"` token as an unexpected extra positional argument -- every parametrized case failed with: Got unexpected extra argument (.../out) Register a harmless `noop` second command so Typer preserves subcommand routing and the tests actually exercise the 3-tuple unpack path they were written to guard. Before: 4 failed After: 4 passed --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: studio-install <studio@local.install> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>	2026-04-14 08:55:43 -07:00
Datta Nimmaturi	da78c6be71	[Studio] Install flash attn at setup time for linux (#4979 ) * [Studio] Install flash attn at setup time for linux * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup changes Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Test cases * wheel_utils: narrow url_exists exceptions and log at debug level --------- Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>	2026-04-14 16:40:17 +04:00
Daniel Han	93a24f6698	Add ROCm test suite for PR #4720 (#4824 ) 95 Python tests and 23 shell tests covering ROCm detection, torch index URL selection, hardware flags, prebuilt asset selection, and install pathway logic. All tests use mocks -- no AMD hardware required. Companion to #4720 (AMD ROCm/HIP support).	2026-04-11 04:44:13 -07:00
Ricardo-M-L	d5525e8bbb	fix: check find() return value before adding offset in try_fix_tokenizer (#4923 ) * fix: check find() return value before adding offset in try_fix_tokenizer The `str.find()` result was checked for -1 only after adding `len(find_text)`, turning the guard into dead code. When the substring is absent, `start` becomes `len(find_text) - 1` (a positive number), so the `if start == -1: continue` never triggers and the subsequent slice extracts garbage from the tokenizer string. Split the find and offset into two steps so the -1 check works correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add defensive guards for token_id None and end find() returning -1 - Skip loop iteration early when token_id is None to avoid constructing a find_text that can never match valid JSON - Guard end = tokenizer_string.find('",', start) against -1 to prevent silent garbage extraction from malformed tokenizer strings * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-09 06:15:46 -07:00
kiankyars	ad5972492d	Fix raw text paragraph break normalization (#4884 ) * Fix raw text paragraph break normalization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Normalize horizontal whitespace before stripping non-ASCII and collapse leftover doubles Run the [^\S\n]+ horizontal-whitespace collapse before the non-ASCII strip so that Unicode whitespace (\u00A0, \u202F, \u2009, \u3000, \v, \f, etc.) becomes a single ASCII space instead of being deleted outright. The prior ordering silently merged adjacent words on HTML/PDF/OCR-sourced text: "hello\u00a0world" used to produce "helloworld" after this PR; it now produces "hello world". Also drop \t from the allow-list since the horizontal-whitespace collapse already normalizes tabs to a single space, and add a targeted [ ]{2,} pass right after the non-ASCII strip so that a non-whitespace non-ASCII character sitting between two spaces ("word1 (c) word2") does not leave an interior double space. Without this extra pass, clean_text was not idempotent on such inputs: the first call produced "word1 word2" and only the second call collapsed it to "word1 word2". Fuzz testing over 10000 random inputs now satisfies the idempotence invariant in every case. * Add regression tests for Unicode/control whitespace and non-ASCII edge cases Cover: - Unicode horizontal whitespace separators (NBSP, narrow NBSP, thin space, en/em space, ideographic space, vertical tab, form feed) normalizing to a single ASCII space instead of being deleted. - Mixed paragraph + Unicode whitespace realistic input ("Section\u00a01\r\n\r\nBody\ftext\u202Fhere"). - Tab collapsing and space trimming around newlines. - Non-whitespace non-ASCII characters (copyright, accented letters, emoji) sitting between spaces: must not leave an interior double space, and clean_text must be idempotent on these inputs. - Non-ASCII characters adjacent to a newline: stripping must not leave stray leading or trailing spaces on the neighbouring line, and must not swallow an adjacent paragraph break. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-09 04:45:43 -07:00
Daniel Han	723bfb2363	Add unit tests for HfFileSystem glob skip guard (#4854 ) Tests verifying that HfFileSystem().glob() is correctly skipped when is_model or is_peft is False, matching the guard added in PR #4852.	2026-04-06 08:54:36 -07:00
Daniel Han	8981e6c804	Update test_pr4562_bugfixes.py for simplified install policy (#4817 ) - Add TestFetchJsonRetries for JSON retry logic and max_pages - Update TestSourceCodePatterns for simplified --simple-policy flow - Add tests for installed prebuilt release reporting - Add test for CUDA toolkit version-sorted nvcc discovery - Remove assertions for removed --resolve-install-tag / --resolve-source-build paths	2026-04-03 04:06:14 -07:00
DoubleMathew	7ae9b7f45f	fix windows llama.cpp compile from source issue (#4793 ) * fix windows llama.cpp compile from source issue * undo local repo usage * fix llama.cpp install * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix windows * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: route resolve-source-build call through Invoke-LlamaHelper The --resolve-source-build call at the source-build resolution path was still calling install_llama_prebuilt.py directly instead of going through Invoke-LlamaHelper. On PS7+ with ErrorActionPreference=Stop, stderr from the 422 response (when tag is "master") would trigger a terminating NativeCommandError and crash setup. * fix: suppress stderr error records from Invoke-LlamaHelper ErrorActionPreference=Continue prevents termination but PowerShell still displays stderr lines as visible ErrorRecord objects. Capture all output via 2>&1 and split stdout from stderr manually so that stderr lines never appear on the console. When StderrPath is given the stderr content is written to that file for diagnostics. * fix: always rebuild llama.cpp on Windows when tag is master When the requested llama.cpp tag is "master" (a moving target), skip the "already built" early exit so the build path runs and syncs to the latest commit. Without this, existing llama-server binaries from an older build (e.g. b8635 which lacks Gemma 4 support) are reused and model loading fails. Pinned tags (e.g. b8635) still skip the rebuild when the binary already exists, since the tag is immutable. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-04-02 11:43:46 -07:00
Daniel Han	b20efc370a	Add regression tests for custom llama prebuilt installer (#4772 ) Expand test coverage for install_llama_prebuilt.py: - Add tests for source build plan resolution with custom repos - Add tests for branch/commit/PR ref matching and normalization - Add tests for manifest checksum validation - Add tests for Windows CUDA upstream asset name patterns - Update capsys checks to capture stderr after log() redirect	2026-04-02 04:45:09 -07:00
Daniel Han	dc0729aadf	Add regression test for shell injection fix in GGML conversion (#4773 ) AST-based test ensures subprocess.Popen calls in GGML conversion functions use argv lists instead of shell=True. Companion to PR #4768.	2026-04-02 00:10:47 -07:00
DoubleMathew	71b934ef9d	Fix custom llama.cpp source builds and macos metal source builds (#4762 ) * Fix script unbound variable error * remove stale test script, add llama.cpp metal source builds, update tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Metal precedence, test sync, and add behavioral tests - Move macOS arm64 Metal check before CUDA/ROCm in GPU backend decision chain so Metal is not bypassed when nvcc is in PATH - Remove RPATH flags from CPU fallback CMAKE_ARGS (only needed for Metal library linking) - Update test_llama_pr_force_and_source.py to match _CLONE_ARGS rename from _CLONE_BRANCH_ARGS in setup.sh - Add confirm_install_tree guard test for existing_install_matches_choice - Add TestMacOSMetalBuildLogic bash subprocess tests verifying Metal flag selection, nvcc precedence, and CPU fallback behavior * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Metal CPU fallback to also cover cmake build failures and update tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. _GPU_BACKEND_FRAGMENT synced -- removed dead CPU_FALLBACK_CMAKE_ARGS= init (6/8) 2. RPATH assertion replaced -- new test_macos_arm64_cpu_fallback_args_exclude_rpath checks the actual runtime CPU_FALLBACK_CMAKE_ARGS output for @loader_path and -DCMAKE_BUILD_WITH_INSTALL_RPATH=ON (6/8) 3. _TRY_METAL_CPU_FALLBACK=false reset after both configure-failure and build-failure fallback branches in setup.sh (4/8) 4. macOS test now removes libmtmd.0.dylib instead of the platform-agnostic convert_hf_to_gguf.py (3/8) 5. Empty-string tag test added -- test_empty_tag_omits_branch_flag for resolved_tag= (2/8) 6. RPATH checks on cmake call logs -- both fallback tests now assert @loader_path and -DCMAKE_BUILD_WITH_INSTALL_RPATH=ON are absent from CPU fallback cmake calls, plus baseline flag preservation (multiple) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tests clean up * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-01 14:06:39 -05:00
Daniel Han	d22b2a18f9	fix: add tokenizers to no-torch deps and TORCH_CONSTRAINT for arm64 macOS py313+ (#4748 ) * fix: add tokenizers to no-torch runtime deps and add TORCH_CONSTRAINT for arm64 macOS py313+ Two installer fixes: 1. Add `tokenizers` to `no-torch-runtime.txt` before `transformers`. Without it, `from transformers import AutoConfig` crashes on startup because `--no-deps` skips transitive dependencies. 2. Add `TORCH_CONSTRAINT` variable to `install.sh`. On arm64 macOS with Python 3.13+, tighten the torch requirement to `>=2.6` since torch <2.6 has no cp313 arm64 wheels. The variable replaces the previously hard-coded constraint in the uv pip install line. Includes 66 tests (42 pytest + 24 bash) covering: - Structural checks on install.sh, install.ps1, no-torch-runtime.txt - Shell snippet tests with mocked python for 13 platform/version combos - Mock uv integration verifying correct constraint string - E2E venv tests on Python 3.12 and 3.13 confirming AutoConfig works - Negative control proving AutoConfig fails without tokenizers - Full no-torch sandbox regression guards (safetensors, huggingface_hub) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix incomplete no-torch manifest and align E2E tests with real --no-deps path - Add missing transitive deps to no-torch-runtime.txt that are required under --no-deps: regex, typing_extensions, filelock, httpx, httpcore, certifi, idna, anyio, sniffio, h11. Without these, `from transformers import AutoConfig` still fails after install.sh --no-torch. - Change all E2E tests to use --no-deps (matching what install.sh does) instead of normal dep resolution. Previous tests passed even with an incomplete manifest because uv backfilled transitive deps. - Rewrite negative control to derive from the real no-torch-runtime.txt with tokenizers stripped, proving the specific fix matters. - Replace GNU-only sed -i with heredoc in shell test for macOS compat. - Remove unused os/sys imports from Python test file. - Quote SKIP_TORCH and mock uv paths in bash -c strings. * Assert install succeeds before checking import results in E2E tests Address review feedback: test_torch_not_importable and test_tokenizers_directly_importable in Group 3 now assert that uv pip install returns 0 before checking import behavior. This prevents false positives when the install itself fails silently. * Assert install succeeds in negative control and tighten error check - Add missing install-success assertion in test_negative_control_no_tokenizers to prevent false positives from network/install failures. - Tighten error message check to look for "tokenizers" in stderr or ModuleNotFoundError, rather than the generic "No module" substring which could match unrelated import failures. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-01 06:12:17 -07:00
Daniel Han	f84c2d03d3	Add installer test coverage for prebuilt llama.cpp changes (#4756 ) Split out from #4741 to keep the main PR focused on installer logic. - New test_install_llama_prebuilt_logic.py: tests for resolve logic, fallback behavior, env_int, busy/lock handling - New test_validate_llama_prebuilt.py: validator tests for staged release_tag/upstream_tag handling - New test_llama_pr_force_and_source.py: tests for PR_FORCE and LLAMA_SOURCE maintainer defaults - Updated test_selection_logic.py: expanded selection/fallback coverage - Updated test_pr4562_bugfixes.py: updated bugfix tests for new logic - Updated smoke_test_llama_prebuilt.py: minor update	2026-04-01 06:06:29 -07:00
Daniel Han	2ffc8d2cea	tests: add no-torch / Intel Mac test suite (#4646 ) * tests: add no-torch / Intel Mac test suite Add comprehensive test coverage for the no-torch / --no-torch installer and Studio backend changes introduced in #4624. Shell tests (tests/sh/test_mac_intel_compat.sh): - version_ge edge cases (9 tests) - Architecture detection + Python version resolution (4 tests) - get_torch_index_url on Darwin (2 tests) - UNSLOTH_NO_TORCH propagation via SKIP_TORCH (5 tests) - E2E uv venv creation at Python 3.12 (3 tests) - E2E torch skip with mock uv shim (4 tests) - UNSLOTH_NO_TORCH env propagation (4 tests) - --python override flag parsing + resolution (11 tests) - --no-torch flag parsing (4 tests) - SKIP_TORCH unification (3 tests) - CPU hint printing (2 tests) Python tests (tests/python/test_no_torch_filtering.py): - _filter_requirements unit tests with synthetic + real requirements files - NO_TORCH / IS_MACOS constant parsing - Subprocess mock of install_python_stack() across platform configs - install.sh --no-torch flag structural + subprocess tests Python tests (tests/python/test_studio_import_no_torch.py): - AST checks for data_collators.py, chat_templates.py, format_conversion.py - Parametrized venv tests (Python 3.12 + 3.13) for no-torch exec - Dataclass instantiation without torch - format_conversion convert functions without torch - Negative controls (import torch fails, torchao fails) Python tests (tests/python/test_e2e_no_torch_sandbox.py): - Before/after import chain tests - Edge cases (broken torch, fake torch, lazy import) - Hardware detection without torch - install.sh logic tests (flag parsing, version resolution) - install_python_stack filtering tests - Live server startup tests (opt-in via @server marker) * fix: address review comments on test suite - Fix always-true assertion in test_studio_import_no_torch.py (or True) - Make IS_MACOS test platform-aware instead of hardcoding Linux - Restore torchvision + torchaudio in server test cleanup (not just torch) - Include server stderr in skip message for easier debugging * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-27 02:33:45 -07:00
DoubleMathew	f4d8a246bf	Use prebuilt llama.cpp for unsloth studio setup (#4562 ) * Use prebuilt llama.cpp for unsloth studio setup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix 3 issues that cause unnecessary fallback to source build 1. Make filelock import optional -- environments without filelock (e.g. minimal installs) crashed at import time instead of gracefully skipping the lock. 2. Use already-verified converter script from the hydrated source tree instead of re-downloading from raw.githubusercontent.com with no checksum. Adds symlink with copy fallback for the legacy filename. 3. Initialize $SkipPrebuiltInstall in setup.ps1 before first use to prevent potential uninitialized variable errors. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Keep network fallback in ensure_converter_scripts Prefer the local verified copy from the hydrated source tree, but retain the original network download as a fallback if the file is missing. Create the legacy hyphenated filename as a symlink with a copy fallback instead of writing a second full copy. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix 4 bugs in source-build fallback and binary_env paths - setup.ps1: Replace git pull + checkout FETCH_HEAD with fetch + checkout -B to avoid detached HEAD state that breaks re-runs. Use pinned tag in both fetch and clone paths. - setup.sh: Move rm -rf after cmake/git prerequisite checks so a missing tool no longer deletes the existing install. Add --branch tag to clone. - install_llama_prebuilt.py: Add binary_path.parent to Linux LD_LIBRARY_PATH in binary_env() so bundled .so files in build/bin are found even without RPATH, matching the existing Windows PATH logic. - Add test for binary_env LD_LIBRARY_PATH on Linux. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Handle unresolved "latest" tag in source-build fallback clone When tag resolution fails and the requested tag is "latest", both setup scripts now omit --branch from git clone so the default branch is cloned instead of failing on a nonexistent "latest" branch/tag. Similarly, the PS1 fetch path fetches the default ref when the tag is "latest". * Resolve actual latest ggml-org tag instead of using literal "latest" When both Python tag resolution attempts fail and the requested tag is "latest", query the GitHub API for the actual latest release tag from ggml-org/llama.cpp (e.g. b8508) instead of passing the literal string "latest" to git clone --branch, which would fail since no such branch/tag exists. setup.sh uses curl + python json parsing; setup.ps1 uses Invoke-RestMethod. Both fall back to the raw requested tag if the API call also fails. * Try Unsloth release repo before ggml-org when resolving latest tag When falling back to the GitHub API to resolve "latest", query the Unsloth release repo (unslothai/llama.cpp) first since it has the prebuilt binaries pinned to tested tags. Only fall back to ggml-org/llama.cpp if the Unsloth repo query fails. * Add comprehensive sandbox tests for PR #4562 bug fixes 35 tests covering all fixes across platforms: - binary_env cross-platform (Linux LD_LIBRARY_PATH, Windows PATH, macOS DYLD_LIBRARY_PATH) with edge cases (dedup, ordering, existing paths) - resolve_requested_llama_tag (concrete, latest, None, empty) - setup.sh logic via subprocess: prereq check ordering (cmake/git missing preserves install), pinned tag in clone, fetch+checkout -B pattern, fetch failure warns instead of aborting - "latest" tag resolution fallback chain (Unsloth API -> ggml-org -> raw) with mock curl: success, failure, malformed JSON, empty body, empty tag_name, env overrides - Source code pattern verification for both .sh and .ps1 files All 138 tests pass in isolated uv venv. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add binary_path.parent to macOS DYLD_LIBRARY_PATH in binary_env macOS prebuilt .dylib files are overlaid into build/bin (same as Linux), but binary_env only added install_dir to DYLD_LIBRARY_PATH. Add binary_path.parent so the loader can find sibling dylibs even without embedded loader paths. Mirrors the existing fix for Linux LD_LIBRARY_PATH and the Windows PATH pattern. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Guard --branch when resolved tag is "latest"; fix broken test assertion When all API fallbacks fail and the tag stays as literal "latest", omit --branch from git clone (clones default branch instead of failing). Both setup.sh and setup.ps1 now check for "latest" before passing --branch to git clone/fetch. Also fix test_setup_ps1_clone_uses_branch_tag which used Python tuple syntax (assert "x", "y" in z) that always passes. Changed to assert "x" in z and "y" in z. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix macOS DYLD trailing colon, install_lock no-op, and debug log - binary_env macOS: use dedupe_existing_dirs instead of raw string concatenation. Eliminates trailing colon in DYLD_LIBRARY_PATH (which causes dyld to search CWD for libraries) and deduplicates when binary_path.parent == install_dir. Now consistent with the Linux and Windows branches. - install_lock: when filelock is not installed, use os.O_CREAT\|O_EXCL as a fallback exclusive file lock with timeout, instead of yielding with no locking. Prevents concurrent installs from corrupting each other's staging directories. - setup.ps1: remove [DEBUG] log line that printed to every user on every Windows setup run. * Add stale-lock detection and atomic clone-then-swap install_lock fallback (no filelock): write PID to lock file and check if the holder process is still alive on contention. Dead PIDs (ProcessLookupError) and unreadable lock files trigger immediate cleanup. Live processes owned by other users (PermissionError) are correctly recognized as alive -- the lock is not removed. setup.sh/setup.ps1 source-build: clone into a temporary directory first, then swap into place only on success. If git clone fails, the existing install is preserved instead of being deleted by the premature rm -rf. * Remove redundant upstream_tag != release_tag check load_approved_release_checksums compared checksums.upstream_tag against the Unsloth release_tag, which are different namespaces (upstream ggml-org tag vs Unsloth published tag). This only worked because both happened to be "b8508" by convention. Would break if Unsloth ever uses a different release naming scheme. The existing check at parse_approved_release_checksums (line 950) already validates the release_tag field correctly. * Fix lock TOCTOU race and build-in-temp-dir swap install_lock fallback: add os.fsync(fd) after writing PID to ensure the PID is visible to racing processes before they check. Treat empty lock files (PID not yet written) as "wait and retry" instead of stale, closing the window where two processes could both see an empty file, both unlink it, and both acquire the lock. setup.sh/setup.ps1 source-build: clone AND build in a temp directory (LLAMA_CPP_DIR.build.$$). Only swap into the final LLAMA_CPP_DIR after the build succeeds. If clone or cmake or build fails, the temp dir is cleaned up and the existing working install is preserved. Previously, rm -rf ran after clone but before build, destroying the existing install even if the build later failed. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-25 05:42:43 -07:00
Roland Tannous	19e9c60a8e	Consolidate dual venvs and separate install from update (#4530 ) * refactor: consolidate dual venvs into single ~/.unsloth/studio/unsloth_studio * refactor: separate install.sh (first-time) from setup.sh (smart update with PyPI version check) * fix: install.sh calls setup.sh directly, keep both setup and update CLI commands * fix: use importlib.resources.files() directly without _path attribute * fix: bootstrap uv before pip upgrade to handle uv venvs without pip * fix: frontend 404 when launched via CLI, add global symlink to ~/.local/bin * feat: add --local flag to install.sh and unsloth studio update for branch testing * fix: resolve repo root from script location for --local installs * feat: add --package flag to install.sh for testing with custom package names * feat: add --package flag to unsloth studio update * fix: always nuke venv in install.sh for clean installs * revert: remove Windows changes, will handle in separate PR * fix: error when --package is passed without an argument * revert: restore Windows scripts to current main * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: always explicitly set STUDIO_LOCAL_INSTALL and STUDIO_PACKAGE_NAME env vars * fix: pass explicit STUDIO_LOCAL_REPO env var for --local installs * fix: align banner box for Setup vs Update labels * deprecate: hide 'unsloth studio setup' command, point users to update/install.sh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: check stdout not stdin for auto-launch detection (curl pipe fix) * fix: update install URL to unsloth.ai/install.sh * fix: update install.sh usage comments to unsloth.ai/install.sh * fix: use --upgrade-package for base deps to preserve existing torch/CUDA installs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: --local install now also installs unsloth-zoo via base.txt before editable overlay * fix: don't skip base packages for --local installs (editable needs unsloth-zoo) * refactor: move --local full dep install to install.sh, keep SKIP_STUDIO_BASE for all paths * feat: add migration support for old .venv and CWD-based installs in setup.sh * Revert "feat: add migration support for old .venv and CWD-based installs in setup.sh" This reverts commit `301291d002`. * feat: migrate old .venv layout in install.sh instead of always nuking * feat: validate old .venv with torch CUDA test before migration, recovery message on launch failure * fix: try CUDA then fall back to CPU for migration validation * fix: upgrade unsloth/unsloth-zoo with --reinstall-package on migration to preserve torch * remove: delete unused unsloth ui command (use unsloth studio instead) * Fix Windows venv path mismatch between install.ps1, setup.ps1, and studio.py install.ps1 was creating the venv CWD-relative ($VenvName = "unsloth_studio"), setup.ps1 was using an absolute path to ".unsloth\studio\.venv", and studio.py looks for ".unsloth\studio\unsloth_studio". All three paths were different, so the Windows installer would never produce a working Studio setup. install.ps1: - Use absolute $StudioHome + $VenvDir matching the Linux install.sh layout - Add 3-way migration: old .venv at STUDIO_HOME, CWD-relative ~/unsloth_studio from the previous install.ps1, or fresh creation with torch validation - For migrated envs, upgrade unsloth while preserving existing torch/CUDA wheels - Set SKIP_STUDIO_BASE=1 before calling setup.ps1 (matches install.sh behavior) - Fix launch instructions to use the absolute venv path setup.ps1: - Change $VenvDir from ".unsloth\studio\.venv" to ".unsloth\studio\unsloth_studio" - Add SKIP_STUDIO_BASE guard: error out if venv is missing when called from install.ps1 (which should have already created it) - Differentiate "Setup" vs "Update" in banners based on SKIP_STUDIO_BASE * setup.ps1: unconditionally error if venv missing, matching setup.sh setup.sh always errors out if the venv does not exist (line 224-228), telling the user to run install.sh first. setup.ps1 was conditionally creating a bare venv with python -m venv when SKIP_STUDIO_BASE was not set, which would produce an empty venv with no torch or unsloth. Now setup.ps1 matches setup.sh: always error, always point to install.ps1. * Fix --torch-backend=auto CPU solver dead-end on Linux, macOS, and Windows On CPU-only machines, `uv pip install unsloth --torch-backend=auto` falls back to unsloth==2024.8 because the CPU solver cannot satisfy newer unsloth's dependencies. install.ps1 already solved this with a two-step approach; this applies the same fix to install.sh and install_python_stack.py. install.sh: add get_torch_index_url() that detects GPU via nvidia-smi and maps CUDA versions to PyTorch index URLs (matching install.ps1's Get-TorchIndexUrl). Fresh installs now install torch first via explicit --index-url, then install unsloth with --upgrade-package to preserve the pre-installed torch. All 5 --torch-backend=auto removed from primary paths. install.ps1: add fallback else-branch when TorchIndexUrl is empty, using --torch-backend=auto as last resort (matching install.sh). install_python_stack.py: remove unconditional --torch-backend=auto from _build_uv_cmd. Torch is pre-installed by install.sh/setup.ps1 by the time this runs. Callers that need it can set UV_TORCH_BACKEND. Both install.sh and install.ps1 now share the same three-branch logic: migrated env (upgrade-package only), normal (torch-first + index-url), and fallback (--torch-backend=auto if URL detection fails). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use --reinstall-package for migrated envs on both Linux and Windows For migrated environments (moved from legacy venv location), --reinstall-package is better than --upgrade-package because it forces a clean reinstall even if the same version is already installed. This ensures proper .dist-info and .pyc state in the new venv location. --upgrade-package remains correct for the fresh install path where torch is already installed and we just want to add unsloth without re-resolving torch. * Address review findings: portability, parity, and stale comments - Replace grep -oP (GNU Perl regex) with POSIX sed in get_torch_index_url() so the script works on BSD grep (macOS is already guarded by the Darwin early-return, but Alpine/BusyBox would silently get the wrong CUDA tag) - Add LC_ALL=C before nvidia-smi invocation to prevent locale-dependent output parsing issues - Add warning on stderr when nvidia-smi output is unparseable, matching install.ps1's [WARN] message - Add explicit unsloth-zoo positional arg to install.ps1 migrated path, matching install.sh (--reinstall-package alone won't install it if it was never present in the migrated env) - Fix stale comment in install_python_stack.py line 392 that still claimed --torch-backend=auto is added by _build_uv_cmd - Add sed to test tools directory (function now uses sed instead of grep) * Add --index-url to migrated env path to prevent CPU torch resolution The migrated path runs uv pip install with --reinstall-package for unsloth/unsloth-zoo. While uv should keep existing torch as satisfied, the resolver could still re-resolve torch as a transitive dependency. Without --index-url pointing at the correct CUDA wheel index, the resolver would fall back to plain PyPI and potentially pull CPU-only torch. Adding --index-url $TORCH_INDEX_URL ensures CUDA wheels are available if the resolver needs them. Applied to both install.sh and install.ps1. * Revert --index-url on migrated env path The original install.ps1 on main already handles the migrated path without --index-url and it works correctly. --reinstall-package only forces reinstall of the named packages while uv keeps existing torch as satisfied. No need for the extra flag. * Fix unsloth studio update --local not installing local checkout studio.py sets STUDIO_LOCAL_REPO when --local is passed, but install_python_stack.py never read it. The update path always installed from PyPI regardless of the --local flag. Add a local_repo branch that first updates deps from base.txt (with --upgrade-package to preserve torch), then overlays the local checkout as an editable install with --no-deps. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-25 05:24:21 -07:00
Avaya Aggarwal	45d0a343b5	feat: Implement Q-GaLore optimizer and custom embedding learning rate… (#4511 ) * feat: Implement Q-GaLore optimizer and custom embedding learning rate in the Unsloth trainer. * feat: Implement QGaLoreAdamW8bit optimizer with 8-bit states, GaLore low-rank gradient projection, and optional INT8 weight quantization, along with supporting projector and tests. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: Introduce Q-GaLore AdamW optimizer with low-rank quantized gradient projection and integrate into the trainer, along with dedicated tests. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: Implement Q-GaLore AdamW optimizer with gradient projection and quantization, including trainer integration and corresponding tests. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix 3 bugs in Q-GaLore optimizer and add weight_quant forward hooks 1. Fix use-after-delete crash: move `del p._saved_data` after the weight decay block so decoupled weight decay can reference the current weights correctly (p.data). 2. Fix substring matching in make_q_galore_param_groups: split parameter names on "." and check exact component matches to prevent false positives (e.g. "not_q_proj" matching "q_proj"). 3. Implement forward pre-hooks for weight_quant: after the optimizer quantizes weights to INT8, replace p.data with a 1-element placeholder to free float memory. A register_forward_pre_hook dequantizes back to float before each forward pass. The trainer calls install_weight_quant_hooks() when weight_quant is enabled. 4. Update test_weight_decay_uses_saved_data to match the fixed code path (decoupled decay uses p.data, expected value 2.7). Add test_weight_quant_hook_restores_float to verify the INT8-to-float hook round-trip. All 24/24 Q-GaLore tests pass. Benchmarked on Llama-3.2-1B-Instruct FFT: Q-GaLore saves 32% VRAM (10.63 -> 7.24 GB) with better loss convergence (1.3 vs 2.0 at step 100). No regressions in 31-notebook sweep across Llama, Qwen, Mistral, Phi, Gemma, vision, and GRPO. * Default weight_quant to False in QGaloreConfig Benchmarks show weight_quant=True adds ~1 GB on Llama-3.2-1B due to INT8 copy/scale overhead exceeding savings from the placeholder trick. Users can still opt in explicitly. The optimizer logic is unchanged. * Optimize Q-GaLore projector and optimizer step performance Projector (q_galore_projector.py): - Use torch.svd_lowrank with oversampling p=10 (Halko et al. 2009) instead of full SVD for large matrices. Falls back to full SVD when min(m,n) <= 2rank. SVD steps are 6-8x faster on Llama-3.2-1B (22s -> 3s for first step). - Cache the dequantized ortho matrix between project() and project_back() to avoid redundant dequantization when quant=True. - Replace F.cosine_similarity with torch.dot for 1-D unit vectors in the adaptive schedule. Remove unused torch.nn.functional import. - Use collections.deque(maxlen=queue_size) instead of list with manual pop(0). Optimizer (q_galore_adamw.py): - Remove redundant .clone() on dequantized weights (line 151) and on float data before re-quantization (line 211). _dequantize already returns a fresh tensor and _quantize/_quantize_stochastic only reads its input. - Consolidate per-group torch.cuda.synchronize() into a single call after all param groups complete. - Use torch.empty instead of torch.zeros for the scalar placeholder tensor that is never read. Verified: 24/24 unit tests pass. Llama-3.2-1B 61-step training produces losses within 0.24% relative diff (correlation >0.9999) of the original. [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-25 01:03:10 -07:00
Daniel Han	c5fa314937	Revert "adding tools to be able to profile model fwds to see what to turn into kernels" This reverts commit `d32b00ecd8`.	2026-03-13 22:38:31 -07:00
cm2435	12898b5bef	adding tools to be able to profile model fwds to see what to turn into kernels (cherry picked from commit `6db5b126b6`)	2026-03-13 22:38:31 -07:00
Datta Nimmaturi	f840119fa4	Fixup mapper issues and resolve properly (#4124 ) * Fixup mapper issues and resolve properly * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-03 06:57:25 -08:00
Daniel Han	3bddfed117	Patch trunc_normal_ for low-precision stability (#4027 ) * Fix low-precision trunc_normal initialization instability * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Document TorchTitan trunc_normal low-precision failure mode * Fix trunc_normal generator positional compatibility * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix trunc_normal generator TypeError fallback --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-19 04:40:14 -08:00
Mohammad Miadh Angkad	336bec216a	Refactor Ollama template wiring and harden packing helpers (#3890 ) * Refactor Ollama template wiring and harden packing helpers Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu> * Fix Qwen3 and Gemma3n template bindings and tidy packing test helper * Fix gptoss Ollama comment and tinyllama stop parameter - Fix wrong comment referencing gemma3n for gptoss_ollama in chat_templates.py - Add missing stop keyword to tinyllama PARAMETER in ollama_template_mappers.py * Fix _DummyTrainer compatibility across TRL versions The try/except only handled the removal of return_position_ids (TRL v0.24+) but not the absence of padding_free (TRL v0.18.2). Gracefully degrade through all optional collator flags so the test works from trl>=0.18.2 through v0.27+. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu> Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-09 04:04:48 -08:00
electroglyph	d80e69258c	add weight-only int8 QAT scheme and update tests for torchao 0.15.0 (#3859 ) * add int8 weight-only QAT scheme, add test, fix tests for current torchao version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change quantization to PerAxis * lambda =/ * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add torchao messages, remove group_size from int8 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * raise exception on missing torchao * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * touch up the torchao imports * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-01-16 09:32:29 +05:30
Daniel Han	0f07e36813	Merge pull request #3612 from Vangmay/feature/raw-text-dataprep Feature/raw text dataprep	2026-01-08 03:38:15 -08:00
pre-commit-ci[bot]	3620564025	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2026-01-08 11:35:21 +00:00
Daniel Han	16a2d901fa	Fix bugs and add improvements to RawTextDataLoader - Fix test file: use return_tokenized instead of return_tensors - Fix test file: use text_dataset instead of undefined dataset variable - Move parameter validation to constructor (fail fast on invalid params) - Add labels field in tokenized output for causal LM training - Add empty file handling with clear error message - Add tests for constructor validation and labels field	2026-01-08 11:35:00 +00:00
Dan Saunders	75e0d7ce62	Auto-enable padding-free SFT (#3672 ) * implement (sdpa, xformers, fa2) sample packing * attention dispatching * ddp working OOTB with CLI * packed SWA and softcap support * enable batch flattening * LGPL license headers * mask packed sequence boundaries * auto-enable sample packing * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add explicit toggle for sample packing * Add explicit toggle for sample packing * Update __init__.py * Update unsloth/kernels/rope_embedding.py * Update unsloth/kernels/rope_embedding.py * remove grad output clones; restore deleted FastLanguageModel arg * fix * restore rope embedding clones * xformers mask cache * implement (sdpa, xformers, fa2) sample packing * attention dispatching * ddp working OOTB with CLI * packed SWA and softcap support * enable batch flattening * LGPL license headers * mask packed sequence boundaries * auto-enable sample packing * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add explicit toggle for sample packing * Add explicit toggle for sample packing * Update __init__.py * Update unsloth/kernels/rope_embedding.py * Update unsloth/kernels/rope_embedding.py * remove grad output clones; restore deleted FastLanguageModel arg * fix * restore rope embedding clones * xformers mask cache * add back accidental deletion * Update unsloth/kernels/rope_embedding.py Co-authored-by: Daniel Han <danielhanchen@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix merge conflicts * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add *kwargs add back clobbered * Update rope_embedding.py * Update rope_embedding.py * simplify trl warnings filter * docstring * nit * bugfix * add padding-free seqlen metadata * auto-enable padding free * gemma2 disable * Apply suggestion from @danielhanchen * Update trainer.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update trainer.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-12-10 03:07:29 -08:00
Dan Saunders	496f84ff6b	SFT sample packing (#3566 ) * implement (sdpa, xformers, fa2) sample packing * attention dispatching * ddp working OOTB with CLI * packed SWA and softcap support * enable batch flattening * LGPL license headers * mask packed sequence boundaries * auto-enable sample packing * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add explicit toggle for sample packing * Add explicit toggle for sample packing * Update __init__.py * Update unsloth/kernels/rope_embedding.py * Update unsloth/kernels/rope_embedding.py * remove grad output clones; restore deleted FastLanguageModel arg * fix * restore rope embedding clones * xformers mask cache * implement (sdpa, xformers, fa2) sample packing * attention dispatching * ddp working OOTB with CLI * packed SWA and softcap support * enable batch flattening * LGPL license headers * mask packed sequence boundaries * auto-enable sample packing * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add explicit toggle for sample packing * Add explicit toggle for sample packing * Update __init__.py * Update unsloth/kernels/rope_embedding.py * Update unsloth/kernels/rope_embedding.py * remove grad output clones; restore deleted FastLanguageModel arg * fix * restore rope embedding clones * xformers mask cache * add back accidental deletion * Update unsloth/kernels/rope_embedding.py Co-authored-by: Daniel Han <danielhanchen@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix merge conflicts * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add *kwargs add back clobbered * Update rope_embedding.py * Update rope_embedding.py * simplify trl warnings filter * docstring * nit * bugfix * Apply suggestion from @danielhanchen * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update unsloth/trainer.py * Update unsloth/trainer.py * Update unsloth/trainer.py * Update unsloth/trainer.py --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-12-09 17:36:45 -08:00
Daniel Han	66649d18bd	Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit `cad158a56c`.	2025-12-01 07:24:58 -08:00
pre-commit-ci[bot]	cad158a56c	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-01 15:24:34 +00:00
Daniel Han	487a951914	Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit `964c9fef95`.	2025-12-01 07:24:21 -08:00
pre-commit-ci[bot]	964c9fef95	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-01 15:23:44 +00:00
Daniel Han	5f27bc4db5	Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit `d34e0454ac`.	2025-12-01 07:23:31 -08:00
pre-commit-ci[bot]	d34e0454ac	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-01 15:20:22 +00:00
Daniel Han	ba2897a318	Revert "[FIX] Vllm guided decoding params (#3662 )" This reverts commit `fb4f0fdf56`.	2025-12-01 05:43:45 -08:00
Datta Nimmaturi	fb4f0fdf56	[FIX] Vllm guided decoding params (#3662 ) * vllm sampling params fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * do not patch base_trainer * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * seperate vllm fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestion from @danielhanchen * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit 58b483dc0d1790f99580665801d3fa0d7267c533. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit b2497519659a9f301e7a633795d9efdafdc2b277. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit de3daaf429f81aceb6632932b0cb1af5149652a8. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-12-01 05:42:37 -08:00
pre-commit-ci[bot]	3bf8ca7da2	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-11-20 13:09:08 +00:00
vangmay	f05169e56a	Make the chunk function efficient	2025-11-20 21:08:33 +08:00
pre-commit-ci[bot]	d429363c23	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-11-20 12:51:18 +00:00
vangmay	ee37dd9f92	Write simple test	2025-11-18 22:36:38 +08:00
Daniel Han	d6bb89ad44	Formatting & bug fixes (#3563 ) * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update vision.py * Update vision.py * Fix DataParallel * Update _utils.py * Update rl.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update mapper.py * Versioning * Update loader.py * Update loader.py * Update rl.py * Versioning * Update _utils.py * Fix auto_mapping * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Message * Update vision.py * Update loader.py * Update vision.py * cache_implementation * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Save max_seq_length * Update _utils.py * Update rl.py * Update vision.py * Update llama.py * Mistral3 vllm (#3349) * [WIP] use vLLM for vision language models * Update README.md Editing icon sizes * Update README.md Updating icon sizes * Update README.md (#2885) * MoE kernels AGPLv3 * versioning * Many bug fixes (#2908) * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912) * Dynamically adjust get_per_token_logps function and patch as well (#2911) * add intel gpu with vllm support (#2903) * [bugs] fix for casual mask (#2868) * fix for casual mask * use un_casual in sdpa * add missing mask * fix for type * Explicitly check if xformers exists for attention (#2889) * Update __init__.py * Update llama.py * if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913) * Move inputs to right devices. (#2919) * Move tensors to right devices * fix multi gpu for non mistral models * multi GPU RoPE for gemma2 * Finish up multi GPU inference * Make multiGPU rope a list * Remove unnecessary transfer to CPU * Remove unnecessary move to CPU * Donot move inputs to device yet will be handled separately in another PR * Move inputs to appropriate decoder device * Make device count global variable * Cleanup RoPE device code * Fixup num_gpu to device count * Cleanup device counts * Use device index for RoPE get_cache * Donot typecast * Use tuple instead of list for tensors. Use device index directly * fixup move to device logic * WIP VLM vLLM * Make vLLM patch a function * Add save and load lora functions * Make fast_inference setup depend on the flag * Improve fast inference patching mechanism * Make vision setting depend on checks in fastbasemodel * Check LoRA and vLLM intercompatibility for vision models * Comment pointing to vLLM LoRA check * Improve lora validation on vLLM * Error out on no vLLM and increase max lora rank * Bug fixes (#3017) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * fix for casual mask (#3011) * [intel] add for intel path for llama.py (#3012) * fix for intel path * remove unuse code * Update unsloth/models/llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * Fix Gemma 2 (#3024) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * falcon force float32 on sm<75 machines (#3026) * Fix torch compile issues (#3028) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance` * Update pyproject.toml * Update _utils.py * Fixup patch vllm * Disable mllama * Use variables to decide VLM support * Better attn_impl handling * Patch TF protobuf incompatability * Torch 2.8 (#3186) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update _auto_install.py * Update pyproject.toml * Update rl.py * Protobuf issue * Update pyproject.toml * Fix extras transformers typo in pyproject.toml * Update _utils.py * Bug fixes (#3195) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * adallow float32 dtype in FastLanguageModel (#3204) * Update loader.py * Update vision.py * Suppress message and use unsloth sampling params * Use trl sampling params for now * Improve error message * fixup quantized fast inference model name * Add mistral 3 support --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> * Set padding to 0 * Fix patch * fixup patch (#3359) Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update vision.py * Versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * MXFP4 dequant * Update loader.py * Update vision.py * load_in_16bit * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update vision.py * offload_embedding * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update loader.py * Fix padding issue * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * New models * Update llama.py * Versioning * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Fix AMD * Update _utils.py * Update llama.py * Update vision.py * DEVICE_TYPE_TORCH * Update __init__.py * Update __init__.py * Update _utils.py * Move DEVICE_TYPE * Update rl_replacements.py * Update loader.py * AMD install script * Move AMD * Update _amd_install.sh * Update pyproject.toml * Update pyproject.toml * Delete _amd_install.sh * Update device_type.py * Update loader.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Versioning * Update pyproject.toml * Update loader.py * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update _utils.py * Update loader.py * Update _utils.py * Update _utils.py * local_files_only * Cut Cross Entropy * Update llama.py * Update vision.py * Update vision.py * Update vision.py * Qwen 3 VL vLLM (#3489) * Update __init__.py * patch_torchao * torchao_logger * Update rl_replacements.py * Fix * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Versioning * fbgemm fp8 block quant support (>=1.4.0) (#3531) * fbgemm fp8 block quant support (>=1.4.0) * Verify for fp8 support before proceeding * Use unsloth zoo's Version and improve comments * spacessss * Update vision.py * Update vision.py * Update rl.py * vllm_sampling_params * Update rl.py * Update rl.py * Update rl.py * Add `ruff` pre-commit hook and apply it (#3424) * Add Ruff pre-commit config and workflow * Add kwarg spacing enforcement helper * Apply Ruff formatting * Update fp8.py * Revert ruff on some files * Update * force-exclude = true * Datasets issue * Ruff * Remove mapper * Update mapper.py * Update pyproject.toml --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> Co-authored-by: Dan Saunders <danjsaund@gmail.com>	2025-11-07 06:00:22 -08:00
andrewor14	3ffb3bdcfe	Fix QAT + LoRA fast path, add tests (#3307 ) Summary: The existing QAT + LoRA path only applied fake quantization to the original slow path, but the default is the fast path that calls unsloth's fast LoRA primitives. This commit integrates fake quantization into these fast primitives as well, and add unit tests to assert that fake quantization is actually taking place. Test Plan: Unit tests: ``` pytest tests/utils/test_qat.py ``` End-to-end test: https://gist.github.com/andrewor14/6360dd69b5784c71c46e80c14f53e6b6 Full fine-tuning Llama3.1-8B with and without QAT + LoRA on yahma/alpaca-cleaned for 1 epoch: - Batch size = 8 (no grad accum) - Learning rate = 2e-4 - Quantization scheme = int4 weight only (with bf16 activations) Wikitext perplexity: - Baseline = int4 quantized model finetuned without QAT - QAT int4 quantized model (with this PR) achieved 33% lower perplexity than the int4 baseline - QAT int4 quantized model without this PR was worse than the int4 baseline ``` ==> unsloth_model_lora_baseline_output/lm_eval_float.log <== \| \| \|none \| 0\|word_perplexity\|↓ \|7.5551\|± \| N/A\| ==> unsloth_model_lora_baseline_output/lm_eval_quantized.log <== \| \| \|none \| 0\|word_perplexity\|↓ \|8.7655\|± \| N/A\| ==> unsloth_model_lora_qat_int4_output/lm_eval_quantized.log <== \| \| \|none \| 0\|word_perplexity\|↓ \|8.3548\|± \| N/A\| ```	2025-09-17 15:18:17 -07:00
Roland Tannous	2011859430	Add TorchAO quantization tests with FP16 models and serialization workarounds (#3269 ) * Add TorchAO quantization tests with FP16 models and serialization workarounds * remove unrelated files * cleaned submission	2025-09-04 17:22:07 -07:00
Roland Tannous	0135d126df	fixed save_pretrained_torchao and associated tests (#3264 )	2025-09-03 20:24:12 -07:00
Jerry Zhang	969c6a0bd8	Support saving locally in `model.save_pretrained_torchao` (#3263 ) Summary: Previously the test was not ran correctly and the save to local path is not tested this PR added support for that and tries to test properly Note: `python tests/saving/test_unsloth_save.py` doesn't run test Test Plan: pytest tests/saving/test_unsloth_save.py -k test_save_torchao Reviewers: Subscribers: Tasks: Tags:	2025-09-03 17:51:33 -07:00
Roland Tannous	711ec4a3ac	tests for mxfp4 and quantized models merge fix unsloth zoo pr 254 (#3223 )	2025-08-29 01:30:48 -07:00
Jerry Zhang	f3ab8c21af	Support `model.save_pretrained_torchao` (#3111 ) Summary: Allow users merge the LoRA weights and then do a post training quantization with torchao Usage: ``` from torchao.quantization import Int8DynamicActivationInt8WeightConfig torchao_config = Int8DynamicActivationInt8WeightConfig() model.save_pretrained_torchao( save_path, tokenizer=tokenizer, torchao_config=torchao_config, ) ``` Test Plan: python tests/saving/test_unsloth_save.py Reviewers: Subscribers: Tasks: Tags:	2025-08-26 04:53:39 -07:00

1 2

58 commits