* feat: Add cactus QAT scheme support
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* test(qat): add tests for cactus QAT scheme and fix missing import
* Fix cactus QAT scheme: correct MappingType import, tighten PerGroup filter
- Drop the broken `from torchao.dtypes import MappingType` import. `MappingType`
lives in `torchao.quantization` (and `torchao.quantization.quant_primitives`);
it is not exported from `torchao.dtypes` in any supported torchao release
(verified on 0.14, 0.16, 0.17). The previous code raised `ImportError` on
every cactus call and was masked as a misleading 'torchao not found' error.
- Since `IntxWeightOnlyConfig` already defaults `mapping_type` to
`MappingType.SYMMETRIC`, drop the explicit kwarg entirely and remove the
import. Behavior is unchanged.
- Introduce a named `group_size = 32` constant (matches the int4 / fp8-int4
pattern in the surrounding branches) and add a `% group_size == 0`
divisibility guard to the filter. `PerGroup(32)` requires
`in_features % 32 == 0` at `quantize_()` time, otherwise torchao raises
`ValueError: in_features (N) % group_size (32) must be == 0`. The old
`in_features >= 32` filter would admit non-aligned widths (e.g. 33, 48, 65,
127) and crash `_prepare_model_for_qat` for those shapes.
* Warn when cactus QAT skips non-divisible Linear layers
Multiple reviewers flagged that the divisibility guard added in the
previous commit can silently leave Linear layers in full precision when
their in_features is not a multiple of 32. For currently supported
Unsloth models (Qwen, Llama, Gemma, Mistral, Phi) every Linear width is
already a multiple of 32/64/128 so this never triggers, but surfacing
the coverage gap is cheap and avoids users assuming 100% QAT coverage
when they bring a custom model with unusual shapes.
Emit a UserWarning listing up to the first 8 skipped layers whenever
the cactus filter excludes any Linear due to the modulo guard. This
keeps the lenient silent-skip behavior (consistent with int4 /
fp8-int4), but stops making it silent.
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* feat: Add support for OLMo-3 model in mapping and tests
* Update unsloth/models/mapper.py
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* Update tests/test_get_model_name.py
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* Fix casing, add Think variants, and align version gate for OLMo-3 PR 4678
Mapper: switch slugs from OLMo-3 to canonical Olmo-3 mixed case, drop the
non-existent unsloth/Olmo-3-7B-Instruct-bnb-4bit dead alias, and add the
already-published Olmo-3-7B-Think and Olmo-3-32B-Think Unsloth mirrors.
Loader: change the olmo3 transformers version gate from Version("4.57.0")
to Version("4.57.0.dev0") so nightly/source builds that already contain
olmo3 are not blocked, matching the OLMo-2, Gemma 3 and Cohere patterns.
* Use canonical Olmo-3 casing and cover Think variants in OLMo-3 tests
Mirrors the mapper.py fixes on pr-4678-code: HuggingFace canonical slugs
for the OLMo-3 family use mixed-case Olmo-3 (not OLMo-3 like OLMo-2), and
Unsloth already hosts Olmo-3-7B-Think and Olmo-3-32B-Think mirrors, so
the resolution matrix now covers all three published Olmo-3 families.
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var
When set, UNSLOTH_PYTORCH_MIRROR overrides the default
https://download.pytorch.org/whl base URL in all four install scripts
(install.sh, install.ps1, studio/setup.ps1, studio/install_python_stack.py).
When unset or empty, the official URL is used. This lets users behind
corporate proxies or in regions with poor connectivity to pytorch.org
point at a local mirror without patching scripts.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Add pytest for UNSLOTH_PYTORCH_MIRROR in install_python_stack.py
Tests that _PYTORCH_WHL_BASE picks up the env var when set, falls back
to the official URL when unset or empty, and preserves the value as-is
(including trailing slashes).
* Remove stale test assertions for missing install.sh messages
* Fix GPU mocking in test_get_torch_index_url.sh
Extract _has_usable_nvidia_gpu and _has_amd_rocm_gpu alongside
get_torch_index_url so the GPU-presence checks work in tests.
Add -L flag handling to mock nvidia-smi so it passes the GPU listing
check. All 26 tests now pass on CPU-only machines.
* Strip trailing slash from UNSLOTH_PYTORCH_MIRROR to avoid double-slash URLs
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* studio: stream export worker output into the export dialog
The Export Model dialog only showed a spinner on the "Exporting..."
button while the worker subprocess was doing the actual heavy lifting.
For Merged to 16bit and GGUF / Llama.cpp exports this meant several
minutes (or more, for large models) of opaque silence, with no way to
tell whether save_pretrained_merged, convert_hf_to_gguf.py, or
llama-quantize was making progress.
This adds a live terminal-style output panel inside the export dialog,
rendered just above the Cancel / Start Export buttons and scrollable
with auto-follow-tail. It shows stdout and stderr from both the worker
process itself and any child process it spawns (GGUF converter,
llama-quantize), coloured by stream.
Backend
- core/export/worker.py: new _setup_log_capture(resp_queue) installed
before LogConfig.setup_logging. It saves the original stdout/stderr
fds, creates pipes, os.dup2's the write ends onto fds 1 and 2 (so
every child process inherits the redirected fds), and spins up two
daemon reader threads. Each thread reads bytes from a pipe, echoes
them back to the original fd (so the server console keeps working),
splits on \n and \r, and forwards each line to the resp queue as
{"type":"log","stream":"stdout|stderr","line":...,"ts":...}.
PYTHONUNBUFFERED=1 is set so nested Python converters flush
immediately.
- core/export/orchestrator.py:
- Thread-safe ring buffer (collections.deque, maxlen 4000) with a
monotonically increasing seq counter. clear_logs(),
get_logs_since(cursor), get_current_log_seq(), is_export_active().
- _wait_response handles rtype == "log" by appending to the buffer
and continuing the wait loop. Status messages are also surfaced as
a "status" stream so users see high level progress alongside raw
subprocess output.
- load_checkpoint, _run_export, and cleanup_memory now wrap their
bodies with the existing self._lock (previously unused), clear the
log buffer at the start of each op, and flip _export_active in a
try/finally so the SSE endpoint can detect idle.
- routes/export.py:
- Wrapped every sync orchestrator call (load_checkpoint,
cleanup_memory, export_merged_model, export_base_model,
export_gguf, export_lora_adapter) in asyncio.to_thread so the
FastAPI event loop stays free during long exports. Without this
the new SSE endpoint could not be served concurrently with the
blocking export POST.
- New GET /api/export/logs/stream SSE endpoint. Honors
Last-Event-ID and a since query param for reconnect, emits log /
heartbeat / complete / error events, uses the id field to carry
the log seq so clients can resume cleanly. On first connect
without an explicit cursor it starts from the current seq so old
lines from a previous run are not replayed.
Frontend
- features/export/api/export-api.ts: streamExportLogs() helper that
authFetches the SSE endpoint and parses id / event / data fields
manually (same pattern as streamTrainingProgress in train-api.ts).
- features/export/components/export-dialog.tsx:
- Local useExportLogs(exporting) hook that opens the SSE stream on
exporting transitions to true, accumulates up to 4000 lines in
component state, and aborts on cleanup.
- New scrollable output panel rendered above DialogFooter, only
shown for Merged to 16bit and GGUF / Llama.cpp (LoRA adapter is
a fast disk write with nothing to show). Dark terminal styling
(bg-black/85, emerald text, rose for stderr, sky for status),
max-height 14rem, auto-scrolls to the bottom on new output but
stops following if the user scrolls up. A small streaming / idle
indicator is shown next to the panel title.
- DialogContent widens from sm:max-w-lg to sm:max-w-2xl when the
output panel is visible so the logs have room to breathe.
Verified
- Python smoke test (tests/smoke_export_log_capture.py): spawns a
real mp.get_context("spawn") process, installs _setup_log_capture,
confirms that parent stdout prints, parent stderr prints, AND a
child subprocess invoked via subprocess.run (both its stdout and
stderr) are all captured in the resp queue. Passes.
- Orchestrator log helpers tested in isolation: _append_log,
get_logs_since (with and without a cursor), clear_logs not
resetting seq so reconnecting clients still progress. Passes.
- routes.export imports cleanly in the studio venv and /logs/stream
shows up in router.routes.
- bun run build: tsc -b plus vite build, no TypeScript errors.
No existing export behavior is changed. If the subprocess, the SSE
endpoint, or the frontend hook fails, the export itself still runs to
completion the same way it did before, with or without logs visible.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* export dialog: trim bootstrap noise, scope logs per screen, show realpath
Several follow-ups to the live export log work:
1. Worker bootstrap noise (transformers venv activation, Unsloth banner,
"Top GGUF/hub models" lists, vision detection, 2k-step weight load
bar) is dropped from the export-dialog stream. A threading.Event
gate in worker.py defaults closed and only opens once _handle_export
actually starts; until then the reader thread still echoes lines to
the saved console fd for debugging but does not push them onto the
resp_queue. The orchestrator already spawns a fresh subprocess for
every checkpoint load, so the gate is naturally reset between runs.
2. tqdm in non-tty mode defaults to a 10s mininterval, which makes
multi-step bars look frozen in the panel. Set TQDM_MININTERVAL=0.5
in the worker env so any tqdm-driven progress emits more often.
3. The dialog's useExportLogs hook now also clears its line buffer
when exportMethod or open changes, so re-opening the dialog into a
different action's screen no longer shows the previous action's
saved output. A useElapsedSeconds tick + "Working Xs" badge in the
log header gives users a visible sign that long single-step phases
(cache copies, GGUF conversion) are still running when no new lines
are arriving.
4. ExportBackend.export_{merged,base,gguf,lora} now return
(success, message, output_path); the worker forwards output_path on
each export_*_done response, the orchestrator's _run_export passes
it to routes/export.py, which surfaces it via
ExportOperationResponse.details.output_path. The dialog's Export
Complete screen renders the resolved on-disk realpath under "Saved
to" so users can find their exported model directly.
* fix(cli): unpack 3-tuple return from export backend
ExportOrchestrator.export_{merged,base,gguf,lora} now return
(success, message, output_path) so the studio dialog can show
the on-disk realpath. The CLI still unpacked 2 values, so every
`unsloth export --format ...` crashed with ValueError before
reporting completion. Update the four call sites and surface
output_path via a "Saved to:" echo.
* fix(studio): anchor export log SSE cursor at run start
The export dialog SSE defaulted its cursor to get_current_log_seq()
at connect time, so any line emitted between the POST that kicks
off the export and the client opening the stream was buffered with
seqs 1..k and then skipped (seq <= cursor). Long-running exports
looked silent during their first seconds.
Snapshot _log_seq into _run_start_seq inside clear_logs() and
expose it via get_run_start_seq(). The SSE default cursor now uses
that snapshot, so every line emitted since the current run began
is reachable regardless of when the client connects. Old runs
still can't leak in because their seqs are <= the snapshot.
* fix(studio): reconnect export log SSE on stream drop
useExportLogs launched streamExportLogs once per exporting
transition and recorded any drop in .catch(). Long GGUF exports
behind a proxy with an idle kill-timeout would silently lose the
stream for the rest of the run even though the backend already
supports Last-Event-ID resume. The "retry: 3000" directive emitted
by the backend is only meaningful to native EventSource; this
hook uses a manual fetch + ReadableStream parse so it had no
effect.
Wrap streamExportLogs in a retry loop that tracks lastSeq from
ExportLogEvent.id and passes it as since on reconnect. Backoff is
exponential with jitter, capped at 5s, reset on successful open.
The loop stops on explicit backend `complete` event or on effect
cleanup.
* fix(studio): register a second command so Typer keeps `export` as a subcommand
The CLI export unpacking tests wrap `unsloth_cli.commands.export.export`
in a fresh Typer app with a single registered command. Typer flattens a
single-command app into that command, so the test's
`runner.invoke(cli_app, ["export", ckpt, out, ...])` treats the leading
`"export"` token as an unexpected extra positional argument -- every
parametrized case failed with:
Got unexpected extra argument (.../out)
Register a harmless `noop` second command so Typer preserves subcommand
routing and the tests actually exercise the 3-tuple unpack path they
were written to guard.
Before: 4 failed
After: 4 passed
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: studio-install <studio@local.install>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
* [Studio] Install flash attn at setup time for linux
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* cleanup changes
Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Test cases
* wheel_utils: narrow url_exists exceptions and log at debug level
---------
Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
* fix: check find() return value before adding offset in try_fix_tokenizer
The `str.find()` result was checked for -1 only after adding
`len(find_text)`, turning the guard into dead code. When the substring
is absent, `start` becomes `len(find_text) - 1` (a positive number),
so the `if start == -1: continue` never triggers and the subsequent
slice extracts garbage from the tokenizer string.
Split the find and offset into two steps so the -1 check works correctly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add defensive guards for token_id None and end find() returning -1
- Skip loop iteration early when token_id is None to avoid constructing
a find_text that can never match valid JSON
- Guard end = tokenizer_string.find('",', start) against -1 to prevent
silent garbage extraction from malformed tokenizer strings
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fix raw text paragraph break normalization
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Normalize horizontal whitespace before stripping non-ASCII and collapse leftover doubles
Run the [^\S\n]+ horizontal-whitespace collapse before the non-ASCII strip
so that Unicode whitespace (\u00A0, \u202F, \u2009, \u3000, \v, \f, etc.)
becomes a single ASCII space instead of being deleted outright. The prior
ordering silently merged adjacent words on HTML/PDF/OCR-sourced text:
"hello\u00a0world" used to produce "helloworld" after this PR; it now
produces "hello world".
Also drop \t from the allow-list since the horizontal-whitespace collapse
already normalizes tabs to a single space, and add a targeted [ ]{2,} pass
right after the non-ASCII strip so that a non-whitespace non-ASCII character
sitting between two spaces ("word1 (c) word2") does not leave an interior
double space. Without this extra pass, clean_text was not idempotent on
such inputs: the first call produced "word1 word2" and only the second
call collapsed it to "word1 word2". Fuzz testing over 10000 random inputs
now satisfies the idempotence invariant in every case.
* Add regression tests for Unicode/control whitespace and non-ASCII edge cases
Cover:
- Unicode horizontal whitespace separators (NBSP, narrow NBSP, thin space,
en/em space, ideographic space, vertical tab, form feed) normalizing to
a single ASCII space instead of being deleted.
- Mixed paragraph + Unicode whitespace realistic input ("Section\u00a01\r\n\r\nBody\ftext\u202Fhere").
- Tab collapsing and space trimming around newlines.
- Non-whitespace non-ASCII characters (copyright, accented letters, emoji)
sitting between spaces: must not leave an interior double space, and
clean_text must be idempotent on these inputs.
- Non-ASCII characters adjacent to a newline: stripping must not leave
stray leading or trailing spaces on the neighbouring line, and must not
swallow an adjacent paragraph break.
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* fix windows llama.cpp compile from source issue
* undo local repo usage
* fix llama.cpp install
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix windows
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix: route resolve-source-build call through Invoke-LlamaHelper
The --resolve-source-build call at the source-build resolution path
was still calling install_llama_prebuilt.py directly instead of going
through Invoke-LlamaHelper. On PS7+ with ErrorActionPreference=Stop,
stderr from the 422 response (when tag is "master") would trigger a
terminating NativeCommandError and crash setup.
* fix: suppress stderr error records from Invoke-LlamaHelper
ErrorActionPreference=Continue prevents termination but PowerShell
still displays stderr lines as visible ErrorRecord objects. Capture
all output via 2>&1 and split stdout from stderr manually so that
stderr lines never appear on the console. When StderrPath is given
the stderr content is written to that file for diagnostics.
* fix: always rebuild llama.cpp on Windows when tag is master
When the requested llama.cpp tag is "master" (a moving target), skip
the "already built" early exit so the build path runs and syncs to
the latest commit. Without this, existing llama-server binaries from
an older build (e.g. b8635 which lacks Gemma 4 support) are reused
and model loading fails.
Pinned tags (e.g. b8635) still skip the rebuild when the binary
already exists, since the tag is immutable.
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Expand test coverage for install_llama_prebuilt.py:
- Add tests for source build plan resolution with custom repos
- Add tests for branch/commit/PR ref matching and normalization
- Add tests for manifest checksum validation
- Add tests for Windows CUDA upstream asset name patterns
- Update capsys checks to capture stderr after log() redirect
* Fix script unbound variable error
* remove stale test script, add llama.cpp metal source builds, update tests
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix Metal precedence, test sync, and add behavioral tests
- Move macOS arm64 Metal check before CUDA/ROCm in GPU backend
decision chain so Metal is not bypassed when nvcc is in PATH
- Remove RPATH flags from CPU fallback CMAKE_ARGS (only needed
for Metal library linking)
- Update test_llama_pr_force_and_source.py to match _CLONE_ARGS
rename from _CLONE_BRANCH_ARGS in setup.sh
- Add confirm_install_tree guard test for
existing_install_matches_choice
- Add TestMacOSMetalBuildLogic bash subprocess tests verifying
Metal flag selection, nvcc precedence, and CPU fallback behavior
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix Metal CPU fallback to also cover cmake build failures and update tests
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* 1. _GPU_BACKEND_FRAGMENT synced -- removed dead CPU_FALLBACK_CMAKE_ARGS= init (6/8)
2. RPATH assertion replaced -- new test_macos_arm64_cpu_fallback_args_exclude_rpath checks the actual runtime CPU_FALLBACK_CMAKE_ARGS output for @loader_path and -DCMAKE_BUILD_WITH_INSTALL_RPATH=ON (6/8)
3. _TRY_METAL_CPU_FALLBACK=false reset after both configure-failure and build-failure fallback branches in setup.sh (4/8)
4. macOS test now removes libmtmd.0.dylib instead of the platform-agnostic convert_hf_to_gguf.py (3/8)
5. Empty-string tag test added -- test_empty_tag_omits_branch_flag for resolved_tag= (2/8)
6. RPATH checks on cmake call logs -- both fallback tests now assert @loader_path and -DCMAKE_BUILD_WITH_INSTALL_RPATH=ON are absent from CPU fallback cmake calls, plus baseline flag preservation (multiple)
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* tests clean up
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* fix: add tokenizers to no-torch runtime deps and add TORCH_CONSTRAINT for arm64 macOS py313+
Two installer fixes:
1. Add `tokenizers` to `no-torch-runtime.txt` before `transformers`.
Without it, `from transformers import AutoConfig` crashes on startup
because `--no-deps` skips transitive dependencies.
2. Add `TORCH_CONSTRAINT` variable to `install.sh`. On arm64 macOS with
Python 3.13+, tighten the torch requirement to `>=2.6` since torch
<2.6 has no cp313 arm64 wheels. The variable replaces the previously
hard-coded constraint in the uv pip install line.
Includes 66 tests (42 pytest + 24 bash) covering:
- Structural checks on install.sh, install.ps1, no-torch-runtime.txt
- Shell snippet tests with mocked python for 13 platform/version combos
- Mock uv integration verifying correct constraint string
- E2E venv tests on Python 3.12 and 3.13 confirming AutoConfig works
- Negative control proving AutoConfig fails without tokenizers
- Full no-torch sandbox regression guards (safetensors, huggingface_hub)
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix incomplete no-torch manifest and align E2E tests with real --no-deps path
- Add missing transitive deps to no-torch-runtime.txt that are required
under --no-deps: regex, typing_extensions, filelock, httpx, httpcore,
certifi, idna, anyio, sniffio, h11. Without these, `from transformers
import AutoConfig` still fails after install.sh --no-torch.
- Change all E2E tests to use --no-deps (matching what install.sh does)
instead of normal dep resolution. Previous tests passed even with an
incomplete manifest because uv backfilled transitive deps.
- Rewrite negative control to derive from the real no-torch-runtime.txt
with tokenizers stripped, proving the specific fix matters.
- Replace GNU-only sed -i with heredoc in shell test for macOS compat.
- Remove unused os/sys imports from Python test file.
- Quote SKIP_TORCH and mock uv paths in bash -c strings.
* Assert install succeeds before checking import results in E2E tests
Address review feedback: test_torch_not_importable and
test_tokenizers_directly_importable in Group 3 now assert that
uv pip install returns 0 before checking import behavior. This
prevents false positives when the install itself fails silently.
* Assert install succeeds in negative control and tighten error check
- Add missing install-success assertion in test_negative_control_no_tokenizers
to prevent false positives from network/install failures.
- Tighten error message check to look for "tokenizers" in stderr or
ModuleNotFoundError, rather than the generic "No module" substring
which could match unrelated import failures.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Split out from #4741 to keep the main PR focused on installer logic.
- New test_install_llama_prebuilt_logic.py: tests for resolve logic,
fallback behavior, env_int, busy/lock handling
- New test_validate_llama_prebuilt.py: validator tests for staged
release_tag/upstream_tag handling
- New test_llama_pr_force_and_source.py: tests for PR_FORCE and
LLAMA_SOURCE maintainer defaults
- Updated test_selection_logic.py: expanded selection/fallback coverage
- Updated test_pr4562_bugfixes.py: updated bugfix tests for new logic
- Updated smoke_test_llama_prebuilt.py: minor update
* Use prebuilt llama.cpp for unsloth studio setup
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix 3 issues that cause unnecessary fallback to source build
1. Make filelock import optional -- environments without filelock
(e.g. minimal installs) crashed at import time instead of
gracefully skipping the lock.
2. Use already-verified converter script from the hydrated source
tree instead of re-downloading from raw.githubusercontent.com
with no checksum. Adds symlink with copy fallback for the
legacy filename.
3. Initialize $SkipPrebuiltInstall in setup.ps1 before first use
to prevent potential uninitialized variable errors.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Keep network fallback in ensure_converter_scripts
Prefer the local verified copy from the hydrated source tree, but
retain the original network download as a fallback if the file is
missing. Create the legacy hyphenated filename as a symlink with a
copy fallback instead of writing a second full copy.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix 4 bugs in source-build fallback and binary_env paths
- setup.ps1: Replace git pull + checkout FETCH_HEAD with fetch + checkout -B
to avoid detached HEAD state that breaks re-runs. Use pinned tag in both
fetch and clone paths.
- setup.sh: Move rm -rf after cmake/git prerequisite checks so a missing
tool no longer deletes the existing install. Add --branch tag to clone.
- install_llama_prebuilt.py: Add binary_path.parent to Linux LD_LIBRARY_PATH
in binary_env() so bundled .so files in build/bin are found even without
RPATH, matching the existing Windows PATH logic.
- Add test for binary_env LD_LIBRARY_PATH on Linux.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Handle unresolved "latest" tag in source-build fallback clone
When tag resolution fails and the requested tag is "latest", both
setup scripts now omit --branch from git clone so the default branch
is cloned instead of failing on a nonexistent "latest" branch/tag.
Similarly, the PS1 fetch path fetches the default ref when the tag
is "latest".
* Resolve actual latest ggml-org tag instead of using literal "latest"
When both Python tag resolution attempts fail and the requested tag
is "latest", query the GitHub API for the actual latest release tag
from ggml-org/llama.cpp (e.g. b8508) instead of passing the literal
string "latest" to git clone --branch, which would fail since no
such branch/tag exists.
setup.sh uses curl + python json parsing; setup.ps1 uses
Invoke-RestMethod. Both fall back to the raw requested tag if the
API call also fails.
* Try Unsloth release repo before ggml-org when resolving latest tag
When falling back to the GitHub API to resolve "latest", query the
Unsloth release repo (unslothai/llama.cpp) first since it has the
prebuilt binaries pinned to tested tags. Only fall back to
ggml-org/llama.cpp if the Unsloth repo query fails.
* Add comprehensive sandbox tests for PR #4562 bug fixes
35 tests covering all fixes across platforms:
- binary_env cross-platform (Linux LD_LIBRARY_PATH, Windows PATH,
macOS DYLD_LIBRARY_PATH) with edge cases (dedup, ordering, existing paths)
- resolve_requested_llama_tag (concrete, latest, None, empty)
- setup.sh logic via subprocess: prereq check ordering (cmake/git missing
preserves install), pinned tag in clone, fetch+checkout -B pattern,
fetch failure warns instead of aborting
- "latest" tag resolution fallback chain (Unsloth API -> ggml-org ->
raw) with mock curl: success, failure, malformed JSON, empty body,
empty tag_name, env overrides
- Source code pattern verification for both .sh and .ps1 files
All 138 tests pass in isolated uv venv.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Add binary_path.parent to macOS DYLD_LIBRARY_PATH in binary_env
macOS prebuilt .dylib files are overlaid into build/bin (same as
Linux), but binary_env only added install_dir to DYLD_LIBRARY_PATH.
Add binary_path.parent so the loader can find sibling dylibs even
without embedded loader paths.
Mirrors the existing fix for Linux LD_LIBRARY_PATH and the Windows
PATH pattern.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Guard --branch when resolved tag is "latest"; fix broken test assertion
When all API fallbacks fail and the tag stays as literal "latest",
omit --branch from git clone (clones default branch instead of
failing). Both setup.sh and setup.ps1 now check for "latest" before
passing --branch to git clone/fetch.
Also fix test_setup_ps1_clone_uses_branch_tag which used Python
tuple syntax (assert "x", "y" in z) that always passes. Changed to
assert "x" in z and "y" in z.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix macOS DYLD trailing colon, install_lock no-op, and debug log
- binary_env macOS: use dedupe_existing_dirs instead of raw string
concatenation. Eliminates trailing colon in DYLD_LIBRARY_PATH
(which causes dyld to search CWD for libraries) and deduplicates
when binary_path.parent == install_dir. Now consistent with the
Linux and Windows branches.
- install_lock: when filelock is not installed, use os.O_CREAT|O_EXCL
as a fallback exclusive file lock with timeout, instead of yielding
with no locking. Prevents concurrent installs from corrupting each
other's staging directories.
- setup.ps1: remove [DEBUG] log line that printed to every user on
every Windows setup run.
* Add stale-lock detection and atomic clone-then-swap
install_lock fallback (no filelock): write PID to lock file and
check if the holder process is still alive on contention. Dead PIDs
(ProcessLookupError) and unreadable lock files trigger immediate
cleanup. Live processes owned by other users (PermissionError) are
correctly recognized as alive -- the lock is not removed.
setup.sh/setup.ps1 source-build: clone into a temporary directory
first, then swap into place only on success. If git clone fails,
the existing install is preserved instead of being deleted by the
premature rm -rf.
* Remove redundant upstream_tag != release_tag check
load_approved_release_checksums compared checksums.upstream_tag
against the Unsloth release_tag, which are different namespaces
(upstream ggml-org tag vs Unsloth published tag). This only worked
because both happened to be "b8508" by convention. Would break if
Unsloth ever uses a different release naming scheme.
The existing check at parse_approved_release_checksums (line 950)
already validates the release_tag field correctly.
* Fix lock TOCTOU race and build-in-temp-dir swap
install_lock fallback: add os.fsync(fd) after writing PID to ensure
the PID is visible to racing processes before they check. Treat
empty lock files (PID not yet written) as "wait and retry" instead
of stale, closing the window where two processes could both see an
empty file, both unlink it, and both acquire the lock.
setup.sh/setup.ps1 source-build: clone AND build in a temp directory
(LLAMA_CPP_DIR.build.$$). Only swap into the final LLAMA_CPP_DIR
after the build succeeds. If clone or cmake or build fails, the temp
dir is cleaned up and the existing working install is preserved.
Previously, rm -rf ran after clone but before build, destroying the
existing install even if the build later failed.
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* refactor: consolidate dual venvs into single ~/.unsloth/studio/unsloth_studio
* refactor: separate install.sh (first-time) from setup.sh (smart update with PyPI version check)
* fix: install.sh calls setup.sh directly, keep both setup and update CLI commands
* fix: use importlib.resources.files() directly without _path attribute
* fix: bootstrap uv before pip upgrade to handle uv venvs without pip
* fix: frontend 404 when launched via CLI, add global symlink to ~/.local/bin
* feat: add --local flag to install.sh and unsloth studio update for branch testing
* fix: resolve repo root from script location for --local installs
* feat: add --package flag to install.sh for testing with custom package names
* feat: add --package flag to unsloth studio update
* fix: always nuke venv in install.sh for clean installs
* revert: remove Windows changes, will handle in separate PR
* fix: error when --package is passed without an argument
* revert: restore Windows scripts to current main
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix: always explicitly set STUDIO_LOCAL_INSTALL and STUDIO_PACKAGE_NAME env vars
* fix: pass explicit STUDIO_LOCAL_REPO env var for --local installs
* fix: align banner box for Setup vs Update labels
* deprecate: hide 'unsloth studio setup' command, point users to update/install.sh
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix: check stdout not stdin for auto-launch detection (curl pipe fix)
* fix: update install URL to unsloth.ai/install.sh
* fix: update install.sh usage comments to unsloth.ai/install.sh
* fix: use --upgrade-package for base deps to preserve existing torch/CUDA installs
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix: --local install now also installs unsloth-zoo via base.txt before editable overlay
* fix: don't skip base packages for --local installs (editable needs unsloth-zoo)
* refactor: move --local full dep install to install.sh, keep SKIP_STUDIO_BASE for all paths
* feat: add migration support for old .venv and CWD-based installs in setup.sh
* Revert "feat: add migration support for old .venv and CWD-based installs in setup.sh"
This reverts commit 301291d002.
* feat: migrate old .venv layout in install.sh instead of always nuking
* feat: validate old .venv with torch CUDA test before migration, recovery message on launch failure
* fix: try CUDA then fall back to CPU for migration validation
* fix: upgrade unsloth/unsloth-zoo with --reinstall-package on migration to preserve torch
* remove: delete unused unsloth ui command (use unsloth studio instead)
* Fix Windows venv path mismatch between install.ps1, setup.ps1, and studio.py
install.ps1 was creating the venv CWD-relative ($VenvName = "unsloth_studio"),
setup.ps1 was using an absolute path to ".unsloth\studio\.venv", and studio.py
looks for ".unsloth\studio\unsloth_studio". All three paths were different, so
the Windows installer would never produce a working Studio setup.
install.ps1:
- Use absolute $StudioHome + $VenvDir matching the Linux install.sh layout
- Add 3-way migration: old .venv at STUDIO_HOME, CWD-relative ~/unsloth_studio
from the previous install.ps1, or fresh creation with torch validation
- For migrated envs, upgrade unsloth while preserving existing torch/CUDA wheels
- Set SKIP_STUDIO_BASE=1 before calling setup.ps1 (matches install.sh behavior)
- Fix launch instructions to use the absolute venv path
setup.ps1:
- Change $VenvDir from ".unsloth\studio\.venv" to ".unsloth\studio\unsloth_studio"
- Add SKIP_STUDIO_BASE guard: error out if venv is missing when called from
install.ps1 (which should have already created it)
- Differentiate "Setup" vs "Update" in banners based on SKIP_STUDIO_BASE
* setup.ps1: unconditionally error if venv missing, matching setup.sh
setup.sh always errors out if the venv does not exist (line 224-228),
telling the user to run install.sh first. setup.ps1 was conditionally
creating a bare venv with python -m venv when SKIP_STUDIO_BASE was not
set, which would produce an empty venv with no torch or unsloth. Now
setup.ps1 matches setup.sh: always error, always point to install.ps1.
* Fix --torch-backend=auto CPU solver dead-end on Linux, macOS, and Windows
On CPU-only machines, `uv pip install unsloth --torch-backend=auto`
falls back to unsloth==2024.8 because the CPU solver cannot satisfy
newer unsloth's dependencies. install.ps1 already solved this with a
two-step approach; this applies the same fix to install.sh and
install_python_stack.py.
install.sh: add get_torch_index_url() that detects GPU via nvidia-smi
and maps CUDA versions to PyTorch index URLs (matching install.ps1's
Get-TorchIndexUrl). Fresh installs now install torch first via explicit
--index-url, then install unsloth with --upgrade-package to preserve
the pre-installed torch. All 5 --torch-backend=auto removed from
primary paths.
install.ps1: add fallback else-branch when TorchIndexUrl is empty,
using --torch-backend=auto as last resort (matching install.sh).
install_python_stack.py: remove unconditional --torch-backend=auto
from _build_uv_cmd. Torch is pre-installed by install.sh/setup.ps1
by the time this runs. Callers that need it can set UV_TORCH_BACKEND.
Both install.sh and install.ps1 now share the same three-branch logic:
migrated env (upgrade-package only), normal (torch-first + index-url),
and fallback (--torch-backend=auto if URL detection fails).
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Use --reinstall-package for migrated envs on both Linux and Windows
For migrated environments (moved from legacy venv location),
--reinstall-package is better than --upgrade-package because it forces
a clean reinstall even if the same version is already installed. This
ensures proper .dist-info and .pyc state in the new venv location.
--upgrade-package remains correct for the fresh install path where
torch is already installed and we just want to add unsloth without
re-resolving torch.
* Address review findings: portability, parity, and stale comments
- Replace grep -oP (GNU Perl regex) with POSIX sed in
get_torch_index_url() so the script works on BSD grep (macOS is
already guarded by the Darwin early-return, but Alpine/BusyBox
would silently get the wrong CUDA tag)
- Add LC_ALL=C before nvidia-smi invocation to prevent locale-dependent
output parsing issues
- Add warning on stderr when nvidia-smi output is unparseable, matching
install.ps1's [WARN] message
- Add explicit unsloth-zoo positional arg to install.ps1 migrated path,
matching install.sh (--reinstall-package alone won't install it if it
was never present in the migrated env)
- Fix stale comment in install_python_stack.py line 392 that still
claimed --torch-backend=auto is added by _build_uv_cmd
- Add sed to test tools directory (function now uses sed instead of grep)
* Add --index-url to migrated env path to prevent CPU torch resolution
The migrated path runs uv pip install with --reinstall-package for
unsloth/unsloth-zoo. While uv should keep existing torch as satisfied,
the resolver could still re-resolve torch as a transitive dependency.
Without --index-url pointing at the correct CUDA wheel index, the
resolver would fall back to plain PyPI and potentially pull CPU-only
torch. Adding --index-url $TORCH_INDEX_URL ensures CUDA wheels are
available if the resolver needs them.
Applied to both install.sh and install.ps1.
* Revert --index-url on migrated env path
The original install.ps1 on main already handles the migrated path
without --index-url and it works correctly. --reinstall-package only
forces reinstall of the named packages while uv keeps existing torch
as satisfied. No need for the extra flag.
* Fix unsloth studio update --local not installing local checkout
studio.py sets STUDIO_LOCAL_REPO when --local is passed, but
install_python_stack.py never read it. The update path always
installed from PyPI regardless of the --local flag.
Add a local_repo branch that first updates deps from base.txt
(with --upgrade-package to preserve torch), then overlays the
local checkout as an editable install with --no-deps.
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* feat: Implement Q-GaLore optimizer and custom embedding learning rate in the Unsloth trainer.
* feat: Implement QGaLoreAdamW8bit optimizer with 8-bit states, GaLore low-rank gradient projection, and optional INT8 weight quantization, along with supporting projector and tests.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* feat: Introduce Q-GaLore AdamW optimizer with low-rank quantized gradient projection and integrate into the trainer, along with dedicated tests.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* feat: Implement Q-GaLore AdamW optimizer with gradient projection and quantization, including trainer integration and corresponding tests.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix 3 bugs in Q-GaLore optimizer and add weight_quant forward hooks
1. Fix use-after-delete crash: move `del p._saved_data` after the
weight decay block so decoupled weight decay can reference the
current weights correctly (p.data).
2. Fix substring matching in make_q_galore_param_groups: split
parameter names on "." and check exact component matches to
prevent false positives (e.g. "not_q_proj" matching "q_proj").
3. Implement forward pre-hooks for weight_quant: after the optimizer
quantizes weights to INT8, replace p.data with a 1-element
placeholder to free float memory. A register_forward_pre_hook
dequantizes back to float before each forward pass. The trainer
calls install_weight_quant_hooks() when weight_quant is enabled.
4. Update test_weight_decay_uses_saved_data to match the fixed code
path (decoupled decay uses p.data, expected value 2.7). Add
test_weight_quant_hook_restores_float to verify the INT8-to-float
hook round-trip.
All 24/24 Q-GaLore tests pass. Benchmarked on Llama-3.2-1B-Instruct
FFT: Q-GaLore saves 32% VRAM (10.63 -> 7.24 GB) with better loss
convergence (1.3 vs 2.0 at step 100). No regressions in 31-notebook
sweep across Llama, Qwen, Mistral, Phi, Gemma, vision, and GRPO.
* Default weight_quant to False in QGaloreConfig
Benchmarks show weight_quant=True adds ~1 GB on Llama-3.2-1B due to
INT8 copy/scale overhead exceeding savings from the placeholder trick.
Users can still opt in explicitly. The optimizer logic is unchanged.
* Optimize Q-GaLore projector and optimizer step performance
Projector (q_galore_projector.py):
- Use torch.svd_lowrank with oversampling p=10 (Halko et al. 2009) instead
of full SVD for large matrices. Falls back to full SVD when min(m,n) <= 2*rank.
SVD steps are 6-8x faster on Llama-3.2-1B (22s -> 3s for first step).
- Cache the dequantized ortho matrix between project() and project_back() to
avoid redundant dequantization when quant=True.
- Replace F.cosine_similarity with torch.dot for 1-D unit vectors in the
adaptive schedule. Remove unused torch.nn.functional import.
- Use collections.deque(maxlen=queue_size) instead of list with manual pop(0).
Optimizer (q_galore_adamw.py):
- Remove redundant .clone() on dequantized weights (line 151) and on float
data before re-quantization (line 211). _dequantize already returns a fresh
tensor and _quantize/_quantize_stochastic only reads its input.
- Consolidate per-group torch.cuda.synchronize() into a single call after
all param groups complete.
- Use torch.empty instead of torch.zeros for the scalar placeholder tensor
that is never read.
Verified: 24/24 unit tests pass. Llama-3.2-1B 61-step training produces
losses within 0.24% relative diff (correlation >0.9999) of the original.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* Fixup mapper issues and resolve properly
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Refactor Ollama template wiring and harden packing helpers
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
* Fix Qwen3 and Gemma3n template bindings and tidy packing test helper
* Fix gptoss Ollama comment and tinyllama stop parameter
- Fix wrong comment referencing gemma3n for gptoss_ollama in chat_templates.py
- Add missing stop keyword to tinyllama PARAMETER in ollama_template_mappers.py
* Fix _DummyTrainer compatibility across TRL versions
The try/except only handled the removal of return_position_ids
(TRL v0.24+) but not the absence of padding_free (TRL v0.18.2).
Gracefully degrade through all optional collator flags so the
test works from trl>=0.18.2 through v0.27+.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* add int8 weight-only QAT scheme, add test, fix tests for current torchao version
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* change quantization to PerAxis
* lambda =/
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add torchao messages, remove group_size from int8
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* raise exception on missing torchao
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* touch up the torchao imports
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
- Fix test file: use return_tokenized instead of return_tensors
- Fix test file: use text_dataset instead of undefined dataset variable
- Move parameter validation to constructor (fail fast on invalid params)
- Add labels field in tokenized output for causal LM training
- Add empty file handling with clear error message
- Add tests for constructor validation and labels field
* vllm sampling params fix
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* do not patch base_trainer
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* seperate vllm fixes
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Apply suggestion from @danielhanchen
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"
This reverts commit 58b483dc0d1790f99580665801d3fa0d7267c533.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"
This reverts commit b2497519659a9f301e7a633795d9efdafdc2b277.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"
This reverts commit de3daaf429f81aceb6632932b0cb1af5149652a8.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
**Summary:** The existing QAT + LoRA path only applied fake
quantization to the original slow path, but the default is the
fast path that calls unsloth's fast LoRA primitives. This commit
integrates fake quantization into these fast primitives as well,
and add unit tests to assert that fake quantization is actually
taking place.
**Test Plan:**
Unit tests:
```
pytest tests/utils/test_qat.py
```
End-to-end test: https://gist.github.com/andrewor14/6360dd69b5784c71c46e80c14f53e6b6
Full fine-tuning Llama3.1-8B with and without QAT + LoRA on yahma/alpaca-cleaned for 1 epoch:
- Batch size = 8 (no grad accum)
- Learning rate = 2e-4
- Quantization scheme = int4 weight only (with bf16 activations)
Wikitext perplexity:
- Baseline = int4 quantized model finetuned without QAT
- QAT int4 quantized model (with this PR) achieved 33% lower perplexity than the int4 baseline
- QAT int4 quantized model without this PR was worse than the int4 baseline
```
==> unsloth_model_lora_baseline_output/lm_eval_float.log <==
| | |none | 0|word_perplexity|↓ |7.5551|± | N/A|
==> unsloth_model_lora_baseline_output/lm_eval_quantized.log <==
| | |none | 0|word_perplexity|↓ |8.7655|± | N/A|
==> unsloth_model_lora_qat_int4_output/lm_eval_quantized.log <==
| | |none | 0|word_perplexity|↓ |8.3548|± | N/A|
```
Summary:
Previously the test was not ran correctly and the save to local path is not tested
this PR added support for that and tries to test properly
Note: `python tests/saving/test_unsloth_save.py` doesn't run test
Test Plan:
pytest tests/saving/test_unsloth_save.py -k test_save_torchao
Reviewers:
Subscribers:
Tasks:
Tags:
Summary:
Allow users merge the LoRA weights and then do a post training quantization with torchao
Usage:
```
from torchao.quantization import Int8DynamicActivationInt8WeightConfig
torchao_config = Int8DynamicActivationInt8WeightConfig()
model.save_pretrained_torchao(
save_path,
tokenizer=tokenizer,
torchao_config=torchao_config,
)
```
Test Plan:
python tests/saving/test_unsloth_save.py
Reviewers:
Subscribers:
Tasks:
Tags: