unsloth

mirror of https://github.com/unslothai/unsloth synced 2026-04-21 13:37:39 +00:00

History

Avaya Aggarwal 7c5464ad71 feat: Add cactus QAT scheme support (#4679 ) * feat: Add cactus QAT scheme support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test(qat): add tests for cactus QAT scheme and fix missing import * Fix cactus QAT scheme: correct MappingType import, tighten PerGroup filter - Drop the broken `from torchao.dtypes import MappingType` import. `MappingType` lives in `torchao.quantization` (and `torchao.quantization.quant_primitives`); it is not exported from `torchao.dtypes` in any supported torchao release (verified on 0.14, 0.16, 0.17). The previous code raised `ImportError` on every cactus call and was masked as a misleading 'torchao not found' error. - Since `IntxWeightOnlyConfig` already defaults `mapping_type` to `MappingType.SYMMETRIC`, drop the explicit kwarg entirely and remove the import. Behavior is unchanged. - Introduce a named `group_size = 32` constant (matches the int4 / fp8-int4 pattern in the surrounding branches) and add a `% group_size == 0` divisibility guard to the filter. `PerGroup(32)` requires `in_features % 32 == 0` at `quantize_()` time, otherwise torchao raises `ValueError: in_features (N) % group_size (32) must be == 0`. The old `in_features >= 32` filter would admit non-aligned widths (e.g. 33, 48, 65, 127) and crash `_prepare_model_for_qat` for those shapes. * Warn when cactus QAT skips non-divisible Linear layers Multiple reviewers flagged that the divisibility guard added in the previous commit can silently leave Linear layers in full precision when their in_features is not a multiple of 32. For currently supported Unsloth models (Qwen, Llama, Gemma, Mistral, Phi) every Linear width is already a multiple of 32/64/128 so this never triggers, but surfacing the coverage gap is cheap and avoids users assuming 100% QAT coverage when they bring a custom model with unusual shapes. Emit a UserWarning listing up to the first 8 skipped layers whenever the cactus filter excludes any Linear due to the modulo guard. This keeps the lenient silent-skip behavior (consistent with int4 / fp8-int4), but stops making it silent. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>		2026-04-15 07:40:03 -07:00
..
python	Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var (#5024 )	2026-04-15 11:39:11 +04:00
qlora	Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"	2025-12-01 07:24:58 -08:00
saving	Add regression test for shell injection fix in GGML conversion (#4773 )	2026-04-02 00:10:47 -07:00
sh	Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var (#5024 )	2026-04-15 11:39:11 +04:00
studio/install	Add ROCm test suite for PR #4720 (#4824 )	2026-04-11 04:44:13 -07:00
utils	feat: Add cactus QAT scheme support (#4679 )	2026-04-15 07:40:03 -07:00
__init__.py	Qwen 3, Bug Fixes (#2445 )	2025-04-30 22:38:39 -07:00
run_all.sh	fix: add tokenizers to no-torch deps and TORCH_CONSTRAINT for arm64 macOS py313+ (#4748 )	2026-04-01 06:12:17 -07:00
test_cli_export_unpacking.py	studio: stream export worker output into the export dialog (#4897 )	2026-04-14 08:55:43 -07:00
test_get_model_name.py	feat: Add support for OLMo-3 model (#4678 )	2026-04-15 07:39:11 -07:00
test_loader_glob_skip.py	Add unit tests for HfFileSystem glob skip guard (#4854 )	2026-04-06 08:54:36 -07:00
test_model_registry.py	Revert "[FIX] Vllm guided decoding params (#3662 )"	2025-12-01 05:43:45 -08:00
test_raw_text.py	fix: check find() return value before adding offset in try_fix_tokenizer (#4923 )	2026-04-09 06:15:46 -07:00