unsloth/tests
Avaya Aggarwal 7c5464ad71
feat: Add cactus QAT scheme support (#4679)
* feat: Add cactus QAT scheme support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test(qat): add tests for cactus QAT scheme and fix missing import

* Fix cactus QAT scheme: correct MappingType import, tighten PerGroup filter

- Drop the broken `from torchao.dtypes import MappingType` import. `MappingType`
  lives in `torchao.quantization` (and `torchao.quantization.quant_primitives`);
  it is not exported from `torchao.dtypes` in any supported torchao release
  (verified on 0.14, 0.16, 0.17). The previous code raised `ImportError` on
  every cactus call and was masked as a misleading 'torchao not found' error.
- Since `IntxWeightOnlyConfig` already defaults `mapping_type` to
  `MappingType.SYMMETRIC`, drop the explicit kwarg entirely and remove the
  import. Behavior is unchanged.
- Introduce a named `group_size = 32` constant (matches the int4 / fp8-int4
  pattern in the surrounding branches) and add a `% group_size == 0`
  divisibility guard to the filter. `PerGroup(32)` requires
  `in_features % 32 == 0` at `quantize_()` time, otherwise torchao raises
  `ValueError: in_features (N) % group_size (32) must be == 0`. The old
  `in_features >= 32` filter would admit non-aligned widths (e.g. 33, 48, 65,
  127) and crash `_prepare_model_for_qat` for those shapes.

* Warn when cactus QAT skips non-divisible Linear layers

Multiple reviewers flagged that the divisibility guard added in the
previous commit can silently leave Linear layers in full precision when
their in_features is not a multiple of 32. For currently supported
Unsloth models (Qwen, Llama, Gemma, Mistral, Phi) every Linear width is
already a multiple of 32/64/128 so this never triggers, but surfacing
the coverage gap is cheap and avoids users assuming 100% QAT coverage
when they bring a custom model with unusual shapes.

Emit a UserWarning listing up to the first 8 skipped layers whenever
the cactus filter excludes any Linear due to the modulo guard. This
keeps the lenient silent-skip behavior (consistent with int4 /
fp8-int4), but stops making it silent.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-15 07:40:03 -07:00
..
python Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var (#5024) 2026-04-15 11:39:11 +04:00
qlora Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" 2025-12-01 07:24:58 -08:00
saving Add regression test for shell injection fix in GGML conversion (#4773) 2026-04-02 00:10:47 -07:00
sh Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var (#5024) 2026-04-15 11:39:11 +04:00
studio/install Add ROCm test suite for PR #4720 (#4824) 2026-04-11 04:44:13 -07:00
utils feat: Add cactus QAT scheme support (#4679) 2026-04-15 07:40:03 -07:00
__init__.py Qwen 3, Bug Fixes (#2445) 2025-04-30 22:38:39 -07:00
run_all.sh fix: add tokenizers to no-torch deps and TORCH_CONSTRAINT for arm64 macOS py313+ (#4748) 2026-04-01 06:12:17 -07:00
test_cli_export_unpacking.py studio: stream export worker output into the export dialog (#4897) 2026-04-14 08:55:43 -07:00
test_get_model_name.py feat: Add support for OLMo-3 model (#4678) 2026-04-15 07:39:11 -07:00
test_loader_glob_skip.py Add unit tests for HfFileSystem glob skip guard (#4854) 2026-04-06 08:54:36 -07:00
test_model_registry.py Revert "[FIX] Vllm guided decoding params (#3662)" 2025-12-01 05:43:45 -08:00
test_raw_text.py fix: check find() return value before adding offset in try_fix_tokenizer (#4923) 2026-04-09 06:15:46 -07:00