unsloth

mirror of https://github.com/unslothai/unsloth synced 2026-04-21 13:37:39 +00:00

History

Ricardo-M-L d5525e8bbb fix: check find() return value before adding offset in try_fix_tokenizer (#4923 ) * fix: check find() return value before adding offset in try_fix_tokenizer The `str.find()` result was checked for -1 only after adding `len(find_text)`, turning the guard into dead code. When the substring is absent, `start` becomes `len(find_text) - 1` (a positive number), so the `if start == -1: continue` never triggers and the subsequent slice extracts garbage from the tokenizer string. Split the find and offset into two steps so the -1 check works correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add defensive guards for token_id None and end find() returning -1 - Skip loop iteration early when token_id is None to avoid constructing a find_text that can never match valid JSON - Guard end = tokenizer_string.find('",', start) against -1 to prevent silent garbage extraction from malformed tokenizer strings * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>		2026-04-09 06:15:46 -07:00
..
python	fix: add tokenizers to no-torch deps and TORCH_CONSTRAINT for arm64 macOS py313+ (#4748 )	2026-04-01 06:12:17 -07:00
qlora	Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"	2025-12-01 07:24:58 -08:00
saving	Add regression test for shell injection fix in GGML conversion (#4773 )	2026-04-02 00:10:47 -07:00
sh	fix: add tokenizers to no-torch deps and TORCH_CONSTRAINT for arm64 macOS py313+ (#4748 )	2026-04-01 06:12:17 -07:00
studio/install	Update test_pr4562_bugfixes.py for simplified install policy (#4817 )	2026-04-03 04:06:14 -07:00
utils	feat: Implement Q-GaLore optimizer and custom embedding learning rate… (#4511 )	2026-03-25 01:03:10 -07:00
__init__.py	Qwen 3, Bug Fixes (#2445 )	2025-04-30 22:38:39 -07:00
run_all.sh	fix: add tokenizers to no-torch deps and TORCH_CONSTRAINT for arm64 macOS py313+ (#4748 )	2026-04-01 06:12:17 -07:00
test_get_model_name.py	Fixup mapper issues and resolve properly (#4124 )	2026-03-03 06:57:25 -08:00
test_loader_glob_skip.py	Add unit tests for HfFileSystem glob skip guard (#4854 )	2026-04-06 08:54:36 -07:00
test_model_registry.py	Revert "[FIX] Vllm guided decoding params (#3662 )"	2025-12-01 05:43:45 -08:00
test_raw_text.py	fix: check find() return value before adding offset in try_fix_tokenizer (#4923 )	2026-04-09 06:15:46 -07:00