unsloth/tests
Ricardo-M-L d5525e8bbb
fix: check find() return value before adding offset in try_fix_tokenizer (#4923)
* fix: check find() return value before adding offset in try_fix_tokenizer

The `str.find()` result was checked for -1 only after adding
`len(find_text)`, turning the guard into dead code. When the substring
is absent, `start` becomes `len(find_text) - 1` (a positive number),
so the `if start == -1: continue` never triggers and the subsequent
slice extracts garbage from the tokenizer string.

Split the find and offset into two steps so the -1 check works correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add defensive guards for token_id None and end find() returning -1

- Skip loop iteration early when token_id is None to avoid constructing
  a find_text that can never match valid JSON
- Guard end = tokenizer_string.find('",', start) against -1 to prevent
  silent garbage extraction from malformed tokenizer strings

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-09 06:15:46 -07:00
..
python fix: add tokenizers to no-torch deps and TORCH_CONSTRAINT for arm64 macOS py313+ (#4748) 2026-04-01 06:12:17 -07:00
qlora Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" 2025-12-01 07:24:58 -08:00
saving Add regression test for shell injection fix in GGML conversion (#4773) 2026-04-02 00:10:47 -07:00
sh fix: add tokenizers to no-torch deps and TORCH_CONSTRAINT for arm64 macOS py313+ (#4748) 2026-04-01 06:12:17 -07:00
studio/install Update test_pr4562_bugfixes.py for simplified install policy (#4817) 2026-04-03 04:06:14 -07:00
utils feat: Implement Q-GaLore optimizer and custom embedding learning rate… (#4511) 2026-03-25 01:03:10 -07:00
__init__.py Qwen 3, Bug Fixes (#2445) 2025-04-30 22:38:39 -07:00
run_all.sh fix: add tokenizers to no-torch deps and TORCH_CONSTRAINT for arm64 macOS py313+ (#4748) 2026-04-01 06:12:17 -07:00
test_get_model_name.py Fixup mapper issues and resolve properly (#4124) 2026-03-03 06:57:25 -08:00
test_loader_glob_skip.py Add unit tests for HfFileSystem glob skip guard (#4854) 2026-04-06 08:54:36 -07:00
test_model_registry.py Revert "[FIX] Vllm guided decoding params (#3662)" 2025-12-01 05:43:45 -08:00
test_raw_text.py fix: check find() return value before adding offset in try_fix_tokenizer (#4923) 2026-04-09 06:15:46 -07:00