mirror of
https://github.com/unslothai/unsloth
synced 2026-04-21 13:37:39 +00:00
* fix: check find() return value before adding offset in try_fix_tokenizer
The `str.find()` result was checked for -1 only after adding
`len(find_text)`, turning the guard into dead code. When the substring
is absent, `start` becomes `len(find_text) - 1` (a positive number),
so the `if start == -1: continue` never triggers and the subsequent
slice extracts garbage from the tokenizer string.
Split the find and offset into two steps so the -1 check works correctly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add defensive guards for token_id None and end find() returning -1
- Skip loop iteration early when token_id is None to avoid constructing
a find_text that can never match valid JSON
- Guard end = tokenizer_string.find('",', start) against -1 to prevent
silent garbage extraction from malformed tokenizer strings
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
||
|---|---|---|
| .. | ||
| python | ||
| qlora | ||
| saving | ||
| sh | ||
| studio/install | ||
| utils | ||
| __init__.py | ||
| run_all.sh | ||
| test_get_model_name.py | ||
| test_loader_glob_skip.py | ||
| test_model_registry.py | ||
| test_raw_text.py | ||