unsloth

mirror of https://github.com/unslothai/unsloth synced 2026-04-21 13:37:39 +00:00

History

Daniel Han 44082cf88e Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM (#5014 ) * Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM The chat settings sheet's ctx slider reads `max_context_length` from `/api/inference/status` and renders Exceeds estimated VRAM capacity (N tokens). The model may use system RAM. when the user drags the slider above that value. For models whose weights fit on some GPU subset, `_max_context_length` was already set to the binary-search cap and the warning fired correctly. For models whose weights exceed 90% of every GPU subset's free memory (e.g. MiniMax-M2.7-GGUF at 131 GB on a 97 GB GPU), the ceiling-probe loop never matched a subset, so `max_available_ctx` stayed at the native context (e.g. 196608). The slider ran all the way to native with no indication that any value above the 4096 spec default would trigger `--fit on` and degrade performance. Anchor `max_available_ctx` at `min(4096, native_context_length)` when no subset fits, so the warning fires at the right threshold and the user sees the correct safe-zone / warning-zone split: Before (MiniMax-M2.7 on 97 GB GPU): slider 0 .. 196608, warning threshold = 196608 (never fires) After: slider 0 .. 196608, warning threshold = 4096 (fires correctly) No frontend changes required: `chat-settings-sheet.tsx` already consumes `ggufMaxContextLength` (= status.max_context_length) as the warning threshold and `ggufNativeContextLength` as the slider max. Adds tests/test_llama_cpp_max_context_threshold.py covering weights-exceed-VRAM (single / multi-GPU), a native-ctx below the 4096 fallback case (don't lie about supported ctx), fittable-model regressions (small / multi-GPU / tiny on huge GPU), and the `max_context_length` property's fallback semantics. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>		2026-04-14 08:53:49 -07:00
..
backend	Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM (#5014 )	2026-04-14 08:53:49 -07:00
frontend	[Studio] Show non exported models in chat UI (#4892 )	2026-04-14 15:03:58 +04:00
__init__.py	Final cleanup	2026-03-12 18:28:04 +00:00
install_llama_prebuilt.py	Add AMD ROCm/HIP support across installer and hardware detection (#4720 )	2026-04-10 01:56:12 -07:00
install_python_stack.py	[Studio] Install flash attn at setup time for linux (#4979 )	2026-04-14 16:40:17 +04:00
LICENSE.AGPL-3.0	Add AGPL-3.0 license to studio folder	2026-03-09 19:36:25 +00:00
setup.bat	Final cleanup	2026-03-12 18:28:04 +00:00
setup.ps1	split venv_t5 into tiered 5.3.0/5.5.0 and fix trust_remote_code (#4878 )	2026-04-07 20:05:01 +04:00
setup.sh	Add AMD ROCm/HIP support across installer and hardware detection (#4720 )	2026-04-10 01:56:12 -07:00
Unsloth_Studio_Colab.ipynb	Allow install_python_stack to run on Colab (#4633 )	2026-03-27 00:29:27 +04:00