unsloth/studio
Daniel Han 44082cf88e
Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM (#5014)
* Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM

The chat settings sheet's ctx slider reads `max_context_length` from
`/api/inference/status` and renders

    Exceeds estimated VRAM capacity (N tokens). The model may use
    system RAM.

when the user drags the slider above that value. For models whose
weights fit on some GPU subset, `_max_context_length` was already set
to the binary-search cap and the warning fired correctly.

For models whose weights exceed 90% of every GPU subset's free memory
(e.g. MiniMax-M2.7-GGUF at 131 GB on a 97 GB GPU), the ceiling-probe
loop never matched a subset, so `max_available_ctx` stayed at the
native context (e.g. 196608). The slider ran all the way to native
with no indication that any value above the 4096 spec default would
trigger `--fit on` and degrade performance.

Anchor `max_available_ctx` at `min(4096, native_context_length)` when
no subset fits, so the warning fires at the right threshold and the
user sees the correct safe-zone / warning-zone split:

    Before (MiniMax-M2.7 on 97 GB GPU):
      slider 0 .. 196608, warning threshold = 196608  (never fires)

    After:
      slider 0 .. 196608, warning threshold = 4096    (fires correctly)

No frontend changes required: `chat-settings-sheet.tsx` already
consumes `ggufMaxContextLength` (= status.max_context_length) as the
warning threshold and `ggufNativeContextLength` as the slider max.

Adds tests/test_llama_cpp_max_context_threshold.py covering
weights-exceed-VRAM (single / multi-GPU), a native-ctx below the 4096
fallback case (don't lie about supported ctx), fittable-model
regressions (small / multi-GPU / tiny on huge GPU), and the
`max_context_length` property's fallback semantics.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 08:53:49 -07:00
..
backend Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM (#5014) 2026-04-14 08:53:49 -07:00
frontend [Studio] Show non exported models in chat UI (#4892) 2026-04-14 15:03:58 +04:00
__init__.py Final cleanup 2026-03-12 18:28:04 +00:00
install_llama_prebuilt.py Add AMD ROCm/HIP support across installer and hardware detection (#4720) 2026-04-10 01:56:12 -07:00
install_python_stack.py [Studio] Install flash attn at setup time for linux (#4979) 2026-04-14 16:40:17 +04:00
LICENSE.AGPL-3.0 Add AGPL-3.0 license to studio folder 2026-03-09 19:36:25 +00:00
setup.bat Final cleanup 2026-03-12 18:28:04 +00:00
setup.ps1 split venv_t5 into tiered 5.3.0/5.5.0 and fix trust_remote_code (#4878) 2026-04-07 20:05:01 +04:00
setup.sh Add AMD ROCm/HIP support across installer and hardware detection (#4720) 2026-04-10 01:56:12 -07:00
Unsloth_Studio_Colab.ipynb Allow install_python_stack to run on Colab (#4633) 2026-03-27 00:29:27 +04:00