unsloth

mirror of https://github.com/unslothai/unsloth synced 2026-04-21 13:37:39 +00:00

History

Daniel Han 767fa8cade Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM (#5011 ) * Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM The load-time auto-fit in LlamaCppBackend.load_model had two issues for models whose weights do not fit on any GPU subset (the common case for large MoE GGUFs such as MiniMax-M2.7, Qwen3.5-397B-A17B, etc.): 1. Auto mode (max_seq_length=0) left effective_ctx at the model's native context when no subset passed the 90% fit check. The UI slider then landed on e.g. 196608 for MiniMax-M2.7, far above anything usable. Default the auto-pick to 4096 so the UI starts at a sane value; the slider ceiling stays at the native context so the user can still opt in to longer contexts and receive the "might be slower" warning. 2. Explicit ctx was silently shrunk when weights fit but the requested KV overflowed the 90% budget. The shrink loop emitted -c <capped> -ngl -1 without informing the caller, so a user who had opted into a longer context via the UI never actually got it. Drop the shrink loop on the explicit path and emit -c <user_ctx> --fit on instead, letting llama-server flex -ngl (CPU layer offload). Adds tests/test_llama_cpp_context_fit.py covering both paths, the file-size-only fallback when KV metadata is missing, non-regression on fittable auto-pick, and platform-agnostic input shape. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>		2026-04-14 08:53:25 -07:00
..
backend	Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM (#5011 )	2026-04-14 08:53:25 -07:00
frontend	[Studio] Show non exported models in chat UI (#4892 )	2026-04-14 15:03:58 +04:00
__init__.py	Final cleanup	2026-03-12 18:28:04 +00:00
install_llama_prebuilt.py	Add AMD ROCm/HIP support across installer and hardware detection (#4720 )	2026-04-10 01:56:12 -07:00
install_python_stack.py	[Studio] Install flash attn at setup time for linux (#4979 )	2026-04-14 16:40:17 +04:00
LICENSE.AGPL-3.0	Add AGPL-3.0 license to studio folder	2026-03-09 19:36:25 +00:00
setup.bat	Final cleanup	2026-03-12 18:28:04 +00:00
setup.ps1	split venv_t5 into tiered 5.3.0/5.5.0 and fix trust_remote_code (#4878 )	2026-04-07 20:05:01 +04:00
setup.sh	Add AMD ROCm/HIP support across installer and hardware detection (#4720 )	2026-04-10 01:56:12 -07:00
Unsloth_Studio_Colab.ipynb	Allow install_python_stack to run on Colab (#4633 )	2026-03-27 00:29:27 +04:00