unsloth

mirror of https://github.com/unslothai/unsloth synced 2026-04-21 13:37:39 +00:00

Author	SHA1	Message	Date
Daniel Han	6e87bade25	Trim verbose comments in PATH helpers Reduce inline comments from ~160 lines to ~25 across both files. Keep one-line summaries of the "why"; drop multi-paragraph rationale blocks that repeated information already captured in commit messages and PR discussion.	2026-04-16 12:01:01 +00:00
Etherll	ec32ce2e82	fix: use direct registry API for PATH writes instead of SetEnvironmentVariable (#4961 ) * fix: replacing SetEnvironmentVariable with direct registry API * apply reviews * Use CreateSubKey for HKCU\Environment * Store PATH backup under HKCU\Software\Unsloth * Fix $backupKey registry handle leak in PATH backup block Wrap $backupKey operations in try/finally so the handle is closed even if GetValue or SetValue throws. The Add-ToUserPath helper already uses this pattern for its registry key -- the backup block was the only place missing it. * Isolate WM_SETTINGCHANGE broadcast from PATH write error handling Wrap the broadcast dummy-variable calls in their own try/catch so a broadcast failure does not mask a successful registry PATH write. Previously, if SetEnvironmentVariable threw after SetValue already committed the new PATH, Add-ToUserPath would return $false and the caller would skip Refresh-SessionPath. * PATH helper polish: venv precedence, quoted entries, raw/expanded dedup Three small follow-ups surfaced by a 10-reviewer pass against the rebased PR head. None fix a regression vs main; each strictly improves the new helpers. Refresh-SessionPath / Refresh-Environment: - Move $env:Path to the front of the merge so an activated venv keeps precedence over machine/user PATH after a refresh. Pre-PR dropped process-only entries entirely; post-PR kept them but at the back. - Dedup on both raw and expanded forms so %USERPROFILE%\foo and the already-expanded C:\Users\me\foo do not both survive. Add-ToUserPath: - Trim whitespace and surrounding double-quotes from each compared entry so quoted PATH entries like "C:\Program Files\CMake\bin" deduplicate against an unquoted directory of the same path. * Back up User PATH inside Add-ToUserPath, before first mutation Previously only studio/setup.ps1 took a one-time PATH backup, at script top (line ~547). install.ps1 (the irm \| iex entry point) had no backup, so users who installed via that path had no recovery surface if anything clobbered their PATH. The PR description's "one-time backup before any modifications" promise only held for the studio installer flow. Move the backup into Add-ToUserPath itself: just before the first actual SetValue mutation, write the pristine raw PATH to HKCU\Software\Unsloth\PathBackup if no backup already exists. This: - Covers both entry points (install.ps1 and studio/setup.ps1). - Captures the TRUE pristine PATH even when install.ps1 runs first and studio/setup.ps1 runs afterwards (the script-top backup in setup.ps1 would otherwise see an already-modified PATH). - Is idempotent: once a backup exists, subsequent calls preserve it. - Skips when nothing would mutate (dedup match) or PATH is empty. The script-top backup in studio/setup.ps1 is kept for defense in depth. * Refresh PATH: venv-aware merge order Reconcile two competing concerns about Refresh-SessionPath / Refresh-Environment surfaced by separate review rounds: - venv at the back -> activated venv loses precedence to system Python - process at the front -> stale shims (old node, old python, etc.) still on $env:Path can beat a freshly installed tool New merge order: 1. Activated venv Scripts dir, only if $env:VIRTUAL_ENV is set 2. Machine PATH freshly read from registry 3. User PATH freshly read from registry 4. Current $env:Path as fallback This way an explicitly-activated venv keeps priority while a tool the script just installed wins over any stale entry that was already on the inherited shell PATH. When no venv is active, fresh registry entries take precedence as expected. * Append to User PATH by default, close $envKey in finally Add-ToUserPath gains a -Position Append\|Prepend parameter defaulting to Append so installing unsloth no longer prepends the bundled venv Scripts directory ahead of the user's existing python / pip on new shells. The four current call sites (install.ps1 launcher, studio/setup.ps1 CMake, nvcc, Python user Scripts) all take the Append default because each one that needs in-session precedence already does an inline $env:Path prepend independently. This matches rustup / cargo / nvm / pyenv / uv behavior. Also wrap the script-top $envKey.GetValue in a try/finally so the registry handle is released even if the read throws. Matches the pattern already used for $backupKey five lines below. * Prepend cmake, nvcc, Python Scripts; keep venv Scripts appended The previous commit switched Add-ToUserPath to append by default so that installing unsloth would not silently hijack the user's system python / pip. That was correct for the venv Scripts dir (which contains python.exe and pip.exe alongside unsloth.exe), but wrong for the three studio/setup call sites. Those persist cmake, the driver-compatible nvcc, and the Python user Scripts dir for future shells, and in all three cases an older tool already earlier in the user PATH would keep winning after the install finished. The nvcc case is especially load-bearing: setup selects a driver-compatible CUDA toolkit, then llama.cpp builds against whatever wins PATH resolution, so a stale older nvcc produces broken builds. Pass -Position 'Prepend' explicitly at the three setup.ps1 call sites (cmake at line 754, nvcc bin at line 1025, Python user Scripts at line 1191). None of those directories holds python.exe, so prepending them does not re-introduce the original hijack problem. Leave the install.ps1 venv Scripts call on the default Append with a comment explaining why. * Symmetric dedup, Prepend reorders duplicates, unsloth shim dir Address three separate findings surfaced by review: 1. Dedup asymmetry (Gemini high-priority): the existing dedup expanded registry entries via ExpandEnvironmentVariables but did NOT expand the new directory. Passing "%USERPROFILE%\foo" when "C:\Users\me\foo" was already in PATH produced a duplicate. Expand both sides so the check is symmetric. 2. -Position Prepend no-op on existing duplicates: the dedup loop returned $false as soon as it saw a match, regardless of position. That left a late-position duplicate in place instead of moving it to the front, so "prepend the newly selected cmake/nvcc" did not always beat an older copy earlier in PATH. Partition entries into kept and dropped lists, then reinsert a single copy at the requested position. Append still returns $false on any match so user-curated orderings are not reshuffled. Prepend also returns $false when the only copy is already at position 0 so we preserve the user's casing. 3. Stop adding the venv Scripts dir to User PATH entirely. That dir holds python.exe and pip.exe alongside unsloth.exe, so neither Prepend nor Append worked: prepend hijacked the user's system python and pip, append made the freshly-installed unsloth.exe lose to any older unsloth.exe earlier on PATH. Replace the Scripts-dir PATH add with a dedicated shim directory that contains only unsloth.cmd, and prepend that dir. The shim calls the venv's unsloth.exe by absolute path so future pip upgrades inside the venv propagate automatically. * Shim via hardlink, Append user Scripts, drop venv sysconfig fallback Three follow-ups to the `c0ab1ab` shim commit, targeting concerns raised in the second 20-reviewer pass: 1. Shim uses unsloth.exe (hardlink, copy fallback) instead of unsloth.cmd. The batch-file approach had three distinct regressions: - cmd.exe expanded %...% sequences inside user arguments, so prompts like "What does 50% mean?" got mangled before reaching the CLI - Git Bash / MSYS2 / POSIX-style shells on Windows do not resolve bare-name lookups to .cmd files, so `unsloth` stopped working there - Set-Content -Encoding ASCII replaced non-ASCII profile characters with '?', so installs under C:\Users\Jörg\... wrote a broken shim A hardlink (fallback: copy) of unsloth.exe is a native Windows executable with no shell indirection. PATHEXT picks .exe before .cmd in cmd.exe and PowerShell, Git Bash honors .exe natively, subprocess callers hit it directly, and a hardlink stays in sync with the venv on pip upgrades because both names point at the same inode. 2. studio/setup.ps1 Python user Scripts dir is added with default Append instead of -Position Prepend. That directory holds every pip-installed user console script (pip, pytest, huggingface-cli, and so on), not just unsloth, so reordering it silently changed resolution order for unrelated tools. The new install.ps1 shim at PATH position 0 already guarantees `unsloth` resolves to the freshly installed copy, so the Python user Scripts entry only needs to be present, not at the front. 3. The sysconfig lookup in studio/setup.ps1 no longer falls back to sysconfig.get_path('scripts') when the nt_user scheme dir does not exist. When setup.ps1 is invoked from an activated venv (a flow the linked issue actually hits) that fallback returns the venv's Scripts directory, which would then be added to the persisted User PATH and re-introduce the python / pip hijack the shim dir is meant to avoid. Stick strictly to the nt_user scheme; skip the block if it does not exist on disk. * Do not crash installer when unsloth.exe shim is locked The shim update sequence at install.ps1:1095 did a bare Remove-Item / New-Item HardLink / Copy-Item. Under the script's $ErrorActionPreference a locked target (most commonly 'unsloth studio' still running while the user re-invokes the installer) turns the Remove-Item failure into a terminating error that aborts the install with no actionable message. The existing shim is perfectly usable in that state, so there is no reason to abort. Wrap the whole remove/link/copy sequence in a try/catch that logs the probable cause (Studio still running), points at the fix (close Studio and re-run), and lets the installer finish with the old launcher still serving the command. Also only emit the "added unsloth launcher to PATH" step line when the launcher was actually (re)created AND the PATH entry was newly added -- previously the message fired even when the shim refresh silently failed, which was confusing. * Guard shim PATH entry on existence, use NullString for broadcast delete Two follow-ups surfaced by the latest review pass: 1. Do not add the shim directory to User PATH when the launcher was not actually created. Antivirus blocking unsloth.exe, a disk-full volume, or restrictive filesystem permissions can make both the hardlink and the copy fallback fail on a fresh install. In that case the existing sequence would report "added unsloth launcher to PATH" warnings but still prepend the empty $ShimDir to User PATH -- the user sees an install that claims success but then cannot resolve `unsloth` in a new shell. Gate Add-ToUserPath on Test-Path $ShimExe so the PATH entry is only persisted when the launcher is really there. 2. Pass [NullString]::Value instead of $null to the broadcast-delete call in Add-ToUserPath. On PowerShell 7.5 and later (running on .NET 9), a bare $null going into [Environment]::SetEnvironmentVariable can be coerced to an empty string rather than a true .NET null, which sets the dummy UnslothPathRefresh_XXXXXXXX variable to "" in HKCU\Environment instead of deleting it. The leaked variable is visible in System Properties and accumulates one entry per install run. [NullString]::Value is a PowerShell-specific sentinel that crosses the interop boundary as a real null and works on both PS 5.1 and PS 7.x. See PowerShell/PowerShell#24637 for the underlying issue. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>	2026-04-16 04:49:51 -07:00
Roland Tannous	13928b5f0e	Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var (#5024 ) * Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var When set, UNSLOTH_PYTORCH_MIRROR overrides the default https://download.pytorch.org/whl base URL in all four install scripts (install.sh, install.ps1, studio/setup.ps1, studio/install_python_stack.py). When unset or empty, the official URL is used. This lets users behind corporate proxies or in regions with poor connectivity to pytorch.org point at a local mirror without patching scripts. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add pytest for UNSLOTH_PYTORCH_MIRROR in install_python_stack.py Tests that _PYTORCH_WHL_BASE picks up the env var when set, falls back to the official URL when unset or empty, and preserves the value as-is (including trailing slashes). * Remove stale test assertions for missing install.sh messages * Fix GPU mocking in test_get_torch_index_url.sh Extract _has_usable_nvidia_gpu and _has_amd_rocm_gpu alongside get_torch_index_url so the GPU-presence checks work in tests. Add -L flag handling to mock nvidia-smi so it passes the GPU listing check. All 26 tests now pass on CPU-only machines. * Strip trailing slash from UNSLOTH_PYTORCH_MIRROR to avoid double-slash URLs --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 11:39:11 +04:00
Roland Tannous	f801e59c29	split venv_t5 into tiered 5.3.0/5.5.0 and fix trust_remote_code (#4878 ) * split venv_t5 into venv_t5_530 and venv_t5_550 for tiered transformers 5.x support * fix bfloat16 crash on T4 for FORCE_FLOAT32 models and disable trust_remote_code auto-enable for native t5 models * revert FORCE_FLOAT32 dtype change * restrict trust_remote_code auto-enable to Nemotron models only * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use config.json model_type for tier detection, add unsloth/nvidia namespace guard * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit `fb43d468e2`. * Revert "use config.json model_type for tier detection, add unsloth/nvidia namespace guard" This reverts commit `fc49ae2453`. * add unsloth/nvidia namespace guard to Nemotron trust_remote_code auto-enable * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * reorder tier checks: all substring matches before config.json fetches * extract shared activate_transformers_for_subprocess into transformers_version.py * narrow Nemotron trust_remote_code to nemotron_h/nemotron-3-nano, add to export worker * clean venv_t5 dirs before re-install in setup.sh, clarify version alias comment * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * run venv_t5 migration outside deps fast-path gate in both setup scripts --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-07 20:05:01 +04:00
DoubleMathew	ac562bac66	Fix/llama.cppbuilding (#4804 ) * Simplify llama.cpp install logic * print release tag * Retry failed json decode * don't pull all ggml releases * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove test file changes from main PR Test changes for test_pr4562_bugfixes.py will be submitted in a separate PR to keep this PR focused on the install path simplification. * Fix setup.sh executable bit and direct tag lookup for pinned releases - Restore setup.sh file mode to 100755 (was accidentally changed to 100644) - Add direct GitHub API tag lookup in iter_release_payloads_by_time for non-latest requested tags (e.g. b7879) instead of relying on paginated release scans that may miss older releases beyond the 5-page limit - Update stale DEFAULT_PUBLISHED_REPO comment to match new value * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix force-compile default ref and remove dead code in setup.ps1 - Change FORCE_COMPILE_DEFAULT_REF from "main" to "master" in all three files (install_llama_prebuilt.py, setup.sh, setup.ps1) since ggml-org/llama.cpp uses "master" as its default branch, not "main". Using "main" would cause git clone --branch to fail when UNSLOTH_LLAMA_FORCE_COMPILE=1 with UNSLOTH_LLAMA_TAG=latest. - Remove dead if ($SkipPrebuiltInstall) block inside the else branch of setup.ps1 that could never be reached (the outer elseif already handles $SkipPrebuiltInstall=true). - Maintain setup.sh executable bit (100755). * Improve iter_release_payloads_by_time error handling for direct tag lookup When a pinned release tag is not found (HTTP 404), fall through to the paginated release scan instead of silently returning empty results. Non-404 errors (network failures, rate limits) are propagated to the caller so users get actionable error messages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-03 00:34:20 -07:00
Daniel Han	934478ae31	fix(studio): revert llama.cpp default tag to latest (#4797 ) * fix(studio): revert llama.cpp default tag to latest The latest ggml-org/llama.cpp release (b8637) now includes Gemma 4 support. Revert the temporary "b8637" pin from #4796 to "latest" so the prebuilt resolver always picks the newest release automatically without needing manual tag bumps. * docs: add comment explaining latest vs master for llama.cpp tag Document in all three files why "latest" is preferred over "master" and when "master" should be used as a temporary override. --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-04-02 11:52:37 -07:00
Daniel Han	8d1712b4ea	fix(studio): pin llama.cpp to b8637 release (Gemma 4 support) (#4796 ) ggml-org/llama.cpp b8637 includes Gemma 4 support (ggml-org/llama.cpp#21309). Revert the temporary "master" default back to a pinned release tag. This eliminates the HTTP 422 errors from the prebuilt resolver (which could not find a release matching "master"), avoids unnecessary source builds, and restores prebuilt binary downloads on all platforms. Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-04-02 11:43:53 -07:00
DoubleMathew	7ae9b7f45f	fix windows llama.cpp compile from source issue (#4793 ) * fix windows llama.cpp compile from source issue * undo local repo usage * fix llama.cpp install * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix windows * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: route resolve-source-build call through Invoke-LlamaHelper The --resolve-source-build call at the source-build resolution path was still calling install_llama_prebuilt.py directly instead of going through Invoke-LlamaHelper. On PS7+ with ErrorActionPreference=Stop, stderr from the 422 response (when tag is "master") would trigger a terminating NativeCommandError and crash setup. * fix: suppress stderr error records from Invoke-LlamaHelper ErrorActionPreference=Continue prevents termination but PowerShell still displays stderr lines as visible ErrorRecord objects. Capture all output via 2>&1 and split stdout from stderr manually so that stderr lines never appear on the console. When StderrPath is given the stderr content is written to that file for diagnostics. * fix: always rebuild llama.cpp on Windows when tag is master When the requested llama.cpp tag is "master" (a moving target), skip the "already built" early exit so the build path runs and syncs to the latest commit. Without this, existing llama-server binaries from an older build (e.g. b8635 which lacks Gemma 4 support) are reused and model loading fails. Pinned tags (e.g. b8635) still skip the rebuild when the binary already exists, since the tag is immutable. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-04-02 11:43:46 -07:00
Daniel Han	1ce83c40aa	fix(studio): build llama.cpp from master instead of latest release tag (#4790 ) The latest ggml-org/llama.cpp release (b8635) does not include Gemma 4 support (ggml-org/llama.cpp#21309 merged after the release was cut). This causes `llama-server` to fail with "unknown model architecture: gemma4" when loading Gemma 4 GGUFs. Temporarily default _DEFAULT_LLAMA_TAG to "master" so all new installs build from the llama.cpp master branch which includes Gemma 4 support. Once a new upstream release is cut with Gemma 4, this can be reverted back to "latest". Changes: - setup.sh: add _DEFAULT_LLAMA_TAG="master" maintainer default - setup.ps1: add $DefaultLlamaTag="master" maintainer default - install_llama_prebuilt.py: change DEFAULT_LLAMA_TAG fallback to "master" Users can still override via UNSLOTH_LLAMA_TAG env var.	2026-04-02 09:45:56 -07:00
Daniel Han	a241c58d84	Use transformers v5.5-release branch and pin to 5.5.0 (#4786 ) The v5.5-release branch now exists on huggingface/transformers. Use transformers==5.5.0 for all install paths and git+transformers.git@v5.5-release for the MLX installer. Also bumps huggingface_hub from 1.7.1 to 1.8.0 in setup.sh and setup.ps1 to stay consistent.	2026-04-02 09:10:02 -07:00
Daniel Han	a353557249	Force llama.cpp to always use mainline ggml-org (#4785 ) Hardcode the release repo to ggml-org/llama.cpp and remove the UNSLOTH_LLAMA_RELEASE_REPO and UNSLOTH_LLAMA_SOURCE env var overrides so that all users always build/download from mainline llama.cpp.	2026-04-02 09:03:00 -07:00
DoubleMathew	1ce8a8e7cd	Feat/custom llama prebuilt (#4771 ) * update logic to incorporate custom prebuilt installs * bug fixes * update for review comments * fix tags * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate test changes from main PR Move test file changes out of this PR to keep the diff focused on the install_llama_prebuilt.py and setup script changes. Test updates will be submitted in a follow-up PR. * Fix branch ref normalization and harden JSON parsing - Add checkout_friendly_ref() to strip refs/heads/ prefix from branch refs before emitting them in SourceBuildPlan. git clone --branch does not accept fully qualified refs like refs/heads/main. - Apply normalization in source_build_plan_for_release() and the direct-ref fallback in resolve_source_build_plan(). - Allow validated_checksums_for_bundle() to accept releases that carry only an exact-commit source archive without the legacy upstream-tag source tarball. - Add 2>/dev/null \|\| true guards to all inline python -c JSON parsing in setup.sh so a malformed payload does not abort the script under set -e. * Fix Windows CUDA asset ordering and tag ref normalization - Reorder windows_cuda_upstream_asset_names to prefer the main binary archive (llama-{tag}-bin-win-cuda-) over the cudart sidecar archive (cudart-llama-bin-win-cuda-). The cudart ZIP only contains CUDA runtime DLLs, not llama-server or llama-quantize binaries. - Extend checkout_friendly_ref to also strip refs/tags/ prefix for tag refs, matching the refs/heads/ handling for branch refs. * Simplify JSON parsing consistency in setup.sh Use json.load(sys.stdin) consistently for all inline JSON parsing in setup.sh, instead of the more complex json.loads(raw) pattern on the install-tag resolution path. The 2>/dev/null \|\| true guard already handles empty/malformed input gracefully. * Fix source build plan fallback for commit ref kind in PR #4771 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <daniel@unsloth.ai> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-02 04:52:26 -07:00
DoubleMathew	428efc7d95	Resolve latest usable published llama.cpp release instead of fixed pinned tag (#4741 ) Replaces the fixed prebuilt llama.cpp tag with dynamic published-release resolution, adds bounded fallback across older published releases, and introduces maintainer-editable defaults for PR/source overrides. Changes: - Resolve latest from the latest usable published release in unslothai/llama.cpp - Use the selected release upstream_tag as the authoritative llama.cpp version - Prefer Unsloth-published platform assets when available - Fall back to same-tag upstream ggml-org/llama.cpp assets where allowed - Keep Linux CUDA anchored to Unsloth-published CUDA bundles only - Add bounded fallback across older Unsloth published releases - Add separate busy/in-use install handling (exit code 3) - Skip reinstall when the installed bundle already matches the selected candidate - Add maintainer-editable _DEFAULT_LLAMA_PR_FORCE and _DEFAULT_LLAMA_SOURCE - Harden env parsing so malformed installer env vars do not crash import-time fallback logic - Honor UNSLOTH_LLAMA_RELEASE_TAG in all resolve steps - Always sync git remote URL in existing-checkout path	2026-04-01 06:06:17 -07:00
Lee Jackson	2cac3e8e4d	studio: Polish Windows installer/setup logs (#4736 ) * style(windows): clean installer/setup log output and remove seeded credential banner * Keep startup credential hint without exposing plaintext password Print the username and .bootstrap_password file path on first-run admin creation instead of the raw password. Headless / Docker / SSH operators still get a startup-time hint for initial sign-in, and the plaintext credential no longer appears in terminal output or logs. --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-03-31 23:12:42 -07:00
Etherll	34272a796f	Fix/bun windows bin detection (#4703 ) * fix(studio): detect bun .exe shims in Windows binary check * Update setup.sh * add .bunx checking	2026-03-30 21:58:33 +04:00
Daniel Han	6d83ad9a28	fix(studio): avoid UnicodeEncodeError on Windows cp1252 consoles (#4699 ) * fix(studio): replace unicode emoji in print() to avoid cp1252 crash on Windows On Windows the default console encoding is cp1252 which cannot encode unicode emoji like U+2705 or U+26A0. bare print() calls with these characters cause a UnicodeEncodeError at runtime. - run.py: replace emoji with ASCII status prefixes [OK] and [WARNING] - format_conversion.py: remove duplicate print() that mirrors the logger.info() call on the next line, and drop the emoji from the log message since loggers handle encoding separately * fix(studio): apply same emoji/print cleanup to parallel VLM conversion path The parallel URL-based conversion logic has the same duplicate print() with emoji that was fixed in the sequential path. Remove the bare print() and drop the emoji from the logger.info() call. * Treat install_python_stack.py failure as fatal in setup.ps1 On Linux/Mac, setup.sh runs under set -euo pipefail so a non-zero exit from install_python_stack.py aborts the installer. On Windows, setup.ps1 had no exit code check -- if the Python script crashed (eg from the cp1252 UnicodeEncodeError), the installer silently continued past the dependency loop and reported success. Studio would then fail at launch with ModuleNotFoundError for structlog, fastapi, and other deps that were never installed. Capture $LASTEXITCODE and exit 1 if the dependency installer fails, matching the error handling pattern already used for PyTorch install. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 06:40:47 -07:00
Lee Jackson	5557e1fd27	studio: unify Windows installer/setup logging style, verbosity controls, and startup messaging (#4651 ) * refactor(studio): unify setup terminal output style and add verbose setup mode * studio(windows): align setup.ps1 banner/steps with setup.sh (ANSI, verbose) * studio(setup): revert nvcc path reordering to match main * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio(setup): restore fail-fast llama.cpp setup flow * studio(banner): use IPv6 loopback URL when binding :: or ::1 * Fix IPv6 URL bracketing, try_quiet stderr, _step label clamp - Bracket IPv6 display_host in external_url to produce clickable URLs - Redirect try_quiet failure log to stderr instead of stdout - Clamp _step label to column width to prevent negative padding * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add sandbox integration tests for PR #4494 UX fixes Simulation harness (tests/simulate_pr4494.py) creates an isolated uv venv, copies the real source files into it, and runs subprocess tests for all three fixes with visual before/after demos and edge cases. Standalone bash test (tests/test_try_quiet.sh) validates try_quiet stderr redirect across 8 scenarios including broken-version contrast. 39 integration tests total (14 IPv6 + 15 try_quiet + 10 _step), all existing 75 unit tests still pass. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Truncate step() labels in setup.sh to match PS1 and Python The %-15s printf format pads short labels but does not truncate long ones. Change to %-15.15s so labels wider than 15 chars are clipped, matching the PowerShell .Substring(0,15) and Python label[:15] logic. * Remove sandbox integration tests from PR These test files are not part of the styling fix and should not ship with this PR. * Show error output on failure instead of suppressing it - install_python_stack.py: restore _red for patch_package_file warnings (was downgraded to _dim) - setup.ps1: capture winget output and show on failure for CUDA, Node, Python, and OpenSSL installs (was piped to Out-Null) - setup.ps1: always show git pull failure warning, not just in verbose mode * Show winget error output for Git and CMake installs on failure Same capture-and-print-on-failure pattern already used for Node, Python, CUDA, and OpenSSL winget installs. * fix: preserve stderr for _run_quiet error messages in setup.sh The step() helper writes to stdout, but _run_quiet's error header was originally sent to stderr (>&2). Without the redirect, callers that separate stdout/stderr would miss the failure headline while still seeing the log body on stderr. Add >&2 to both step calls inside _run_quiet to match main's behavior. * feat: add --verbose flag to setup and update commands Wire UNSLOTH_VERBOSE=1 through _run_setup_script() so that 'unsloth studio update --verbose' (and the deprecated 'setup') passes the flag to setup.sh / setup.ps1 / install_python_stack.py. * fix(studio): honor verbose logging and keep llama.cpp failures non-blocking * fix(studio): switch installer to 'studio update' and normalize Windows setup logs * chore(studio): refine localhost tip and remove skip-base setup nois * fix(studio): align Windows setup logs with Linux style and improve startup tips * fix(studio): align Windows setup logs with Linux style * refactor(windows-installer): align install/setup logs with Linux style and silence auto-launch output * refactor(windows): align installer/setup output with Linux style and reduce default verbosity * refactor(windows): match install.ps1 output style/colors to setup and quiet default logs * fix(studio-banner): update personal-computer localhost tip * fix(setup.sh): restore verbose llama.cpp build output while keeping default quiet mode * fix(install.sh): align installer logging with setup style and restore POSIX-safe color output * fix(install.sh): preserve installer reliability and launch visibility Export verbose mode for child setup processes, harden install command handling under set -e, and keep first-run studio launch non-silent so users can always see URL and port fallback output. * fix(windows installer): keep exit semantics and degrade status accurate Use quiet command redirection that preserves native exit codes, keep startup output visible on first launch, and report limited install status when llama.cpp is unavailable. * fix(setup.sh): improve log clarity and enforce GGUF degraded signaling Restore clean default setup output, add verbose-only diagnostics, fail fast on Colab dependency install errors, and return non-zero when GGUF prerequisites or llama.cpp artifacts are unavailable. * fix(installer): harden bash preflight and PowerShell GPU checks Fail fast when bash is unavailable before invoking setup.sh, and replace remaining nvidia-smi pipeline checks with stream redirection patterns that preserve reliable native exit-code handling. * fix(windows): keep verbose output visible while preserving exit codes Ensure PowerShell wrapper helpers in install/update stream native command output to host without returning it as function output, so npm logs no longer corrupt exit-code checks in verbose mode. * fix(windows): avoid sticky UNSLOTH_VERBOSE and gate studio update verbosity * Fix degraded llama.cpp exit code, PS verbose stderr, banner URLs, npm verbose - setup.sh: Do not exit non-zero when llama.cpp is unavailable; the footer already reports the limitation, and install.sh runs under set -e so a non-zero exit aborts the entire install including PATH/shortcuts/launch. - setup.ps1: Remove $? check in Invoke-SetupCommand verbose path; PS 5.1 sets $? = $false when native commands write to stderr even with exit 0. Merge stderr into stdout with 2>&1 and rely solely on $LASTEXITCODE. - startup_banner.py: Show the actual bound address when Studio is bound to a non-loopback interface instead of always showing 127.0.0.1/localhost. - setup.sh: Use run_quiet_no_exit instead of run_quiet_no_exit_always for npm install steps so --verbose correctly surfaces npm output. * Fix install.ps1 verbose stderr, propagate UNSLOTH_VERBOSE, fix git clone verbose - install.ps1: Apply same Invoke-InstallCommand fix as setup.ps1 -- merge stderr into stdout with 2>&1 and drop the $? check that misclassifies successful native commands on PS 5.1. - install.ps1 + setup.ps1: Export UNSLOTH_VERBOSE=1 to the process env when --verbose is passed so child processes like install_python_stack.py also run in verbose mode. - setup.sh: Use run_quiet_no_exit for git clone llama.cpp so --verbose correctly surfaces clone diagnostics during source-build fallback. * Surface prebuilt llama.cpp output in verbose mode, remove dead code, fix banner - setup.sh: Use tee in verbose mode for prebuilt llama.cpp installer so users can see download/validation progress while still capturing the log for structured error reporting on failure. - setup.ps1: Same fix for Windows -- use Tee-Object in verbose mode. - setup.sh: Remove run_quiet_no_exit_always() which has no remaining callers. - startup_banner.py: Avoid printing the same URL twice when Studio is bound to a specific non-loopback address that matches the display host. * Fix run_install_cmd exit code after failed if-statement The previous pattern 'if "$@"; then return 0; fi; _rc=$?' always captured $? = 0 because $? reflects the if-statement result, not the command's exit code. Switch to '"$@" && return 0; _rc=$?' which preserves the actual command exit code on failure. Applies to both verbose and quiet branches. * Fix _run_quiet exit code, double uv install, missing --local flag - setup.sh: Fix _run_quiet verbose path that always captured exit code 0 due to $? resetting after if-then-fi with no else. Switch to the same '"$@" && return 0; exit_code=$?' pattern used in install.sh. - setup.sh: Consolidate the two uv install branches (verbose + quiet) into a single attempt with conditional output. Previously, when verbose mode was on and the install failed, a second silent attempt was made. - install.ps1: Pass --local flag to 'unsloth studio update' when $StudioLocalInstall is true. Without this, studio.py's update() command overwrites STUDIO_LOCAL_INSTALL to "0", which could cause issues if setup.ps1 or install_python_stack.py later checks that variable. * Revert SKIP_STUDIO_BASE change for --no-torch, restore install banners - Revert SKIP_STUDIO_BASE from 0 to 1 for --no-torch. install.sh already installs unsloth+unsloth-zoo and no-torch-runtime.txt before calling setup.sh, so letting install_python_stack.py redo it was redundant and slowed down --no-torch installs for no benefit. - Restore the "Unsloth Studio installed!" success banner and "starting Unsloth Studio..." launch message so users get clear install completion feedback before the server starts. * Make llama.cpp build failure a hard error with proper cleanup - setup.sh: Restore exit 1 when _LLAMA_CPP_DEGRADED is true. GGUF inference requires a working llama.cpp build, so this should be a hard failure, not a silent degradation. - install.sh: Catch setup.sh's non-zero exit with '\|\| _SETUP_EXIT=$?' instead of letting set -e abort immediately. This ensures PATH setup, symlinks, and shortcuts still get created so the user can fix the build deps and retry with 'unsloth studio update'. After post-install steps, propagate the failure with a clear error message. * Revert install.ps1 to 'studio setup' to preserve SKIP_STUDIO_BASE 'studio update' pops SKIP_STUDIO_BASE from the environment, which defeats the fast-path version check added in PR #4667. When called from install.ps1 (which already installed packages), SKIP_STUDIO_BASE=1 must survive into setup.ps1 so it skips the redundant PyPI check and package reinstallation. 'studio setup' does not modify env vars. * Remove deprecation message from 'studio setup' command install.ps1 uses 'studio setup' (not 'studio update') to preserve SKIP_STUDIO_BASE. The deprecation message was confusing during first install since the user never typed the command. * Fix stale env vars, scope degraded exit, generic error message for PR #4651 - install.ps1: Always set STUDIO_LOCAL_INSTALL and clear STUDIO_LOCAL_REPO when not using --local, to prevent stale values from a previous --local run in the same PowerShell session. Fix log messages to say 'setup' not 'update' since we call 'studio setup'. - setup.sh: Only exit non-zero for degraded llama.cpp when called from the installer (SKIP_STUDIO_BASE=1). Direct 'unsloth studio update' keeps degraded installs successful since Studio is still usable for non-GGUF workflows and the footer already reports the limitation. - install.sh: Make the setup failure error message generic instead of GGUF-specific, so unrelated failures (npm, Python deps) do not show misleading cmake/git recovery advice. * Show captured output on failure in quiet mode for PR #4651 Both Invoke-InstallCommand (install.ps1) and Invoke-SetupCommand (setup.ps1) now capture command output in quiet mode and display it in red when the command fails. This matches the behavior of run_install_cmd in install.sh where failure output is surfaced even in quiet mode, making cross-platform error debugging consistent. * Match degraded llama.cpp exit on Windows, fix --local recovery hint for PR #4651 - setup.ps1: Exit non-zero for degraded llama.cpp when called from install.ps1 (SKIP_STUDIO_BASE=1), matching setup.sh behavior. Direct 'unsloth studio update' keeps degraded installs successful. - install.sh: Show 'unsloth studio update --local' in the recovery message when the install was run with --local, so users retry with the correct flag instead of losing local checkout context. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-30 00:53:23 -07:00
Roland Tannous	5bbfabb151	fix: [Studio] setup.ps1 update-flow for windows (#4667 ) * fix: add PyPI version check to setup.ps1 for fast update path Port the update-flow logic from setup.sh to setup.ps1 so that `unsloth studio update` on Windows skips Python dependency reinstall when the installed version already matches PyPI latest. * fix: clear SKIP_STUDIO_BASE in update command install.ps1 sets SKIP_STUDIO_BASE=1 which persists in the PowerShell session. If the user runs `unsloth studio update` in the same terminal, the env var causes the version check to be skipped. Clear it explicitly in the update command. * fix: harden version check and clear stale env vars in update flow - Normalize $InstalledVer with Out-String + Trim() to avoid array/whitespace comparison issues in PowerShell 5.1 (python output can be captured as string[] instead of scalar string) - Move Fast-Install --upgrade pip inside if (-not $SkipPythonDeps) so the fast path avoids unnecessary network round-trips - Clear STUDIO_LOCAL_REPO when --local is not passed to prevent a previous --local session from leaking into a plain update --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-29 21:14:36 -07:00
Lee Jackson	0233fe7f9c	studio: setup log styling (#4494 ) * refactor(studio): unify setup terminal output style and add verbose setup mode * studio(windows): align setup.ps1 banner/steps with setup.sh (ANSI, verbose) * studio(setup): revert nvcc path reordering to match main * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio(setup): restore fail-fast llama.cpp setup flow * studio(banner): use IPv6 loopback URL when binding :: or ::1 * Fix IPv6 URL bracketing, try_quiet stderr, _step label clamp - Bracket IPv6 display_host in external_url to produce clickable URLs - Redirect try_quiet failure log to stderr instead of stdout - Clamp _step label to column width to prevent negative padding * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add sandbox integration tests for PR #4494 UX fixes Simulation harness (tests/simulate_pr4494.py) creates an isolated uv venv, copies the real source files into it, and runs subprocess tests for all three fixes with visual before/after demos and edge cases. Standalone bash test (tests/test_try_quiet.sh) validates try_quiet stderr redirect across 8 scenarios including broken-version contrast. 39 integration tests total (14 IPv6 + 15 try_quiet + 10 _step), all existing 75 unit tests still pass. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Truncate step() labels in setup.sh to match PS1 and Python The %-15s printf format pads short labels but does not truncate long ones. Change to %-15.15s so labels wider than 15 chars are clipped, matching the PowerShell .Substring(0,15) and Python label[:15] logic. * Remove sandbox integration tests from PR These test files are not part of the styling fix and should not ship with this PR. * Show error output on failure instead of suppressing it - install_python_stack.py: restore _red for patch_package_file warnings (was downgraded to _dim) - setup.ps1: capture winget output and show on failure for CUDA, Node, Python, and OpenSSL installs (was piped to Out-Null) - setup.ps1: always show git pull failure warning, not just in verbose mode * Show winget error output for Git and CMake installs on failure Same capture-and-print-on-failure pattern already used for Node, Python, CUDA, and OpenSSL winget installs. * fix: preserve stderr for _run_quiet error messages in setup.sh The step() helper writes to stdout, but _run_quiet's error header was originally sent to stderr (>&2). Without the redirect, callers that separate stdout/stderr would miss the failure headline while still seeing the log body on stderr. Add >&2 to both step calls inside _run_quiet to match main's behavior. * feat: add --verbose flag to setup and update commands Wire UNSLOTH_VERBOSE=1 through _run_setup_script() so that 'unsloth studio update --verbose' (and the deprecated 'setup') passes the flag to setup.sh / setup.ps1 / install_python_stack.py. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-27 03:12:48 -07:00
Daniel Han	23eb7fc0a7	Fix Colab Studio launch and setup.ps1 box alignment (#4601 ) * Fix Colab Studio launch and setup.ps1 box alignment - colab.py: when the Studio venv is missing on Colab, pip-install backend dependencies (structlog, fastapi, etc.) from studio.txt into the current Python instead of failing with ModuleNotFoundError - setup.sh: on Colab without a venv, install backend deps into system Python and skip venv-dependent sections (Python stack update, llama.cpp build) that would otherwise fail - setup.ps1: use PadRight(47) for the done-line so "Setup Complete!" and "Update Complete!" both align with the box border * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-25 09:00:08 -07:00
Daniel Han	366fb048d4	fix(studio): add bun cache validation to Windows setup.ps1 (#4596 ) Port the bun cache corruption fix from setup.sh to setup.ps1. bun's package cache can become corrupt, storing only package metadata without actual content. This causes bun install to exit 0 but leave binaries like tsc missing from node_modules/.bin/. Changes: - After bun install, verify tsc and vite exist in node_modules\.bin\ - Check for both bare names and .cmd wrappers (Windows creates both) - If missing, clear the bun cache and retry once - Only fall back to npm if the retry also fails	2026-03-25 07:27:08 -07:00
Daniel Han	3efea63e2f	fix(studio): source-build fallback prefers Unsloth's tested tag over upstream latest (#4593 ) * fix(studio): source-build fallback prefers Unsloth's tested tag over upstream latest When the prebuilt install fails and falls back to source build, --resolve-llama-tag now queries the Unsloth release repo (unslothai/llama.cpp) first to get the latest tested/approved tag (e.g. b8508), instead of going straight to ggml-org/llama.cpp which may return a newer untested tag (e.g. b8514). This ensures the source-build fallback compiles the same version that the prebuilt path would have installed, rather than a potentially incompatible bleeding-edge release. Resolution order for "latest": 1. Unsloth release repo (tested/approved) 2. ggml-org upstream (bleeding-edge) 3. Raw requested tag string (last resort) Changes: - resolve_requested_llama_tag() accepts optional published_repo param with docstring explaining the resolution order - CLI --resolve-llama-tag passes --published-repo through - setup.sh and setup.ps1 pass --published-repo to --resolve-llama-tag with inline comments explaining the preference * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-25 07:25:47 -07:00
DoubleMathew	f4d8a246bf	Use prebuilt llama.cpp for unsloth studio setup (#4562 ) * Use prebuilt llama.cpp for unsloth studio setup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix 3 issues that cause unnecessary fallback to source build 1. Make filelock import optional -- environments without filelock (e.g. minimal installs) crashed at import time instead of gracefully skipping the lock. 2. Use already-verified converter script from the hydrated source tree instead of re-downloading from raw.githubusercontent.com with no checksum. Adds symlink with copy fallback for the legacy filename. 3. Initialize $SkipPrebuiltInstall in setup.ps1 before first use to prevent potential uninitialized variable errors. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Keep network fallback in ensure_converter_scripts Prefer the local verified copy from the hydrated source tree, but retain the original network download as a fallback if the file is missing. Create the legacy hyphenated filename as a symlink with a copy fallback instead of writing a second full copy. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix 4 bugs in source-build fallback and binary_env paths - setup.ps1: Replace git pull + checkout FETCH_HEAD with fetch + checkout -B to avoid detached HEAD state that breaks re-runs. Use pinned tag in both fetch and clone paths. - setup.sh: Move rm -rf after cmake/git prerequisite checks so a missing tool no longer deletes the existing install. Add --branch tag to clone. - install_llama_prebuilt.py: Add binary_path.parent to Linux LD_LIBRARY_PATH in binary_env() so bundled .so files in build/bin are found even without RPATH, matching the existing Windows PATH logic. - Add test for binary_env LD_LIBRARY_PATH on Linux. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Handle unresolved "latest" tag in source-build fallback clone When tag resolution fails and the requested tag is "latest", both setup scripts now omit --branch from git clone so the default branch is cloned instead of failing on a nonexistent "latest" branch/tag. Similarly, the PS1 fetch path fetches the default ref when the tag is "latest". * Resolve actual latest ggml-org tag instead of using literal "latest" When both Python tag resolution attempts fail and the requested tag is "latest", query the GitHub API for the actual latest release tag from ggml-org/llama.cpp (e.g. b8508) instead of passing the literal string "latest" to git clone --branch, which would fail since no such branch/tag exists. setup.sh uses curl + python json parsing; setup.ps1 uses Invoke-RestMethod. Both fall back to the raw requested tag if the API call also fails. * Try Unsloth release repo before ggml-org when resolving latest tag When falling back to the GitHub API to resolve "latest", query the Unsloth release repo (unslothai/llama.cpp) first since it has the prebuilt binaries pinned to tested tags. Only fall back to ggml-org/llama.cpp if the Unsloth repo query fails. * Add comprehensive sandbox tests for PR #4562 bug fixes 35 tests covering all fixes across platforms: - binary_env cross-platform (Linux LD_LIBRARY_PATH, Windows PATH, macOS DYLD_LIBRARY_PATH) with edge cases (dedup, ordering, existing paths) - resolve_requested_llama_tag (concrete, latest, None, empty) - setup.sh logic via subprocess: prereq check ordering (cmake/git missing preserves install), pinned tag in clone, fetch+checkout -B pattern, fetch failure warns instead of aborting - "latest" tag resolution fallback chain (Unsloth API -> ggml-org -> raw) with mock curl: success, failure, malformed JSON, empty body, empty tag_name, env overrides - Source code pattern verification for both .sh and .ps1 files All 138 tests pass in isolated uv venv. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add binary_path.parent to macOS DYLD_LIBRARY_PATH in binary_env macOS prebuilt .dylib files are overlaid into build/bin (same as Linux), but binary_env only added install_dir to DYLD_LIBRARY_PATH. Add binary_path.parent so the loader can find sibling dylibs even without embedded loader paths. Mirrors the existing fix for Linux LD_LIBRARY_PATH and the Windows PATH pattern. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Guard --branch when resolved tag is "latest"; fix broken test assertion When all API fallbacks fail and the tag stays as literal "latest", omit --branch from git clone (clones default branch instead of failing). Both setup.sh and setup.ps1 now check for "latest" before passing --branch to git clone/fetch. Also fix test_setup_ps1_clone_uses_branch_tag which used Python tuple syntax (assert "x", "y" in z) that always passes. Changed to assert "x" in z and "y" in z. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix macOS DYLD trailing colon, install_lock no-op, and debug log - binary_env macOS: use dedupe_existing_dirs instead of raw string concatenation. Eliminates trailing colon in DYLD_LIBRARY_PATH (which causes dyld to search CWD for libraries) and deduplicates when binary_path.parent == install_dir. Now consistent with the Linux and Windows branches. - install_lock: when filelock is not installed, use os.O_CREAT\|O_EXCL as a fallback exclusive file lock with timeout, instead of yielding with no locking. Prevents concurrent installs from corrupting each other's staging directories. - setup.ps1: remove [DEBUG] log line that printed to every user on every Windows setup run. * Add stale-lock detection and atomic clone-then-swap install_lock fallback (no filelock): write PID to lock file and check if the holder process is still alive on contention. Dead PIDs (ProcessLookupError) and unreadable lock files trigger immediate cleanup. Live processes owned by other users (PermissionError) are correctly recognized as alive -- the lock is not removed. setup.sh/setup.ps1 source-build: clone into a temporary directory first, then swap into place only on success. If git clone fails, the existing install is preserved instead of being deleted by the premature rm -rf. * Remove redundant upstream_tag != release_tag check load_approved_release_checksums compared checksums.upstream_tag against the Unsloth release_tag, which are different namespaces (upstream ggml-org tag vs Unsloth published tag). This only worked because both happened to be "b8508" by convention. Would break if Unsloth ever uses a different release naming scheme. The existing check at parse_approved_release_checksums (line 950) already validates the release_tag field correctly. * Fix lock TOCTOU race and build-in-temp-dir swap install_lock fallback: add os.fsync(fd) after writing PID to ensure the PID is visible to racing processes before they check. Treat empty lock files (PID not yet written) as "wait and retry" instead of stale, closing the window where two processes could both see an empty file, both unlink it, and both acquire the lock. setup.sh/setup.ps1 source-build: clone AND build in a temp directory (LLAMA_CPP_DIR.build.$$). Only swap into the final LLAMA_CPP_DIR after the build succeeds. If clone or cmake or build fails, the temp dir is cleaned up and the existing working install is preserved. Previously, rm -rf ran after clone but before build, destroying the existing install even if the build later failed. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-25 05:42:43 -07:00
Roland Tannous	19e9c60a8e	Consolidate dual venvs and separate install from update (#4530 ) * refactor: consolidate dual venvs into single ~/.unsloth/studio/unsloth_studio * refactor: separate install.sh (first-time) from setup.sh (smart update with PyPI version check) * fix: install.sh calls setup.sh directly, keep both setup and update CLI commands * fix: use importlib.resources.files() directly without _path attribute * fix: bootstrap uv before pip upgrade to handle uv venvs without pip * fix: frontend 404 when launched via CLI, add global symlink to ~/.local/bin * feat: add --local flag to install.sh and unsloth studio update for branch testing * fix: resolve repo root from script location for --local installs * feat: add --package flag to install.sh for testing with custom package names * feat: add --package flag to unsloth studio update * fix: always nuke venv in install.sh for clean installs * revert: remove Windows changes, will handle in separate PR * fix: error when --package is passed without an argument * revert: restore Windows scripts to current main * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: always explicitly set STUDIO_LOCAL_INSTALL and STUDIO_PACKAGE_NAME env vars * fix: pass explicit STUDIO_LOCAL_REPO env var for --local installs * fix: align banner box for Setup vs Update labels * deprecate: hide 'unsloth studio setup' command, point users to update/install.sh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: check stdout not stdin for auto-launch detection (curl pipe fix) * fix: update install URL to unsloth.ai/install.sh * fix: update install.sh usage comments to unsloth.ai/install.sh * fix: use --upgrade-package for base deps to preserve existing torch/CUDA installs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: --local install now also installs unsloth-zoo via base.txt before editable overlay * fix: don't skip base packages for --local installs (editable needs unsloth-zoo) * refactor: move --local full dep install to install.sh, keep SKIP_STUDIO_BASE for all paths * feat: add migration support for old .venv and CWD-based installs in setup.sh * Revert "feat: add migration support for old .venv and CWD-based installs in setup.sh" This reverts commit `301291d002`. * feat: migrate old .venv layout in install.sh instead of always nuking * feat: validate old .venv with torch CUDA test before migration, recovery message on launch failure * fix: try CUDA then fall back to CPU for migration validation * fix: upgrade unsloth/unsloth-zoo with --reinstall-package on migration to preserve torch * remove: delete unused unsloth ui command (use unsloth studio instead) * Fix Windows venv path mismatch between install.ps1, setup.ps1, and studio.py install.ps1 was creating the venv CWD-relative ($VenvName = "unsloth_studio"), setup.ps1 was using an absolute path to ".unsloth\studio\.venv", and studio.py looks for ".unsloth\studio\unsloth_studio". All three paths were different, so the Windows installer would never produce a working Studio setup. install.ps1: - Use absolute $StudioHome + $VenvDir matching the Linux install.sh layout - Add 3-way migration: old .venv at STUDIO_HOME, CWD-relative ~/unsloth_studio from the previous install.ps1, or fresh creation with torch validation - For migrated envs, upgrade unsloth while preserving existing torch/CUDA wheels - Set SKIP_STUDIO_BASE=1 before calling setup.ps1 (matches install.sh behavior) - Fix launch instructions to use the absolute venv path setup.ps1: - Change $VenvDir from ".unsloth\studio\.venv" to ".unsloth\studio\unsloth_studio" - Add SKIP_STUDIO_BASE guard: error out if venv is missing when called from install.ps1 (which should have already created it) - Differentiate "Setup" vs "Update" in banners based on SKIP_STUDIO_BASE * setup.ps1: unconditionally error if venv missing, matching setup.sh setup.sh always errors out if the venv does not exist (line 224-228), telling the user to run install.sh first. setup.ps1 was conditionally creating a bare venv with python -m venv when SKIP_STUDIO_BASE was not set, which would produce an empty venv with no torch or unsloth. Now setup.ps1 matches setup.sh: always error, always point to install.ps1. * Fix --torch-backend=auto CPU solver dead-end on Linux, macOS, and Windows On CPU-only machines, `uv pip install unsloth --torch-backend=auto` falls back to unsloth==2024.8 because the CPU solver cannot satisfy newer unsloth's dependencies. install.ps1 already solved this with a two-step approach; this applies the same fix to install.sh and install_python_stack.py. install.sh: add get_torch_index_url() that detects GPU via nvidia-smi and maps CUDA versions to PyTorch index URLs (matching install.ps1's Get-TorchIndexUrl). Fresh installs now install torch first via explicit --index-url, then install unsloth with --upgrade-package to preserve the pre-installed torch. All 5 --torch-backend=auto removed from primary paths. install.ps1: add fallback else-branch when TorchIndexUrl is empty, using --torch-backend=auto as last resort (matching install.sh). install_python_stack.py: remove unconditional --torch-backend=auto from _build_uv_cmd. Torch is pre-installed by install.sh/setup.ps1 by the time this runs. Callers that need it can set UV_TORCH_BACKEND. Both install.sh and install.ps1 now share the same three-branch logic: migrated env (upgrade-package only), normal (torch-first + index-url), and fallback (--torch-backend=auto if URL detection fails). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use --reinstall-package for migrated envs on both Linux and Windows For migrated environments (moved from legacy venv location), --reinstall-package is better than --upgrade-package because it forces a clean reinstall even if the same version is already installed. This ensures proper .dist-info and .pyc state in the new venv location. --upgrade-package remains correct for the fresh install path where torch is already installed and we just want to add unsloth without re-resolving torch. * Address review findings: portability, parity, and stale comments - Replace grep -oP (GNU Perl regex) with POSIX sed in get_torch_index_url() so the script works on BSD grep (macOS is already guarded by the Darwin early-return, but Alpine/BusyBox would silently get the wrong CUDA tag) - Add LC_ALL=C before nvidia-smi invocation to prevent locale-dependent output parsing issues - Add warning on stderr when nvidia-smi output is unparseable, matching install.ps1's [WARN] message - Add explicit unsloth-zoo positional arg to install.ps1 migrated path, matching install.sh (--reinstall-package alone won't install it if it was never present in the migrated env) - Fix stale comment in install_python_stack.py line 392 that still claimed --torch-backend=auto is added by _build_uv_cmd - Add sed to test tools directory (function now uses sed instead of grep) * Add --index-url to migrated env path to prevent CPU torch resolution The migrated path runs uv pip install with --reinstall-package for unsloth/unsloth-zoo. While uv should keep existing torch as satisfied, the resolver could still re-resolve torch as a transitive dependency. Without --index-url pointing at the correct CUDA wheel index, the resolver would fall back to plain PyPI and potentially pull CPU-only torch. Adding --index-url $TORCH_INDEX_URL ensures CUDA wheels are available if the resolver needs them. Applied to both install.sh and install.ps1. * Revert --index-url on migrated env path The original install.ps1 on main already handles the migrated path without --index-url and it works correctly. --reinstall-package only forces reinstall of the named packages while uv keeps existing torch as satisfied. No need for the extra flag. * Fix unsloth studio update --local not installing local checkout studio.py sets STUDIO_LOCAL_REPO when --local is passed, but install_python_stack.py never read it. The update path always installed from PyPI regardless of the --local flag. Add a local_repo branch that first updates deps from base.txt (with --upgrade-package to preserve torch), then overlays the local checkout as an editable install with --no-deps. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-25 05:24:21 -07:00
Etherll	d69d60ff19	perf(studio): upgrade to Vite 8 + auto-install bun for faster frontend builds (#4522 ) * perf(studio): upgrade to Vite 8 + auto-install bun for 3x faster frontend builds * fix(studio): make bun-to-npm fallback actually reachable setup.sh used run_quiet() for the bun install attempt, but run_quiet calls exit on failure. This killed the script before the npm fallback could run, making the "falling back to npm" branch dead code. Replace the run_quiet call with a direct bun invocation that captures output to a temp file (same pattern, but returns instead of exiting). Also clean up partial node_modules left by a failed bun install before falling back to npm, in both setup.sh and build.sh. Without this, npm inherits a corrupted node_modules tree from the failed bun run. * fix(studio): restore commonjsOptions for dagre CJS interop The previous commit removed build.commonjsOptions, assuming Vite 8's Rolldown handles CJS natively. While optimizeDeps.include covers the dev server (pre-bundling), it does NOT apply to production builds. The resolve.alias still points @dagrejs/dagre to its .cjs.js entry, so without commonjsOptions the production bundle fails to resolve the CJS default export. This causes "TypeError: e is not a function" on /chat after build (while dev mode works fine). Restore the original commonjsOptions block to fix production builds. * fix(studio): use motion/react instead of legacy framer-motion import * fix(studio): address PR review findings for Vite 8 + bun upgrade Fixes: - Remove bun.lock from repo and add to .gitignore (npm is source of truth) - Use & bun install > $null pattern in setup.ps1 for reliable $LASTEXITCODE - Add Remove-Item node_modules before npm fallback in setup.ps1 - Print bun install failure log in setup.sh before discarding - Add Refresh-Environment after npm install -g bun in setup.ps1 - Tighten Node version check to ^20.19.0 \|\| >=22.12.0 (Vite 8 requirement) - Add engines field to package.json - Use string comparison for _install_ok in build.sh - Remove explicit framer-motion ^11.18.2 from package.json (motion pulls framer-motion ^12.38.0 as its own dependency — the old pin caused a version conflict) Fix Colab Node bypass and bun.lock stale-build trigger Gate the Colab Node shortcut on NODE_OK=true so Colab environments with a Node version too old for Vite 8 fall through to the nvm install path instead of silently proceeding. Exclude bun.lock from the stale-build probe in both setup.sh and setup.ps1 so it does not force unnecessary frontend rebuilds on every run. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Shine1i <wasimysdev@gmail.com>	2026-03-25 04:27:41 -07:00
Daniel Han	797ddd201e	Fix Studio silently exiting on Windows without error output (#4527 ) * Fix Studio silently exiting on Windows without error output On Windows, `unsloth studio` launches a child process via subprocess.Popen to run the server in the studio venv. If the child crashes (e.g. due to a missing package), the parent just calls typer.Exit(rc) with no message -- the user sees "Launching Unsloth Studio... Please wait..." and then the prompt returns with zero feedback. Root cause: `data_designer_unstructured_seed` is imported at the top level in seed.py. If this package is not installed in the studio venv, the entire import chain (seed.py -> routes/__init__.py -> main.py -> run_server()) crashes with ModuleNotFoundError. Since run.py has no try/except around run_server() and studio.py does not report nonzero exit codes, the failure is completely silent. Changes: - run.py: wrap run_server() in try/except, print clear error with traceback to stderr. Also reconfigure stderr encoding on Windows so tracebacks with non-ASCII paths do not cause secondary failures. - studio.py: print an error message when the child process exits with a nonzero code on Windows, so the user knows something went wrong. - seed.py: make data_designer_unstructured_seed import optional with a try/except fallback. The server starts normally and only returns HTTP 500 if the unstructured seed endpoints are actually called. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Skip Anaconda/Miniconda Python when creating Studio venv on Windows Conda-bundled CPython ships modified DLL search paths that prevent torch from loading c10.dll on Windows. The Studio server fails silently at startup because the venv was created with conda's Python. Standalone CPython (python.org, winget, uv) does not have this issue. Both install.ps1 and setup.ps1 now skip any Python binary whose path contains conda, miniconda, anaconda, miniforge, or mambaforge when selecting the interpreter for the studio venv. If only conda Python is available, the scripts print an error with instructions to install standalone CPython. * Fix multi-file preview crash and improve setup.ps1 Python discovery Addresses review findings [10/10] and [8/10]: 1. seed.py: _read_preview_rows_from_multi_files() had a hard import of build_multi_file_preview_rows inside the function body, bypassing the optional-plugin guard. Moved it into the top-level try/except block and added a None guard matching the other functions. 2. setup.ps1: Python discovery now probes py.exe (Python Launcher) first, uses Get-Command -All to look past conda entries that shadow standalone CPython further down PATH, skips WindowsApps stubs, and resolves the actual executable path so venv creation does not re-resolve back to a conda interpreter. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Check sys.base_prefix to catch venvs created from conda Python A venv created from conda Python (e.g. C:\Users\danie\.venv) has a path that does not contain "conda", but sys.base_prefix still points to the conda install (e.g. C:\Users\danie\miniconda3). The previous path-only check missed this case entirely. Both install.ps1 and setup.ps1 now use a Test-IsConda helper that checks both the executable path AND sys.base_prefix against the conda/miniconda/anaconda/miniforge/mambaforge pattern. This catches: - Direct conda Python executables - Venvs created from conda Python (base_prefix reveals the origin) * Fix install.ps1 passing version string to uv venv instead of resolved path Find-CompatiblePython returned a bare version string (e.g. "3.13") which was passed to `uv venv --python 3.13`. uv performs its own interpreter discovery and can resolve that version string back to a conda Python, defeating the entire conda-skip logic. Now Find-CompatiblePython returns a hashtable with both .Version (for display) and .Path (the resolved absolute executable path). The venv is created with `uv venv --python <absolute-path>`, ensuring uv uses the exact interpreter we validated. * Quote resolved Python path in uv venv call for paths with spaces --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-22 08:23:03 -07:00
Leo Borcherding	71c77d4e96	fix(install.ps1): fix non-NVIDIA package resolution — split torch+unsloth install (#4515 ) * fix(install.ps1): split torch+unsloth install to fix non-NVIDIA package resolution --torch-backend=auto on a non-NVIDIA Windows machine causes uv to resolve unsloth==2024.8 (pre-CLI, no unsloth.exe). Fix: detect GPU robustly (PATH + hardcoded fallback paths, mirrors setup.ps1), install torch first with an explicit --index-url (CUDA variant for NVIDIA, CPU for everyone else), then install unsloth separately without --torch-backend so the solver always picks a modern release that ships the Studio CLI. Closes the remaining gap flagged in #4478. * fix(install.ps1): align warning with setup.ps1, add --upgrade, handle CUDA 11.x - Match the no-GPU warning message to studio/setup.ps1 wording (chat-only GGUF mode, driver download link) - Add CUDA 11.x floor check in Get-TorchIndexUrl so old drivers fall back to CPU wheels instead of silently getting cu124 - Log a warning when nvidia-smi output cannot be parsed - Add --upgrade to both uv pip install calls so re-runs pick up newer package versions * revert --upgrade from uv pip install calls uv pip install already resolves to the latest satisfying version; --upgrade is unnecessary and could force unwanted re-installs. * fix: replace frozen cu124 fallbacks with cu126, guard CUDA 11.x cu124 wheels are frozen at torch 2.6.0 -- falling back to them pins users to an outdated PyTorch. Three issues fixed in both install.ps1 and setup.ps1: 1. CUDA 12.0-12.5 now maps to cu126 (was cu124). 2. CUDA 11.x and older now falls back to cpu (was cu124, which would silently install incompatible GPU wheels). 3. Parse-failure and no-nvidia-smi fallbacks updated to cu126/cpu. Adds tests/test_cuda_wheel_mapping.py covering the mapping logic, nvidia-smi parsing, PS1 file sync, PyTorch index URL validation, and sandbox torch installs. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove test file from PR branch Test file kept locally, not needed in the PR. * fix: map CUDA 11.x to cu118 instead of cpu PyTorch still publishes cu118 wheels (up to torch 2.7.1), so CUDA 11.x users get GPU-accelerated torch rather than being forced to CPU-only. Only CUDA 10.x and older fall back to cpu. * fix: revert CUDA 12.0-12.5 to cu124, handle cpu tag in setup.ps1 CUDA 12.0-12.5 drivers only support up to their reported CUDA version, so cu126 wheels (built with CUDA 12.6) fail to load. Revert the catch- all for 12.0-12.5 back to cu124. Also fix setup.ps1 caller: when Get-PytorchCudaTag returns "cpu" (e.g. CUDA 10.x driver), the installer now correctly skips Triton and prints "CPU-only" instead of "CUDA support (cpu)". * fix: add --upgrade to unsloth install for stale venv repair On reruns against an existing venv, uv pip install unsloth makes no changes if unsloth==2024.8 is already installed (it satisfies the constraint). Adding --upgrade only to the unsloth install ensures stale installs get repaired without forcing a multi-GB torch re-download. * fix: use --upgrade-package to avoid clobbering torch CUDA wheels `--upgrade unsloth` re-resolves torch from default PyPI, stripping the +cuXXX suffix installed in step 1. `--upgrade-package unsloth unsloth` upgrades only unsloth (and pulls missing deps like transformers, trl) while preserving the pinned torch from the CUDA-specific index. * docs: explain why split-install and --upgrade-package are needed Expand the inline comment block to document both design decisions: 1. Why torch is installed separately (solver fallback to 2024.8) 2. Why --upgrade-package is used instead of --upgrade (preserves CUDA wheels) --------- Co-authored-by: LeoBorcherding <LeoBorcherding@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-22 05:41:58 -07:00
Leo Borcherding	96edad9c95	PR: Fix/cuda minimum check and abort (#4517 ) * fix: add CUDA minimum version check and abort for llama.cpp (>= 12.4) - setup.ps1/setup.sh: abort with clear error if CUDA toolkit < 12.4 (llama.cpp requirement); link to cuda-toolkit-archive for upgrade - setup.ps1: promote CUDA VS integration copy failure from WARN to ERROR + exit 1; remove manual-copy hack instructions per Roland — correct fix is re-installing CUDA/MSBuild, not a manual workaround Fixes: https://github.com/unslothai/unsloth/issues/4437 Reported by: Sebastien * fix: wipe stale studio venv when torch CUDA tag changes When the NVIDIA driver is updated, the required PyTorch CUDA tag changes (e.g. cu124 -> cu130) but setup.ps1 was silently reusing the existing .venv, leaving the old torch wheel in place and breaking the UI for everyone on the next setup run. Before creating/reusing the venv, inspect the installed torch version string. If its CUDA tag does not match what the current driver requires, wipe the venv so we always get a clean, correct install. * Fix CUDA version check: portability, non-fatal fallback, stale venv detection - setup.sh: Replace grep -oP with POSIX sed for macOS compatibility - setup.sh: Replace exit 1 with NVCC_PATH="" to fall back to CPU-only build - setup.sh: Move version check before -DGGML_CUDA=ON append - setup.sh: Add else branch warning when nvcc version is unparseable - setup.ps1: Replace exit 1 with $NvccPath=$null for non-fatal CUDA fallback - setup.ps1: Add driver vs toolkit guidance in version warning - setup.ps1: Guard CUDA env/VS integration setup with if ($NvccPath) - setup.ps1: VS integration catch: downgrade to WARN, restore source/dest paths - setup.ps1: Stale venv: detect CPU torch and untagged wheels, not just +cuNNN - setup.ps1: Stale venv: rebuild on failed torch import - setup.ps1: Stale venv: wrap Remove-Item in try/catch for locked files * Remove incorrect CUDA >= 12.4 check, keep only stale venv detection llama.cpp has no hard minimum CUDA version -- it builds with CUDA as old as 11.2 and degrades features gracefully via #if CUDART_VERSION guards. The 12.4 figure was the default Docker/CI baseline, not a build requirement. Reverted: - CUDA version check in setup.sh (entirely removed) - CUDA version check in setup.ps1 (entirely removed) - VS integration catch block cosmetic changes (restored to main) - if ($NvccPath) guard around CUDA env setup (not needed without version check) Kept: - Stale venv detection in setup.ps1: detects torch CUDA tag mismatch (cu124 vs cu130, cpu vs cuXXX, broken torch import) and rebuilds venv * Fix stale venv detection: incomplete venvs, timeout, fatal delete failure - Add 30s timeout for torch import probe via ProcessStartInfo/WaitForExit - Use Test-Path -PathType Container to reject files masquerading as venv dir - Trigger rebuild when python.exe is missing (incomplete venv) - Make Remove-Item failure fatal ([ERROR] + exit 1) instead of warn-and-continue - Move $expectedTorchTag computation inside -not $shouldRebuild guard --------- Co-authored-by: LeoBorcherding <LeoBorcherding@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-22 04:46:36 -07:00
Manan Shah	6f129a214b	Fix Install commands for Windows + 1 line installs (#4447 ) * One liner setup for unsloth studio * Fix install scripts: system deps, activation bugs, curl/wget support - install.sh: detect platform (macOS/Linux/WSL) and check for missing system dependencies (cmake, git, build-essential, libcurl4-openssl-dev). Prompt user once for permission to install all missing packages via brew (macOS) or sudo apt-get (Linux/WSL). Add wget fallback via download() helper since curl is not always present on minimal Linux installs. Fix nested curl\|sh stdin stealing by downloading uv installer to a tempfile first. Replace venv activation (no-op in a pipe subshell) with explicit --python flag for uv pip install and direct venv binary invocation. Add idempotency guard for venv creation. Redirect stdin on unsloth studio setup to prevent pipe consumption. On macOS, check for Xcode Command Line Tools and trigger install if missing. - install.ps1: wrap script body in Install-UnslothStudio function so that errors use return instead of exit (exit kills the terminal when run via irm\|iex). Remove activate.ps1 invocation entirely -- use explicit --python path for uv pip install and & $UnslothExe for studio setup. This avoids both the child-scope activation bug (& vs dot-source) and the execution policy error on default Windows systems. Add winget availability check with clear error message. Fix PATH refresh to append registry paths instead of replacing the session PATH. Add uv installer fallback via astral.sh PowerShell script if winget install does not put uv on PATH. Broaden Python version check to accept 3.11-3.13. Add idempotency guard for venv creation. - README.md: add wget one-liner alternative for systems without curl. * Fix Tailwind CSS v4 .gitignore bug on Windows (#4444) - Add .gitignore hiding workaround to setup.ps1 (matching existing setup.sh logic) so venv .gitignore files containing "" don't prevent Tailwind's oxide scanner from finding .tsx source files - Add CSS size validation to setup.sh, setup.ps1, and build.sh to catch truncated Tailwind builds early - Remove stray force-rebuild overrides that made the "skip build if current" cache check dead code in both setup scripts - Add rm -rf dist to build.sh to force clean rebuilds for wheel packaging Change default port 8000 to 8888, fix installer bugs, improve UX - Change default Studio port from 8000 to 8888 across all entry points (run.py, studio.py, ui.py, colab.py, vite.config.ts, setup scripts) - Update launch banner: "Launching with studio venv..." to "Launching Unsloth Studio... Please wait..." - Add "Open your web browser" banner and rename labels (Local -> Local Access, External -> Worldwide Web Address) - Fix venv idempotency: check for bin/python instead of just directory existence, clean up partial venvs on retry - Fix build.sh CSS validation: handle empty CSS case that silently bypassed the check with "integer expression expected" - Fix install.sh sudo handling: try apt-get without sudo first (works when root), then escalate with per-package tracking and user prompt - Fix install.ps1: check exit code from studio setup, fail on error - Add pciutils to WSL GGUF build dependencies - Apply same smart apt-get escalation pattern to studio/setup.sh * Use detected Python version for venv, abort on non-apt Linux - install.ps1: detect existing Python 3.11/3.12/3.13 and use that version for venv creation instead of always forcing 3.13 - install.sh: exit with error on non-apt Linux distros when required packages cannot be auto-installed, instead of silently continuing * Make sudo permission prompt more prominent with warning banner * Add Accept [Y/n] sudo prompt to studio/setup.sh for consistency * Fix native command exit code handling and sudo decline flow install.ps1: Add $LASTEXITCODE checks after winget (Python), uv venv, and uv pip install calls. $ErrorActionPreference only catches PowerShell cmdlet errors, not native executable failures. The Python check also handles winget returning non-zero for "already installed". setup.sh: Skip llama-server build when user declines sudo or sudo is unavailable. Previously the script continued to section 8 which would fail with confusing errors (e.g. "gcc: command not found") since build-essential was never installed. * Move rm -rf llama.cpp inside build branch to preserve existing install When _SKIP_GGUF_BUILD is set (user declined sudo or sudo unavailable), the previous rm -rf would destroy an already-working llama-server before the skip check ran. Move it inside the else branch so existing builds are preserved when the rebuild is skipped. --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-19 02:09:09 -07:00
Daniel Han	7ddb660b0c	revert: always rebuild frontend, override caching with _NEED_FRONTEND_BUILD=true (#4427 ) * revert: remove frontend build caching from setup scripts The mtime-based caching introduced in #4404/#4413 can incorrectly skip frontend builds -- e.g. after git pull when filesystem timestamps are not preserved, or after our Tailwind v4 discovery that the site-packages .gitignore must be hidden before vite build (which the cached path doesn't handle). Always rebuild the frontend on setup. The build takes ~15s and is safer than risking a stale dist/. * revert: disable frontend build caching, keep code commented out Caching disabled by always setting _NEED_FRONTEND_BUILD=true. The mtime-based logic is preserved in comments for future re-enabling. Reasons for disabling: - Git does not preserve file timestamps, so cached dist/ can appear newer than freshly checked-out source after a pull - Tailwind v4 requires hiding site-packages/.gitignore before vite build; the cache path bypasses this, producing broken CSS * revert: always rebuild frontend, remove mtime caching * revert: always rebuild frontend, override caching with _NEED_FRONTEND_BUILD=true	2026-03-18 07:37:53 -07:00
Daniel Han	1f12ba16df	Combine studio setup fixes: frontend caching, venv isolation, Windows CPU support (#4413 ) * Allow Windows setup to complete without NVIDIA GPU setup.ps1 previously hard-exited if nvidia-smi was not found, blocking setup entirely on CPU-only or non-NVIDIA machines. The backend already supports CPU and MLX (Apple Silicon) in chat-only GGUF mode, and the Linux/Mac setup.sh handles missing GPUs gracefully. Changes: - Convert the GPU check from a hard exit to a warning - Guard CUDA toolkit installation behind $HasNvidiaSmi - Install CPU-only PyTorch when no GPU is detected - Build llama.cpp without CUDA flags when no GPU is present - Update doc comment to reflect CPU support * Cache frontend build across setup runs Skip the frontend npm install + build if frontend/dist already exists. Previously setup.ps1 nuked node_modules and package-lock.json on every run, and both scripts always rebuilt even when dist/ was already present. On a git clone editable install, the first setup run still builds the frontend as before. Subsequent runs skip it, saving several minutes. To force a rebuild, delete frontend/dist and re-run setup. * Show pip progress for PyTorch download on Windows The torch CUDA wheel is ~2.8 GB and the CPU wheel is ~300 MB. With \| Out-Null suppressing all output, the install appeared completely frozen with no feedback. Remove \| Out-Null for the torch install lines so pip's download progress bar is visible. Add a size hint so users know the download is expected to take a while. Also moves the Triton success message inside the GPU branch so it only prints when Triton was actually installed. * Guard CUDA env re-sanitization behind GPU check in llama.cpp build The CUDA_PATH re-sanitization block (lines 1020-1033) references $CudaToolkitRoot which is only set when $HasNvidiaSmi is true and the CUDA Toolkit section runs. On CPU-only machines, $CudaToolkitRoot is null, causing Split-Path to throw: Split-Path : Cannot bind argument to parameter 'Path' because it is null. Wrap the entire block in `if ($HasNvidiaSmi -and $CudaToolkitRoot)`. * Rebuild frontend when source files are newer than dist/ Instead of only checking if dist/ exists, compare source file timestamps against the dist/ directory. If any file in frontend/src/ is newer than dist/, trigger a rebuild. This handles the case where a developer pulls new frontend changes and re-runs setup -- stale assets get rebuilt automatically. * Fix cmake not found on Windows after winget install Two issues fixed: 1. After winget installs cmake, Refresh-Environment may not pick up the new PATH entry (MSI PATH changes sometimes need a new shell). Added a fallback that probes cmake's default install locations (Program Files, LocalAppData) and adds the directory to PATH explicitly if found. 2. If cmake is still unavailable when the llama.cpp build starts (e.g. winget failed silently or PATH was not updated), the build now skips gracefully with a [SKIP] warning instead of crashing with "cmake : The term 'cmake' is not recognized". * Fix frontend rebuild detection and decouple oxc-validator install Address review feedback: - Check entire frontend/ directory for changes, not just src/. The build also depends on package.json, vite.config.ts, tailwind.config.ts, public/, and other config files. A change to any of these now triggers a rebuild. - Move oxc-validator npm install outside the frontend build gate in setup.sh so it always runs on setup, matching setup.ps1 which already had it outside the gate. * Show cmake errors on failure and retry CUDA VS integration with elevation Two fixes for issue #4405 (Windows setup fails at cmake configure): 1. cmake configure: capture output and display it on failure instead of piping to Out-Null. When the error mentions "No CUDA toolset found", print a hint about the CUDA VS integration files. 2. CUDA VS integration copy: when the direct Copy-Item fails (needs admin access to write to Program Files), retry with Start-Process -Verb RunAs to prompt for elevation. This is the root cause of the "No CUDA toolset found" cmake failure -- the .targets files that let MSBuild compile .cu files are missing from the VS BuildCustomizations directory. * Address reviewer feedback: cmake PATH persistence, stale cache, torch error check 1. Persist cmake PATH to user registry so Refresh-Environment cannot drop it later in the same setup run. Previously the process-only PATH addition at phase 1 could vanish when Refresh-Environment rebuilt PATH from registry during phase 2/3 installs. 2. Clean stale CMake cache before configure. If a previous run built with CUDA and the user reruns without a GPU (or vice versa), the cached GGML_CUDA value would persist. Now the build dir is removed before configure. 3. Explicitly set -DGGML_CUDA=OFF for CPU-only builds instead of just omitting CUDA flags. This prevents cmake from auto-detecting a partial CUDA installation. 4. Fix CUDA cmake flag indentation -- was misaligned from the original PR, now consistently indented inside the if/else block. 5. Fail hard if pip install torch returns a non-zero exit code instead of silently continuing with a broken environment. * Remove extra CUDA cmake flags to align Windows with Linux build Drop GGML_CUDA_FA_ALL_QUANTS, GGML_CUDA_F16, GGML_CUDA_GRAPHS, GGML_CUDA_FORCE_CUBLAS, and GGML_CUDA_PEER_MAX_BATCH_SIZE flags. The Linux build in setup.sh only sets GGML_CUDA=ON and lets llama.cpp use its defaults for everything else. Keep Windows consistent. * Address reviewer round 2: GPU probe fallback, Triton check, stale binary rebuild 1. GPU detection: fallback to default nvidia-smi install locations (Program Files\NVIDIA Corporation\NVSMI, System32) when nvidia-smi is not on PATH. Prevents silent CPU-only provisioning on machines that have a GPU but a broken PATH. 2. Triton: check $LASTEXITCODE after pip install and print [WARN] on failure instead of unconditional [OK]. 3. Stale llama-server: check CMakeCache.txt for GGML_CUDA setting and rebuild if the existing binary does not match the current GPU mode (e.g. CUDA binary on a now-CPU-only rerun, or vice versa). * Fix frontend rebuild detection and npm dependency issues Addresses reviewer feedback on the frontend caching logic: 1. setup.sh: Fix broken find command that caused exit under pipefail. The piped `find \| xargs find -newer` had paths after the expression which GNU find rejects. Replaced with a simpler `find -maxdepth 1 -type f -newer dist/` that checks ALL top-level files (catches index.html, bun.lock, etc. that the extension allowlist missed). 2. setup.sh: Guard oxc-validator npm install behind `command -v npm` check. When the frontend build is skipped (dist/ is cached), Node bootstrap is also skipped, so npm may not be available. 3. setup.ps1: Replace Get-ChildItem -Include with explicit path probing for src/ and public/. PowerShell's -Include without a trailing wildcard silently returns nothing, so src/public changes were never detected. Also check ALL top-level files instead of just .json/.ts/.js/.mjs extensions. * Fix studio setup: venv isolation, centralized .venv_t5, uv targeting - All platforms (including Colab) now create ~/.unsloth/studio/.venv with --without-pip fallback for broken ensurepip environments - Add --python sys.executable to uv pip install in install_python_stack.py so uv targets the correct venv instead of system Python - Centralize .venv_t5 bootstrap in transformers_version.py with proper validation (checks required packages exist, not just non-empty dir) - Replace ~150 lines of duplicated install code across 3 worker files with calls to the shared _ensure_venv_t5_exists() helper - Use uv-if-present with pip fallback; do not install uv at runtime - Add site.addsitedir() shim in colab.py so notebook cells can import studio packages from the venv without system-Python double-install - Update .venv_t5 packages: huggingface_hub 1.3.0->1.7.1, add hf_xet - Bump transformers pin 4.57.1->4.57.6 in requirements + constraints - Add Fast-Install helper to setup.ps1 with uv+pip fallback - Keep Colab-specific completion banner in setup.sh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix nvidia-smi PATH persistence and cmake requirement for CPU-only 1. Store nvidia-smi as an absolute path ($NvidiaSmiExe) on first detection. All later calls (Get-CudaComputeCapability, Get-PytorchCudaTag, CUDA toolkit detection) use this absolute path instead of relying on PATH. This survives Refresh-Environment which rebuilds PATH from the registry and drops process-only additions. 2. Make cmake fatal for CPU-only installs. CPU-only machines depend entirely on llama-server for GGUF chat mode, so reporting "Setup Complete!" without it is misleading. GPU machines can still skip the llama-server build since they have other inference paths. * Fix broken frontend freshness detection in setup scripts - setup.sh: Replace broken `find \| xargs find -newer` pipeline with single `find ... -newer` call. The old pipeline produced "paths must precede expression" errors (silently suppressed by 2>/dev/null), causing top-level config changes to never trigger a rebuild. - setup.sh: Add `command -v npm` guard to oxc-validator block so it does not fail when Node was not installed (build-skip path). - setup.ps1: Replace `Get-ChildItem -Include` (unreliable without -Recurse on PS 5.1) with explicit directory paths for src/ and public/ scanning. - Both: Add .html to tracked file patterns so index.html (Vite entry point) changes trigger a rebuild. - Both: Use -print -quit instead of piping to head -1 for efficiency. Fix bugs found during review of PRs #4404, #4400, #4399 - setup.sh: Add \|\| true guard to find command that checks frontend/src and frontend/public dirs, preventing script abort under set -euo pipefail when either directory is missing - colab.py: Use sys.path.insert(0, ...) instead of site.addsitedir() so Studio venv packages take priority over system copies. Add warning when venv is missing instead of silently failing. - transformers_version.py: _venv_t5_is_valid() now checks installed package versions via .dist-info metadata, not just directory presence. Prevents false positives from stale or wrong-version packages. - transformers_version.py: _install_to_venv_t5() now passes --upgrade so pip replaces existing stale packages in the target directory. - setup.ps1: CPU-only PyTorch install uses --index-url for cpu wheel and all install commands use Fast-Install (uv with pip fallback). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix _venv_t5_is_valid dist-info loop exiting after first directory Remove premature break that caused the loop over .dist-info directories to exit after the first match even if it had no METADATA file. Now continues iterating until a valid METADATA is found or all dirs are exhausted. * Capture error output on failure instead of discarding with Out-Null setup.ps1: 6 locations changed from `\| Out-Null` to `\| Out-String` with output shown on failure -- PyTorch GPU/CPU install, Triton install, venv_t5 package loop, cmake llama-server and llama-quantize builds. transformers_version.py: clean stale .venv_t5 directory before reinstall when validation detects missing or version-mismatched packages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix ModuleNotFoundError when CLI imports studio.backend.core The backend uses bare "from utils." imports everywhere, relying on backend/ being on sys.path. Workers and routes add it at startup, but the CLI imports studio.backend.core as a package -- backend/ was never added. Add sys.path setup at the top of core/__init__.py so lazy imports resolve correctly regardless of entry point. Fixes: unsloth inference unsloth/Qwen3-8B "who are you" crashing with "No module named 'utils'" Fix frontend freshness check to detect all top-level file changes The extension allowlist (.json, .ts, .js, .mjs, .html) missed files like bun.lock, so lockfile-only dependency changes could skip the frontend rebuild. Check all top-level files instead. Add tiktoken to .venv_t5 for Qwen-family tokenizers Qwen models use tiktoken-based tokenizers which fail when routed through the transformers 5.x overlay without tiktoken installed. Add it to the setup scripts (with deps for Windows) and runtime fallback list. Integrates PR #4418. * Fix tiktoken crash in _venv_t5_is_valid and stray brace in setup.ps1 _venv_t5_is_valid() crashed with ValueError on unpinned packages like "tiktoken" (no ==version). Handle by splitting safely and skipping version check for unpinned packages (existence check only). Also remove stray closing brace in setup.ps1 tiktoken install block. --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-18 03:52:25 -07:00
Daniel Han	0acd1c7eec	studio: improve onboarding UX, tooltips, and training defaults (#4355 ) * studio: improve onboarding UX, tooltips, and training defaults - Change splash text to "Train and run LLMs locally" - Add "Chat Only" card with BubbleChatIcon to skip directly to chat - Add Skip/Skip to Chat buttons in sidebar and footer - Back button on step 1 returns to splash screen instead of being disabled - Change "Watch video guide" to "Get started with our guide" with new URL - Update intro text to mention all model types + chat - Make all tooltips clickable (in addition to hover) via React context - Strip surrounding quotes from pasted HF tokens - Rename "Eval Split" to "Evaluation Split" - Add SparklesIcon to "Auto Detect" format option - Change step 4 heading to "Choose your training parameters" - Default max_steps to 60 - Learning rate displayed in scientific notation with +/- stepper - Context length options capped by model's max_position_embeddings (via AutoConfig) - Fix "QLORA"/"LORA" to "QLoRA"/"LoRA" in summary step - Backend: add max_position_embeddings to model config endpoint * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * compare for 2 diff models * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolving gemini comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: disable thinking for Qwen3.5 <9B and always for AI Assist - Change Qwen3.5 thinking threshold from <=2B to <9B (0.8B, 2B, 4B all disable thinking by default; 9B+ enables it) - Always pass enable_thinking=False in AI Assist helper calls (_run_with_helper and _generate_with_backend) regardless of chat thinking settings * studio: address PR review comments - Extract _get_max_position_embeddings helper to DRY config extraction - Fix "Skip to Chat" to navigate to /chat on step 1 (was /studio) * fix: comment out debug print statements * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: skip Shiki highlighting for incomplete SVG code fences While streaming SVG content, the syntax highlighter (Shiki) re-parses the entire growing SVG on every token, blocking the main thread and freezing the code area until the fence closes. Show a plain-text preview for incomplete SVG fences instead, similar to how Mermaid diagrams show a placeholder while streaming. * studio: fix default top_k from 50/40 to 20 for chat inference Per Qwen3.5 docs (unsloth.ai/docs/models/qwen3.5), top_k should be 20 for both thinking and non-thinking modes. The model-specific config in inference_defaults.json already had top_k=20 for Qwen3.5, but the generic fallback defaults were wrong: - Frontend DEFAULT_INFERENCE_PARAMS.topK: 50 -> 20 - Backend generate_chat_completion top_k: 40 -> 20 - Backend generate_chat_completion_with_tools top_k: 40 -> 20 - Frontend title generation top_k: 40 -> 20 * studio: set universal inference defaults for unknown models Default params for any model without specific config: temperature=0.6, top_p=0.95, top_k=20, min_p=0.01, presence_penalty=0.0, repetition_penalty=1.0 Models with entries in inference_defaults.json (Qwen3.5, Gemma-3, Llama, etc.) override these with their recommended values. Updated in: frontend DEFAULT_INFERENCE_PARAMS, backend Pydantic request models, and backend generate_chat_completion defaults. * studio: only trust_remote_code for unsloth/ models in AutoConfig Only set trust_remote_code=True when the model name starts with "unsloth/". All other models default to False for safety. * studio: move Generating spinner above the composer The "Generating" spinner was below the send message bar, causing the bar to jump up and down. Move it above the composer in both the regular thread view and the welcome/empty view. * studio: adjust toast close button position away from edge Move the X close button on toasts (like "Starting model...") from top-1.5 to top-3 and add right-3, giving more breathing room from the top-right corner. * studio: make Think button smaller with tighter icon-text gap Reduce gap from 1.5 to 0.5, padding from px-2.5/py-1 to px-2/py-0.5, and icon from size-3.5 to size-3. * studio: multiple onboarding and chat UX improvements - Move Generating spinner above composer (fixes jumping send bar) - Make Think button smaller with tighter icon-text gap - Chat card now inside grid (same size as Audio/Embeddings cards) - Rename "Chat Only" to "Chat" - Chat card requires Continue to proceed (no auto-advance) - Continue on Chat selection skips onboarding and goes to /chat - Tooltip (i) click on Chat card doesn't trigger navigation - Step 1 footer Back button goes back to splash (label is "Back") - Splash "Skip Onboarding" renamed to "Skip to Chat", navigates to /chat - Toast close button moved away from edge * studio: align Skip to Chat button, add Skip to footer - Sidebar "Skip to Chat" now uses primary (green) Button style with arrow icon, full width, aligned like step items. Shows on all steps. - Footer: added "Skip" outline button next to Continue that goes directly to /studio with progress saved (markOnboardingDone) * studio: change default max steps from 30 to 60 in toggle hook The DEFAULT_MAX_STEPS in use-max-steps-epochs-toggle.ts was still 30, used as fallback when toggling from epochs back to max steps. * studio: extend context length options to 262K CONTEXT_LENGTHS now includes 65536, 131072, 262144 in addition to the existing 512-32768 range. The onboarding step filters these by the model's max_position_embeddings (e.g. Nemotron-3-Nano-4B has 262144), showing powers of 2 up to the model's maximum. * studio: auto-select LoRA vs QLoRA based on model size and GPU memory After selecting a model in onboarding, detect the total model weight file size from HF Hub (safetensors/bin files). Then estimate memory needed: model_size_gb * 1.5 * context_scale, where context_scale is: - <=8192 tokens: 1.0x - >8192 tokens: 1.7x - >=16384 tokens: 2.0x - >=32768 tokens: 4.0x If the estimate fits in free GPU VRAM, default to LoRA (16-bit). Otherwise default to QLoRA (4-bit). Backend changes: - Add model_size_bytes to ModelDetails (models.py) - Add _get_model_size_bytes() using HfApi.repo_info (routes/models.py) - Add vram_free_gb to get_gpu_summary (hardware.py) Frontend changes: - Add autoSelectTrainingMethod() in training-config-store.ts - Called after model defaults are loaded - Add model_size_bytes to ModelConfigResponse type - Add vramFreeGb to HardwareInfo hook * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: rename "Importing ML libraries..." to "Importing Unsloth..." * studio: show model/dataset in training status, fix LoRA/QLoRA casing - Training status now shows 'Training "model_name"' and 'Dataset = ...' instead of generic "Starting training..." - Fix Studio progress section to show QLoRA/LoRA instead of QLORA/LORA * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: rename 'Skip to Chat' to 'Skip Onboarding' on splash screen * studio: add presence_penalty support for chat inference Add presence_penalty as a parameter across the full stack: - Backend: llama_cpp.py generate_chat_completion/with_tools, Pydantic models (inference.py), routes/inference.py pass-through - Frontend: InferenceParams type, DEFAULT_INFERENCE_PARAMS (0.0), chat-adapter.ts payload, chat-settings-sheet.tsx slider (0-2), model defaults loading from inference_defaults.json - Set Qwen3.5 default presence_penalty to 1.5 per official docs - Default for unknown models is 0.0 (off) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: fix Chat card deselecting Text and aligning with other cards * studio: fix presence_penalty not loading from inference defaults The inference_config.py load_inference_config() was not including presence_penalty in the returned config dict, so the Qwen3.5 default of 1.5 from inference_defaults.json never reached the frontend. Added it to the config builder. * studio: add delete button for cached models in model selector Add trash icon on each downloaded model row (GGUF and safetensors) with confirmation dialog. Backend DELETE /api/models/delete-cached endpoint uses huggingface_hub scan_cache_dir + delete_revisions to cleanly remove cached repos, refusing if the model is currently loaded. * studio: restore inference defaults, reasoning, and tools on page refresh On page refresh with a model already loaded, the frontend was not re-applying model-specific inference defaults (presence_penalty, temperature, etc.) or restoring reasoning/tools support flags. Backend: Add inference config, supports_reasoning, supports_tools, and context_length to InferenceStatusResponse. Frontend: In the refresh callback, when an active model is detected, apply mergeRecommendedInference and restore reasoning/tools flags with proper Qwen3.5 size-based defaults. * studio: fix delete dialog closing before async completes Prevent AlertDialogAction's default close behavior with e.preventDefault() so the dialog stays open during deletion. Also block onOpenChange dismiss while deleting is in progress. * fix: add Dict and Any imports to inference models * studio: fix Qwen3.5 reasoning threshold in frontend load path The frontend loadModel handler had the old threshold (<=2) for disabling reasoning on small Qwen3.5 models. Changed to <9 to match the backend. This was causing 4B to not properly disable thinking by default when auto-loaded. * studio: move GGUF delete to per-variant level For GGUF repos, the trash icon now appears on each downloaded variant row inside the quantization expander instead of on the repo-level row. Backend accepts optional variant param to delete specific GGUF files (blob + symlink) rather than the entire repo cache. * studio: restore ggufContextLength on page refresh The Max Tokens slider was capped at 32768 on page refresh because ggufContextLength was not restored from the status response. Now set it from statusRes.context_length on reconnect. * fix: remove <think> from Qwen3.5 response template marker The train-on-responses-only feature uses template markers to find where the assistant response starts. The Qwen3.5 response marker included '<think>\n' which is only present when thinking mode is enabled. With thinking disabled (default for <9B), the marker never matched, causing 100% of samples to be dropped. Changed response marker from '<\|im_start\|>assistant\n<think>\n' to '<\|im_start\|>assistant\n' which works regardless of thinking mode. * studio: fix sloth ASCII art alignment in training overlay * fix: correct sloth ASCII art alignment to match Unsloth banner * studio: add Python and terminal tool calling to chat Register python and terminal tools alongside web search. Python executor validates imports (stdlib only) via unsloth_zoo rl_environments, runs code in a subprocess sandbox with 5-min timeout and cancel support. Terminal executor blocks dangerous commands (rm, sudo, etc.) and runs in a temp directory. Update llama_cpp tool loop to show tool-specific status messages and pass cancel_event through to executors. Rename composer toggle from "Search" to "Tools" and show TerminalIcon for execution status pills. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: fix Nemotron/transformers 5.x support, onboarding navigation, port binding Backend: - Dynamic transformers 5.x detection via tokenizer_config.json fetch (checks for TokenizersBackend class, cached per-model) - Bump transformers 5.x version from 5.2.0 to 5.3.0 across all workers, setup scripts (setup.sh, setup.ps1) - Auto-enable trust_remote_code for unsloth/* models needing transformers 5.x (workaround for NemotronH config parsing bug in transformers) - Auto-install mamba-ssm/causal-conv1d for SSM models (NemotronH, Falcon-H1) with --no-build-isolation --no-deps to avoid torch version conflicts - Add SO_REUSEADDR to port check in run.py (fixes Colab proxy stale connection falsely reporting port as in-use) Frontend: - Fix "Skip to Chat" navigation: use window.location.href instead of React Router navigate() to bypass useEffect redirect race - Fix "Skip Onboarding" on splash: navigates to /studio (not /chat) - Fix onboarding guard: only check isOnboardingDone() on initial mount - Fix Chat card on step 1: add sr-only spacer for consistent alignment - Fix Chat+Text both selected: clear RadioGroup value when Chat is selected * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: split tools toggle into Search and Code buttons Replace the single "Tools" toggle with two independent toggles: - "Search" (globe icon) enables web search only - "Code" (terminal icon) enables Python and terminal execution Add enabled_tools list field to the inference payload so the backend only registers the tools the user has toggled on. Both toggles appear in the main composer and the compare composer. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: fix tool calling import validation and error logging Replace unsloth_zoo-dependent import checker with a standalone ast-based validator using sys.stdlib_module_names. This properly blocks non-stdlib imports (numpy, requests, etc.) and returns a clear error message to the model so it can rewrite using only stdlib. Add full traceback to tool streaming error logs for debugging. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: parse gpt-oss harmony channels for clean safetensors chat output gpt-oss models emit multi-channel output via harmony protocol tokens (<\|channel\|>analysis<\|message\|>... and <\|channel\|>final<\|message\|>...). TextIteratorStreamer with skip_special_tokens=True strips the special tokens but leaves channel names concatenated with content, producing garbled output like "analysisWe need to...assistantfinalHello!". Add HarmonyTextStreamer that decodes with skip_special_tokens=False, parses harmony markup via regex, and emits <think>analysis</think> for the analysis channel and plain text for the final channel -- reusing the existing frontend reasoning UI. Also expose supports_reasoning=True for non-GGUF gpt-oss models in the /status endpoint so the frontend enables the Think toggle. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: use unsloth_zoo for Python sandbox validation Set UNSLOTH_IS_PRESENT=1 and import check_python_modules and check_signal_escape_patterns directly from unsloth_zoo instead of a standalone fallback. This gives us the full Unsloth validation including stdlib-only import checks and signal/timeout escape pattern detection. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: allow all imports in Python tool sandbox Remove stdlib-only import restriction. Keep signal escape pattern detection via unsloth_zoo for safety. * studio: fix ReadTimeout on tool streaming final pass The 0.5s read timeout used for cancel-checking during streaming also fires when waiting for the first response from llama-server (e.g. reasoning model thinking for 15+ seconds). Add _stream_with_retry() context manager that retries on ReadTimeout while checking cancel_event, so the model has unlimited time to think before producing the first token. Applied to both the regular streaming path and the tool-calling final pass. * fix: rewrite HarmonyTextStreamer with stateful incremental parsing The delta-on-transformed approach had two critical bugs: 1. Before the full <\|channel\|>X<\|message\|> pattern was complete, the strip-tokens fallback emitted "analysis" as plain text. Then when the regex matched, _transform returned a completely different format (<think>...</think>) and the delta was computed against the wrong base string, producing fragments like "think>", "nk>", ">". 2. Even with full matches, the closing </think> tag shifted position as content grew, so text[prev_len:] produced garbled deltas. Replace with stateful incremental parsing that: - Buffers until a complete channel+message pair is seen - Emits <think> once when analysis channel first appears - Streams analysis content deltas (computed on channel content directly) - Emits </think> once when final channel first appears - Streams final content deltas - Closes open think tags in end() Also skip the generic all_special_tokens stripping in _clean_generated_text for gpt-oss since HarmonyTextStreamer already produces clean output and the generic stripping was mangling <think> tags. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: strip all <\|...\|> tokens in gpt-oss cleanup, not just harmony subset The gpt-oss tokenizer has added tokens like <\|return\|> (id=200002) that are not part of the harmony channel protocol but can leak into output. The previous regex only stripped channel\|message\|start\|end tokens. Broaden the _clean_generated_text regex for gpt-oss to <\\|[a-z_]+\\|> which catches all pipe-delimited tokens (return, constrain, reserved, etc.) without matching <think>/<\/think> tags. Verified: gpt-oss all_special_tokens are only <\|return\|>, <\|reserved_200017\|>, <\|startoftext\|> -- none overlap with <think>. The harmony tokens (channel, message, start, end) are added_tokens but not in all_special_tokens. * fix: hide config-only model repos from cached models list Repos that only have metadata/config files cached (no .safetensors or .bin weight files) were showing up in the Downloaded list with tiny sizes like "1.8 KB" or "24 KB". These are just leftover config snapshots from architecture checks, not usable models. Filter the cached-models endpoint to only include repos that contain actual model weight files (.safetensors or .bin). * studio: fix toast description text contrast in dark mode Add explicit !text-muted-foreground to toast description classNames so secondary text (e.g. "Releases VRAM and resets inference state.") is readable in dark mode. * studio: fix Chat card icon alignment with size-4 spacer Replace sr-only span (takes no space) with a size-4 shrink-0 div matching the RadioGroupItem dimensions in other cards, so the Chat icon aligns vertically with Text/Audio/Vision/Embeddings icons. --------- Co-authored-by: workspace <user@workspace.local> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Manan17 <shahmanan170602@gmail.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>	2026-03-17 07:46:07 -07:00
Daniel Han	eeffa4c065	studio: web search, KV cache dtype, training progress, inference fixes ## Summary - Add web search tool calling for GGUF models (Search toggle, DuckDuckGo via ddgs) - Add KV cache dtype dropdown (f16/bf16/q8_0/q5_1/q4_1) in Chat Settings - Fix Qwen3/3.5 inference defaults per official docs (thinking on/off params) - Enable reasoning by default for Qwen3.5 4B and 9B - Replace "Generating" toast with inline spinner - Fix stop button via asyncio.to_thread (event loop no longer blocked) - Fix CUDA 12 compat lib paths for llama-server on CUDA 13 systems - Fix auto-load model name not appearing in selector - Training progress messages + dataset_num_proc fix Integrated PRs: - #4327 (imagineer99): BETA badge alignment (already in tree) - #4340 (Manan Shah): prioritize training models in model selection - #4344 (Roland Tannous): setup.sh macOS python version compatibility - #4345 (Manan Shah): revamp model+dataset checking logic	2026-03-17 00:30:01 -07:00
Datta Nimmaturi	bbf6414caf	Fix formatting of launch command in setup.ps1	2026-03-17 10:19:16 +05:30
Roland Tannous	46f9be3dd1	fix: Resolve CUDA toolkit mismatch on multi-CUDA Windows systems (#4324 ) * fix: prefer existing CUDA_PATH toolkit to avoid version mismatch on multi-CUDA systems * fix: validate GPU arch support before accepting CUDA toolkit (sm_120 + CUDA 12.4 fallback) * debug: add temporary CUDA compatibility check print * fix: auto-copy CUDA VS integration files when missing (No CUDA toolset found) * fix: return false when nvcc --list-gpu-arch unavailable (reject old toolkit, scan for newer) * fix: re-sanitize CUDA env vars before cmake build (survives Refresh-Environment) * fix: use --list-gpu-code (sm_) instead of --list-gpu-arch (compute_) for arch probing	2026-03-16 18:16:16 +04:00
Roland Tannous	f44857b2df	PR: Windows Setup Improvements (#4299 ) * quiet llama.cpp build, smarter CUDA install via winget, accept Python 3.11-3.13 * studio: hide Python traceback when setup script exits with error * setup.ps1: auto-add Python Scripts dir to PATH so 'unsloth' command works in new terminals * setup.ps1: fix GPU check to run nvidia-smi instead of just checking command existence * setup.ps1: fix PATH check to use exact entry comparison instead of substring match * setup.ps1: validate Python probe exit code before persisting Scripts PATH	2026-03-14 23:59:49 +04:00
Daniel Han	6dda8c4c23	studio: revert combined targets, keep separate builds Restore separate cmake --build calls for llama-server and llama-quantize on both setup.sh and setup.ps1. The combined approach made llama-quantize failure fatal, but it was originally best-effort (\|\| true on Linux, [WARN] on Windows). The timing savings from combining was only ~2.7s, not worth the semantic change. The Ninja + arch detection speedups are preserved (55s vs 1m 37s).	2026-03-14 00:54:09 -07:00
Daniel Han	e4a5da8d96	studio: combine llama.cpp build targets in setup.ps1 Build llama-server and llama-quantize in a single cmake --build invocation on Windows, matching the same optimization done in setup.sh. This allows MSBuild to better parallelize the two targets. The Visual Studio generator is kept as-is (not switching to Ninja on Windows since VS generator is the standard approach and interacts with MSBuild).	2026-03-14 00:54:09 -07:00
Roland Tannous	47654cb91c	Final cleanup	2026-03-12 18:28:04 +00:00
Roland Tannous	400b6ecede	Update setup.ps1	2026-03-12 02:44:25 +04:00
Roland Tannous	1087216cb5	Merge branch 'fix/pre-merge-cleanup' into feature/merge-build-final	2026-03-11 20:56:49 +00:00
Manan17	fbccac8cee	shifting setup & co inside studio	2026-03-11 20:19:52 +00:00
Roland Tannous	daa50d0756	Revert "Merge pull request #347 from unslothai/feature/studio-storage-roots" This reverts commit `6b43e33ff1`, reversing changes made to `9edadaf21f`.	2026-03-10 01:52:47 +00:00
Manan17	32569fc8a8	shifting setup & co inside studio	2026-03-09 23:48:31 +00:00

44 commits