Commit graph

44 commits

Author SHA1 Message Date
Daniel Han
6e87bade25 Trim verbose comments in PATH helpers
Reduce inline comments from ~160 lines to ~25 across both files.
Keep one-line summaries of the "why"; drop multi-paragraph rationale
blocks that repeated information already captured in commit messages
and PR discussion.
2026-04-16 12:01:01 +00:00
Etherll
ec32ce2e82
fix: use direct registry API for PATH writes instead of SetEnvironmentVariable (#4961)
* fix: replacing SetEnvironmentVariable with direct registry API

* apply reviews

* Use CreateSubKey for HKCU\Environment

* Store PATH backup under HKCU\Software\Unsloth

* Fix $backupKey registry handle leak in PATH backup block

Wrap $backupKey operations in try/finally so the handle is closed even
if GetValue or SetValue throws. The Add-ToUserPath helper already uses
this pattern for its registry key -- the backup block was the only
place missing it.

* Isolate WM_SETTINGCHANGE broadcast from PATH write error handling

Wrap the broadcast dummy-variable calls in their own try/catch so a
broadcast failure does not mask a successful registry PATH write.
Previously, if SetEnvironmentVariable threw after SetValue already
committed the new PATH, Add-ToUserPath would return $false and the
caller would skip Refresh-SessionPath.

* PATH helper polish: venv precedence, quoted entries, raw/expanded dedup

Three small follow-ups surfaced by a 10-reviewer pass against the rebased
PR head. None fix a regression vs main; each strictly improves the new
helpers.

Refresh-SessionPath / Refresh-Environment:
- Move $env:Path to the front of the merge so an activated venv keeps
  precedence over machine/user PATH after a refresh. Pre-PR dropped
  process-only entries entirely; post-PR kept them but at the back.
- Dedup on both raw and expanded forms so %USERPROFILE%\foo and the
  already-expanded C:\Users\me\foo do not both survive.

Add-ToUserPath:
- Trim whitespace and surrounding double-quotes from each compared entry
  so quoted PATH entries like "C:\Program Files\CMake\bin" deduplicate
  against an unquoted directory of the same path.

* Back up User PATH inside Add-ToUserPath, before first mutation

Previously only studio/setup.ps1 took a one-time PATH backup, at script
top (line ~547). install.ps1 (the irm | iex entry point) had no backup,
so users who installed via that path had no recovery surface if anything
clobbered their PATH. The PR description's "one-time backup before any
modifications" promise only held for the studio installer flow.

Move the backup into Add-ToUserPath itself: just before the first actual
SetValue mutation, write the pristine raw PATH to
HKCU\Software\Unsloth\PathBackup if no backup already exists. This:

- Covers both entry points (install.ps1 and studio/setup.ps1).
- Captures the TRUE pristine PATH even when install.ps1 runs first and
  studio/setup.ps1 runs afterwards (the script-top backup in setup.ps1
  would otherwise see an already-modified PATH).
- Is idempotent: once a backup exists, subsequent calls preserve it.
- Skips when nothing would mutate (dedup match) or PATH is empty.

The script-top backup in studio/setup.ps1 is kept for defense in depth.

* Refresh PATH: venv-aware merge order

Reconcile two competing concerns about Refresh-SessionPath /
Refresh-Environment surfaced by separate review rounds:

  - venv at the back -> activated venv loses precedence to system Python
  - process at the front -> stale shims (old node, old python, etc.)
    still on $env:Path can beat a freshly installed tool

New merge order:
  1. Activated venv Scripts dir, only if $env:VIRTUAL_ENV is set
  2. Machine PATH freshly read from registry
  3. User PATH freshly read from registry
  4. Current $env:Path as fallback

This way an explicitly-activated venv keeps priority while a tool the
script just installed wins over any stale entry that was already on
the inherited shell PATH. When no venv is active, fresh registry
entries take precedence as expected.

* Append to User PATH by default, close $envKey in finally

Add-ToUserPath gains a -Position Append|Prepend parameter defaulting to
Append so installing unsloth no longer prepends the bundled venv Scripts
directory ahead of the user's existing python / pip on new shells. The
four current call sites (install.ps1 launcher, studio/setup.ps1 CMake,
nvcc, Python user Scripts) all take the Append default because each one
that needs in-session precedence already does an inline $env:Path prepend
independently. This matches rustup / cargo / nvm / pyenv / uv behavior.

Also wrap the script-top $envKey.GetValue in a try/finally so the
registry handle is released even if the read throws. Matches the pattern
already used for $backupKey five lines below.

* Prepend cmake, nvcc, Python Scripts; keep venv Scripts appended

The previous commit switched Add-ToUserPath to append by default so that
installing unsloth would not silently hijack the user's system python /
pip. That was correct for the venv Scripts dir (which contains python.exe
and pip.exe alongside unsloth.exe), but wrong for the three studio/setup
call sites. Those persist cmake, the driver-compatible nvcc, and the
Python user Scripts dir for future shells, and in all three cases an
older tool already earlier in the user PATH would keep winning after the
install finished. The nvcc case is especially load-bearing: setup selects
a driver-compatible CUDA toolkit, then llama.cpp builds against whatever
wins PATH resolution, so a stale older nvcc produces broken builds.

Pass -Position 'Prepend' explicitly at the three setup.ps1 call sites
(cmake at line 754, nvcc bin at line 1025, Python user Scripts at line
1191). None of those directories holds python.exe, so prepending them
does not re-introduce the original hijack problem. Leave the install.ps1
venv Scripts call on the default Append with a comment explaining why.

* Symmetric dedup, Prepend reorders duplicates, unsloth shim dir

Address three separate findings surfaced by review:

1. Dedup asymmetry (Gemini high-priority): the existing dedup expanded
   registry entries via ExpandEnvironmentVariables but did NOT expand the
   new directory. Passing "%USERPROFILE%\foo" when "C:\Users\me\foo" was
   already in PATH produced a duplicate. Expand both sides so the check
   is symmetric.

2. -Position Prepend no-op on existing duplicates: the dedup loop
   returned $false as soon as it saw a match, regardless of position.
   That left a late-position duplicate in place instead of moving it to
   the front, so "prepend the newly selected cmake/nvcc" did not always
   beat an older copy earlier in PATH. Partition entries into kept and
   dropped lists, then reinsert a single copy at the requested position.
   Append still returns $false on any match so user-curated orderings
   are not reshuffled. Prepend also returns $false when the only copy
   is already at position 0 so we preserve the user's casing.

3. Stop adding the venv Scripts dir to User PATH entirely. That dir
   holds python.exe and pip.exe alongside unsloth.exe, so neither
   Prepend nor Append worked: prepend hijacked the user's system python
   and pip, append made the freshly-installed unsloth.exe lose to any
   older unsloth.exe earlier on PATH. Replace the Scripts-dir PATH add
   with a dedicated shim directory that contains only unsloth.cmd, and
   prepend that dir. The shim calls the venv's unsloth.exe by absolute
   path so future pip upgrades inside the venv propagate automatically.

* Shim via hardlink, Append user Scripts, drop venv sysconfig fallback

Three follow-ups to the c0ab1ab shim commit, targeting concerns raised in
the second 20-reviewer pass:

1. Shim uses unsloth.exe (hardlink, copy fallback) instead of unsloth.cmd.
   The batch-file approach had three distinct regressions:
   - cmd.exe expanded %...% sequences inside user arguments, so prompts
     like "What does 50% mean?" got mangled before reaching the CLI
   - Git Bash / MSYS2 / POSIX-style shells on Windows do not resolve
     bare-name lookups to .cmd files, so `unsloth` stopped working there
   - Set-Content -Encoding ASCII replaced non-ASCII profile characters
     with '?', so installs under C:\Users\Jörg\... wrote a broken shim
   A hardlink (fallback: copy) of unsloth.exe is a native Windows
   executable with no shell indirection. PATHEXT picks .exe before .cmd
   in cmd.exe and PowerShell, Git Bash honors .exe natively, subprocess
   callers hit it directly, and a hardlink stays in sync with the venv
   on pip upgrades because both names point at the same inode.

2. studio/setup.ps1 Python user Scripts dir is added with default Append
   instead of -Position Prepend. That directory holds every pip-installed
   user console script (pip, pytest, huggingface-cli, and so on), not
   just unsloth, so reordering it silently changed resolution order for
   unrelated tools. The new install.ps1 shim at PATH position 0 already
   guarantees `unsloth` resolves to the freshly installed copy, so the
   Python user Scripts entry only needs to be present, not at the front.

3. The sysconfig lookup in studio/setup.ps1 no longer falls back to
   sysconfig.get_path('scripts') when the nt_user scheme dir does not
   exist. When setup.ps1 is invoked from an activated venv (a flow the
   linked issue actually hits) that fallback returns the venv's Scripts
   directory, which would then be added to the persisted User PATH and
   re-introduce the python / pip hijack the shim dir is meant to avoid.
   Stick strictly to the nt_user scheme; skip the block if it does not
   exist on disk.

* Do not crash installer when unsloth.exe shim is locked

The shim update sequence at install.ps1:1095 did a bare Remove-Item /
New-Item HardLink / Copy-Item. Under the script's $ErrorActionPreference
a locked target (most commonly 'unsloth studio' still running while the
user re-invokes the installer) turns the Remove-Item failure into a
terminating error that aborts the install with no actionable message.

The existing shim is perfectly usable in that state, so there is no
reason to abort. Wrap the whole remove/link/copy sequence in a try/catch
that logs the probable cause (Studio still running), points at the fix
(close Studio and re-run), and lets the installer finish with the old
launcher still serving the command.

Also only emit the "added unsloth launcher to PATH" step line when the
launcher was actually (re)created AND the PATH entry was newly added --
previously the message fired even when the shim refresh silently failed,
which was confusing.

* Guard shim PATH entry on existence, use NullString for broadcast delete

Two follow-ups surfaced by the latest review pass:

1. Do not add the shim directory to User PATH when the launcher was not
   actually created. Antivirus blocking unsloth.exe, a disk-full volume,
   or restrictive filesystem permissions can make both the hardlink and
   the copy fallback fail on a fresh install. In that case the existing
   sequence would report "added unsloth launcher to PATH" warnings but
   still prepend the empty $ShimDir to User PATH -- the user sees an
   install that claims success but then cannot resolve `unsloth` in a
   new shell. Gate Add-ToUserPath on Test-Path $ShimExe so the PATH
   entry is only persisted when the launcher is really there.

2. Pass [NullString]::Value instead of $null to the broadcast-delete
   call in Add-ToUserPath. On PowerShell 7.5 and later (running on .NET
   9), a bare $null going into [Environment]::SetEnvironmentVariable
   can be coerced to an empty string rather than a true .NET null,
   which sets the dummy UnslothPathRefresh_XXXXXXXX variable to "" in
   HKCU\Environment instead of deleting it. The leaked variable is
   visible in System Properties and accumulates one entry per install
   run. [NullString]::Value is a PowerShell-specific sentinel that
   crosses the interop boundary as a real null and works on both PS 5.1
   and PS 7.x. See PowerShell/PowerShell#24637 for the underlying issue.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
2026-04-16 04:49:51 -07:00
Roland Tannous
13928b5f0e
Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var (#5024)
* Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var

When set, UNSLOTH_PYTORCH_MIRROR overrides the default
https://download.pytorch.org/whl base URL in all four install scripts
(install.sh, install.ps1, studio/setup.ps1, studio/install_python_stack.py).
When unset or empty, the official URL is used. This lets users behind
corporate proxies or in regions with poor connectivity to pytorch.org
point at a local mirror without patching scripts.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add pytest for UNSLOTH_PYTORCH_MIRROR in install_python_stack.py

Tests that _PYTORCH_WHL_BASE picks up the env var when set, falls back
to the official URL when unset or empty, and preserves the value as-is
(including trailing slashes).

* Remove stale test assertions for missing install.sh messages

* Fix GPU mocking in test_get_torch_index_url.sh

Extract _has_usable_nvidia_gpu and _has_amd_rocm_gpu alongside
get_torch_index_url so the GPU-presence checks work in tests.
Add -L flag handling to mock nvidia-smi so it passes the GPU listing
check. All 26 tests now pass on CPU-only machines.

* Strip trailing slash from UNSLOTH_PYTORCH_MIRROR to avoid double-slash URLs

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-15 11:39:11 +04:00
Roland Tannous
f801e59c29
split venv_t5 into tiered 5.3.0/5.5.0 and fix trust_remote_code (#4878)
* split venv_t5 into venv_t5_530 and venv_t5_550 for tiered transformers 5.x support

* fix bfloat16 crash on T4 for FORCE_FLOAT32 models and disable trust_remote_code auto-enable for native t5 models

* revert FORCE_FLOAT32 dtype change

* restrict trust_remote_code auto-enable to Nemotron models only

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use config.json model_type for tier detection, add unsloth/nvidia namespace guard

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

This reverts commit fb43d468e2.

* Revert "use config.json model_type for tier detection, add unsloth/nvidia namespace guard"

This reverts commit fc49ae2453.

* add unsloth/nvidia namespace guard to Nemotron trust_remote_code auto-enable

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* reorder tier checks: all substring matches before config.json fetches

* extract shared activate_transformers_for_subprocess into transformers_version.py

* narrow Nemotron trust_remote_code to nemotron_h/nemotron-3-nano, add to export worker

* clean venv_t5 dirs before re-install in setup.sh, clarify version alias comment

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* run venv_t5 migration outside deps fast-path gate in both setup scripts

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-07 20:05:01 +04:00
DoubleMathew
ac562bac66
Fix/llama.cppbuilding (#4804)
* Simplify llama.cpp install logic

* print release tag

* Retry failed json decode

* don't pull all ggml releases

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove test file changes from main PR

Test changes for test_pr4562_bugfixes.py will be submitted in a separate PR to keep this PR focused on the install path simplification.

* Fix setup.sh executable bit and direct tag lookup for pinned releases

- Restore setup.sh file mode to 100755 (was accidentally changed to 100644)
- Add direct GitHub API tag lookup in iter_release_payloads_by_time for
  non-latest requested tags (e.g. b7879) instead of relying on paginated
  release scans that may miss older releases beyond the 5-page limit
- Update stale DEFAULT_PUBLISHED_REPO comment to match new value

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix force-compile default ref and remove dead code in setup.ps1

- Change FORCE_COMPILE_DEFAULT_REF from "main" to "master" in all three
  files (install_llama_prebuilt.py, setup.sh, setup.ps1) since
  ggml-org/llama.cpp uses "master" as its default branch, not "main".
  Using "main" would cause git clone --branch to fail when
  UNSLOTH_LLAMA_FORCE_COMPILE=1 with UNSLOTH_LLAMA_TAG=latest.
- Remove dead if ($SkipPrebuiltInstall) block inside the else branch of
  setup.ps1 that could never be reached (the outer elseif already
  handles $SkipPrebuiltInstall=true).
- Maintain setup.sh executable bit (100755).

* Improve iter_release_payloads_by_time error handling for direct tag lookup

When a pinned release tag is not found (HTTP 404), fall through to the
paginated release scan instead of silently returning empty results.
Non-404 errors (network failures, rate limits) are propagated to the
caller so users get actionable error messages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-03 00:34:20 -07:00
Daniel Han
934478ae31
fix(studio): revert llama.cpp default tag to latest (#4797)
* fix(studio): revert llama.cpp default tag to latest

The latest ggml-org/llama.cpp release (b8637) now includes Gemma 4
support. Revert the temporary "b8637" pin from #4796 to "latest" so
the prebuilt resolver always picks the newest release automatically
without needing manual tag bumps.

* docs: add comment explaining latest vs master for llama.cpp tag

Document in all three files why "latest" is preferred over "master"
and when "master" should be used as a temporary override.

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-04-02 11:52:37 -07:00
Daniel Han
8d1712b4ea
fix(studio): pin llama.cpp to b8637 release (Gemma 4 support) (#4796)
ggml-org/llama.cpp b8637 includes Gemma 4 support (ggml-org/llama.cpp#21309).
Revert the temporary "master" default back to a pinned release tag.

This eliminates the HTTP 422 errors from the prebuilt resolver (which
could not find a release matching "master"), avoids unnecessary source
builds, and restores prebuilt binary downloads on all platforms.

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-04-02 11:43:53 -07:00
DoubleMathew
7ae9b7f45f
fix windows llama.cpp compile from source issue (#4793)
* fix windows llama.cpp compile from source issue

* undo local repo usage

* fix llama.cpp install

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix windows

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: route resolve-source-build call through Invoke-LlamaHelper

The --resolve-source-build call at the source-build resolution path
was still calling install_llama_prebuilt.py directly instead of going
through Invoke-LlamaHelper. On PS7+ with ErrorActionPreference=Stop,
stderr from the 422 response (when tag is "master") would trigger a
terminating NativeCommandError and crash setup.

* fix: suppress stderr error records from Invoke-LlamaHelper

ErrorActionPreference=Continue prevents termination but PowerShell
still displays stderr lines as visible ErrorRecord objects. Capture
all output via 2>&1 and split stdout from stderr manually so that
stderr lines never appear on the console. When StderrPath is given
the stderr content is written to that file for diagnostics.

* fix: always rebuild llama.cpp on Windows when tag is master

When the requested llama.cpp tag is "master" (a moving target), skip
the "already built" early exit so the build path runs and syncs to
the latest commit. Without this, existing llama-server binaries from
an older build (e.g. b8635 which lacks Gemma 4 support) are reused
and model loading fails.

Pinned tags (e.g. b8635) still skip the rebuild when the binary
already exists, since the tag is immutable.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-04-02 11:43:46 -07:00
Daniel Han
1ce83c40aa
fix(studio): build llama.cpp from master instead of latest release tag (#4790)
The latest ggml-org/llama.cpp release (b8635) does not include Gemma 4
support (ggml-org/llama.cpp#21309 merged after the release was cut).
This causes `llama-server` to fail with "unknown model architecture:
gemma4" when loading Gemma 4 GGUFs.

Temporarily default _DEFAULT_LLAMA_TAG to "master" so all new installs
build from the llama.cpp master branch which includes Gemma 4 support.
Once a new upstream release is cut with Gemma 4, this can be reverted
back to "latest".

Changes:
- setup.sh: add _DEFAULT_LLAMA_TAG="master" maintainer default
- setup.ps1: add $DefaultLlamaTag="master" maintainer default
- install_llama_prebuilt.py: change DEFAULT_LLAMA_TAG fallback to "master"

Users can still override via UNSLOTH_LLAMA_TAG env var.
2026-04-02 09:45:56 -07:00
Daniel Han
a241c58d84
Use transformers v5.5-release branch and pin to 5.5.0 (#4786)
The v5.5-release branch now exists on huggingface/transformers.
Use transformers==5.5.0 for all install paths and
git+transformers.git@v5.5-release for the MLX installer.

Also bumps huggingface_hub from 1.7.1 to 1.8.0 in setup.sh and
setup.ps1 to stay consistent.
2026-04-02 09:10:02 -07:00
Daniel Han
a353557249
Force llama.cpp to always use mainline ggml-org (#4785)
Hardcode the release repo to ggml-org/llama.cpp and remove the
UNSLOTH_LLAMA_RELEASE_REPO and UNSLOTH_LLAMA_SOURCE env var overrides
so that all users always build/download from mainline llama.cpp.
2026-04-02 09:03:00 -07:00
DoubleMathew
1ce8a8e7cd
Feat/custom llama prebuilt (#4771)
* update logic to incorporate custom prebuilt installs

* bug fixes

* update for review comments

* fix tags

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Separate test changes from main PR

Move test file changes out of this PR to keep the diff focused on
the install_llama_prebuilt.py and setup script changes. Test updates
will be submitted in a follow-up PR.

* Fix branch ref normalization and harden JSON parsing

- Add checkout_friendly_ref() to strip refs/heads/ prefix from branch
  refs before emitting them in SourceBuildPlan. git clone --branch does
  not accept fully qualified refs like refs/heads/main.
- Apply normalization in source_build_plan_for_release() and the
  direct-ref fallback in resolve_source_build_plan().
- Allow validated_checksums_for_bundle() to accept releases that carry
  only an exact-commit source archive without the legacy upstream-tag
  source tarball.
- Add 2>/dev/null || true guards to all inline python -c JSON parsing
  in setup.sh so a malformed payload does not abort the script under
  set -e.

* Fix Windows CUDA asset ordering and tag ref normalization

- Reorder windows_cuda_upstream_asset_names to prefer the main binary
  archive (llama-{tag}-bin-win-cuda-*) over the cudart sidecar archive
  (cudart-llama-bin-win-cuda-*). The cudart ZIP only contains CUDA
  runtime DLLs, not llama-server or llama-quantize binaries.
- Extend checkout_friendly_ref to also strip refs/tags/ prefix for tag
  refs, matching the refs/heads/ handling for branch refs.

* Simplify JSON parsing consistency in setup.sh

Use json.load(sys.stdin) consistently for all inline JSON parsing
in setup.sh, instead of the more complex json.loads(raw) pattern
on the install-tag resolution path. The 2>/dev/null || true guard
already handles empty/malformed input gracefully.

* Fix source build plan fallback for commit ref kind in PR #4771

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <daniel@unsloth.ai>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-02 04:52:26 -07:00
DoubleMathew
428efc7d95
Resolve latest usable published llama.cpp release instead of fixed pinned tag (#4741)
Replaces the fixed prebuilt llama.cpp tag with dynamic published-release
resolution, adds bounded fallback across older published releases, and
introduces maintainer-editable defaults for PR/source overrides.

Changes:
- Resolve latest from the latest usable published release in unslothai/llama.cpp
- Use the selected release upstream_tag as the authoritative llama.cpp version
- Prefer Unsloth-published platform assets when available
- Fall back to same-tag upstream ggml-org/llama.cpp assets where allowed
- Keep Linux CUDA anchored to Unsloth-published CUDA bundles only
- Add bounded fallback across older Unsloth published releases
- Add separate busy/in-use install handling (exit code 3)
- Skip reinstall when the installed bundle already matches the selected candidate
- Add maintainer-editable _DEFAULT_LLAMA_PR_FORCE and _DEFAULT_LLAMA_SOURCE
- Harden env parsing so malformed installer env vars do not crash import-time fallback logic
- Honor UNSLOTH_LLAMA_RELEASE_TAG in all resolve steps
- Always sync git remote URL in existing-checkout path
2026-04-01 06:06:17 -07:00
Lee Jackson
2cac3e8e4d
studio: Polish Windows installer/setup logs (#4736)
* style(windows): clean installer/setup log output and remove seeded credential banner

* Keep startup credential hint without exposing plaintext password

Print the username and .bootstrap_password file path on first-run
admin creation instead of the raw password. Headless / Docker / SSH
operators still get a startup-time hint for initial sign-in, and the
plaintext credential no longer appears in terminal output or logs.

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-03-31 23:12:42 -07:00
Etherll
34272a796f
Fix/bun windows bin detection (#4703)
* fix(studio): detect bun .exe shims in Windows binary check

* Update setup.sh

* add .bunx checking
2026-03-30 21:58:33 +04:00
Daniel Han
6d83ad9a28
fix(studio): avoid UnicodeEncodeError on Windows cp1252 consoles (#4699)
* fix(studio): replace unicode emoji in print() to avoid cp1252 crash on Windows

On Windows the default console encoding is cp1252 which cannot encode
unicode emoji like U+2705 or U+26A0. bare print() calls with these
characters cause a UnicodeEncodeError at runtime.

- run.py: replace emoji with ASCII status prefixes [OK] and [WARNING]
- format_conversion.py: remove duplicate print() that mirrors the
  logger.info() call on the next line, and drop the emoji from the
  log message since loggers handle encoding separately

* fix(studio): apply same emoji/print cleanup to parallel VLM conversion path

The parallel URL-based conversion logic has the same duplicate print()
with emoji that was fixed in the sequential path. Remove the bare
print() and drop the emoji from the logger.info() call.

* Treat install_python_stack.py failure as fatal in setup.ps1

On Linux/Mac, setup.sh runs under set -euo pipefail so a non-zero
exit from install_python_stack.py aborts the installer. On Windows,
setup.ps1 had no exit code check -- if the Python script crashed
(eg from the cp1252 UnicodeEncodeError), the installer silently
continued past the dependency loop and reported success. Studio
would then fail at launch with ModuleNotFoundError for structlog,
fastapi, and other deps that were never installed.

Capture $LASTEXITCODE and exit 1 if the dependency installer fails,
matching the error handling pattern already used for PyTorch install.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 06:40:47 -07:00
Lee Jackson
5557e1fd27
studio: unify Windows installer/setup logging style, verbosity controls, and startup messaging (#4651)
* refactor(studio): unify setup terminal output style and add verbose setup mode

* studio(windows): align setup.ps1 banner/steps with setup.sh (ANSI, verbose)

* studio(setup): revert nvcc path reordering to match main

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio(setup): restore fail-fast llama.cpp setup flow

* studio(banner): use IPv6 loopback URL when binding :: or ::1

* Fix IPv6 URL bracketing, try_quiet stderr, _step label clamp

- Bracket IPv6 display_host in external_url to produce clickable URLs
- Redirect try_quiet failure log to stderr instead of stdout
- Clamp _step label to column width to prevent negative padding

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add sandbox integration tests for PR #4494 UX fixes

Simulation harness (tests/simulate_pr4494.py) creates an isolated uv
venv, copies the real source files into it, and runs subprocess tests
for all three fixes with visual before/after demos and edge cases.

Standalone bash test (tests/test_try_quiet.sh) validates try_quiet
stderr redirect across 8 scenarios including broken-version contrast.

39 integration tests total (14 IPv6 + 15 try_quiet + 10 _step), all
existing 75 unit tests still pass.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Truncate step() labels in setup.sh to match PS1 and Python

The %-15s printf format pads short labels but does not truncate long
ones.  Change to %-15.15s so labels wider than 15 chars are clipped,
matching the PowerShell .Substring(0,15) and Python label[:15] logic.

* Remove sandbox integration tests from PR

These test files are not part of the styling fix and should not
ship with this PR.

* Show error output on failure instead of suppressing it

- install_python_stack.py: restore _red for patch_package_file
  warnings (was downgraded to _dim)
- setup.ps1: capture winget output and show on failure for CUDA,
  Node, Python, and OpenSSL installs (was piped to Out-Null)
- setup.ps1: always show git pull failure warning, not just in
  verbose mode

* Show winget error output for Git and CMake installs on failure

Same capture-and-print-on-failure pattern already used for
Node, Python, CUDA, and OpenSSL winget installs.

* fix: preserve stderr for _run_quiet error messages in setup.sh

The step() helper writes to stdout, but _run_quiet's error header
was originally sent to stderr (>&2). Without the redirect, callers
that separate stdout/stderr would miss the failure headline while
still seeing the log body on stderr. Add >&2 to both step calls
inside _run_quiet to match main's behavior.

* feat: add --verbose flag to setup and update commands

Wire UNSLOTH_VERBOSE=1 through _run_setup_script() so that
'unsloth studio update --verbose' (and the deprecated 'setup')
passes the flag to setup.sh / setup.ps1 / install_python_stack.py.

* fix(studio): honor verbose logging and keep llama.cpp failures non-blocking

* fix(studio): switch installer to 'studio update' and normalize Windows setup logs

* chore(studio): refine localhost tip and remove skip-base setup nois

* fix(studio): align Windows setup logs with Linux style and improve startup tips

* fix(studio): align Windows setup logs with Linux style

* refactor(windows-installer): align install/setup logs with Linux style and silence auto-launch output

* refactor(windows): align installer/setup output with Linux style and reduce default verbosity

* refactor(windows): match install.ps1 output style/colors to setup and quiet default logs

* fix(studio-banner): update personal-computer localhost tip

* fix(setup.sh): restore verbose llama.cpp build output while keeping default quiet mode

* fix(install.sh): align installer logging with setup style and restore POSIX-safe color output

* fix(install.sh): preserve installer reliability and launch visibility

Export verbose mode for child setup processes, harden install command handling under set -e, and keep first-run studio launch non-silent so users can always see URL and port fallback output.

* fix(windows installer): keep exit semantics and degrade status accurate

Use quiet command redirection that preserves native exit codes, keep startup output visible on first launch, and report limited install status when llama.cpp is unavailable.

* fix(setup.sh): improve log clarity and enforce GGUF degraded signaling

Restore clean default setup output, add verbose-only diagnostics, fail fast on Colab dependency install errors, and return non-zero when GGUF prerequisites or llama.cpp artifacts are unavailable.

* fix(installer): harden bash preflight and PowerShell GPU checks

Fail fast when bash is unavailable before invoking setup.sh, and replace remaining nvidia-smi pipeline checks with stream redirection patterns that preserve reliable native exit-code handling.

* fix(windows): keep verbose output visible while preserving exit codes

Ensure PowerShell wrapper helpers in install/update stream native command output to host without returning it as function output, so npm logs no longer corrupt exit-code checks in verbose mode.

* fix(windows): avoid sticky UNSLOTH_VERBOSE and gate studio update verbosity

* Fix degraded llama.cpp exit code, PS verbose stderr, banner URLs, npm verbose

- setup.sh: Do not exit non-zero when llama.cpp is unavailable; the footer
  already reports the limitation, and install.sh runs under set -e so a
  non-zero exit aborts the entire install including PATH/shortcuts/launch.
- setup.ps1: Remove $? check in Invoke-SetupCommand verbose path; PS 5.1
  sets $? = $false when native commands write to stderr even with exit 0.
  Merge stderr into stdout with 2>&1 and rely solely on $LASTEXITCODE.
- startup_banner.py: Show the actual bound address when Studio is bound to
  a non-loopback interface instead of always showing 127.0.0.1/localhost.
- setup.sh: Use run_quiet_no_exit instead of run_quiet_no_exit_always for
  npm install steps so --verbose correctly surfaces npm output.

* Fix install.ps1 verbose stderr, propagate UNSLOTH_VERBOSE, fix git clone verbose

- install.ps1: Apply same Invoke-InstallCommand fix as setup.ps1 -- merge
  stderr into stdout with 2>&1 and drop the $? check that misclassifies
  successful native commands on PS 5.1.
- install.ps1 + setup.ps1: Export UNSLOTH_VERBOSE=1 to the process env
  when --verbose is passed so child processes like install_python_stack.py
  also run in verbose mode.
- setup.sh: Use run_quiet_no_exit for git clone llama.cpp so --verbose
  correctly surfaces clone diagnostics during source-build fallback.

* Surface prebuilt llama.cpp output in verbose mode, remove dead code, fix banner

- setup.sh: Use tee in verbose mode for prebuilt llama.cpp installer so
  users can see download/validation progress while still capturing the log
  for structured error reporting on failure.
- setup.ps1: Same fix for Windows -- use Tee-Object in verbose mode.
- setup.sh: Remove run_quiet_no_exit_always() which has no remaining callers.
- startup_banner.py: Avoid printing the same URL twice when Studio is
  bound to a specific non-loopback address that matches the display host.

* Fix run_install_cmd exit code after failed if-statement

The previous pattern 'if "$@"; then return 0; fi; _rc=$?' always captured
$? = 0 because $? reflects the if-statement result, not the command's exit
code. Switch to '"$@" && return 0; _rc=$?' which preserves the actual
command exit code on failure. Applies to both verbose and quiet branches.

* Fix _run_quiet exit code, double uv install, missing --local flag

- setup.sh: Fix _run_quiet verbose path that always captured exit code 0
  due to $? resetting after if-then-fi with no else. Switch to the same
  '"$@" && return 0; exit_code=$?' pattern used in install.sh.
- setup.sh: Consolidate the two uv install branches (verbose + quiet)
  into a single attempt with conditional output. Previously, when verbose
  mode was on and the install failed, a second silent attempt was made.
- install.ps1: Pass --local flag to 'unsloth studio update' when
  $StudioLocalInstall is true. Without this, studio.py's update() command
  overwrites STUDIO_LOCAL_INSTALL to "0", which could cause issues if
  setup.ps1 or install_python_stack.py later checks that variable.

* Revert SKIP_STUDIO_BASE change for --no-torch, restore install banners

- Revert SKIP_STUDIO_BASE from 0 to 1 for --no-torch. install.sh already
  installs unsloth+unsloth-zoo and no-torch-runtime.txt before calling
  setup.sh, so letting install_python_stack.py redo it was redundant and
  slowed down --no-torch installs for no benefit.
- Restore the "Unsloth Studio installed!" success banner and "starting
  Unsloth Studio..." launch message so users get clear install completion
  feedback before the server starts.

* Make llama.cpp build failure a hard error with proper cleanup

- setup.sh: Restore exit 1 when _LLAMA_CPP_DEGRADED is true. GGUF
  inference requires a working llama.cpp build, so this should be a
  hard failure, not a silent degradation.
- install.sh: Catch setup.sh's non-zero exit with '|| _SETUP_EXIT=$?'
  instead of letting set -e abort immediately. This ensures PATH setup,
  symlinks, and shortcuts still get created so the user can fix the
  build deps and retry with 'unsloth studio update'. After post-install
  steps, propagate the failure with a clear error message.

* Revert install.ps1 to 'studio setup' to preserve SKIP_STUDIO_BASE

'studio update' pops SKIP_STUDIO_BASE from the environment, which
defeats the fast-path version check added in PR #4667. When called
from install.ps1 (which already installed packages), SKIP_STUDIO_BASE=1
must survive into setup.ps1 so it skips the redundant PyPI check and
package reinstallation. 'studio setup' does not modify env vars.

* Remove deprecation message from 'studio setup' command

install.ps1 uses 'studio setup' (not 'studio update') to preserve
SKIP_STUDIO_BASE. The deprecation message was confusing during first
install since the user never typed the command.

* Fix stale env vars, scope degraded exit, generic error message for PR #4651

- install.ps1: Always set STUDIO_LOCAL_INSTALL and clear STUDIO_LOCAL_REPO
  when not using --local, to prevent stale values from a previous --local
  run in the same PowerShell session. Fix log messages to say 'setup' not
  'update' since we call 'studio setup'.
- setup.sh: Only exit non-zero for degraded llama.cpp when called from the
  installer (SKIP_STUDIO_BASE=1). Direct 'unsloth studio update' keeps
  degraded installs successful since Studio is still usable for non-GGUF
  workflows and the footer already reports the limitation.
- install.sh: Make the setup failure error message generic instead of
  GGUF-specific, so unrelated failures (npm, Python deps) do not show
  misleading cmake/git recovery advice.

* Show captured output on failure in quiet mode for PR #4651

Both Invoke-InstallCommand (install.ps1) and Invoke-SetupCommand
(setup.ps1) now capture command output in quiet mode and display it
in red when the command fails. This matches the behavior of
run_install_cmd in install.sh where failure output is surfaced even
in quiet mode, making cross-platform error debugging consistent.

* Match degraded llama.cpp exit on Windows, fix --local recovery hint for PR #4651

- setup.ps1: Exit non-zero for degraded llama.cpp when called from
  install.ps1 (SKIP_STUDIO_BASE=1), matching setup.sh behavior. Direct
  'unsloth studio update' keeps degraded installs successful.
- install.sh: Show 'unsloth studio update --local' in the recovery
  message when the install was run with --local, so users retry with
  the correct flag instead of losing local checkout context.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-30 00:53:23 -07:00
Roland Tannous
5bbfabb151
fix: [Studio] setup.ps1 update-flow for windows (#4667)
* fix: add PyPI version check to setup.ps1 for fast update path

Port the update-flow logic from setup.sh to setup.ps1 so that
`unsloth studio update` on Windows skips Python dependency reinstall
when the installed version already matches PyPI latest.

* fix: clear SKIP_STUDIO_BASE in update command

install.ps1 sets SKIP_STUDIO_BASE=1 which persists in the PowerShell
session. If the user runs `unsloth studio update` in the same terminal,
the env var causes the version check to be skipped. Clear it explicitly
in the update command.

* fix: harden version check and clear stale env vars in update flow

- Normalize $InstalledVer with Out-String + Trim() to avoid array/whitespace
  comparison issues in PowerShell 5.1 (python output can be captured as
  string[] instead of scalar string)
- Move Fast-Install --upgrade pip inside if (-not $SkipPythonDeps) so the
  fast path avoids unnecessary network round-trips
- Clear STUDIO_LOCAL_REPO when --local is not passed to prevent a previous
  --local session from leaking into a plain update

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-29 21:14:36 -07:00
Lee Jackson
0233fe7f9c
studio: setup log styling (#4494)
* refactor(studio): unify setup terminal output style and add verbose setup mode

* studio(windows): align setup.ps1 banner/steps with setup.sh (ANSI, verbose)

* studio(setup): revert nvcc path reordering to match main

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio(setup): restore fail-fast llama.cpp setup flow

* studio(banner): use IPv6 loopback URL when binding :: or ::1

* Fix IPv6 URL bracketing, try_quiet stderr, _step label clamp

- Bracket IPv6 display_host in external_url to produce clickable URLs
- Redirect try_quiet failure log to stderr instead of stdout
- Clamp _step label to column width to prevent negative padding

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add sandbox integration tests for PR #4494 UX fixes

Simulation harness (tests/simulate_pr4494.py) creates an isolated uv
venv, copies the real source files into it, and runs subprocess tests
for all three fixes with visual before/after demos and edge cases.

Standalone bash test (tests/test_try_quiet.sh) validates try_quiet
stderr redirect across 8 scenarios including broken-version contrast.

39 integration tests total (14 IPv6 + 15 try_quiet + 10 _step), all
existing 75 unit tests still pass.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Truncate step() labels in setup.sh to match PS1 and Python

The %-15s printf format pads short labels but does not truncate long
ones.  Change to %-15.15s so labels wider than 15 chars are clipped,
matching the PowerShell .Substring(0,15) and Python label[:15] logic.

* Remove sandbox integration tests from PR

These test files are not part of the styling fix and should not
ship with this PR.

* Show error output on failure instead of suppressing it

- install_python_stack.py: restore _red for patch_package_file
  warnings (was downgraded to _dim)
- setup.ps1: capture winget output and show on failure for CUDA,
  Node, Python, and OpenSSL installs (was piped to Out-Null)
- setup.ps1: always show git pull failure warning, not just in
  verbose mode

* Show winget error output for Git and CMake installs on failure

Same capture-and-print-on-failure pattern already used for
Node, Python, CUDA, and OpenSSL winget installs.

* fix: preserve stderr for _run_quiet error messages in setup.sh

The step() helper writes to stdout, but _run_quiet's error header
was originally sent to stderr (>&2). Without the redirect, callers
that separate stdout/stderr would miss the failure headline while
still seeing the log body on stderr. Add >&2 to both step calls
inside _run_quiet to match main's behavior.

* feat: add --verbose flag to setup and update commands

Wire UNSLOTH_VERBOSE=1 through _run_setup_script() so that
'unsloth studio update --verbose' (and the deprecated 'setup')
passes the flag to setup.sh / setup.ps1 / install_python_stack.py.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-27 03:12:48 -07:00
Daniel Han
23eb7fc0a7
Fix Colab Studio launch and setup.ps1 box alignment (#4601)
* Fix Colab Studio launch and setup.ps1 box alignment

- colab.py: when the Studio venv is missing on Colab, pip-install
  backend dependencies (structlog, fastapi, etc.) from studio.txt
  into the current Python instead of failing with ModuleNotFoundError
- setup.sh: on Colab without a venv, install backend deps into system
  Python and skip venv-dependent sections (Python stack update,
  llama.cpp build) that would otherwise fail
- setup.ps1: use PadRight(47) for the done-line so "Setup Complete!"
  and "Update Complete!" both align with the box border

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-25 09:00:08 -07:00
Daniel Han
366fb048d4
fix(studio): add bun cache validation to Windows setup.ps1 (#4596)
Port the bun cache corruption fix from setup.sh to setup.ps1.

bun's package cache can become corrupt, storing only package metadata
without actual content. This causes bun install to exit 0 but leave
binaries like tsc missing from node_modules/.bin/.

Changes:
- After bun install, verify tsc and vite exist in node_modules\.bin\
- Check for both bare names and .cmd wrappers (Windows creates both)
- If missing, clear the bun cache and retry once
- Only fall back to npm if the retry also fails
2026-03-25 07:27:08 -07:00
Daniel Han
3efea63e2f
fix(studio): source-build fallback prefers Unsloth's tested tag over upstream latest (#4593)
* fix(studio): source-build fallback prefers Unsloth's tested tag over upstream latest

When the prebuilt install fails and falls back to source build,
--resolve-llama-tag now queries the Unsloth release repo
(unslothai/llama.cpp) first to get the latest tested/approved tag
(e.g. b8508), instead of going straight to ggml-org/llama.cpp which
may return a newer untested tag (e.g. b8514).

This ensures the source-build fallback compiles the same version that
the prebuilt path would have installed, rather than a potentially
incompatible bleeding-edge release.

Resolution order for "latest":
  1. Unsloth release repo (tested/approved)
  2. ggml-org upstream (bleeding-edge)
  3. Raw requested tag string (last resort)

Changes:
- resolve_requested_llama_tag() accepts optional published_repo param
  with docstring explaining the resolution order
- CLI --resolve-llama-tag passes --published-repo through
- setup.sh and setup.ps1 pass --published-repo to --resolve-llama-tag
  with inline comments explaining the preference

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-25 07:25:47 -07:00
DoubleMathew
f4d8a246bf
Use prebuilt llama.cpp for unsloth studio setup (#4562)
* Use prebuilt llama.cpp for unsloth studio setup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix 3 issues that cause unnecessary fallback to source build

1. Make filelock import optional -- environments without filelock
   (e.g. minimal installs) crashed at import time instead of
   gracefully skipping the lock.

2. Use already-verified converter script from the hydrated source
   tree instead of re-downloading from raw.githubusercontent.com
   with no checksum. Adds symlink with copy fallback for the
   legacy filename.

3. Initialize $SkipPrebuiltInstall in setup.ps1 before first use
   to prevent potential uninitialized variable errors.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Keep network fallback in ensure_converter_scripts

Prefer the local verified copy from the hydrated source tree, but
retain the original network download as a fallback if the file is
missing. Create the legacy hyphenated filename as a symlink with a
copy fallback instead of writing a second full copy.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix 4 bugs in source-build fallback and binary_env paths

- setup.ps1: Replace git pull + checkout FETCH_HEAD with fetch + checkout -B
  to avoid detached HEAD state that breaks re-runs. Use pinned tag in both
  fetch and clone paths.
- setup.sh: Move rm -rf after cmake/git prerequisite checks so a missing
  tool no longer deletes the existing install. Add --branch tag to clone.
- install_llama_prebuilt.py: Add binary_path.parent to Linux LD_LIBRARY_PATH
  in binary_env() so bundled .so files in build/bin are found even without
  RPATH, matching the existing Windows PATH logic.
- Add test for binary_env LD_LIBRARY_PATH on Linux.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Handle unresolved "latest" tag in source-build fallback clone

When tag resolution fails and the requested tag is "latest", both
setup scripts now omit --branch from git clone so the default branch
is cloned instead of failing on a nonexistent "latest" branch/tag.
Similarly, the PS1 fetch path fetches the default ref when the tag
is "latest".

* Resolve actual latest ggml-org tag instead of using literal "latest"

When both Python tag resolution attempts fail and the requested tag
is "latest", query the GitHub API for the actual latest release tag
from ggml-org/llama.cpp (e.g. b8508) instead of passing the literal
string "latest" to git clone --branch, which would fail since no
such branch/tag exists.

setup.sh uses curl + python json parsing; setup.ps1 uses
Invoke-RestMethod. Both fall back to the raw requested tag if the
API call also fails.

* Try Unsloth release repo before ggml-org when resolving latest tag

When falling back to the GitHub API to resolve "latest", query the
Unsloth release repo (unslothai/llama.cpp) first since it has the
prebuilt binaries pinned to tested tags. Only fall back to
ggml-org/llama.cpp if the Unsloth repo query fails.

* Add comprehensive sandbox tests for PR #4562 bug fixes

35 tests covering all fixes across platforms:
- binary_env cross-platform (Linux LD_LIBRARY_PATH, Windows PATH,
  macOS DYLD_LIBRARY_PATH) with edge cases (dedup, ordering, existing paths)
- resolve_requested_llama_tag (concrete, latest, None, empty)
- setup.sh logic via subprocess: prereq check ordering (cmake/git missing
  preserves install), pinned tag in clone, fetch+checkout -B pattern,
  fetch failure warns instead of aborting
- "latest" tag resolution fallback chain (Unsloth API -> ggml-org ->
  raw) with mock curl: success, failure, malformed JSON, empty body,
  empty tag_name, env overrides
- Source code pattern verification for both .sh and .ps1 files

All 138 tests pass in isolated uv venv.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add binary_path.parent to macOS DYLD_LIBRARY_PATH in binary_env

macOS prebuilt .dylib files are overlaid into build/bin (same as
Linux), but binary_env only added install_dir to DYLD_LIBRARY_PATH.
Add binary_path.parent so the loader can find sibling dylibs even
without embedded loader paths.

Mirrors the existing fix for Linux LD_LIBRARY_PATH and the Windows
PATH pattern.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Guard --branch when resolved tag is "latest"; fix broken test assertion

When all API fallbacks fail and the tag stays as literal "latest",
omit --branch from git clone (clones default branch instead of
failing). Both setup.sh and setup.ps1 now check for "latest" before
passing --branch to git clone/fetch.

Also fix test_setup_ps1_clone_uses_branch_tag which used Python
tuple syntax (assert "x", "y" in z) that always passes. Changed to
assert "x" in z and "y" in z.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix macOS DYLD trailing colon, install_lock no-op, and debug log

- binary_env macOS: use dedupe_existing_dirs instead of raw string
  concatenation. Eliminates trailing colon in DYLD_LIBRARY_PATH
  (which causes dyld to search CWD for libraries) and deduplicates
  when binary_path.parent == install_dir. Now consistent with the
  Linux and Windows branches.
- install_lock: when filelock is not installed, use os.O_CREAT|O_EXCL
  as a fallback exclusive file lock with timeout, instead of yielding
  with no locking. Prevents concurrent installs from corrupting each
  other's staging directories.
- setup.ps1: remove [DEBUG] log line that printed to every user on
  every Windows setup run.

* Add stale-lock detection and atomic clone-then-swap

install_lock fallback (no filelock): write PID to lock file and
check if the holder process is still alive on contention. Dead PIDs
(ProcessLookupError) and unreadable lock files trigger immediate
cleanup. Live processes owned by other users (PermissionError) are
correctly recognized as alive -- the lock is not removed.

setup.sh/setup.ps1 source-build: clone into a temporary directory
first, then swap into place only on success. If git clone fails,
the existing install is preserved instead of being deleted by the
premature rm -rf.

* Remove redundant upstream_tag != release_tag check

load_approved_release_checksums compared checksums.upstream_tag
against the Unsloth release_tag, which are different namespaces
(upstream ggml-org tag vs Unsloth published tag). This only worked
because both happened to be "b8508" by convention. Would break if
Unsloth ever uses a different release naming scheme.

The existing check at parse_approved_release_checksums (line 950)
already validates the release_tag field correctly.

* Fix lock TOCTOU race and build-in-temp-dir swap

install_lock fallback: add os.fsync(fd) after writing PID to ensure
the PID is visible to racing processes before they check. Treat
empty lock files (PID not yet written) as "wait and retry" instead
of stale, closing the window where two processes could both see an
empty file, both unlink it, and both acquire the lock.

setup.sh/setup.ps1 source-build: clone AND build in a temp directory
(LLAMA_CPP_DIR.build.$$). Only swap into the final LLAMA_CPP_DIR
after the build succeeds. If clone or cmake or build fails, the temp
dir is cleaned up and the existing working install is preserved.
Previously, rm -rf ran after clone but before build, destroying the
existing install even if the build later failed.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-25 05:42:43 -07:00
Roland Tannous
19e9c60a8e
Consolidate dual venvs and separate install from update (#4530)
* refactor: consolidate dual venvs into single ~/.unsloth/studio/unsloth_studio

* refactor: separate install.sh (first-time) from setup.sh (smart update with PyPI version check)

* fix: install.sh calls setup.sh directly, keep both setup and update CLI commands

* fix: use importlib.resources.files() directly without _path attribute

* fix: bootstrap uv before pip upgrade to handle uv venvs without pip

* fix: frontend 404 when launched via CLI, add global symlink to ~/.local/bin

* feat: add --local flag to install.sh and unsloth studio update for branch testing

* fix: resolve repo root from script location for --local installs

* feat: add --package flag to install.sh for testing with custom package names

* feat: add --package flag to unsloth studio update

* fix: always nuke venv in install.sh for clean installs

* revert: remove Windows changes, will handle in separate PR

* fix: error when --package is passed without an argument

* revert: restore Windows scripts to current main

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: always explicitly set STUDIO_LOCAL_INSTALL and STUDIO_PACKAGE_NAME env vars

* fix: pass explicit STUDIO_LOCAL_REPO env var for --local installs

* fix: align banner box for Setup vs Update labels

* deprecate: hide 'unsloth studio setup' command, point users to update/install.sh

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: check stdout not stdin for auto-launch detection (curl pipe fix)

* fix: update install URL to unsloth.ai/install.sh

* fix: update install.sh usage comments to unsloth.ai/install.sh

* fix: use --upgrade-package for base deps to preserve existing torch/CUDA installs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: --local install now also installs unsloth-zoo via base.txt before editable overlay

* fix: don't skip base packages for --local installs (editable needs unsloth-zoo)

* refactor: move --local full dep install to install.sh, keep SKIP_STUDIO_BASE for all paths

* feat: add migration support for old .venv and CWD-based installs in setup.sh

* Revert "feat: add migration support for old .venv and CWD-based installs in setup.sh"

This reverts commit 301291d002.

* feat: migrate old .venv layout in install.sh instead of always nuking

* feat: validate old .venv with torch CUDA test before migration, recovery message on launch failure

* fix: try CUDA then fall back to CPU for migration validation

* fix: upgrade unsloth/unsloth-zoo with --reinstall-package on migration to preserve torch

* remove: delete unused unsloth ui command (use unsloth studio instead)

* Fix Windows venv path mismatch between install.ps1, setup.ps1, and studio.py

install.ps1 was creating the venv CWD-relative ($VenvName = "unsloth_studio"),
setup.ps1 was using an absolute path to ".unsloth\studio\.venv", and studio.py
looks for ".unsloth\studio\unsloth_studio". All three paths were different, so
the Windows installer would never produce a working Studio setup.

install.ps1:
- Use absolute $StudioHome + $VenvDir matching the Linux install.sh layout
- Add 3-way migration: old .venv at STUDIO_HOME, CWD-relative ~/unsloth_studio
  from the previous install.ps1, or fresh creation with torch validation
- For migrated envs, upgrade unsloth while preserving existing torch/CUDA wheels
- Set SKIP_STUDIO_BASE=1 before calling setup.ps1 (matches install.sh behavior)
- Fix launch instructions to use the absolute venv path

setup.ps1:
- Change $VenvDir from ".unsloth\studio\.venv" to ".unsloth\studio\unsloth_studio"
- Add SKIP_STUDIO_BASE guard: error out if venv is missing when called from
  install.ps1 (which should have already created it)
- Differentiate "Setup" vs "Update" in banners based on SKIP_STUDIO_BASE

* setup.ps1: unconditionally error if venv missing, matching setup.sh

setup.sh always errors out if the venv does not exist (line 224-228),
telling the user to run install.sh first. setup.ps1 was conditionally
creating a bare venv with python -m venv when SKIP_STUDIO_BASE was not
set, which would produce an empty venv with no torch or unsloth. Now
setup.ps1 matches setup.sh: always error, always point to install.ps1.

* Fix --torch-backend=auto CPU solver dead-end on Linux, macOS, and Windows

On CPU-only machines, `uv pip install unsloth --torch-backend=auto`
falls back to unsloth==2024.8 because the CPU solver cannot satisfy
newer unsloth's dependencies. install.ps1 already solved this with a
two-step approach; this applies the same fix to install.sh and
install_python_stack.py.

install.sh: add get_torch_index_url() that detects GPU via nvidia-smi
and maps CUDA versions to PyTorch index URLs (matching install.ps1's
Get-TorchIndexUrl). Fresh installs now install torch first via explicit
--index-url, then install unsloth with --upgrade-package to preserve
the pre-installed torch. All 5 --torch-backend=auto removed from
primary paths.

install.ps1: add fallback else-branch when TorchIndexUrl is empty,
using --torch-backend=auto as last resort (matching install.sh).

install_python_stack.py: remove unconditional --torch-backend=auto
from _build_uv_cmd. Torch is pre-installed by install.sh/setup.ps1
by the time this runs. Callers that need it can set UV_TORCH_BACKEND.

Both install.sh and install.ps1 now share the same three-branch logic:
migrated env (upgrade-package only), normal (torch-first + index-url),
and fallback (--torch-backend=auto if URL detection fails).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use --reinstall-package for migrated envs on both Linux and Windows

For migrated environments (moved from legacy venv location),
--reinstall-package is better than --upgrade-package because it forces
a clean reinstall even if the same version is already installed. This
ensures proper .dist-info and .pyc state in the new venv location.

--upgrade-package remains correct for the fresh install path where
torch is already installed and we just want to add unsloth without
re-resolving torch.

* Address review findings: portability, parity, and stale comments

- Replace grep -oP (GNU Perl regex) with POSIX sed in
  get_torch_index_url() so the script works on BSD grep (macOS is
  already guarded by the Darwin early-return, but Alpine/BusyBox
  would silently get the wrong CUDA tag)
- Add LC_ALL=C before nvidia-smi invocation to prevent locale-dependent
  output parsing issues
- Add warning on stderr when nvidia-smi output is unparseable, matching
  install.ps1's [WARN] message
- Add explicit unsloth-zoo positional arg to install.ps1 migrated path,
  matching install.sh (--reinstall-package alone won't install it if it
  was never present in the migrated env)
- Fix stale comment in install_python_stack.py line 392 that still
  claimed --torch-backend=auto is added by _build_uv_cmd
- Add sed to test tools directory (function now uses sed instead of grep)

* Add --index-url to migrated env path to prevent CPU torch resolution

The migrated path runs uv pip install with --reinstall-package for
unsloth/unsloth-zoo. While uv should keep existing torch as satisfied,
the resolver could still re-resolve torch as a transitive dependency.
Without --index-url pointing at the correct CUDA wheel index, the
resolver would fall back to plain PyPI and potentially pull CPU-only
torch. Adding --index-url $TORCH_INDEX_URL ensures CUDA wheels are
available if the resolver needs them.

Applied to both install.sh and install.ps1.

* Revert --index-url on migrated env path

The original install.ps1 on main already handles the migrated path
without --index-url and it works correctly. --reinstall-package only
forces reinstall of the named packages while uv keeps existing torch
as satisfied. No need for the extra flag.

* Fix unsloth studio update --local not installing local checkout

studio.py sets STUDIO_LOCAL_REPO when --local is passed, but
install_python_stack.py never read it. The update path always
installed from PyPI regardless of the --local flag.

Add a local_repo branch that first updates deps from base.txt
(with --upgrade-package to preserve torch), then overlays the
local checkout as an editable install with --no-deps.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-25 05:24:21 -07:00
Etherll
d69d60ff19
perf(studio): upgrade to Vite 8 + auto-install bun for faster frontend builds (#4522)
* perf(studio): upgrade to Vite 8 + auto-install bun for 3x faster frontend builds

* fix(studio): make bun-to-npm fallback actually reachable

setup.sh used run_quiet() for the bun install attempt, but run_quiet
calls exit on failure. This killed the script before the npm fallback
could run, making the "falling back to npm" branch dead code.

Replace the run_quiet call with a direct bun invocation that captures
output to a temp file (same pattern, but returns instead of exiting).

Also clean up partial node_modules left by a failed bun install before
falling back to npm, in both setup.sh and build.sh. Without this, npm
inherits a corrupted node_modules tree from the failed bun run.

* fix(studio): restore commonjsOptions for dagre CJS interop

The previous commit removed build.commonjsOptions, assuming Vite 8's
Rolldown handles CJS natively. While optimizeDeps.include covers the
dev server (pre-bundling), it does NOT apply to production builds.

The resolve.alias still points @dagrejs/dagre to its .cjs.js entry,
so without commonjsOptions the production bundle fails to resolve
the CJS default export. This causes "TypeError: e is not a function"
on /chat after build (while dev mode works fine).

Restore the original commonjsOptions block to fix production builds.

* fix(studio): use motion/react instead of legacy framer-motion import

* fix(studio): address PR review findings for Vite 8 + bun upgrade

Fixes:
  - Remove bun.lock from repo and add to .gitignore (npm is source of truth)
  - Use & bun install *> $null pattern in setup.ps1 for reliable $LASTEXITCODE
  - Add Remove-Item node_modules before npm fallback in setup.ps1
  - Print bun install failure log in setup.sh before discarding
  - Add Refresh-Environment after npm install -g bun in setup.ps1
  - Tighten Node version check to ^20.19.0 || >=22.12.0 (Vite 8 requirement)
  - Add engines field to package.json
  - Use string comparison for _install_ok in build.sh
  - Remove explicit framer-motion ^11.18.2 from package.json (motion pulls
    framer-motion ^12.38.0 as its own dependency — the old pin caused a
    version conflict)

* Fix Colab Node bypass and bun.lock stale-build trigger

Gate the Colab Node shortcut on NODE_OK=true so Colab
environments with a Node version too old for Vite 8 fall
through to the nvm install path instead of silently proceeding.

Exclude bun.lock from the stale-build probe in both setup.sh
and setup.ps1 so it does not force unnecessary frontend rebuilds
on every run.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Shine1i <wasimysdev@gmail.com>
2026-03-25 04:27:41 -07:00
Daniel Han
797ddd201e
Fix Studio silently exiting on Windows without error output (#4527)
* Fix Studio silently exiting on Windows without error output

On Windows, `unsloth studio` launches a child process via
subprocess.Popen to run the server in the studio venv. If the child
crashes (e.g. due to a missing package), the parent just calls
typer.Exit(rc) with no message -- the user sees "Launching Unsloth
Studio... Please wait..." and then the prompt returns with zero
feedback.

Root cause: `data_designer_unstructured_seed` is imported at the top
level in seed.py. If this package is not installed in the studio venv,
the entire import chain (seed.py -> routes/__init__.py -> main.py ->
run_server()) crashes with ModuleNotFoundError. Since run.py has no
try/except around run_server() and studio.py does not report nonzero
exit codes, the failure is completely silent.

Changes:
- run.py: wrap run_server() in try/except, print clear error with
  traceback to stderr. Also reconfigure stderr encoding on Windows so
  tracebacks with non-ASCII paths do not cause secondary failures.
- studio.py: print an error message when the child process exits with
  a nonzero code on Windows, so the user knows something went wrong.
- seed.py: make data_designer_unstructured_seed import optional with
  a try/except fallback. The server starts normally and only returns
  HTTP 500 if the unstructured seed endpoints are actually called.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Skip Anaconda/Miniconda Python when creating Studio venv on Windows

Conda-bundled CPython ships modified DLL search paths that prevent
torch from loading c10.dll on Windows. The Studio server fails
silently at startup because the venv was created with conda's Python.

Standalone CPython (python.org, winget, uv) does not have this issue.

Both install.ps1 and setup.ps1 now skip any Python binary whose path
contains conda, miniconda, anaconda, miniforge, or mambaforge when
selecting the interpreter for the studio venv. If only conda Python
is available, the scripts print an error with instructions to install
standalone CPython.

* Fix multi-file preview crash and improve setup.ps1 Python discovery

Addresses review findings [10/10] and [8/10]:

1. seed.py: _read_preview_rows_from_multi_files() had a hard import
   of build_multi_file_preview_rows inside the function body, bypassing
   the optional-plugin guard. Moved it into the top-level try/except
   block and added a None guard matching the other functions.

2. setup.ps1: Python discovery now probes py.exe (Python Launcher)
   first, uses Get-Command -All to look past conda entries that shadow
   standalone CPython further down PATH, skips WindowsApps stubs, and
   resolves the actual executable path so venv creation does not
   re-resolve back to a conda interpreter.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Check sys.base_prefix to catch venvs created from conda Python

A venv created from conda Python (e.g. C:\Users\danie\.venv) has a
path that does not contain "conda", but sys.base_prefix still points
to the conda install (e.g. C:\Users\danie\miniconda3). The previous
path-only check missed this case entirely.

Both install.ps1 and setup.ps1 now use a Test-IsConda helper that
checks both the executable path AND sys.base_prefix against the
conda/miniconda/anaconda/miniforge/mambaforge pattern. This catches:
- Direct conda Python executables
- Venvs created from conda Python (base_prefix reveals the origin)

* Fix install.ps1 passing version string to uv venv instead of resolved path

Find-CompatiblePython returned a bare version string (e.g. "3.13")
which was passed to `uv venv --python 3.13`. uv performs its own
interpreter discovery and can resolve that version string back to a
conda Python, defeating the entire conda-skip logic.

Now Find-CompatiblePython returns a hashtable with both .Version (for
display) and .Path (the resolved absolute executable path). The venv
is created with `uv venv --python <absolute-path>`, ensuring uv uses
the exact interpreter we validated.

* Quote resolved Python path in uv venv call for paths with spaces

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-22 08:23:03 -07:00
Leo Borcherding
71c77d4e96
fix(install.ps1): fix non-NVIDIA package resolution — split torch+unsloth install (#4515)
* fix(install.ps1): split torch+unsloth install to fix non-NVIDIA package resolution

--torch-backend=auto on a non-NVIDIA Windows machine causes uv to resolve
unsloth==2024.8 (pre-CLI, no unsloth.exe). Fix: detect GPU robustly (PATH +
hardcoded fallback paths, mirrors setup.ps1), install torch first with an
explicit --index-url (CUDA variant for NVIDIA, CPU for everyone else), then
install unsloth separately without --torch-backend so the solver always picks
a modern release that ships the Studio CLI.

Closes the remaining gap flagged in #4478.

* fix(install.ps1): align warning with setup.ps1, add --upgrade, handle CUDA 11.x

- Match the no-GPU warning message to studio/setup.ps1 wording
  (chat-only GGUF mode, driver download link)
- Add CUDA 11.x floor check in Get-TorchIndexUrl so old drivers
  fall back to CPU wheels instead of silently getting cu124
- Log a warning when nvidia-smi output cannot be parsed
- Add --upgrade to both uv pip install calls so re-runs pick up
  newer package versions

* revert --upgrade from uv pip install calls

uv pip install already resolves to the latest satisfying version;
--upgrade is unnecessary and could force unwanted re-installs.

* fix: replace frozen cu124 fallbacks with cu126, guard CUDA 11.x

cu124 wheels are frozen at torch 2.6.0 -- falling back to them pins
users to an outdated PyTorch.  Three issues fixed in both install.ps1
and setup.ps1:

1. CUDA 12.0-12.5 now maps to cu126 (was cu124).
2. CUDA 11.x and older now falls back to cpu (was cu124, which would
   silently install incompatible GPU wheels).
3. Parse-failure and no-nvidia-smi fallbacks updated to cu126/cpu.

Adds tests/test_cuda_wheel_mapping.py covering the mapping logic,
nvidia-smi parsing, PS1 file sync, PyTorch index URL validation,
and sandbox torch installs.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove test file from PR branch

Test file kept locally, not needed in the PR.

* fix: map CUDA 11.x to cu118 instead of cpu

PyTorch still publishes cu118 wheels (up to torch 2.7.1), so CUDA 11.x
users get GPU-accelerated torch rather than being forced to CPU-only.
Only CUDA 10.x and older fall back to cpu.

* fix: revert CUDA 12.0-12.5 to cu124, handle cpu tag in setup.ps1

CUDA 12.0-12.5 drivers only support up to their reported CUDA version,
so cu126 wheels (built with CUDA 12.6) fail to load. Revert the catch-
all for 12.0-12.5 back to cu124.

Also fix setup.ps1 caller: when Get-PytorchCudaTag returns "cpu" (e.g.
CUDA 10.x driver), the installer now correctly skips Triton and prints
"CPU-only" instead of "CUDA support (cpu)".

* fix: add --upgrade to unsloth install for stale venv repair

On reruns against an existing venv, uv pip install unsloth makes no
changes if unsloth==2024.8 is already installed (it satisfies the
constraint). Adding --upgrade only to the unsloth install ensures
stale installs get repaired without forcing a multi-GB torch
re-download.

* fix: use --upgrade-package to avoid clobbering torch CUDA wheels

`--upgrade unsloth` re-resolves torch from default PyPI, stripping the
+cuXXX suffix installed in step 1.  `--upgrade-package unsloth unsloth`
upgrades only unsloth (and pulls missing deps like transformers, trl)
while preserving the pinned torch from the CUDA-specific index.

* docs: explain why split-install and --upgrade-package are needed

Expand the inline comment block to document both design decisions:
1. Why torch is installed separately (solver fallback to 2024.8)
2. Why --upgrade-package is used instead of --upgrade (preserves CUDA wheels)

---------

Co-authored-by: LeoBorcherding <LeoBorcherding@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-22 05:41:58 -07:00
Leo Borcherding
96edad9c95
PR: Fix/cuda minimum check and abort (#4517)
* fix: add CUDA minimum version check and abort for llama.cpp (>= 12.4)

- setup.ps1/setup.sh: abort with clear error if CUDA toolkit < 12.4
  (llama.cpp requirement); link to cuda-toolkit-archive for upgrade
- setup.ps1: promote CUDA VS integration copy failure from WARN to
  ERROR + exit 1; remove manual-copy hack instructions per Roland —
  correct fix is re-installing CUDA/MSBuild, not a manual workaround

Fixes: https://github.com/unslothai/unsloth/issues/4437
Reported by: Sebastien

* fix: wipe stale studio venv when torch CUDA tag changes

When the NVIDIA driver is updated, the required PyTorch CUDA tag changes
(e.g. cu124 -> cu130) but setup.ps1 was silently reusing the existing
.venv, leaving the old torch wheel in place and breaking the UI for
everyone on the next setup run.

Before creating/reusing the venv, inspect the installed torch version
string. If its CUDA tag does not match what the current driver requires,
wipe the venv so we always get a clean, correct install.

* Fix CUDA version check: portability, non-fatal fallback, stale venv detection

- setup.sh: Replace grep -oP with POSIX sed for macOS compatibility
- setup.sh: Replace exit 1 with NVCC_PATH="" to fall back to CPU-only build
- setup.sh: Move version check before -DGGML_CUDA=ON append
- setup.sh: Add else branch warning when nvcc version is unparseable
- setup.ps1: Replace exit 1 with $NvccPath=$null for non-fatal CUDA fallback
- setup.ps1: Add driver vs toolkit guidance in version warning
- setup.ps1: Guard CUDA env/VS integration setup with if ($NvccPath)
- setup.ps1: VS integration catch: downgrade to WARN, restore source/dest paths
- setup.ps1: Stale venv: detect CPU torch and untagged wheels, not just +cuNNN
- setup.ps1: Stale venv: rebuild on failed torch import
- setup.ps1: Stale venv: wrap Remove-Item in try/catch for locked files

* Remove incorrect CUDA >= 12.4 check, keep only stale venv detection

llama.cpp has no hard minimum CUDA version -- it builds with CUDA as old
as 11.2 and degrades features gracefully via #if CUDART_VERSION guards.
The 12.4 figure was the default Docker/CI baseline, not a build requirement.

Reverted:
- CUDA version check in setup.sh (entirely removed)
- CUDA version check in setup.ps1 (entirely removed)
- VS integration catch block cosmetic changes (restored to main)
- if ($NvccPath) guard around CUDA env setup (not needed without version check)

Kept:
- Stale venv detection in setup.ps1: detects torch CUDA tag mismatch
  (cu124 vs cu130, cpu vs cuXXX, broken torch import) and rebuilds venv

* Fix stale venv detection: incomplete venvs, timeout, fatal delete failure

- Add 30s timeout for torch import probe via ProcessStartInfo/WaitForExit
- Use Test-Path -PathType Container to reject files masquerading as venv dir
- Trigger rebuild when python.exe is missing (incomplete venv)
- Make Remove-Item failure fatal ([ERROR] + exit 1) instead of warn-and-continue
- Move $expectedTorchTag computation inside -not $shouldRebuild guard

---------

Co-authored-by: LeoBorcherding <LeoBorcherding@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-22 04:46:36 -07:00
Manan Shah
6f129a214b
Fix Install commands for Windows + 1 line installs (#4447)
* One liner setup for unsloth studio

* Fix install scripts: system deps, activation bugs, curl/wget support

- install.sh: detect platform (macOS/Linux/WSL) and check for missing
  system dependencies (cmake, git, build-essential, libcurl4-openssl-dev).
  Prompt user once for permission to install all missing packages via
  brew (macOS) or sudo apt-get (Linux/WSL). Add wget fallback via
  download() helper since curl is not always present on minimal Linux
  installs. Fix nested curl|sh stdin stealing by downloading uv installer
  to a tempfile first. Replace venv activation (no-op in a pipe subshell)
  with explicit --python flag for uv pip install and direct venv binary
  invocation. Add idempotency guard for venv creation. Redirect stdin
  on unsloth studio setup to prevent pipe consumption. On macOS, check
  for Xcode Command Line Tools and trigger install if missing.

- install.ps1: wrap script body in Install-UnslothStudio function so
  that errors use return instead of exit (exit kills the terminal when
  run via irm|iex). Remove activate.ps1 invocation entirely -- use
  explicit --python path for uv pip install and & $UnslothExe for
  studio setup. This avoids both the child-scope activation bug (& vs
  dot-source) and the execution policy error on default Windows systems.
  Add winget availability check with clear error message. Fix PATH
  refresh to append registry paths instead of replacing the session PATH.
  Add uv installer fallback via astral.sh PowerShell script if winget
  install does not put uv on PATH. Broaden Python version check to
  accept 3.11-3.13. Add idempotency guard for venv creation.

- README.md: add wget one-liner alternative for systems without curl.

* Fix Tailwind CSS v4 .gitignore bug on Windows (#4444)

- Add .gitignore hiding workaround to setup.ps1 (matching existing
  setup.sh logic) so venv .gitignore files containing "*" don't prevent
  Tailwind's oxide scanner from finding .tsx source files
- Add CSS size validation to setup.sh, setup.ps1, and build.sh to catch
  truncated Tailwind builds early
- Remove stray force-rebuild overrides that made the "skip build if
  current" cache check dead code in both setup scripts
- Add rm -rf dist to build.sh to force clean rebuilds for wheel packaging

* Change default port 8000 to 8888, fix installer bugs, improve UX

- Change default Studio port from 8000 to 8888 across all entry points
  (run.py, studio.py, ui.py, colab.py, vite.config.ts, setup scripts)
- Update launch banner: "Launching with studio venv..." to
  "Launching Unsloth Studio... Please wait..."
- Add "Open your web browser" banner and rename labels
  (Local -> Local Access, External -> Worldwide Web Address)
- Fix venv idempotency: check for bin/python instead of just directory
  existence, clean up partial venvs on retry
- Fix build.sh CSS validation: handle empty CSS case that silently
  bypassed the check with "integer expression expected"
- Fix install.sh sudo handling: try apt-get without sudo first (works
  when root), then escalate with per-package tracking and user prompt
- Fix install.ps1: check exit code from studio setup, fail on error
- Add pciutils to WSL GGUF build dependencies
- Apply same smart apt-get escalation pattern to studio/setup.sh

* Use detected Python version for venv, abort on non-apt Linux

- install.ps1: detect existing Python 3.11/3.12/3.13 and use that
  version for venv creation instead of always forcing 3.13
- install.sh: exit with error on non-apt Linux distros when required
  packages cannot be auto-installed, instead of silently continuing

* Make sudo permission prompt more prominent with warning banner

* Add Accept [Y/n] sudo prompt to studio/setup.sh for consistency

* Fix native command exit code handling and sudo decline flow

install.ps1: Add $LASTEXITCODE checks after winget (Python), uv venv,
and uv pip install calls. $ErrorActionPreference only catches PowerShell
cmdlet errors, not native executable failures. The Python check also
handles winget returning non-zero for "already installed".

setup.sh: Skip llama-server build when user declines sudo or sudo is
unavailable. Previously the script continued to section 8 which would
fail with confusing errors (e.g. "gcc: command not found") since
build-essential was never installed.

* Move rm -rf llama.cpp inside build branch to preserve existing install

When _SKIP_GGUF_BUILD is set (user declined sudo or sudo unavailable),
the previous rm -rf would destroy an already-working llama-server before
the skip check ran. Move it inside the else branch so existing builds
are preserved when the rebuild is skipped.

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-03-19 02:09:09 -07:00
Daniel Han
7ddb660b0c
revert: always rebuild frontend, override caching with _NEED_FRONTEND_BUILD=true (#4427)
* revert: remove frontend build caching from setup scripts

The mtime-based caching introduced in #4404/#4413 can incorrectly skip
frontend builds -- e.g. after git pull when filesystem timestamps are
not preserved, or after our Tailwind v4 discovery that the site-packages
.gitignore must be hidden before vite build (which the cached path
doesn't handle).

Always rebuild the frontend on setup. The build takes ~15s and is
safer than risking a stale dist/.

* revert: disable frontend build caching, keep code commented out

Caching disabled by always setting _NEED_FRONTEND_BUILD=true.
The mtime-based logic is preserved in comments for future re-enabling.

Reasons for disabling:
- Git does not preserve file timestamps, so cached dist/ can appear
  newer than freshly checked-out source after a pull
- Tailwind v4 requires hiding site-packages/.gitignore before vite
  build; the cache path bypasses this, producing broken CSS

* revert: always rebuild frontend, remove mtime caching

* revert: always rebuild frontend, override caching with _NEED_FRONTEND_BUILD=true
2026-03-18 07:37:53 -07:00
Daniel Han
1f12ba16df
Combine studio setup fixes: frontend caching, venv isolation, Windows CPU support (#4413)
* Allow Windows setup to complete without NVIDIA GPU

setup.ps1 previously hard-exited if nvidia-smi was not found, blocking
setup entirely on CPU-only or non-NVIDIA machines. The backend already
supports CPU and MLX (Apple Silicon) in chat-only GGUF mode, and the
Linux/Mac setup.sh handles missing GPUs gracefully.

Changes:
- Convert the GPU check from a hard exit to a warning
- Guard CUDA toolkit installation behind $HasNvidiaSmi
- Install CPU-only PyTorch when no GPU is detected
- Build llama.cpp without CUDA flags when no GPU is present
- Update doc comment to reflect CPU support

* Cache frontend build across setup runs

Skip the frontend npm install + build if frontend/dist already exists.
Previously setup.ps1 nuked node_modules and package-lock.json on every
run, and both scripts always rebuilt even when dist/ was already present.

On a git clone editable install, the first setup run still builds the
frontend as before. Subsequent runs skip it, saving several minutes.
To force a rebuild, delete frontend/dist and re-run setup.

* Show pip progress for PyTorch download on Windows

The torch CUDA wheel is ~2.8 GB and the CPU wheel is ~300 MB. With
| Out-Null suppressing all output, the install appeared completely
frozen with no feedback. Remove | Out-Null for the torch install
lines so pip's download progress bar is visible. Add a size hint
so users know the download is expected to take a while.

Also moves the Triton success message inside the GPU branch so it
only prints when Triton was actually installed.

* Guard CUDA env re-sanitization behind GPU check in llama.cpp build

The CUDA_PATH re-sanitization block (lines 1020-1033) references
$CudaToolkitRoot which is only set when $HasNvidiaSmi is true and
the CUDA Toolkit section runs. On CPU-only machines, $CudaToolkitRoot
is null, causing Split-Path to throw:

  Split-Path : Cannot bind argument to parameter 'Path' because it is null.

Wrap the entire block in `if ($HasNvidiaSmi -and $CudaToolkitRoot)`.

* Rebuild frontend when source files are newer than dist/

Instead of only checking if dist/ exists, compare source file timestamps
against the dist/ directory. If any file in frontend/src/ is newer than
dist/, trigger a rebuild. This handles the case where a developer pulls
new frontend changes and re-runs setup -- stale assets get rebuilt
automatically.

* Fix cmake not found on Windows after winget install

Two issues fixed:

1. After winget installs cmake, Refresh-Environment may not pick up the
   new PATH entry (MSI PATH changes sometimes need a new shell). Added a
   fallback that probes cmake's default install locations (Program Files,
   LocalAppData) and adds the directory to PATH explicitly if found.

2. If cmake is still unavailable when the llama.cpp build starts (e.g.
   winget failed silently or PATH was not updated), the build now skips
   gracefully with a [SKIP] warning instead of crashing with
   "cmake : The term 'cmake' is not recognized".

* Fix frontend rebuild detection and decouple oxc-validator install

Address review feedback:

- Check entire frontend/ directory for changes, not just src/.
  The build also depends on package.json, vite.config.ts,
  tailwind.config.ts, public/, and other config files. A change
  to any of these now triggers a rebuild.
- Move oxc-validator npm install outside the frontend build gate
  in setup.sh so it always runs on setup, matching setup.ps1
  which already had it outside the gate.

* Show cmake errors on failure and retry CUDA VS integration with elevation

Two fixes for issue #4405 (Windows setup fails at cmake configure):

1. cmake configure: capture output and display it on failure instead of
   piping to Out-Null. When the error mentions "No CUDA toolset found",
   print a hint about the CUDA VS integration files.

2. CUDA VS integration copy: when the direct Copy-Item fails (needs
   admin access to write to Program Files), retry with Start-Process
   -Verb RunAs to prompt for elevation. This is the root cause of the
   "No CUDA toolset found" cmake failure -- the .targets files that let
   MSBuild compile .cu files are missing from the VS BuildCustomizations
   directory.

* Address reviewer feedback: cmake PATH persistence, stale cache, torch error check

1. Persist cmake PATH to user registry so Refresh-Environment cannot
   drop it later in the same setup run. Previously the process-only
   PATH addition at phase 1 could vanish when Refresh-Environment
   rebuilt PATH from registry during phase 2/3 installs.

2. Clean stale CMake cache before configure. If a previous run built
   with CUDA and the user reruns without a GPU (or vice versa), the
   cached GGML_CUDA value would persist. Now the build dir is removed
   before configure.

3. Explicitly set -DGGML_CUDA=OFF for CPU-only builds instead of just
   omitting CUDA flags. This prevents cmake from auto-detecting a
   partial CUDA installation.

4. Fix CUDA cmake flag indentation -- was misaligned from the original
   PR, now consistently indented inside the if/else block.

5. Fail hard if pip install torch returns a non-zero exit code instead
   of silently continuing with a broken environment.

* Remove extra CUDA cmake flags to align Windows with Linux build

Drop GGML_CUDA_FA_ALL_QUANTS, GGML_CUDA_F16, GGML_CUDA_GRAPHS,
GGML_CUDA_FORCE_CUBLAS, and GGML_CUDA_PEER_MAX_BATCH_SIZE flags.
The Linux build in setup.sh only sets GGML_CUDA=ON and lets llama.cpp
use its defaults for everything else. Keep Windows consistent.

* Address reviewer round 2: GPU probe fallback, Triton check, stale binary rebuild

1. GPU detection: fallback to default nvidia-smi install locations
   (Program Files\NVIDIA Corporation\NVSMI, System32) when nvidia-smi
   is not on PATH. Prevents silent CPU-only provisioning on machines
   that have a GPU but a broken PATH.

2. Triton: check $LASTEXITCODE after pip install and print [WARN]
   on failure instead of unconditional [OK].

3. Stale llama-server: check CMakeCache.txt for GGML_CUDA setting
   and rebuild if the existing binary does not match the current GPU
   mode (e.g. CUDA binary on a now-CPU-only rerun, or vice versa).

* Fix frontend rebuild detection and npm dependency issues

Addresses reviewer feedback on the frontend caching logic:

1. setup.sh: Fix broken find command that caused exit under pipefail.
   The piped `find | xargs find -newer` had paths after the expression
   which GNU find rejects. Replaced with a simpler `find -maxdepth 1
   -type f -newer dist/` that checks ALL top-level files (catches
   index.html, bun.lock, etc. that the extension allowlist missed).

2. setup.sh: Guard oxc-validator npm install behind `command -v npm`
   check. When the frontend build is skipped (dist/ is cached), Node
   bootstrap is also skipped, so npm may not be available.

3. setup.ps1: Replace Get-ChildItem -Include with explicit path
   probing for src/ and public/. PowerShell's -Include without a
   trailing wildcard silently returns nothing, so src/public changes
   were never detected. Also check ALL top-level files instead of
   just .json/.ts/.js/.mjs extensions.

* Fix studio setup: venv isolation, centralized .venv_t5, uv targeting

- All platforms (including Colab) now create ~/.unsloth/studio/.venv
  with --without-pip fallback for broken ensurepip environments
- Add --python sys.executable to uv pip install in install_python_stack.py
  so uv targets the correct venv instead of system Python
- Centralize .venv_t5 bootstrap in transformers_version.py with proper
  validation (checks required packages exist, not just non-empty dir)
- Replace ~150 lines of duplicated install code across 3 worker files
  with calls to the shared _ensure_venv_t5_exists() helper
- Use uv-if-present with pip fallback; do not install uv at runtime
- Add site.addsitedir() shim in colab.py so notebook cells can import
  studio packages from the venv without system-Python double-install
- Update .venv_t5 packages: huggingface_hub 1.3.0->1.7.1, add hf_xet
- Bump transformers pin 4.57.1->4.57.6 in requirements + constraints
- Add Fast-Install helper to setup.ps1 with uv+pip fallback
- Keep Colab-specific completion banner in setup.sh

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix nvidia-smi PATH persistence and cmake requirement for CPU-only

1. Store nvidia-smi as an absolute path ($NvidiaSmiExe) on first
   detection. All later calls (Get-CudaComputeCapability,
   Get-PytorchCudaTag, CUDA toolkit detection) use this absolute
   path instead of relying on PATH. This survives Refresh-Environment
   which rebuilds PATH from the registry and drops process-only
   additions.

2. Make cmake fatal for CPU-only installs. CPU-only machines depend
   entirely on llama-server for GGUF chat mode, so reporting "Setup
   Complete!" without it is misleading. GPU machines can still skip
   the llama-server build since they have other inference paths.

* Fix broken frontend freshness detection in setup scripts

- setup.sh: Replace broken `find | xargs find -newer` pipeline with
  single `find ... -newer` call. The old pipeline produced "paths must
  precede expression" errors (silently suppressed by 2>/dev/null),
  causing top-level config changes to never trigger a rebuild.
- setup.sh: Add `command -v npm` guard to oxc-validator block so it
  does not fail when Node was not installed (build-skip path).
- setup.ps1: Replace `Get-ChildItem -Include` (unreliable without
  -Recurse on PS 5.1) with explicit directory paths for src/ and
  public/ scanning.
- Both: Add *.html to tracked file patterns so index.html (Vite
  entry point) changes trigger a rebuild.
- Both: Use -print -quit instead of piping to head -1 for efficiency.

* Fix bugs found during review of PRs #4404, #4400, #4399

- setup.sh: Add || true guard to find command that checks frontend/src
  and frontend/public dirs, preventing script abort under set -euo
  pipefail when either directory is missing

- colab.py: Use sys.path.insert(0, ...) instead of site.addsitedir()
  so Studio venv packages take priority over system copies. Add warning
  when venv is missing instead of silently failing.

- transformers_version.py: _venv_t5_is_valid() now checks installed
  package versions via .dist-info metadata, not just directory presence.
  Prevents false positives from stale or wrong-version packages.

- transformers_version.py: _install_to_venv_t5() now passes --upgrade
  so pip replaces existing stale packages in the target directory.

- setup.ps1: CPU-only PyTorch install uses --index-url for cpu wheel
  and all install commands use Fast-Install (uv with pip fallback).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix _venv_t5_is_valid dist-info loop exiting after first directory

Remove premature break that caused the loop over .dist-info directories
to exit after the first match even if it had no METADATA file. Now
continues iterating until a valid METADATA is found or all dirs are
exhausted.

* Capture error output on failure instead of discarding with Out-Null

setup.ps1: 6 locations changed from `| Out-Null` to `| Out-String` with
output shown on failure -- PyTorch GPU/CPU install, Triton install,
venv_t5 package loop, cmake llama-server and llama-quantize builds.

transformers_version.py: clean stale .venv_t5 directory before reinstall
when validation detects missing or version-mismatched packages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix ModuleNotFoundError when CLI imports studio.backend.core

The backend uses bare "from utils.*" imports everywhere, relying on
backend/ being on sys.path. Workers and routes add it at startup, but
the CLI imports studio.backend.core as a package -- backend/ was never
added. Add sys.path setup at the top of core/__init__.py so lazy
imports resolve correctly regardless of entry point.

Fixes: unsloth inference unsloth/Qwen3-8B "who are you" crashing with
"No module named 'utils'"

* Fix frontend freshness check to detect all top-level file changes

The extension allowlist (*.json, *.ts, *.js, *.mjs, *.html) missed
files like bun.lock, so lockfile-only dependency changes could skip
the frontend rebuild. Check all top-level files instead.

* Add tiktoken to .venv_t5 for Qwen-family tokenizers

Qwen models use tiktoken-based tokenizers which fail when routed through
the transformers 5.x overlay without tiktoken installed. Add it to the
setup scripts (with deps for Windows) and runtime fallback list.

Integrates PR #4418.

* Fix tiktoken crash in _venv_t5_is_valid and stray brace in setup.ps1

_venv_t5_is_valid() crashed with ValueError on unpinned packages like
"tiktoken" (no ==version). Handle by splitting safely and skipping
version check for unpinned packages (existence check only).

Also remove stray closing brace in setup.ps1 tiktoken install block.

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-18 03:52:25 -07:00
Daniel Han
0acd1c7eec
studio: improve onboarding UX, tooltips, and training defaults (#4355)
* studio: improve onboarding UX, tooltips, and training defaults

- Change splash text to "Train and run LLMs locally"
- Add "Chat Only" card with BubbleChatIcon to skip directly to chat
- Add Skip/Skip to Chat buttons in sidebar and footer
- Back button on step 1 returns to splash screen instead of being disabled
- Change "Watch video guide" to "Get started with our guide" with new URL
- Update intro text to mention all model types + chat
- Make all tooltips clickable (in addition to hover) via React context
- Strip surrounding quotes from pasted HF tokens
- Rename "Eval Split" to "Evaluation Split"
- Add SparklesIcon to "Auto Detect" format option
- Change step 4 heading to "Choose your training parameters"
- Default max_steps to 60
- Learning rate displayed in scientific notation with +/- stepper
- Context length options capped by model's max_position_embeddings (via AutoConfig)
- Fix "QLORA"/"LORA" to "QLoRA"/"LoRA" in summary step
- Backend: add max_position_embeddings to model config endpoint

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* compare for 2 diff models

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolving gemini comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: disable thinking for Qwen3.5 <9B and always for AI Assist

- Change Qwen3.5 thinking threshold from <=2B to <9B (0.8B, 2B, 4B
  all disable thinking by default; 9B+ enables it)
- Always pass enable_thinking=False in AI Assist helper calls
  (_run_with_helper and _generate_with_backend) regardless of chat
  thinking settings

* studio: address PR review comments

- Extract _get_max_position_embeddings helper to DRY config extraction
- Fix "Skip to Chat" to navigate to /chat on step 1 (was /studio)

* fix: comment out debug print statements

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: skip Shiki highlighting for incomplete SVG code fences

While streaming SVG content, the syntax highlighter (Shiki) re-parses
the entire growing SVG on every token, blocking the main thread and
freezing the code area until the fence closes. Show a plain-text
preview for incomplete SVG fences instead, similar to how Mermaid
diagrams show a placeholder while streaming.

* studio: fix default top_k from 50/40 to 20 for chat inference

Per Qwen3.5 docs (unsloth.ai/docs/models/qwen3.5), top_k should be 20
for both thinking and non-thinking modes. The model-specific config in
inference_defaults.json already had top_k=20 for Qwen3.5, but the
generic fallback defaults were wrong:
- Frontend DEFAULT_INFERENCE_PARAMS.topK: 50 -> 20
- Backend generate_chat_completion top_k: 40 -> 20
- Backend generate_chat_completion_with_tools top_k: 40 -> 20
- Frontend title generation top_k: 40 -> 20

* studio: set universal inference defaults for unknown models

Default params for any model without specific config:
  temperature=0.6, top_p=0.95, top_k=20, min_p=0.01,
  presence_penalty=0.0, repetition_penalty=1.0

Models with entries in inference_defaults.json (Qwen3.5, Gemma-3,
Llama, etc.) override these with their recommended values.

Updated in: frontend DEFAULT_INFERENCE_PARAMS, backend Pydantic
request models, and backend generate_chat_completion defaults.

* studio: only trust_remote_code for unsloth/ models in AutoConfig

Only set trust_remote_code=True when the model name starts with
"unsloth/". All other models default to False for safety.

* studio: move Generating spinner above the composer

The "Generating" spinner was below the send message bar, causing
the bar to jump up and down. Move it above the composer in both
the regular thread view and the welcome/empty view.

* studio: adjust toast close button position away from edge

Move the X close button on toasts (like "Starting model...") from
top-1.5 to top-3 and add right-3, giving more breathing room from
the top-right corner.

* studio: make Think button smaller with tighter icon-text gap

Reduce gap from 1.5 to 0.5, padding from px-2.5/py-1 to px-2/py-0.5,
and icon from size-3.5 to size-3.

* studio: multiple onboarding and chat UX improvements

- Move Generating spinner above composer (fixes jumping send bar)
- Make Think button smaller with tighter icon-text gap
- Chat card now inside grid (same size as Audio/Embeddings cards)
- Rename "Chat Only" to "Chat"
- Chat card requires Continue to proceed (no auto-advance)
- Continue on Chat selection skips onboarding and goes to /chat
- Tooltip (i) click on Chat card doesn't trigger navigation
- Step 1 footer Back button goes back to splash (label is "Back")
- Splash "Skip Onboarding" renamed to "Skip to Chat", navigates to /chat
- Toast close button moved away from edge

* studio: align Skip to Chat button, add Skip to footer

- Sidebar "Skip to Chat" now uses primary (green) Button style with
  arrow icon, full width, aligned like step items. Shows on all steps.
- Footer: added "Skip" outline button next to Continue that goes
  directly to /studio with progress saved (markOnboardingDone)

* studio: change default max steps from 30 to 60 in toggle hook

The DEFAULT_MAX_STEPS in use-max-steps-epochs-toggle.ts was still 30,
used as fallback when toggling from epochs back to max steps.

* studio: extend context length options to 262K

CONTEXT_LENGTHS now includes 65536, 131072, 262144 in addition to
the existing 512-32768 range. The onboarding step filters these by
the model's max_position_embeddings (e.g. Nemotron-3-Nano-4B has
262144), showing powers of 2 up to the model's maximum.

* studio: auto-select LoRA vs QLoRA based on model size and GPU memory

After selecting a model in onboarding, detect the total model weight
file size from HF Hub (safetensors/bin files). Then estimate memory
needed: model_size_gb * 1.5 * context_scale, where context_scale is:
  - <=8192 tokens: 1.0x
  - >8192 tokens: 1.7x
  - >=16384 tokens: 2.0x
  - >=32768 tokens: 4.0x

If the estimate fits in free GPU VRAM, default to LoRA (16-bit).
Otherwise default to QLoRA (4-bit).

Backend changes:
- Add model_size_bytes to ModelDetails (models.py)
- Add _get_model_size_bytes() using HfApi.repo_info (routes/models.py)
- Add vram_free_gb to get_gpu_summary (hardware.py)

Frontend changes:
- Add autoSelectTrainingMethod() in training-config-store.ts
- Called after model defaults are loaded
- Add model_size_bytes to ModelConfigResponse type
- Add vramFreeGb to HardwareInfo hook

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: rename "Importing ML libraries..." to "Importing Unsloth..."

* studio: show model/dataset in training status, fix LoRA/QLoRA casing

- Training status now shows 'Training "model_name"' and 'Dataset = ...'
  instead of generic "Starting training..."
- Fix Studio progress section to show QLoRA/LoRA instead of QLORA/LORA

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: rename 'Skip to Chat' to 'Skip Onboarding' on splash screen

* studio: add presence_penalty support for chat inference

Add presence_penalty as a parameter across the full stack:
- Backend: llama_cpp.py generate_chat_completion/with_tools, Pydantic
  models (inference.py), routes/inference.py pass-through
- Frontend: InferenceParams type, DEFAULT_INFERENCE_PARAMS (0.0),
  chat-adapter.ts payload, chat-settings-sheet.tsx slider (0-2),
  model defaults loading from inference_defaults.json
- Set Qwen3.5 default presence_penalty to 1.5 per official docs
- Default for unknown models is 0.0 (off)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: fix Chat card deselecting Text and aligning with other cards

* studio: fix presence_penalty not loading from inference defaults

The inference_config.py load_inference_config() was not including
presence_penalty in the returned config dict, so the Qwen3.5
default of 1.5 from inference_defaults.json never reached the
frontend. Added it to the config builder.

* studio: add delete button for cached models in model selector

Add trash icon on each downloaded model row (GGUF and safetensors) with
confirmation dialog. Backend DELETE /api/models/delete-cached endpoint
uses huggingface_hub scan_cache_dir + delete_revisions to cleanly remove
cached repos, refusing if the model is currently loaded.

* studio: restore inference defaults, reasoning, and tools on page refresh

On page refresh with a model already loaded, the frontend was not
re-applying model-specific inference defaults (presence_penalty,
temperature, etc.) or restoring reasoning/tools support flags.

Backend: Add inference config, supports_reasoning, supports_tools,
and context_length to InferenceStatusResponse.

Frontend: In the refresh callback, when an active model is detected,
apply mergeRecommendedInference and restore reasoning/tools flags
with proper Qwen3.5 size-based defaults.

* studio: fix delete dialog closing before async completes

Prevent AlertDialogAction's default close behavior with
e.preventDefault() so the dialog stays open during deletion.
Also block onOpenChange dismiss while deleting is in progress.

* fix: add Dict and Any imports to inference models

* studio: fix Qwen3.5 reasoning threshold in frontend load path

The frontend loadModel handler had the old threshold (<=2) for
disabling reasoning on small Qwen3.5 models. Changed to <9 to
match the backend. This was causing 4B to not properly disable
thinking by default when auto-loaded.

* studio: move GGUF delete to per-variant level

For GGUF repos, the trash icon now appears on each downloaded variant
row inside the quantization expander instead of on the repo-level row.
Backend accepts optional variant param to delete specific GGUF files
(blob + symlink) rather than the entire repo cache.

* studio: restore ggufContextLength on page refresh

The Max Tokens slider was capped at 32768 on page refresh because
ggufContextLength was not restored from the status response.
Now set it from statusRes.context_length on reconnect.

* fix: remove <think> from Qwen3.5 response template marker

The train-on-responses-only feature uses template markers to find
where the assistant response starts. The Qwen3.5 response marker
included '<think>\n' which is only present when thinking mode is
enabled. With thinking disabled (default for <9B), the marker
never matched, causing 100% of samples to be dropped.

Changed response marker from '<|im_start|>assistant\n<think>\n'
to '<|im_start|>assistant\n' which works regardless of thinking mode.

* studio: fix sloth ASCII art alignment in training overlay

* fix: correct sloth ASCII art alignment to match Unsloth banner

* studio: add Python and terminal tool calling to chat

Register python and terminal tools alongside web search. Python
executor validates imports (stdlib only) via unsloth_zoo
rl_environments, runs code in a subprocess sandbox with 5-min
timeout and cancel support. Terminal executor blocks dangerous
commands (rm, sudo, etc.) and runs in a temp directory.

Update llama_cpp tool loop to show tool-specific status messages
and pass cancel_event through to executors. Rename composer
toggle from "Search" to "Tools" and show TerminalIcon for
execution status pills.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: fix Nemotron/transformers 5.x support, onboarding navigation, port binding

Backend:
- Dynamic transformers 5.x detection via tokenizer_config.json fetch
  (checks for TokenizersBackend class, cached per-model)
- Bump transformers 5.x version from 5.2.0 to 5.3.0 across all workers,
  setup scripts (setup.sh, setup.ps1)
- Auto-enable trust_remote_code for unsloth/* models needing transformers 5.x
  (workaround for NemotronH config parsing bug in transformers)
- Auto-install mamba-ssm/causal-conv1d for SSM models (NemotronH, Falcon-H1)
  with --no-build-isolation --no-deps to avoid torch version conflicts
- Add SO_REUSEADDR to port check in run.py (fixes Colab proxy stale connection
  falsely reporting port as in-use)

Frontend:
- Fix "Skip to Chat" navigation: use window.location.href instead of React
  Router navigate() to bypass useEffect redirect race
- Fix "Skip Onboarding" on splash: navigates to /studio (not /chat)
- Fix onboarding guard: only check isOnboardingDone() on initial mount
- Fix Chat card on step 1: add sr-only spacer for consistent alignment
- Fix Chat+Text both selected: clear RadioGroup value when Chat is selected

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: split tools toggle into Search and Code buttons

Replace the single "Tools" toggle with two independent toggles:
- "Search" (globe icon) enables web search only
- "Code" (terminal icon) enables Python and terminal execution

Add enabled_tools list field to the inference payload so the
backend only registers the tools the user has toggled on. Both
toggles appear in the main composer and the compare composer.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: fix tool calling import validation and error logging

Replace unsloth_zoo-dependent import checker with a standalone
ast-based validator using sys.stdlib_module_names. This properly
blocks non-stdlib imports (numpy, requests, etc.) and returns a
clear error message to the model so it can rewrite using only
stdlib.

Add full traceback to tool streaming error logs for debugging.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: parse gpt-oss harmony channels for clean safetensors chat output

gpt-oss models emit multi-channel output via harmony protocol tokens
(<|channel|>analysis<|message|>... and <|channel|>final<|message|>...).
TextIteratorStreamer with skip_special_tokens=True strips the special
tokens but leaves channel names concatenated with content, producing
garbled output like "analysisWe need to...assistantfinalHello!".

Add HarmonyTextStreamer that decodes with skip_special_tokens=False,
parses harmony markup via regex, and emits <think>analysis</think>
for the analysis channel and plain text for the final channel --
reusing the existing frontend reasoning UI.

Also expose supports_reasoning=True for non-GGUF gpt-oss models in
the /status endpoint so the frontend enables the Think toggle.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: use unsloth_zoo for Python sandbox validation

Set UNSLOTH_IS_PRESENT=1 and import check_python_modules and
check_signal_escape_patterns directly from unsloth_zoo instead
of a standalone fallback. This gives us the full Unsloth
validation including stdlib-only import checks and signal/timeout
escape pattern detection.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: allow all imports in Python tool sandbox

Remove stdlib-only import restriction. Keep signal escape
pattern detection via unsloth_zoo for safety.

* studio: fix ReadTimeout on tool streaming final pass

The 0.5s read timeout used for cancel-checking during streaming
also fires when waiting for the first response from llama-server
(e.g. reasoning model thinking for 15+ seconds). Add
_stream_with_retry() context manager that retries on ReadTimeout
while checking cancel_event, so the model has unlimited time to
think before producing the first token. Applied to both the
regular streaming path and the tool-calling final pass.

* fix: rewrite HarmonyTextStreamer with stateful incremental parsing

The delta-on-transformed approach had two critical bugs:

1. Before the full <|channel|>X<|message|> pattern was complete, the
   strip-tokens fallback emitted "analysis" as plain text. Then when
   the regex matched, _transform returned a completely different format
   (<think>...</think>) and the delta was computed against the wrong
   base string, producing fragments like "think>", "nk>", ">".

2. Even with full matches, the closing </think> tag shifted position
   as content grew, so text[prev_len:] produced garbled deltas.

Replace with stateful incremental parsing that:
- Buffers until a complete channel+message pair is seen
- Emits <think> once when analysis channel first appears
- Streams analysis content deltas (computed on channel content directly)
- Emits </think> once when final channel first appears
- Streams final content deltas
- Closes open think tags in end()

Also skip the generic all_special_tokens stripping in
_clean_generated_text for gpt-oss since HarmonyTextStreamer already
produces clean output and the generic stripping was mangling <think>
tags.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: strip all <|...|> tokens in gpt-oss cleanup, not just harmony subset

The gpt-oss tokenizer has added tokens like <|return|> (id=200002) that
are not part of the harmony channel protocol but can leak into output.
The previous regex only stripped channel|message|start|end tokens.

Broaden the _clean_generated_text regex for gpt-oss to <\|[a-z_]+\|>
which catches all pipe-delimited tokens (return, constrain, reserved,
etc.) without matching <think>/<\/think> tags.

Verified: gpt-oss all_special_tokens are only <|return|>,
<|reserved_200017|>, <|startoftext|> -- none overlap with <think>.
The harmony tokens (channel, message, start, end) are added_tokens
but not in all_special_tokens.

* fix: hide config-only model repos from cached models list

Repos that only have metadata/config files cached (no .safetensors or
.bin weight files) were showing up in the Downloaded list with tiny
sizes like "1.8 KB" or "24 KB". These are just leftover config
snapshots from architecture checks, not usable models.

Filter the cached-models endpoint to only include repos that contain
actual model weight files (.safetensors or .bin).

* studio: fix toast description text contrast in dark mode

Add explicit !text-muted-foreground to toast description classNames
so secondary text (e.g. "Releases VRAM and resets inference state.")
is readable in dark mode.

* studio: fix Chat card icon alignment with size-4 spacer

Replace sr-only span (takes no space) with a size-4 shrink-0 div
matching the RadioGroupItem dimensions in other cards, so the Chat
icon aligns vertically with Text/Audio/Vision/Embeddings icons.

---------

Co-authored-by: workspace <user@workspace.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Manan17 <shahmanan170602@gmail.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
2026-03-17 07:46:07 -07:00
Daniel Han
eeffa4c065
studio: web search, KV cache dtype, training progress, inference fixes
## Summary
- Add web search tool calling for GGUF models (Search toggle, DuckDuckGo via ddgs)
- Add KV cache dtype dropdown (f16/bf16/q8_0/q5_1/q4_1) in Chat Settings
- Fix Qwen3/3.5 inference defaults per official docs (thinking on/off params)
- Enable reasoning by default for Qwen3.5 4B and 9B
- Replace "Generating" toast with inline spinner
- Fix stop button via asyncio.to_thread (event loop no longer blocked)
- Fix CUDA 12 compat lib paths for llama-server on CUDA 13 systems
- Fix auto-load model name not appearing in selector
- Training progress messages + dataset_num_proc fix

Integrated PRs:
- #4327 (imagineer99): BETA badge alignment (already in tree)
- #4340 (Manan Shah): prioritize training models in model selection
- #4344 (Roland Tannous): setup.sh macOS python version compatibility
- #4345 (Manan Shah): revamp model+dataset checking logic
2026-03-17 00:30:01 -07:00
Datta Nimmaturi
bbf6414caf
Fix formatting of launch command in setup.ps1 2026-03-17 10:19:16 +05:30
Roland Tannous
46f9be3dd1
fix: Resolve CUDA toolkit mismatch on multi-CUDA Windows systems (#4324)
* fix: prefer existing CUDA_PATH toolkit to avoid version mismatch on multi-CUDA systems

* fix: validate GPU arch support before accepting CUDA toolkit (sm_120 + CUDA 12.4 fallback)

* debug: add temporary CUDA compatibility check print

* fix: auto-copy CUDA VS integration files when missing (No CUDA toolset found)

* fix: return false when nvcc --list-gpu-arch unavailable (reject old toolkit, scan for newer)

* fix: re-sanitize CUDA env vars before cmake build (survives Refresh-Environment)

* fix: use --list-gpu-code (sm_*) instead of --list-gpu-arch (compute_*) for arch probing
2026-03-16 18:16:16 +04:00
Roland Tannous
f44857b2df
PR: Windows Setup Improvements (#4299)
* quiet llama.cpp build, smarter CUDA install via winget, accept Python 3.11-3.13

* studio: hide Python traceback when setup script exits with error

* setup.ps1: auto-add Python Scripts dir to PATH so 'unsloth' command works in new terminals

* setup.ps1: fix GPU check to run nvidia-smi instead of just checking command existence

* setup.ps1: fix PATH check to use exact entry comparison instead of substring match

* setup.ps1: validate Python probe exit code before persisting Scripts PATH
2026-03-14 23:59:49 +04:00
Daniel Han
6dda8c4c23 studio: revert combined targets, keep separate builds
Restore separate cmake --build calls for llama-server and
llama-quantize on both setup.sh and setup.ps1. The combined
approach made llama-quantize failure fatal, but it was originally
best-effort (|| true on Linux, [WARN] on Windows). The timing
savings from combining was only ~2.7s, not worth the semantic
change.

The Ninja + arch detection speedups are preserved (55s vs 1m 37s).
2026-03-14 00:54:09 -07:00
Daniel Han
e4a5da8d96 studio: combine llama.cpp build targets in setup.ps1
Build llama-server and llama-quantize in a single cmake --build
invocation on Windows, matching the same optimization done in
setup.sh. This allows MSBuild to better parallelize the two targets.

The Visual Studio generator is kept as-is (not switching to Ninja on
Windows since VS generator is the standard approach and interacts
with MSBuild).
2026-03-14 00:54:09 -07:00
Roland Tannous
47654cb91c Final cleanup 2026-03-12 18:28:04 +00:00
Roland Tannous
400b6ecede Update setup.ps1 2026-03-12 02:44:25 +04:00
Roland Tannous
1087216cb5 Merge branch 'fix/pre-merge-cleanup' into feature/merge-build-final 2026-03-11 20:56:49 +00:00
Manan17
fbccac8cee shifting setup & co inside studio 2026-03-11 20:19:52 +00:00
Roland Tannous
daa50d0756 Revert "Merge pull request #347 from unslothai/feature/studio-storage-roots"
This reverts commit 6b43e33ff1, reversing
changes made to 9edadaf21f.
2026-03-10 01:52:47 +00:00
Manan17
32569fc8a8 shifting setup & co inside studio 2026-03-09 23:48:31 +00:00
Renamed from setup.ps1 (Browse further)