unsloth/studio/setup.sh
Daniel Han cad8c6ad05
Add AMD ROCm/HIP support across installer and hardware detection (#4720)
* Add ROCm detection to install.sh and expand shell tests

Add AMD ROCm GPU detection to get_torch_index_url() in install.sh.
When nvidia-smi is not found, probe for ROCm via amd-smi, /opt/rocm
version file, hipconfig, dpkg-query, and rpm.

Includes validation guard for malformed _rocm_tag, Debian epoch prefix
stripping, ROCm 7.2+ cap to rocm7.1 index, bitsandbytes AMD install,
and status messaging. Shell tests expanded to 23 cases.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Add ROCm torch reinstall support to install_python_stack.py

Add _detect_rocm_version() and _ensure_rocm_torch() to detect when a
Linux host has ROCm but the venv received CPU-only torch, and reinstall
with the correct ROCm wheels. Covers ROCm 6.0 through 7.1 with a
30-second timeout on the torch GPU probe subprocess.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Add ROCm support to llama.cpp prebuilt installer

Add has_rocm field to HostInfo, extend detect_host() to probe for ROCm
via hipcc/amd-smi/rocm-smi/ROCM_PATH, and route ROCm hosts to upstream
prebuilts (Linux ROCm 7.2 prebuilt with source fallback, Windows HIP
prebuilt with CPU fallback). Add linux-rocm and windows-hip install
kinds to runtime_patterns_for_choice().

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Add IS_ROCM hardware flag and fix AMD error message

Add IS_ROCM flag to hardware.py detect_hardware() (set when
torch.version.hip is present, DeviceType stays CUDA). Export IS_ROCM
from __init__.py. Add "rocm" key to get_package_versions().

Replace "We do not support AMD" error in tokenizer_utils.py with a
helpful message pointing to ROCm installation docs.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Add comprehensive ROCm support test suite (68 tests)

Add tests/studio/install/test_rocm_support.py covering all ROCm code
paths across install_llama_prebuilt.py, install_python_stack.py,
hardware.py, tokenizer_utils.py, and install.sh. All tests use mocks
and run without AMD hardware.

Covers: asset selection (11), runtime patterns (5), HostInfo (4),
ROCm version detection (9), torch reinstall (9), index mapping (8),
hardware flag (8), tokenizer message (2), install.sh structure (10),
and live regression (1).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Harden ROCm support: probe error handling, version cap, validation

Address review findings from 8 independent reviewers:

- Wrap _ensure_rocm_torch() torch probe in try/except for
  TimeoutExpired and OSError so a hung or broken torch import does not
  crash the installer (8/8 reviewers flagged this)
- Add torch>=2.4,<2.11.0 version cap to the ROCm reinstall path to
  prevent installing unsupported torch 2.11.0 from the rocm7.1 index
- Use with-statement for file reads in _detect_rocm_version() to avoid
  resource leaks
- Handle ROCM_PATH="" correctly (use `or "/opt/rocm"` instead of
  default parameter to avoid relative path resolution)
- Strengthen shell validation guard from rocm[0-9] to rocm[1-9] to
  reject rocm0.x tags that would produce nonexistent PyTorch index URLs
- Switch shell version cap from blocklist to allowlist (rocm6.*|rocm7.0*
  |rocm7.1* pass through, everything else caps to rocm7.1) so future
  ROCm 10+ does not fall through to a nonexistent index
- Add sorted() to _ROCM_TORCH_INDEX lookup for defensive ordering
- Fix test_probe_timeout_handled: replace zero-assertion test with
  proper assertions verifying reinstall proceeds after timeout

* Clean up rocm_paths list construction in detect_host()

Filter None from the ROCM_PATH env var lookup at list construction time
instead of relying on the inline `if p` guard in the any() call.

* Require actual AMD GPU presence before selecting ROCm paths

All 8 reviewers across 2 cycles independently flagged that ROCm
detection used toolkit/filesystem hints (hipcc, /opt/rocm, rocm-core)
as a proxy for GPU presence, which would misroute CPU-only or NVIDIA
hosts that happen to have ROCm tools installed.

Now all 3 detection points (install.sh, install_python_stack.py,
install_llama_prebuilt.py) probe for an actual AMD GPU before
entering the ROCm path:

- install.sh: check rocminfo for gfx* GPU names, or amd-smi list
  for device rows, before version detection
- install_python_stack.py: new _has_rocm_gpu() function probes
  rocminfo and amd-smi list before _ensure_rocm_torch() proceeds
- install_llama_prebuilt.py: detect_host() probes rocminfo/amd-smi
  list instead of just checking tool existence or directory paths

Also:
- Shell test mock amd-smi now handles "list" subcommand
- Python tests updated to mock _has_rocm_gpu where needed
- Added test_no_gpu_with_rocm_tools_skips to verify the new guard
- Test index lookups now use sorted() to match production code

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Harden hipconfig version parsing and torch probe compatibility

- Add parts[1].isdigit() check in hipconfig version parsing to handle
  versions like "6.3-HIP" where the minor component has non-numeric
  suffix (strip "-" prefix before int() conversion)
- Use getattr() in torch probe subprocess to safely handle old or
  custom torch builds that may lack torch.version.hip/cuda attributes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Strengthen AMD GPU detection and add NVIDIA precedence guard

- Change amd-smi list detection from any-non-empty-output to requiring
  "gpu" marker in output, matching the shell-side NR>1 check. Prevents
  false positives from header-only amd-smi list output.
- Add nvidia-smi check at the top of _ensure_rocm_torch() so mixed
  AMD+NVIDIA hosts preserve NVIDIA precedence (matching install.sh and
  install_llama_prebuilt.py behavior).
- Apply the same amd-smi marker fix to install_llama_prebuilt.py
  detect_host() for consistency.

* Add Windows-specific ROCm/HIP detection in detect_host()

The previous detect_host() ROCm check used rocminfo and amd-smi list
which are Linux-only tools. On Windows, has_rocm would always be False,
making the Windows HIP prebuilt path at line 1794 unreachable.

Now detect_host() uses platform-specific detection:
- Linux: rocminfo (check for gfx GPU names) or amd-smi list
- Windows: hipinfo.exe, amd-smi, or amdhip64.dll on PATH

This allows Windows AMD users to get the HIP prebuilt binary instead
of silently falling through to the CPU prebuilt.

* Add AMD ROCm gaps: Mamba/SSM source builds, GPU monitoring, Windows messaging, RDNA expansion

- worker.py: Add HIP detection to causal-conv1d/mamba-ssm probe, check
  for hipcc before ROCm source builds, improve status messages and error
  reporting, add timeout and uv support for the source build fallback
- amd.py: New AMD GPU monitoring module via amd-smi metric --json,
  mirroring nvidia.py structure (utilization, temperature, power, VRAM)
- hardware.py: Branch to amd.py when IS_ROCM is True for GPU utilization,
  visible GPU queries, and physical GPU count
- install_python_stack.py: Detect AMD GPUs on Windows and warn that
  ROCm-enabled PyTorch must be installed manually
- kernels/utils.py: Expand is_rdna() to cover RDNA2 (gfx1030-1032),
  RDNA3 (gfx1102-1103), RDNA3.5 (gfx1150-1152) alongside existing entries
- tests: Add 32 new tests covering all changes (95/95 pass)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Harden ROCm detection, fix VRAM heuristic, and expand RDNA2 coverage

- Windows ROCm detection: validate actual GPU presence via hipinfo/amd-smi
  output markers instead of just checking tool existence on PATH
- _ensure_rocm_torch: validate nvidia-smi actually reports a GPU before
  giving NVIDIA precedence (fixes AMD-only hosts with stale NVIDIA tools)
- amd.py _parse_numeric: handle dict-shaped metric objects from newer
  amd-smi versions ({"value": 10, "unit": "W"}) and strip MiB/GiB units
- amd.py VRAM heuristic: raise threshold from 100k to 10M to correctly
  handle MI300X (192 GB = 196608 MB) and other high-VRAM GPUs
- amd.py visible GPU: use AMD-reported GPU IDs instead of enumerate index
  so non-dense sets like CUDA_VISIBLE_DEVICES=1,3 report correctly
- install.sh: add ROCm <6.0 minimum version guard (no PyTorch wheels
  exist for older versions); fix rocm7.1* glob to not match rocm7.10+
- is_rdna: add gfx1033-1036 for RDNA2 mobile GPUs (RX 6600M etc.)
- worker.py: increase ROCm source build timeout from 600s to 1800s;
  fix success log message for ROCm source builds
- Tests: update mocks for _has_usable_nvidia_gpu, add RDNA2 target asserts

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add HIP_VISIBLE_DEVICES support, unit-aware VRAM parsing, Windows GPU validation

- hardware.py: check HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES on ROCm
  before falling back to CUDA_VISIBLE_DEVICES, so multi-GPU AMD setups with
  HIP-specific env vars report the correct visible device set
- amd.py: add _parse_memory_mb() that reads "unit" from dict-shaped amd-smi
  JSON (e.g. {"value": 192, "unit": "GiB"}) and converts to MB correctly;
  fixes MI300X VRAM misreported as 0.19 GB instead of 192 GB
- install_python_stack.py: Windows AMD warning now validates actual GPU
  presence via hipinfo/amd-smi output markers before printing
- install_llama_prebuilt.py: restore amdhip64.dll fallback for Windows HIP
  detection after tool-based checks, so Windows HIP installs without CLI
  tools on PATH are still detected
- hardware.py: fix IS_ROCM comment to accurately describe its role

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix HIP_VISIBLE_DEVICES empty-string handling in GPU visibility spec

Use explicit None checks instead of Python `or` operator when reading
HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES, so that an empty string
("") is correctly honored as "no visible GPUs" rather than silently
falling through to CUDA_VISIBLE_DEVICES on mixed ROCm+CUDA systems.

* Fix IS_ROCM test assertion for multi-line formatting

* Cap torchvision/torchaudio versions, remove amdhip64.dll fallback, fix visible GPU count

- Cap torchvision<0.26.0 and torchaudio<2.11.0 alongside torch<2.11.0 in
  both install.sh and install_python_stack.py to prevent resolver from
  selecting incompatible companion packages from ROCm wheel index
- Remove amdhip64.dll fallback in Windows ROCm detection (DLL presence
  without hipinfo/amd-smi is not proof of GPU existence)
- Fix get_visible_gpu_count() to use _get_parent_visible_gpu_spec() which
  respects HIP_VISIBLE_DEVICES/ROCR_VISIBLE_DEVICES on ROCm hosts

* Attribute is_rdna() RDNA2/3/3.5/4 expansion to PR #4428

The is_rdna() expansion to cover RDNA2 (gfx1030-1036), RDNA3
(gfx1100-1103), RDNA3.5 (gfx1150-1152), and RDNA4 (gfx1200-1201)
architectures is based on the original work from PR #4428.

Co-authored-by: GoldenGrapeGentleman <yueyuan@amd.com>
Co-authored-by: billishyahao <bill.he@amd.com>

* Support AMD Radeon for studio (#4770)

Co-authored-by: Iswarya Alex <iswarya.alex@amd.com>

* Remove ROCm test files from main PR

Move test_rocm_support.py and shell test additions to a separate PR
to keep the main ROCm support PR focused on implementation changes.

* Fix installer and hardware detection issues for PR #4720

- Fix empty _tri_arg passed to uv pip install in Radeon path (causes
  "Empty field is not allowed for PEP508" error)
- Fix Radeon fallback: use ROCm index instead of CPU-only when
  repo.radeon.com is unreachable (TORCH_INDEX_URL already has ROCm)
- Use $TORCH_CONSTRAINT in fallback paths instead of hardcoded strings
- Fix _pick_radeon_wheel: relax suffix to match manylinux_2_28_x86_64
  wheels (AMD Radeon repo does not use bare linux_x86_64 platform tag)
- Fix IS_ROCM export: use __getattr__ so callers always see the live
  value after detect_hardware() runs
- Fix apply_gpu_ids: set HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES
  on ROCm so _get_parent_visible_gpu_spec picks up narrowed GPU set
- Fix _parse_memory_mb: distinguish GB (1000 MB) from GiB (1024 MiB)
- Add amd-smi version as a fallback in _detect_rocm_version
- Fix trailing whitespace and missing newline at EOF in install.sh

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix GPU detection false positives and add missing health groups

- Fix _has_rocm_gpu() false positive: require "GPU: <number>" data rows
  from amd-smi list, not just header containing "gpu"
- Apply same fix in detect_host() in install_llama_prebuilt.py
- Add runtime_payload_health_groups for linux-rocm and windows-hip so
  partial/corrupt ROCm/HIP prebuilt installs are properly detected
- Add bitsandbytes install to Radeon fallback paths (was only in the
  success path, skipped when repo.radeon.com was unreachable)
- Keep DEVICE/CHAT_ONLY as direct imports in __init__.py (matching main)
  and only use __getattr__ for IS_ROCM

* Fix _ensure_rocm_torch and Windows AMD warning false positives

- _ensure_rocm_torch: only skip when HIP is already present, not for
  CUDA builds (which are unusable on AMD-only hosts). Fixes the case
  where a venv has a stale CUDA wheel and the repair step is skipped.
- Windows AMD warning: use GPU data row check (same as Linux fix) to
  avoid false positives from amd-smi list header-only output.

* Fix amd-smi GPU detection for GPU[N] output format

Older amd-smi versions output "GPU[0] : Card series: ..." instead of
"GPU: 0". The regex now matches both "GPU: <digit>" and "GPU[<digit>"
formats to detect actual GPU data rows.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Harden AMD GPU detection against false positives

- install.sh: replace weak amd-smi list check (awk 'NR>1 && NF') with
  strict pattern matching GPU data rows (/^GPU[[:space:]]*[:\[]/)
- All files: reject rocminfo gfx000 (CPU HSA agent) by requiring
  gfx[1-9] instead of gfx[0-9] in the rocminfo GPU probe
- Fixes false positives on hosts with ROCm tools but no AMD GPU

* Remove duplicate comment from pre-commit merge

* Refactor: deduplicate AMD detection, consolidate bitsandbytes, clean up imports

- Extract _has_amd_rocm_gpu() shell function to avoid duplicating the
  rocminfo/amd-smi GPU detection logic in get_torch_index_url and
  the Radeon auto-detect block
- Consolidate bitsandbytes install into a single case block after torch
  install (was duplicated 4 times across Radeon success/fallback paths)
- Move math and re imports to top of amd.py (were inline in functions)
- Add _smi_query() helper in hardware.py to centralize IS_ROCM backend
  selection for get_gpu_utilization and get_visible_gpu_utilization

Addresses Gemini code review suggestions.

* Fix VRAM parsing for string values and GB/GiB consistency

- Extract unit from string-valued VRAM fields (e.g. "192 GiB") so
  _parse_memory_mb correctly applies the unit multiplier instead of
  treating the value as bare MB
- Treat GB and GiB identically (both as binary x1024) since GPU tools
  including amd-smi use binary units even when labeling them "GB"
- Fixes incorrect VRAM reporting on MI300-class cards (was showing
  ~0.19 GB instead of 192 GB for string-valued outputs)

* Add --no-cache to uv for ROCm HIP source builds

Avoid stale cache artifacts from partial HIP source builds when
uv is used for causal-conv1d/mamba-ssm compilation on ROCm.
The pip path already uses --no-cache-dir; this adds the uv equivalent
(--no-cache) only when is_hip is True.

* Fix critical: initialize _amd_gpu_radeon before case block

_amd_gpu_radeon was only set inside the */rocm*) case arm, so on
NVIDIA/CPU/macOS paths where TORCH_INDEX_URL does not contain "rocm",
the variable was unbound. With set -u (nounset) enabled, this crashes
the installer for every non-AMD user.

Move initialization to before the case block so it is always defined.

* Fix Windows AMD: route has_rocm hosts to HIP prebuilt path

resolve_release_asset_choice was selecting windows-cpu for all Windows
x86_64 hosts including those with has_rocm=True. Windows AMD users
should fall through to resolve_upstream_asset_choice which tries the
HIP prebuilt first. Add "not host.has_rocm" guard to the published
windows-cpu selection.

* Harden ROCm detection, Radeon wheel fallback, and HIP visibility

Addresses review findings from parallel reviewers on PR #4720:

- install.sh: add _has_usable_nvidia_gpu() helper requiring nvidia-smi -L
  to actually list a GPU before treating the host as NVIDIA. Fixes the
  stale-nvidia-smi-on-PATH regression where AMD-only hosts fell into the
  CUDA branch.
- install.sh: fix hipconfig awk blocks to propagate a non-zero exit code
  when the output is not a recognisable version string, so the ||-chain
  continues to dpkg-query / rpm instead of terminating early.
- install.sh: fail-closed on Radeon wheel fallback. When torch,
  torchvision or torchaudio is missing from the Radeon repo for the
  active Python tag, fall back to the standard ROCm index instead of
  silently mixing Radeon wheels with PyPI defaults. Quote all wheel
  arguments individually so wheel filenames cannot be word-split or
  glob-expanded.
- install_llama_prebuilt.py: detect_host() now requires nvidia-smi -L to
  list a GPU before setting has_physical_nvidia. Routes AMD ROCm hosts
  with a broken leftover nvidia-smi to the ROCm path instead of
  misclassifying them as NVIDIA.
- install_llama_prebuilt.py: scan upstream assets for any rocm-<version>
  prebuilt instead of hard-coding rocm-7.2, so ROCm 6.x / 7.0 / 7.1 / 7.3+
  users pick up a matching upstream prebuilt when one exists.
- install_llama_prebuilt.py: validate_server() adds --n-gpu-layers 1 for
  linux-rocm and windows-hip hosts, so new HIP prebuilts are preflighted
  on the GPU path instead of passing validation on CPU only.
- install_llama_prebuilt.py: restore the published windows-cpu fallback
  for AMD Windows hosts without a HIP prebuilt so hash-approved bundles
  are still preferred over the raw upstream CPU asset.
- install_python_stack.py: drop the /opt/rocm / hipcc gate in
  _ensure_rocm_torch() and rely on _has_rocm_gpu(). Runtime-only ROCm
  installs (package-managed minimal installs, Radeon software) that ship
  amd-smi / rocminfo without hipcc can now repair a CPU-only venv via
  "unsloth studio update". Adds an explicit IS_WINDOWS / IS_MACOS guard.
- studio/backend/utils/hardware/amd.py: honour HIP_VISIBLE_DEVICES /
  ROCR_VISIBLE_DEVICES / CUDA_VISIBLE_DEVICES in
  get_primary_gpu_utilization(). A process restricted to GPU 2 now
  reports metrics for GPU 2 instead of physical GPU 0. Tighten the plain
  bytes unit detection to an explicit allowlist.
- studio/backend/utils/hardware/hardware.py: route
  get_backend_visible_gpu_info()'s backend_cuda_visible_devices field
  through a helper that reads HIP_VISIBLE_DEVICES on ROCm. Drop the
  unconditional "(rocm=False)" suffix in apply_gpu_ids() logs.

* Fix round 2 regressions: ROCm validate_server and Windows HIP routing

Follow-up to 810b833b addressing review findings on the first round of
hardening commits:

- install_llama_prebuilt.py validate_server: gate --n-gpu-layers on the
  resolved install_kind instead of host.has_rocm. AMD Windows hosts
  without a HIP prebuilt fall back to windows-cpu and must not be
  validated with GPU layers; thread install_kind through from the
  caller.
- install_llama_prebuilt.py resolve_release_asset_choice: reinstate the
  "not has_rocm" guard on the published windows-cpu bundle so AMD
  Windows hosts reach resolve_upstream_asset_choice() where the new
  HIP prebuilt path lives. Prefer a published windows-hip bundle first
  when one exists, fall through to upstream HIP + upstream CPU
  otherwise.
- install_llama_prebuilt.py detect_host: also set has_physical_nvidia
  when the secondary --query-gpu block confirms a working NVIDIA GPU,
  so older nvidia-smi versions without -L support do not silently skip
  the Linux diagnostics that key off has_physical_nvidia.
- install_llama_prebuilt.py: drop redundant "import re as _re" /
  "import re as _re_rocm" local aliases in favour of the existing
  top-level "import re".
- install_python_stack.py _ensure_rocm_torch: run the AMD
  bitsandbytes install unconditionally after the HIP-torch probe so
  "unsloth studio update" on venvs that already have ROCm torch still
  gains the AMD bitsandbytes build.
- install.sh: add a non-x86_64 early-exit to get_torch_index_url() so
  aarch64 / arm64 Linux hosts do not hit the ROCm wheel index
  (PyTorch only publishes ROCm wheels for linux_x86_64).
- install.sh: add bitsandbytes install to the migrated-environment
  branch so upgrades pick it up for ROCm hosts instead of only the
  fresh-install path.
- install.sh: in the Radeon wheel path, pass version constraints +
  --no-index --find-links to uv instead of explicit wheel URLs so a
  version-compatible torch / torchvision / torchaudio triple is
  resolved, rather than picking the highest-version wheel for each
  package independently.
- studio/backend/utils/hardware/amd.py _first_visible_amd_gpu_id: fall
  through to lower-priority visibility env vars when the first entry
  is malformed (leading comma, all-whitespace first token) instead of
  silently returning GPU 0.

* Fix round 3 findings: x86_64 guard, ROCm version clip, Radeon deps

Address issues surfaced by the round 3 reviewers on top of 8636fa63:

- install_python_stack.py _ensure_rocm_torch: add the same `x86_64`
  guard that install.sh already has. Linux aarch64 / arm64 ROCm hosts
  must skip the repair path entirely; PyTorch only publishes ROCm
  wheels for linux_x86_64, and without this guard
  `unsloth studio update` aborts with a missing-wheel error on non
  x86_64 hosts.
- install_llama_prebuilt.py resolve_upstream_asset_choice: add a
  best-effort _detect_host_rocm_version() helper (reading
  /opt/rocm/.info/version, amd-smi version, hipconfig --version) and
  filter rocm_candidates to entries whose major.minor is <= host
  version. Falls back to the newest candidate only when no compatible
  one exists, so a ROCm 6.4 host downloads rocm-6.4 instead of being
  handed the numerically newest rocm-7.2 bundle (which fails preflight
  and forces a source build).
- install.sh: remove the round 2 --no-index switch from the Radeon
  wheel branch. --no-index forced uv to ignore PyPI entirely, which
  broke transitive dependency resolution (filelock, sympy, networkx,
  jinja2, fsspec, setuptools, typing-extensions, ...) on a fresh venv.
  Restore the round 1 explicit wheel URL invocation but add a
  torch / torchvision / torchaudio version-pair sanity check so a
  mismatched trio (e.g. torch 2.9.1 + torchvision 0.23.0 + torchaudio
  2.9.0) falls back to the standard ROCm index instead of installing a
  broken combination.
- install_python_stack.py _ensure_rocm_torch: restructure the
  "tag is None" path so it no longer short-circuits the bitsandbytes
  install. On a ROCm runtime older than anything in
  _ROCM_TORCH_INDEX, print the "no wheel" warning but still run the
  AMD bitsandbytes install.
- studio/backend/core/training/worker.py: restore the pre-PR
  "no timeout" behaviour for non-HIP causal-conv1d / mamba-ssm source
  builds. The round 2 "timeout = 1800 if is_hip else 300" cap aborts
  slow non-HIP builds (Linux aarch64, unsupported torch/CUDA combos)
  after 5 minutes; omit timeout for the non-HIP branch so the cap
  only applies to ROCm source builds.

* Fix round 4 findings: apply_gpu_ids env inheritance, Radeon X.Y, bitsandbytes gate

Address remaining issues surfaced by the round 4 reviewers:

- studio/backend/utils/hardware/hardware.py apply_gpu_ids: mirror the
  selection into HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES whenever
  the caller already had a ROCm visibility env var set, not only when
  IS_ROCM has already been set by detect_hardware(). Training and
  inference workers call apply_gpu_ids() before detect_hardware()
  runs, so the old guard would leave a forked ROCm worker with a
  stale HIP_VISIBLE_DEVICES mask that no longer matched the
  narrowed CUDA_VISIBLE_DEVICES selection.
- install.sh get_radeon_wheel_url: accept X.Y ROCm versions in
  addition to X.Y.Z. The `/opt/rocm/.info/version` file and some
  hipconfig versions report only two components, and the Radeon
  repository publishes both rocm-rel-X.Y.Z/ and rocm-rel-X.Y/
  directories, so treating X.Y as invalid caused Radeon hosts to fall
  back to the generic ROCm index even when a matching AMD wheel set
  existed.
- install_python_stack.py _ensure_rocm_torch: only install the AMD
  bitsandbytes build when the venv actually has a ROCm-compatible
  torch (either already present or just installed by this function).
  Previously the bitsandbytes install ran unconditionally, which
  could leave an AMD bitsandbytes layered on top of a CPU/CUDA torch
  on hosts where the ROCm runtime is older than any entry in
  _ROCM_TORCH_INDEX. Also add --force-reinstall so an existing
  CPU/CUDA bitsandbytes is replaced by the AMD build during upgrades.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix gemini findings: amd-smi metric envelope validation and dict-wrapped GPU id

Two medium-severity defensive fixes from the gemini-code-assist review on
the AMD monitoring backend:

1. _extract_gpu_metrics may return a dict where every value is None when
   amd-smi succeeds (zero exit) but the JSON envelope contains no usable
   fields (error response, unsupported card). The new _has_real_metrics
   helper lets get_primary_gpu_utilization surface available:False and
   lets get_visible_gpu_utilization skip ghost device rows so the UI
   does not render placeholder cards with empty numbers.

2. Newer amd-smi versions wrap scalar fields as {"value": 0, "unit":
   "none"}, including the per-GPU id. The previous int(raw_id) call
   silently fell back to the enumeration index in that case, losing the
   real GPU id. Routing raw_id through the existing _parse_numeric
   helper handles bare ints, floats, strings, and the dict shape
   uniformly, with a debug log on parse failure.

* Fix gemini round 2 findings: explicit length guard on ROCm version file parser

Both _detect_rocm_version (install_python_stack.py) and
_detect_host_rocm_version (install_llama_prebuilt.py) read /opt/rocm/.info/version
or $ROCM_PATH/lib/rocm_version, split on "." and unconditionally accessed
parts[1]. The surrounding broad `except Exception: pass` already swallowed
the resulting IndexError, so a one-component file like "6\n" did fall
through to the next detection source -- but the control flow relied on
exception handling instead of an explicit check.

Add `if len(parts) >= 2:` guards in both helpers so the loop falls through
on its own without raising. Behaviour is unchanged for the common multi-
component case; the previously-silent IndexError path becomes an explicit
no-op.

* Fix gemini round 3: include has_rocm in validate_server fallback path

When validate_server is called without an explicit install_kind (older
call sites that have not been updated), the fallback was only enabling
--n-gpu-layers for NVIDIA and macOS arm64 hosts. AMD ROCm Linux hosts
fell through to the CPU validation path even though the prebuilt being
exercised was a HIP binary.

Add host.has_rocm to the fallback expression so the GPU offload flag is
applied consistently with the install_kind=='linux-rocm' / 'windows-hip'
branches above.

* Fix gemini round 4: remove risky bytes-vs-MB heuristic in _parse_memory_mb

The previous heuristic divided any bare number above 10_000_000 by
1024*1024 on the assumption that large unit-less values were bytes.
This misclassified small VRAM allocations: 5 MB of used VRAM reported
as 5_242_880 bytes without a unit would be taken at face value and
render as 5_242_880 MB (~5 TB) in the monitoring UI.

Modern amd-smi always provides explicit units (MiB/GiB dict form),
and legacy amd-smi returns bare numbers in MB -- the heuristic never
had a real workload to handle. Drop it and default to MB for bare
numeric input, keeping the existing unit-aware branches for dict /
string inputs unchanged.

The unrelated gemini suggestion to "default minor to 0" in the
amd-smi version awk parser was intentionally NOT applied: rocm7.0
and rocm7.1 ship different wheel sets, so silently substituting 0
for a missing minor could install the wrong wheels. The existing
reject-and-fall-through behaviour is safer.

* Fix gemini round 5: POSIX compliance and leading-comma visibility parsing

Three medium findings from gemini-code-assist addressed in this commit:

1. _pick_radeon_wheel used grep -o and sort -V, both GNU extensions
   that are not in POSIX and break on BSD/BusyBox coreutils. install.sh
   has a #!/bin/sh shebang so the whole pipeline was rewritten as a
   single awk script that extracts all href="..." hits on each line,
   filters to wheels matching the package prefix and python tag, and
   picks the newest version via zero-padded lexical comparison. No
   external sort or grep is needed.

2. _first_visible_amd_gpu_id in the AMD monitoring backend treated a
   leading comma (e.g. HIP_VISIBLE_DEVICES=",1") as "fall through to
   the next env var", which is surprising given the clear intent to
   narrow to device 1. Filter empty tokens after the split and return
   the first real one. An all-commas value ("," / ",,,") still falls
   through because no real tokens exist; the empty-string and "-1"
   explicit-zero cases are unchanged.

The unrelated amd-smi version awk parser suggestion was not applied
(see round 4 commit message for rationale: defaulting a missing minor
to 0 could silently install the wrong ROCm wheel set).

* Fix 20-reviewer.py findings: base drift, Radeon %2B, dpkg/rpm fallback, bnb, backend label

Consolidated fix batch from a 20-parallel reviewer.py run on the current
head. Each fix is drawn from a high-consensus finding and addresses a
real bug or feature gap, not a stylistic preference.

1. install.sh: bump `unsloth>=2026.4.2` -> `unsloth>=2026.4.4` at five
   call sites so this branch no longer regresses main's version floor
   (main bumped to 2026.4.4 in #4876). Without this, merging 4720 would
   silently downgrade the minimum version pin for fresh installs.

2. install.sh: URL-decode Radeon wheel names before extracting the
   torch / torchvision / torchaudio version strings. Real wheel URLs
   from repo.radeon.com are percent-encoded ("torch-2.10.0%2Brocm7.2.0...")
   so the previous `[+-]` terminator in the sed regex never matched,
   `_torch_ver` stayed empty, `_radeon_versions_match` stayed false,
   and every Radeon consumer install silently fell back to the generic
   ROCm index. Now decode %2B -> + first, then extract, then validate.

3. install.sh: the two AMD bitsandbytes install lines were running
   `uv pip install "bitsandbytes>=0.49.1"` without `--force-reinstall`,
   so upgrades where the venv already has a CPU/CUDA bitsandbytes
   satisfying the constraint would keep the stale non-AMD wheel. Add
   `--force-reinstall --no-cache-dir` to both call sites, matching the
   pattern already used in install_python_stack.py::_ensure_rocm_torch.

4. install_python_stack.py and install_llama_prebuilt.py: add
   `dpkg-query -W rocm-core` and `rpm -q rocm-core` fallbacks to the
   Python-side ROCm version detectors so they match the chain in
   install.sh::get_torch_index_url. Package-managed ROCm installs
   (Debian/Ubuntu/RHEL/Fedora distro packages) can expose GPUs via
   rocminfo/amd-smi but still lack /opt/rocm/.info/version, hipconfig,
   or amd-smi `version` output -- without these fallbacks, `unsloth
   studio update` on such hosts returned None and skipped the ROCm
   torch repair. Also strip the dpkg epoch prefix ("1:6.3.0-1") before
   parsing so epoch-annotated packages parse correctly.

5. hardware.py: add a `_backend_label(device)` helper that returns
   "rocm" when IS_ROCM is set and the device is DeviceType.CUDA, and
   use it for every `"backend": ...` emission in JSON responses served
   to the Studio frontend. Internally we still represent ROCm hosts as
   DeviceType.CUDA (ROCm torch reuses the whole torch.cuda.* API
   surface), but the user-facing API now correctly reports "rocm" on
   AMD boxes instead of labeling them as "cuda".

All 250 simulation scenarios pass (was 233 before this batch: added 17
new regression tests covering the version pin, %2B decoding, bnb
force-reinstall flags, dpkg/rpm fallback presence, and the
_backend_label helper's four-way truth table).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix gemini round 6 + URL audit: amd.py defensive checks, rocm6.5+ clip to 6.4

Two rounds of fixes in one commit, plus a full URL audit of every PyPI /
download.pytorch.org / repo.radeon.com reference the PR introduces.

amd.py (4 medium gemini findings on commit b3627bc2):

1. _extract_gpu_metrics used `and vram_total_mb` as part of the vram_util
   gate. The follow-up `vram_total_mb > 0` already handles the division
   guard, but the truthiness check was redundant and slightly surprising
   for a 0.0 valid value. Replace with explicit `is not None and > 0`
   for both vram_util and power_util.

2. get_physical_gpu_count called `data.get("gpu", ...)` without guarding
   for non-dict envelopes. A scalar / string JSON response from amd-smi
   would raise AttributeError. Add an isinstance(data, dict) check and
   return None for unexpected shapes.

3. get_visible_gpu_utilization had the same .get() exposure on the outer
   envelope. Rewrite the gpu_list extraction as an explicit
   list/dict/else cascade so a malformed scalar envelope produces
   gpu_list=[data] and continues without raising.

4. The same function's per-entry loop also called gpu_data.get() on
   whatever was inside gpu_list. If a scalar ever leaks into the list
   (directly or via the previous fix's fallback), _extract_gpu_metrics
   would raise on the first .get() inside the helper. Skip non-dict
   entries in the loop before extracting metrics.

install.sh (URL audit finding, previously flagged by 20-reviewer as #13):

5. get_torch_index_url used `rocm6.*` in the rocm tag case statement,
   which matched rocm6.5 and rocm6.6 and emitted
   download.pytorch.org/whl/rocm6.5 -- which returns HTTP 403 because
   PyTorch only publishes rocm 5.7, 6.0-6.4, 7.0-7.2. Enumerate the
   supported 6.x minors explicitly and add a rocm6.* fallback branch
   that clips to rocm6.4 (the last supported 6.x wheel set).

URL audit results (all URLs PR 4720 references):
- 14/14 download.pytorch.org/whl/{cpu,cu118,cu124,cu126,cu128,cu130,
  rocm6.0..6.4,rocm7.0..7.2} return HTTP 200.
- 9/9 repo.radeon.com/rocm/manylinux/rocm-rel-{5.7,6.0,6.1,6.2,6.3,
  6.4,7.0,7.1,7.2}/ return HTTP 200.
- X.Y.Z patch directories exist for 7.0.2, 7.1.1, 7.2.1 but NOT for
  6.3.0, 6.4.0, 6.2.1 -- install.sh already handles this via the X.Y.Z
  -> X.Y fallback sed in the Radeon wheel install block.
- Docs links (rocm.docs.amd.com, docs.unsloth.ai AMD guide) and the
  llama.cpp GitHub releases API endpoint all return 200.

Test suite: 255 -> 258. New regression coverage:
- U17: get_physical_gpu_count tolerates scalar amd-smi envelope
- U18: get_visible_gpu_utilization tolerates scalar envelope
- U19a-c: vram_util / power_util return None on zero total, but
  vram_total_gb still echoes 0.0 (not None)
- A_rocm{6.5,6.6,6.9}_clips_to_rocm64: install.sh clips unsupported
  6.x minors to rocm6.4 instead of producing a 403 index URL

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix reviewer.py round 2: tokenizer AMD multi-GPU, --no-torch bnb, main.py backend label

Three high-confidence findings from a second 20-parallel reviewer.py run
on commit 7effb3ae. Triaged 15 total findings and applied the three that
were confirmed as real bugs; the rest were either false positives (e.g.
"migrated AMD venv not repaired" -- _ensure_rocm_torch runs downstream
via setup.sh regardless), design decisions (e.g. visibility mask env
vars not consulted in installer detection), or edge cases the existing
fallback logic already handles.

1. unsloth/tokenizer_utils.py [6/20]: the multi-GPU guard's shell probe
   runs `nvidia-smi --query-gpu=memory.used`, catches the failure, then
   only raises if `torch.cuda.is_available()` is False. On ROCm torch,
   torch.cuda.is_available() returns True (ROCm reuses the torch.cuda.*
   API), so the guard becomes dead code on AMD hosts and multi-GPU AMD
   setups slip through even though unsloth does not support them yet.
   Add a torch.cuda.device_count() > 1 fallback inside the except so
   AMD multi-visible-device setups are flagged consistently with the
   original CUDA memory check.

2. install.sh [1/20]: the fresh-install bitsandbytes block for AMD ROCm
   ran unconditionally when TORCH_INDEX_URL matched `*/rocm*`, even when
   SKIP_TORCH=true (from --no-torch or Intel Mac auto-detect). A user
   running `install.sh --no-torch` on an AMD host would still pull in
   bitsandbytes despite explicitly asking for GGUF-only mode. Wrap the
   case block in an outer `[ "$SKIP_TORCH" = false ]` guard.

3. studio/backend/main.py [3/20]: the /api/system endpoint returned
   `"device_backend": get_device().value`, which is "cuda" on ROCm
   hosts (because ROCm torch piggybacks on torch.cuda). Other endpoints
   (hardware.py) already use the _backend_label helper which swaps
   "cuda" -> "rocm" when IS_ROCM. Route /api/system through the same
   helper so the Studio UI reports the backend consistently across all
   endpoints.

4. studio/backend/tests/test_utils.py: update test_backend_matches_device
   to call _backend_label(get_device()) instead of raw get_device().value
   so the test matches the new contract and still passes on CUDA hosts.

Tests: 258 -> 261. New regression coverage:
- X08 main.py /api/system uses _backend_label
- X09 tokenizer multi-GPU guard has device_count() fallback
- X10 fresh-install bnb case block gated on SKIP_TORCH=false

* fix: prevent bitsandbytes from overwriting ROCm torch with CUDA wheels

During install, bitsandbytes was installed without --no-deps, causing
uv to resolve torch from PyPI (CUDA build) and silently overwrite the
ROCm wheels that were just installed in the previous step.

This happened in three places:
- install.sh: bitsandbytes install in both migrated and fresh paths
- install_python_stack.py: bitsandbytes install inside _ensure_rocm_torch()

Additionally, multiple install steps in install_python_stack.py (extras,
overrides, studio deps) can pull in CUDA torch via transitive
dependencies. A final _ensure_rocm_torch() call at the end of the
install sequence ensures ROCm torch is always in place at runtime.

All changes are gated behind ROCm-specific conditions and do not affect
NVIDIA, CPU-only, macOS, or Windows install paths.

Tested on AMD Instinct MI300X VF with ROCm 7.2.0 -- confirms
torch==2.10.0+rocm7.1 with HIP 7.1.25424 after install.

* fix: ROCm inference fallback -- skip Unsloth patching and bnb 4-bit on HIP

On AMD ROCm (HIP), two issues prevent the normal Unsloth inference path:

1. Unsloth's global monkey-patching of transformers model classes
   (LlamaRotaryEmbedding, attention modules) triggers
   _assert_async_cuda_kernel crashes on HIP during generation.
   Training uses different code paths and works fine.

2. bitsandbytes 4-bit matmul kernels also trigger HIP assertion
   failures on MI300X (CDNA3 / gfx942), even without Unsloth patching.

This commit adds a ROCm-specific inference fallback that:
- Skips importing Unsloth at module level (prevents global patching)
- Loads models in 16-bit with plain transformers + PEFT instead
- Resolves pre-quantized model names (e.g. "xxx-bnb-4bit" -> "xxx")
  since pre-quantized HF repos still trigger bnb codepaths
- Guards get_chat_template calls (unavailable without Unsloth import)
- Fixes max_seq_length=0 being passed to from_pretrained (GGUF
  semantics don't apply to transformers path)

The NVIDIA path is completely unchanged -- Unsloth import and
for_inference() optimization remain active. GGUF inference (via
llama-server/HIP) is unaffected since it never imports Python model
classes. AMD GPUs typically have large VRAM (e.g. 192GB on MI300X)
so 16-bit loading is practical for inference.

Tested on AMD Instinct MI300X VF (ROCm 7.2, HIP 7.1.25424):
- Simple generation: PASS
- Compare mode (base vs finetuned): PASS
- GGUF inference + tool calling: PASS (unaffected by this change)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: guard audio/vision inference on ROCm, remove unused import

- Add clear RuntimeError for audio/vision model inference on ROCm
  (these paths use Unsloth's FastModel/FastVisionModel which would
  crash on HIP; GGUF inference is the supported path on AMD)
- Remove unused `import os as _os` from the ROCm changes

* fix: amd-smi parsing for newer output format (gpu_data wrapper, mem_usage, temperature)

amd-smi on recent ROCm versions (7.x) wraps metric output in a
{"gpu_data": [...]} envelope instead of returning a raw list. This
caused get_primary_gpu_utilization() and get_visible_gpu_utilization()
to fail silently (returning available=False) because the GPU data
dict was never unwrapped.

Additionally:
- VRAM data moved from "vram" to "mem_usage" with "total_vram" /
  "used_vram" keys. Added fallback key lookup.
- Temperature "edge" sensor returns "N/A" on MI300X VF; the previous
  dict.get() chain returned the "N/A" string instead of falling
  through to "hotspot". Changed to a loop that checks each key until
  a parseable value is found.

Tested on AMD Instinct MI300X VF (ROCm 7.2, amd-smi 24.x):
- GPU utilization: 0% (idle), up to 100% during training
- Temperature: 40-44C (from hotspot sensor)
- VRAM: 0.28/191.69 GB (idle)
- Power: 158-211W draw

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Bug fix detecting radeon (#4940)

* Bug fix detecting radeon

* Expanding GPU target for gfx1100*

* Generalize gfx family-prefix filter to cover gfx10/gfx12 as well

rocminfo on ROCm 6.1+ emits LLVM generic-family ISA lines alongside the
specific GPU (e.g. gfx11-generic next to gfx1100). The outer grep captures
the bare family prefix from the generic line, and passing that to
-DGPU_TARGETS breaks the HIP build because clang only accepts specific
gfxNNN ids.

The previous filter only special-cased gfx11. Generalize it so any bare
2-digit family prefix (gfx10, gfx11, gfx12, ...) is dropped whenever a
specific sibling target is present in the same list. No real AMD GPU has
a 2-digit gfx id, so the filter can only ever drop family prefixes and
never a real target.

Covers the existing gfx11 cases unchanged, and extends the same fix to
gfx10-1-generic / gfx10-3-generic (RDNA1/2) and gfx12-generic (RDNA4),
which would otherwise hit the same build failure on newer rocminfo.

---------

Co-authored-by: Iswarya Alex <iswarya.alex@amd.com>
Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>

---------

Co-authored-by: Eda Z <eda.zhou@amd.com>
Co-authored-by: GoldenGrapeGentleman <yueyuan@amd.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: billishyahao <bill.he@amd.com>
Co-authored-by: Iswarya Alex <47045679+iswaryaalex@users.noreply.github.com>
Co-authored-by: Iswarya Alex <iswarya.alex@amd.com>
Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-04-10 01:56:12 -07:00

1079 lines
44 KiB
Bash
Executable file

#!/usr/bin/env bash
# SPDX-License-Identifier: AGPL-3.0-only
# Copyright 2026-present the Unsloth AI Inc. team. All rights reserved. See /studio/LICENSE.AGPL-3.0
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
RULE=$(printf '\342\224\200%.0s' {1..52})
# ── Maintainer-editable defaults ──────────────────────────────────────────
# Change these in the GitHub-hosted script so all users get updated defaults.
# User environment variables always override these baked-in values.
#
# _DEFAULT_LLAMA_PR_FORCE : PR number to build by default ("" = normal path)
# _DEFAULT_LLAMA_SOURCE : git clone URL for source builds
# _DEFAULT_LLAMA_TAG : llama.cpp ref to build ("latest" = newest release,
# "master" = bleeding-edge, "bNNNN" = specific tag)
# Prefer "latest" over "master" -- "master" bypasses
# the prebuilt resolver (no matching GitHub release),
# forces a source build, and causes HTTP 422 errors.
# Only use "master" temporarily when the latest release
# is missing support for a new model architecture.
# ──────────────────────────────────────────────────────────────────────────
_DEFAULT_LLAMA_PR_FORCE=""
_DEFAULT_LLAMA_SOURCE="https://github.com/ggml-org/llama.cpp"
_DEFAULT_LLAMA_TAG="latest"
_DEFAULT_LLAMA_FORCE_COMPILE_REF="master"
# ── Colors (same palette as startup_banner / install_python_stack) ──
if [ -n "${NO_COLOR:-}" ]; then
C_TITLE= C_DIM= C_OK= C_WARN= C_ERR= C_RST=
elif [ -t 1 ] || [ -n "${FORCE_COLOR:-}" ]; then
C_TITLE=$'\033[38;5;150m'
C_DIM=$'\033[38;5;245m'
C_OK=$'\033[38;5;108m'
C_WARN=$'\033[38;5;136m'
C_ERR=$'\033[91m'
C_RST=$'\033[0m'
else
C_TITLE= C_DIM= C_OK= C_WARN= C_ERR= C_RST=
fi
# ── Output helpers ──
# Consistent column layout: 2-space indent, 15-char label (fits llama-quantize), then value.
# Usage: step <label> <message> [color] (color defaults to C_OK)
step() { printf " ${C_DIM}%-15.15s${C_RST}${3:-$C_OK}%s${C_RST}\n" "$1" "$2"; }
substep() { printf " ${C_DIM}%-15s%s${C_RST}\n" "" "$1"; }
_is_verbose() {
[ "${UNSLOTH_VERBOSE:-0}" = "1" ]
}
verbose_substep() {
if _is_verbose; then
substep "$1"
fi
return 0
}
run_maybe_quiet() {
if _is_verbose; then
"$@"
else
"$@" > /dev/null 2>&1
fi
}
# ── Helper: run command quietly, show output only on failure ──
_run_quiet() {
local on_fail=$1
local label=$2
shift 2
if _is_verbose; then
local exit_code
"$@" && return 0
exit_code=$?
step "error" "$label failed (exit code $exit_code)" "$C_ERR" >&2
if [ "$on_fail" = "exit" ]; then
exit "$exit_code"
else
return "$exit_code"
fi
fi
local tmplog
tmplog=$(mktemp) || {
step "error" "Failed to create temporary file" "$C_ERR" >&2
[ "$on_fail" = "exit" ] && exit 1 || return 1
}
if "$@" >"$tmplog" 2>&1; then
rm -f "$tmplog"
return 0
else
local exit_code=$?
step "error" "$label failed (exit code $exit_code)" "$C_ERR" >&2
cat "$tmplog" >&2
rm -f "$tmplog"
if [ "$on_fail" = "exit" ]; then
exit "$exit_code"
else
return "$exit_code"
fi
fi
}
run_quiet() {
_run_quiet exit "$@"
}
run_quiet_no_exit() {
_run_quiet return "$@"
}
print_llama_error_log() {
local log_file=$1
[ -s "$log_file" ] || return 0
substep "llama.cpp diagnostics (last 120 lines):"
tail -n 120 "$log_file" | sed 's/^/ | /' >&2
}
installed_llama_prebuilt_release() {
local install_dir=${1:-}
local metadata_path="$install_dir/UNSLOTH_PREBUILT_INFO.json"
[ -f "$metadata_path" ] || return 0
python - "$metadata_path" <<'PY' 2>/dev/null || true
import json
import sys
from pathlib import Path
try:
payload = json.loads(Path(sys.argv[1]).read_text(encoding="utf-8"))
except Exception:
raise SystemExit(0)
if not isinstance(payload, dict):
raise SystemExit(0)
repo = str(payload.get("published_repo") or "").strip()
release_tag = str(payload.get("release_tag") or "").strip()
llama_tag = str(payload.get("tag") or "").strip()
if not repo or not release_tag:
raise SystemExit(0)
message = f"installed release: {repo}@{release_tag}"
if llama_tag and llama_tag != release_tag:
message += f" (tag {llama_tag})"
print(message)
PY
}
print_installed_llama_prebuilt_release() {
local install_dir=${1:-}
local installed_release
installed_release="$(installed_llama_prebuilt_release "$install_dir")"
if [ -n "$installed_release" ]; then
substep "$installed_release"
fi
}
# ── Banner ──
echo ""
printf " ${C_TITLE}%s${C_RST}\n" "🦥 Unsloth Studio Setup"
printf " ${C_DIM}%s${C_RST}\n" "$RULE"
verbose_substep "verbose diagnostics enabled"
_LLAMA_ONLY="${UNSLOTH_STUDIO_LLAMA_ONLY:-0}"
if [ "$_LLAMA_ONLY" = "1" ]; then
substep "llama.cpp only mode"
fi
# ── Clean up stale caches ──
rm -rf "$REPO_ROOT/unsloth_compiled_cache"
rm -rf "$SCRIPT_DIR/backend/unsloth_compiled_cache"
rm -rf "$SCRIPT_DIR/tmp/unsloth_compiled_cache"
# ── Detect Colab ──
IS_COLAB=false
keynames=$'\n'$(printenv | cut -d= -f1)
if [[ "$keynames" == *$'\nCOLAB_'* ]]; then
IS_COLAB=true
fi
if [ "$_LLAMA_ONLY" != "1" ]; then
# ── Frontend ──
_NEED_FRONTEND_BUILD=true
if [ -d "$SCRIPT_DIR/frontend/dist" ]; then
_changed=$(find "$SCRIPT_DIR/frontend" -maxdepth 1 -type f \
! -name 'bun.lock' \
-newer "$SCRIPT_DIR/frontend/dist" -print -quit 2>/dev/null)
if [ -z "$_changed" ]; then
_changed=$(find "$SCRIPT_DIR/frontend/src" "$SCRIPT_DIR/frontend/public" \
-type f -newer "$SCRIPT_DIR/frontend/dist" -print -quit 2>/dev/null) || true
fi
[ -z "$_changed" ] && _NEED_FRONTEND_BUILD=false
fi
if [ "$_NEED_FRONTEND_BUILD" = false ]; then
step "frontend" "up to date"
verbose_substep "frontend dist is newer than source inputs"
else
# ── Node ──
NEED_NODE=true
if command -v node &>/dev/null && command -v npm &>/dev/null; then
NODE_MAJOR=$(node -v | sed 's/v//' | cut -d. -f1)
NODE_MINOR=$(node -v | sed 's/v//' | cut -d. -f2)
NPM_MAJOR=$(npm -v | cut -d. -f1)
# Vite 8 requires Node ^20.19.0 || >=22.12.0
NODE_OK=false
if [ "$NODE_MAJOR" -eq 20 ] && [ "$NODE_MINOR" -ge 19 ]; then NODE_OK=true; fi
if [ "$NODE_MAJOR" -eq 22 ] && [ "$NODE_MINOR" -ge 12 ]; then NODE_OK=true; fi
if [ "$NODE_MAJOR" -ge 23 ]; then NODE_OK=true; fi
if [ "$NODE_OK" = true ] && [ "$NPM_MAJOR" -ge 11 ]; then
NEED_NODE=false
else
if [ "$IS_COLAB" = true ] && [ "$NODE_OK" = true ]; then
# In Colab, just upgrade npm directly - nvm doesn't work well
if [ "$NPM_MAJOR" -lt 11 ]; then
substep "upgrading npm..."
run_maybe_quiet npm install -g npm@latest
fi
NEED_NODE=false
fi
fi
fi
if [ "$NEED_NODE" = true ]; then
substep "installing nvm..."
export NODE_OPTIONS=--dns-result-order=ipv4first
if _is_verbose; then
curl -so- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
else
curl -so- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash > /dev/null 2>&1
fi
export NVM_DIR="$HOME/.nvm"
set +u
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"
if [ -f "$HOME/.npmrc" ]; then
if grep -qE '^\s*(prefix|globalconfig)\s*=' "$HOME/.npmrc"; then
sed -i.bak '/^\s*\(prefix\|globalconfig\)\s*=/d' "$HOME/.npmrc"
fi
fi
substep "installing Node LTS..."
run_quiet "nvm install" nvm install --lts
if _is_verbose; then
nvm use --lts
else
nvm use --lts > /dev/null 2>&1
fi
set -u
NODE_MAJOR=$(node -v | sed 's/v//' | cut -d. -f1)
NPM_MAJOR=$(npm -v | cut -d. -f1)
if [ "$NODE_MAJOR" -lt 20 ]; then
step "node" "FAILED -- version must be >= 20 (got $(node -v))" "$C_ERR"
exit 1
fi
if [ "$NPM_MAJOR" -lt 11 ]; then
substep "upgrading npm..."
run_quiet "npm update" npm install -g npm@latest
fi
fi
step "node" "$(node -v) | npm $(npm -v)"
verbose_substep "node check: NEED_NODE=$NEED_NODE NODE_OK=${NODE_OK:-unknown} NPM_MAJOR=${NPM_MAJOR:-unknown}"
# ── Install bun (optional, faster package installs) ──
# Uses npm to install bun globally -- Node is already guaranteed above,
# avoids platform-specific installers, PATH issues, and admin requirements.
if ! command -v bun &>/dev/null; then
substep "installing bun..."
if run_maybe_quiet npm install -g bun && command -v bun &>/dev/null; then
substep "bun installed ($(bun --version))"
else
substep "bun install skipped (npm will be used instead)"
fi
else
substep "bun already installed ($(bun --version))"
fi
# ── Build frontend ──
substep "building frontend..."
cd "$SCRIPT_DIR/frontend"
_HIDDEN_GITIGNORES=()
_dir="$(pwd)"
while [ "$_dir" != "/" ]; do
_dir="$(dirname "$_dir")"
if [ -f "$_dir/.gitignore" ] && grep -qx '\*' "$_dir/.gitignore" 2>/dev/null; then
mv "$_dir/.gitignore" "$_dir/.gitignore._twbuild"
_HIDDEN_GITIGNORES+=("$_dir/.gitignore")
fi
done
_restore_gitignores() {
for _gi in "${_HIDDEN_GITIGNORES[@]+"${_HIDDEN_GITIGNORES[@]}"}"; do
mv "${_gi}._twbuild" "$_gi" 2>/dev/null || true
done
}
trap _restore_gitignores EXIT
# Use bun for install if available (faster), fall back to npm.
# Build always uses npm (Node runtime -- avoids bun runtime issues on some platforms).
# NOTE: We intentionally avoid run_quiet for the bun install attempt because
# run_quiet calls exit on failure, which would kill the script before the npm
# fallback can run. Instead we capture output manually and only show it on failure.
#
# IMPORTANT: bun's package cache can become corrupt -- packages get stored
# with only metadata (package.json, README) but no actual content (bin/,
# lib/). When this happens bun install exits 0 but leaves binaries missing.
# We verify critical binaries after install. If missing, we clear the cache
# and retry once before falling back to npm.
_try_bun_install() {
local _log _exit_code=0
_log=$(mktemp)
bun install >"$_log" 2>&1 || _exit_code=$?
# bun may create .exe shims on Windows (Git Bash / MSYS2) instead of plain scripts
if [ "$_exit_code" -eq 0 ] \
&& { [ -x node_modules/.bin/tsc ] || [ -f node_modules/.bin/tsc.exe ] || [ -f node_modules/.bin/tsc.bunx ]; } \
&& { [ -x node_modules/.bin/vite ] || [ -f node_modules/.bin/vite.exe ] || [ -f node_modules/.bin/vite.bunx ]; }; then
rm -f "$_log"
return 0
fi
# Either bun install failed or it exited 0 but left packages missing
if [ "$_exit_code" -ne 0 ]; then
echo " bun install failed (exit code $_exit_code):"
else
echo " bun install exited 0 but critical binaries are missing:"
fi
sed 's/^/ | /' "$_log" >&2
rm -f "$_log"
rm -rf node_modules
return 1
}
_bun_install_ok=false
if command -v bun &>/dev/null; then
substep "using bun for package install (faster)"
if _try_bun_install; then
_bun_install_ok=true
else
# First attempt failed, likely due to corrupt cache entries.
# Clear the cache and retry once.
echo " Clearing bun cache and retrying..."
run_maybe_quiet bun pm cache rm || true
if _try_bun_install; then
_bun_install_ok=true
fi
fi
fi
if [ "$_bun_install_ok" = false ]; then
run_quiet_no_exit "npm install" npm install --no-fund --no-audit --loglevel=error
_npm_install_rc=$?
if [ "$_npm_install_rc" -ne 0 ]; then
exit "$_npm_install_rc"
fi
fi
run_quiet "npm run build" npm run build
_restore_gitignores
trap - EXIT
_MAX_CSS=$(find "$SCRIPT_DIR/frontend/dist/assets" -name '*.css' -exec wc -c {} + 2>/dev/null | sort -n | tail -1 | awk '{print $1}')
if [ -z "$_MAX_CSS" ]; then
step "frontend" "built (warning: no CSS emitted)" "$C_WARN"
elif [ "$_MAX_CSS" -lt 100000 ]; then
step "frontend" "built (warning: CSS may be truncated)" "$C_WARN"
else
step "frontend" "built"
fi
cd "$SCRIPT_DIR"
fi # end frontend build check
# ── oxc-validator runtime ──
if [ -d "$SCRIPT_DIR/backend/core/data_recipe/oxc-validator" ] && command -v npm &>/dev/null; then
cd "$SCRIPT_DIR/backend/core/data_recipe/oxc-validator"
run_quiet_no_exit "npm install (oxc validator runtime)" npm install --no-fund --no-audit --loglevel=error
_oxc_install_rc=$?
if [ "$_oxc_install_rc" -ne 0 ]; then
exit "$_oxc_install_rc"
fi
cd "$SCRIPT_DIR"
fi
# ── Python venv + deps ──
STUDIO_HOME="$HOME/.unsloth/studio"
VENV_DIR="$STUDIO_HOME/unsloth_studio"
VENV_T5_530_DIR="$STUDIO_HOME/.venv_t5_530"
VENV_T5_550_DIR="$STUDIO_HOME/.venv_t5_550"
[ -d "$REPO_ROOT/.venv" ] && rm -rf "$REPO_ROOT/.venv"
[ -d "$REPO_ROOT/.venv_overlay" ] && rm -rf "$REPO_ROOT/.venv_overlay"
[ -d "$REPO_ROOT/.venv_t5" ] && rm -rf "$REPO_ROOT/.venv_t5"
[ -d "$REPO_ROOT/.venv_t5_530" ] && rm -rf "$REPO_ROOT/.venv_t5_530"
[ -d "$REPO_ROOT/.venv_t5_550" ] && rm -rf "$REPO_ROOT/.venv_t5_550"
# Note: do NOT delete $STUDIO_HOME/.venv here — install.sh handles migration
_COLAB_NO_VENV=false
if [ ! -x "$VENV_DIR/bin/python" ]; then
if [ "$IS_COLAB" = true ]; then
# On Colab there is no Studio venv -- install backend deps into system Python.
# Strip all version constraints so pip keeps Colab's pre-installed
# packages (huggingface-hub, datasets, transformers) and only pulls
# in genuinely missing ones (structlog, fastapi, etc.).
substep "Colab detected, installing Studio backend dependencies..."
_COLAB_REQS_TMP="$(mktemp)"
sed 's/[><=!~;].*//' "$SCRIPT_DIR/backend/requirements/studio.txt" \
| grep -v '^#' | grep -v '^$' > "$_COLAB_REQS_TMP"
if [ -s "$_COLAB_REQS_TMP" ]; then
if ! run_quiet_no_exit "install Colab backend deps" pip install -q -r "$_COLAB_REQS_TMP"; then
rm -f "$_COLAB_REQS_TMP"
step "python" "Colab backend dependency install failed" "$C_ERR"
exit 1
fi
else
step "python" "no Colab backend dependencies resolved from requirements file" "$C_WARN"
fi
rm -f "$_COLAB_REQS_TMP"
_COLAB_NO_VENV=true
else
step "python" "venv not found at $VENV_DIR" "$C_ERR"
substep "Run install.sh first to create the environment:"
substep "curl -fsSL https://unsloth.ai/install.sh | sh"
exit 1
fi
else
source "$VENV_DIR/bin/activate"
fi
install_python_stack() {
python "$SCRIPT_DIR/install_python_stack.py"
}
USE_UV=false
if command -v uv &>/dev/null; then
USE_UV=true
elif {
if _is_verbose; then
curl -LsSf https://astral.sh/uv/install.sh | sh
else
curl -LsSf https://astral.sh/uv/install.sh | sh > /dev/null 2>&1
fi
}; then
export PATH="$HOME/.local/bin:$PATH"
command -v uv &>/dev/null && USE_UV=true
fi
fast_install() {
if [ "$USE_UV" = true ]; then
uv pip install --python "$(command -v python)" "$@" && return 0
fi
python -m pip install "$@"
}
cd "$SCRIPT_DIR"
# On Colab without a venv, skip venv-dependent Python deps sections but
# continue to llama.cpp install so GGUF inference is available.
if [ "$_COLAB_NO_VENV" = true ]; then
step "python" "backend deps installed into system Python"
substep "continuing to llama.cpp install for GGUF inference support"
fi
# ── Check if Python deps need updating ──
# Compare installed package version against PyPI latest.
# Skip all Python dependency work if versions match (fast update path).
# On Colab (no venv), skip this version check (it needs $VENV_DIR/bin/python)
# but still run install_python_stack below (it uses sys.executable).
_SKIP_PYTHON_DEPS=false
_SKIP_VERSION_CHECK=false
if [ "$_COLAB_NO_VENV" = true ]; then
_SKIP_VERSION_CHECK=true
fi
_PKG_NAME="${STUDIO_PACKAGE_NAME:-unsloth}"
if [ "$_SKIP_VERSION_CHECK" != true ] && [ "${SKIP_STUDIO_BASE:-0}" != "1" ] && [ "${STUDIO_LOCAL_INSTALL:-0}" != "1" ]; then
# Only check when NOT called from install.sh (which just installed the package)
INSTALLED_VER=$("$VENV_DIR/bin/python" -c "
from importlib.metadata import version
print(version('$_PKG_NAME'))
" 2>/dev/null || echo "")
LATEST_VER=$(curl -fsSL --max-time 5 "https://pypi.org/pypi/$_PKG_NAME/json" 2>/dev/null \
| "$VENV_DIR/bin/python" -c "import sys,json; print(json.load(sys.stdin)['info']['version'])" 2>/dev/null \
|| echo "")
if [ -n "$INSTALLED_VER" ] && [ -n "$LATEST_VER" ] && [ "$INSTALLED_VER" = "$LATEST_VER" ]; then
step "python" "$_PKG_NAME $INSTALLED_VER is up to date"
_SKIP_PYTHON_DEPS=true
elif [ -n "$INSTALLED_VER" ] && [ -n "$LATEST_VER" ]; then
substep "$_PKG_NAME $INSTALLED_VER -> $LATEST_VER available, updating..."
elif [ -z "$LATEST_VER" ]; then
substep "could not reach PyPI, updating to be safe..."
fi
fi
if [ "$_SKIP_PYTHON_DEPS" = false ]; then
install_python_stack
else
step "python" "dependencies up to date"
verbose_substep "python deps check: installed=$_PKG_NAME@${INSTALLED_VER:-unknown} latest=${LATEST_VER:-unknown}"
fi
# ── 6b. Pre-install transformers 5.x into .venv_t5_530/ and .venv_t5_550/ ──
# Models like GLM-4.7-Flash, Qwen3 MoE need transformers>=5.3.0.
# Gemma 4 models need transformers>=5.5.0.
# Pre-install into separate directories to avoid runtime pip overhead.
# The training subprocess prepends the appropriate dir to sys.path.
#
# Runs outside the _SKIP_PYTHON_DEPS gate so that upgrades from legacy
# single .venv_t5 are always migrated to the tiered layout.
_NEED_T5_INSTALL=false
if [ -d "$STUDIO_HOME/.venv_t5" ]; then
# Legacy layout — migrate
rm -rf "$STUDIO_HOME/.venv_t5"
_NEED_T5_INSTALL=true
fi
[ ! -d "$VENV_T5_530_DIR" ] && _NEED_T5_INSTALL=true
[ ! -d "$VENV_T5_550_DIR" ] && _NEED_T5_INSTALL=true
# Also reinstall when python deps were updated (packages may need rebuild)
[ "$_SKIP_PYTHON_DEPS" = false ] && _NEED_T5_INSTALL=true
if [ "$_NEED_T5_INSTALL" = true ]; then
[ -d "$VENV_T5_530_DIR" ] && rm -rf "$VENV_T5_530_DIR"
mkdir -p "$VENV_T5_530_DIR"
run_quiet "install transformers 5.3.0" fast_install --target "$VENV_T5_530_DIR" --no-deps "transformers==5.3.0"
run_quiet "install huggingface_hub for t5_530" fast_install --target "$VENV_T5_530_DIR" --no-deps "huggingface_hub==1.8.0"
run_quiet "install hf_xet for t5_530" fast_install --target "$VENV_T5_530_DIR" --no-deps "hf_xet==1.4.2"
run_quiet "install tiktoken for t5_530" fast_install --target "$VENV_T5_530_DIR" "tiktoken"
step "transformers" "5.3.0 pre-installed"
[ -d "$VENV_T5_550_DIR" ] && rm -rf "$VENV_T5_550_DIR"
mkdir -p "$VENV_T5_550_DIR"
run_quiet "install transformers 5.5.0" fast_install --target "$VENV_T5_550_DIR" --no-deps "transformers==5.5.0"
run_quiet "install huggingface_hub for t5_550" fast_install --target "$VENV_T5_550_DIR" --no-deps "huggingface_hub==1.8.0"
run_quiet "install hf_xet for t5_550" fast_install --target "$VENV_T5_550_DIR" --no-deps "hf_xet==1.4.2"
run_quiet "install tiktoken for t5_550" fast_install --target "$VENV_T5_550_DIR" "tiktoken"
step "transformers" "5.5.0 pre-installed"
fi
fi
# ── 7. Prefer prebuilt llama.cpp bundles before any source build path ──
UNSLOTH_HOME="$HOME/.unsloth"
mkdir -p "$UNSLOTH_HOME"
LLAMA_CPP_DIR="$UNSLOTH_HOME/llama.cpp"
LLAMA_SERVER_BIN="$LLAMA_CPP_DIR/build/bin/llama-server"
_NEED_LLAMA_SOURCE_BUILD=false
_LLAMA_CPP_DEGRADED=false
_LLAMA_FORCE_COMPILE="${UNSLOTH_LLAMA_FORCE_COMPILE:-0}"
_REQUESTED_LLAMA_TAG="${UNSLOTH_LLAMA_TAG:-${_DEFAULT_LLAMA_TAG}}"
_HOST_SYSTEM="$(uname -s 2>/dev/null || true)"
if [ "$_HOST_SYSTEM" = "Darwin" ]; then
_HELPER_RELEASE_REPO="ggml-org/llama.cpp"
else
_HELPER_RELEASE_REPO="unslothai/llama.cpp"
fi
_LLAMA_PR="${UNSLOTH_LLAMA_PR:-}"
_SKIP_PREBUILT_INSTALL=false
_LLAMA_PR_FORCE="${UNSLOTH_LLAMA_PR_FORCE:-${_DEFAULT_LLAMA_PR_FORCE}}"
_LLAMA_SOURCE="${_DEFAULT_LLAMA_SOURCE}"
_LLAMA_SOURCE="${_LLAMA_SOURCE%.git}" # normalize: strip trailing .git
_RESOLVED_SOURCE_URL="$_LLAMA_SOURCE"
_RESOLVED_SOURCE_REF="$_REQUESTED_LLAMA_TAG"
_RESOLVED_SOURCE_REF_KIND="tag"
_RESOLVED_LLAMA_TAG="$_REQUESTED_LLAMA_TAG"
if [ "$_LLAMA_FORCE_COMPILE" = "1" ]; then
_NEED_LLAMA_SOURCE_BUILD=true
_SKIP_PREBUILT_INSTALL=true
fi
# Baked-in PR_FORCE promotes to _LLAMA_PR when user hasn't set one.
if [ -z "$_LLAMA_PR" ] && [ -n "$_LLAMA_PR_FORCE" ] && \
[[ "$_LLAMA_PR_FORCE" =~ ^[0-9]+$ ]] && [ "$_LLAMA_PR_FORCE" -gt 0 ]; then
_LLAMA_PR="$_LLAMA_PR_FORCE"
step "llama.cpp" "baked-in PR_FORCE=$_LLAMA_PR_FORCE" "$C_WARN"
fi
if [ -n "$_LLAMA_PR" ]; then
if ! [[ "$_LLAMA_PR" =~ ^[0-9]+$ ]] || [ "$_LLAMA_PR" -le 0 ]; then
step "llama.cpp" "UNSLOTH_LLAMA_PR=$_LLAMA_PR is not a valid PR number" "$C_ERR"
exit 1
fi
step "llama.cpp" "UNSLOTH_LLAMA_PR=$_LLAMA_PR -- will build from PR head" "$C_WARN"
_RESOLVED_LLAMA_TAG="pr-$_LLAMA_PR"
_RESOLVED_SOURCE_URL="$_LLAMA_SOURCE"
_RESOLVED_SOURCE_REF="pr-$_LLAMA_PR"
_RESOLVED_SOURCE_REF_KIND="pull"
_NEED_LLAMA_SOURCE_BUILD=true
_SKIP_PREBUILT_INSTALL=true
fi
verbose_substep "requested llama.cpp tag: $_REQUESTED_LLAMA_TAG (repo: $_HELPER_RELEASE_REPO)"
if [ "$_LLAMA_FORCE_COMPILE" = "1" ]; then
step "llama.cpp" "UNSLOTH_LLAMA_FORCE_COMPILE=1 -- skipping prebuilt" "$C_WARN"
_NEED_LLAMA_SOURCE_BUILD=true
elif [ "${_SKIP_PREBUILT_INSTALL:-false}" = true ]; then
substep "prebuilt install skipped -- falling back to source build"
else
substep "installing prebuilt llama.cpp..."
if [ -d "$LLAMA_CPP_DIR" ]; then
substep "existing install detected -- validating update"
fi
_PREBUILT_CMD=(
python "$SCRIPT_DIR/install_llama_prebuilt.py"
--install-dir "$LLAMA_CPP_DIR"
--llama-tag "$_REQUESTED_LLAMA_TAG"
--published-repo "$_HELPER_RELEASE_REPO"
--simple-policy
)
if [ -n "${UNSLOTH_LLAMA_RELEASE_TAG:-}" ]; then
_PREBUILT_CMD+=(--published-release-tag "$UNSLOTH_LLAMA_RELEASE_TAG")
fi
_PREBUILT_LOG="$(mktemp)"
set +e
if _is_verbose; then
"${_PREBUILT_CMD[@]}" 2>&1 | tee "$_PREBUILT_LOG"
_PREBUILT_STATUS=${PIPESTATUS[0]}
else
"${_PREBUILT_CMD[@]}" >"$_PREBUILT_LOG" 2>&1
_PREBUILT_STATUS=$?
fi
set -e
if [ "$_PREBUILT_STATUS" -eq 0 ]; then
if grep -Fq "already matches" "$_PREBUILT_LOG"; then
step "llama.cpp" "prebuilt up to date and validated"
else
step "llama.cpp" "prebuilt installed and validated"
fi
print_installed_llama_prebuilt_release "$LLAMA_CPP_DIR"
verbose_substep "llama.cpp install dir: $LLAMA_CPP_DIR"
rm -f "$_PREBUILT_LOG"
elif [ "$_PREBUILT_STATUS" -eq 3 ]; then
step "llama.cpp" "install blocked by active llama.cpp process" "$C_WARN"
print_llama_error_log "$_PREBUILT_LOG"
rm -f "$_PREBUILT_LOG"
if [ -d "$LLAMA_CPP_DIR" ]; then
substep "existing install was restored"
fi
substep "close Studio or other llama.cpp users and retry"
exit 3
else
step "llama.cpp" "prebuilt install failed (continuing)" "$C_WARN"
print_llama_error_log "$_PREBUILT_LOG"
rm -f "$_PREBUILT_LOG"
if [ -d "$LLAMA_CPP_DIR" ]; then
substep "prebuilt update failed; existing install restored"
fi
substep "falling back to source build"
_NEED_LLAMA_SOURCE_BUILD=true
fi
fi
# ── 8. WSL: pre-install GGUF build dependencies for fallback source builds ──
# On WSL, sudo requires a password and can't be entered during GGUF export
# (runs in a non-interactive subprocess). Install build deps here instead.
if [ "$_NEED_LLAMA_SOURCE_BUILD" = true ] && grep -qi microsoft /proc/version 2>/dev/null; then
_GGUF_DEPS="pciutils build-essential cmake curl git libcurl4-openssl-dev"
apt-get update -y >/dev/null 2>&1 || true
apt-get install -y $_GGUF_DEPS >/dev/null 2>&1 || true
_STILL_MISSING=""
for _pkg in $_GGUF_DEPS; do
case "$_pkg" in
build-essential) command -v gcc >/dev/null 2>&1 || _STILL_MISSING="$_STILL_MISSING $_pkg" ;;
pciutils) command -v lspci >/dev/null 2>&1 || _STILL_MISSING="$_STILL_MISSING $_pkg" ;;
libcurl4-openssl-dev) dpkg -s "$_pkg" >/dev/null 2>&1 || _STILL_MISSING="$_STILL_MISSING $_pkg" ;;
*) command -v "$_pkg" >/dev/null 2>&1 || _STILL_MISSING="$_STILL_MISSING $_pkg" ;;
esac
done
_STILL_MISSING=$(echo "$_STILL_MISSING" | sed 's/^ *//')
if [ -z "$_STILL_MISSING" ]; then
step "gguf deps" "installed"
elif command -v sudo >/dev/null 2>&1; then
step "gguf deps" "sudo required for: $_STILL_MISSING" "$C_WARN"
printf " %-15s" ""
printf "accept? [Y/n] "
if [ -r /dev/tty ]; then
read -r REPLY </dev/tty || REPLY="y"
else
REPLY="y"
fi
case "$REPLY" in
[nN]*)
substep "skipped -- run manually:"
substep "sudo apt-get install -y $_STILL_MISSING"
_SKIP_GGUF_BUILD=true
;;
*)
sudo apt-get update -y
sudo apt-get install -y $_STILL_MISSING
step "gguf deps" "installed"
;;
esac
else
step "gguf deps" "missing (no sudo) -- install manually:" "$C_WARN"
substep "apt-get install -y $_STILL_MISSING"
_SKIP_GGUF_BUILD=true
fi
fi
# ── 9. Build llama.cpp binaries for GGUF inference + export when prebuilt install fails ──
# Builds at ~/.unsloth/llama.cpp — a single shared location under the user's
# home directory. This is used by both the inference server and the GGUF
# export pipeline (unsloth-zoo).
# - llama-server: for GGUF model inference
# - llama-quantize: for GGUF export quantization (symlinked to root for check_llama_cpp())
if [ "$_NEED_LLAMA_SOURCE_BUILD" = false ]; then
:
elif [ "${_SKIP_GGUF_BUILD:-}" = true ]; then
step "llama.cpp" "skipped (missing build deps)" "$C_WARN"
[ -f "$LLAMA_SERVER_BIN" ] || _LLAMA_CPP_DEGRADED=true
else
{
if ! command -v cmake &>/dev/null; then
step "llama.cpp" "skipped (cmake not found)" "$C_WARN"
[ -f "$LLAMA_SERVER_BIN" ] || _LLAMA_CPP_DEGRADED=true
elif ! command -v git &>/dev/null; then
step "llama.cpp" "skipped (git not found)" "$C_WARN"
[ -f "$LLAMA_SERVER_BIN" ] || _LLAMA_CPP_DEGRADED=true
else
if [ -z "$_LLAMA_PR" ]; then
_RESOLVED_SOURCE_URL="$_LLAMA_SOURCE"
if [ "$_LLAMA_FORCE_COMPILE" = "1" ]; then
if [ "$_REQUESTED_LLAMA_TAG" = "latest" ]; then
_RESOLVED_SOURCE_REF="${UNSLOTH_LLAMA_FORCE_COMPILE_REF:-${_DEFAULT_LLAMA_FORCE_COMPILE_REF}}"
_RESOLVED_SOURCE_REF_KIND="branch"
else
_RESOLVED_SOURCE_REF="$_REQUESTED_LLAMA_TAG"
_RESOLVED_SOURCE_REF_KIND="tag"
fi
elif [ "$_REQUESTED_LLAMA_TAG" = "latest" ]; then
_RESOLVE_TAG_ARGS=(--resolve-llama-tag latest --published-repo "ggml-org/llama.cpp" --output-format json)
set +e
_RESOLVE_TAG_JSON="$(python "$SCRIPT_DIR/install_llama_prebuilt.py" "${_RESOLVE_TAG_ARGS[@]}" 2>/dev/null)"
_RESOLVE_TAG_STATUS=$?
set -e
if [ "$_RESOLVE_TAG_STATUS" -eq 0 ] && [ -n "${_RESOLVE_TAG_JSON:-}" ]; then
_RESOLVED_SOURCE_REF="$(
printf '%s' "$_RESOLVE_TAG_JSON" | python -c 'import json,sys; print(json.load(sys.stdin).get("llama_tag",""))' 2>/dev/null || true
)"
else
_RESOLVED_SOURCE_REF=""
fi
if [ -z "$_RESOLVED_SOURCE_REF" ]; then
_RESOLVED_SOURCE_REF="latest"
fi
_RESOLVED_SOURCE_REF_KIND="tag"
else
_RESOLVED_SOURCE_REF="$_REQUESTED_LLAMA_TAG"
_RESOLVED_SOURCE_REF_KIND="tag"
fi
if [ -z "$_RESOLVED_SOURCE_URL" ]; then
_RESOLVED_SOURCE_URL="$_LLAMA_SOURCE"
fi
if [ -z "$_RESOLVED_SOURCE_REF" ]; then
_RESOLVED_SOURCE_REF="$_REQUESTED_LLAMA_TAG"
fi
fi
verbose_substep "source build repo: $_RESOLVED_SOURCE_URL"
verbose_substep "source build ref: ${_RESOLVED_SOURCE_REF:-latest} (${_RESOLVED_SOURCE_REF_KIND})"
BUILD_OK=true
mkdir -p "$(dirname "$LLAMA_CPP_DIR")"
_BUILD_TMP="${LLAMA_CPP_DIR}.build.$$"
rm -rf "$_BUILD_TMP"
if [ -n "$_LLAMA_PR" ]; then
run_quiet_no_exit "clone llama.cpp" \
git clone --depth 1 "${_LLAMA_SOURCE}.git" "$_BUILD_TMP" || BUILD_OK=false
if [ "$BUILD_OK" = true ]; then
run_quiet_no_exit "fetch PR #$_LLAMA_PR" \
git -C "$_BUILD_TMP" fetch --depth 1 origin "pull/$_LLAMA_PR/head:pr-$_LLAMA_PR" || BUILD_OK=false
fi
if [ "$BUILD_OK" = true ]; then
run_quiet_no_exit "checkout PR #$_LLAMA_PR" \
git -C "$_BUILD_TMP" checkout "pr-$_LLAMA_PR" || BUILD_OK=false
fi
elif [ "$_RESOLVED_SOURCE_REF_KIND" = "pull" ] && [ -n "$_RESOLVED_SOURCE_REF" ]; then
run_quiet_no_exit "clone llama.cpp" \
git clone --depth 1 "${_RESOLVED_SOURCE_URL}.git" "$_BUILD_TMP" || BUILD_OK=false
if [ "$BUILD_OK" = true ]; then
run_quiet_no_exit "fetch source PR ref" \
git -C "$_BUILD_TMP" fetch --depth 1 origin "$_RESOLVED_SOURCE_REF" || BUILD_OK=false
fi
if [ "$BUILD_OK" = true ]; then
run_quiet_no_exit "checkout source PR ref" \
git -C "$_BUILD_TMP" checkout -B unsloth-llama-build FETCH_HEAD || BUILD_OK=false
fi
elif [ "$_RESOLVED_SOURCE_REF_KIND" = "commit" ] && [ -n "$_RESOLVED_SOURCE_REF" ]; then
run_quiet_no_exit "clone llama.cpp" \
git clone --depth 1 "${_RESOLVED_SOURCE_URL}.git" "$_BUILD_TMP" || BUILD_OK=false
if [ "$BUILD_OK" = true ]; then
run_quiet_no_exit "fetch source commit" \
git -C "$_BUILD_TMP" fetch --depth 1 origin "$_RESOLVED_SOURCE_REF" || BUILD_OK=false
fi
if [ "$BUILD_OK" = true ]; then
run_quiet_no_exit "checkout source commit" \
git -C "$_BUILD_TMP" checkout -B unsloth-llama-build FETCH_HEAD || BUILD_OK=false
fi
else
_CLONE_ARGS=(git clone --depth 1)
if [ "$_RESOLVED_SOURCE_REF" != "latest" ] && [ -n "$_RESOLVED_SOURCE_REF" ]; then
_CLONE_ARGS+=(--branch "$_RESOLVED_SOURCE_REF")
fi
_CLONE_ARGS+=("${_RESOLVED_SOURCE_URL}.git" "$_BUILD_TMP")
run_quiet_no_exit "clone llama.cpp" \
"${_CLONE_ARGS[@]}" || BUILD_OK=false
fi
if [ "$BUILD_OK" = true ]; then
CMAKE_ARGS="-DLLAMA_BUILD_TESTS=OFF -DLLAMA_BUILD_EXAMPLES=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_NATIVE=ON"
_TRY_METAL_CPU_FALLBACK=false
_HOST_SYSTEM="$(uname -s 2>/dev/null || true)"
_HOST_MACHINE="$(uname -m 2>/dev/null || true)"
_IS_MACOS_ARM64=false
if [ "$_HOST_SYSTEM" = "Darwin" ] && { [ "$_HOST_MACHINE" = "arm64" ] || [ "$_HOST_MACHINE" = "aarch64" ]; }; then
_IS_MACOS_ARM64=true
fi
if command -v ccache &>/dev/null; then
CMAKE_ARGS="$CMAKE_ARGS -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DCMAKE_CUDA_COMPILER_LAUNCHER=ccache"
fi
CPU_FALLBACK_CMAKE_ARGS="$CMAKE_ARGS"
GPU_BACKEND=""
NVCC_PATH=""
if command -v nvcc &>/dev/null; then
NVCC_PATH="$(command -v nvcc)"
GPU_BACKEND="cuda"
elif [ -x /usr/local/cuda/bin/nvcc ]; then
NVCC_PATH="/usr/local/cuda/bin/nvcc"
export PATH="/usr/local/cuda/bin:$PATH"
GPU_BACKEND="cuda"
elif ls /usr/local/cuda-*/bin/nvcc &>/dev/null 2>&1; then
# Pick the newest cuda-XX.X directory
NVCC_PATH="$(ls -d /usr/local/cuda-*/bin/nvcc 2>/dev/null | sort -V | tail -1)"
export PATH="$(dirname "$NVCC_PATH"):$PATH"
GPU_BACKEND="cuda"
fi
# Check for ROCm (AMD) only if CUDA was not already selected
ROCM_HIPCC=""
if [ -z "$GPU_BACKEND" ]; then
if command -v hipcc &>/dev/null; then
ROCM_HIPCC="$(command -v hipcc)"
GPU_BACKEND="rocm"
elif [ -x /opt/rocm/bin/hipcc ]; then
ROCM_HIPCC="/opt/rocm/bin/hipcc"
export PATH="/opt/rocm/bin:$PATH"
GPU_BACKEND="rocm"
elif ls /opt/rocm-*/bin/hipcc &>/dev/null 2>&1; then
ROCM_HIPCC="$(ls -d /opt/rocm-*/bin/hipcc 2>/dev/null | sort -V | tail -1)"
export PATH="$(dirname "$ROCM_HIPCC"):$PATH"
GPU_BACKEND="rocm"
fi
fi
_BUILD_DESC="building"
if [ "$_IS_MACOS_ARM64" = true ]; then
# Metal takes precedence on Apple Silicon (CUDA/ROCm not functional on macOS)
_BUILD_DESC="building (Metal)"
CMAKE_ARGS="$CMAKE_ARGS -DGGML_METAL=ON -DGGML_METAL_EMBED_LIBRARY=ON -DGGML_METAL_USE_BF16=ON -DCMAKE_INSTALL_RPATH=@loader_path -DCMAKE_BUILD_WITH_INSTALL_RPATH=ON"
CPU_FALLBACK_CMAKE_ARGS="$CPU_FALLBACK_CMAKE_ARGS -DGGML_METAL=OFF"
_TRY_METAL_CPU_FALLBACK=true
elif [ -n "$NVCC_PATH" ]; then
CMAKE_ARGS="$CMAKE_ARGS -DGGML_CUDA=ON"
CUDA_ARCHS=""
if command -v nvidia-smi &>/dev/null; then
_raw_caps=$(nvidia-smi --query-gpu=compute_cap --format=csv,noheader 2>/dev/null || true)
while IFS= read -r _cap; do
_cap=$(echo "$_cap" | tr -d '[:space:]')
if [[ "$_cap" =~ ^([0-9]+)\.([0-9]+)$ ]]; then
_arch="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"
# Append if not already present
case ";$CUDA_ARCHS;" in
*";$_arch;"*) ;;
*) CUDA_ARCHS="${CUDA_ARCHS:+$CUDA_ARCHS;}$_arch" ;;
esac
fi
done <<< "$_raw_caps"
fi
if [ -n "$CUDA_ARCHS" ]; then
CMAKE_ARGS="$CMAKE_ARGS -DCMAKE_CUDA_ARCHITECTURES=${CUDA_ARCHS}"
_BUILD_DESC="building (CUDA, sm_${CUDA_ARCHS//;/+sm_})"
else
_BUILD_DESC="building (CUDA)"
fi
CMAKE_ARGS="$CMAKE_ARGS -DCMAKE_CUDA_FLAGS=--threads=0"
elif [ "$GPU_BACKEND" = "rocm" ]; then
# Resolve hipcc symlinks to find the real ROCm root
_HIPCC_REAL="$(readlink -f "$ROCM_HIPCC" 2>/dev/null || printf '%s' "$ROCM_HIPCC")"
ROCM_ROOT=""
if command -v hipconfig &>/dev/null; then
ROCM_ROOT="$(hipconfig -R 2>/dev/null || true)"
fi
if [ -z "$ROCM_ROOT" ]; then
ROCM_ROOT="$(cd "$(dirname "$_HIPCC_REAL")/.." 2>/dev/null && pwd)"
fi
_BUILD_DESC="building (ROCm)"
CMAKE_ARGS="$CMAKE_ARGS -DGGML_HIP=ON"
export ROCM_PATH="$ROCM_ROOT"
export HIP_PATH="$ROCM_ROOT"
# Use upstream-recommended HIP compiler (not legacy hipcc-as-CXX)
if command -v hipconfig &>/dev/null; then
_HIP_CLANG_DIR="$(hipconfig -l 2>/dev/null || true)"
[ -n "$_HIP_CLANG_DIR" ] && export HIPCXX="$_HIP_CLANG_DIR/clang"
fi
# Detect AMD GPU architecture (gfx target)
GPU_TARGETS=""
if command -v rocminfo &>/dev/null; then
_gfx_list=$(rocminfo 2>/dev/null | grep -oE 'gfx[0-9]{2,4}[a-z]?' | sort -u || true)
_valid_gfx=""
for _gfx in $_gfx_list; do
if [[ "$_gfx" =~ ^gfx[0-9]{2,4}[a-z]?$ ]]; then
# Drop bare family-level targets (gfx10, gfx11, gfx12, ...)
# when a specific sibling is present in the same list.
# rocminfo on ROCm 6.1+ emits both the specific GPU and
# the LLVM generic family line (e.g. gfx1100 alongside
# gfx11-generic), and the outer grep above captures the
# bare family prefix from the generic line. Passing that
# bare prefix to -DGPU_TARGETS breaks the HIP/llama.cpp
# build because clang only accepts specific gfxNNN ids.
# No real AMD GPU has a 2-digit gfx id, so this filter
# can only ever drop family prefixes, never real targets.
if [[ "$_gfx" =~ ^gfx[0-9]{2}$ ]] \
&& echo "$_gfx_list" | grep -qE "^${_gfx}[0-9][0-9a-z]?$"; then
continue
fi
_valid_gfx="${_valid_gfx}${_valid_gfx:+;}$_gfx"
fi
done
[ -n "$_valid_gfx" ] && GPU_TARGETS="$_valid_gfx"
fi
if [ -n "$GPU_TARGETS" ]; then
CMAKE_ARGS="$CMAKE_ARGS -DGPU_TARGETS=${GPU_TARGETS}"
_BUILD_DESC="building (ROCm, ${GPU_TARGETS//;/+})"
fi
elif [ -d /usr/local/cuda ] || nvidia-smi &>/dev/null; then
_BUILD_DESC="building (CPU, CUDA driver found but nvcc missing)"
elif [ -d /opt/rocm ] || command -v rocm-smi &>/dev/null; then
_BUILD_DESC="building (CPU, ROCm driver found but hipcc missing)"
else
_BUILD_DESC="building (CPU)"
fi
substep "$_BUILD_DESC..."
NCPU=$(nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
CMAKE_GENERATOR_ARGS=""
if command -v ninja &>/dev/null; then
CMAKE_GENERATOR_ARGS="-G Ninja"
fi
if ! run_quiet_no_exit "cmake llama.cpp" cmake $CMAKE_GENERATOR_ARGS -S "$_BUILD_TMP" -B "$_BUILD_TMP/build" $CMAKE_ARGS; then
if [ "$_TRY_METAL_CPU_FALLBACK" = true ]; then
_TRY_METAL_CPU_FALLBACK=false
substep "Metal configure failed; retrying CPU build..." "$C_WARN"
rm -rf "$_BUILD_TMP/build"
run_quiet_no_exit "cmake llama.cpp (cpu fallback)" cmake $CMAKE_GENERATOR_ARGS -S "$_BUILD_TMP" -B "$_BUILD_TMP/build" $CPU_FALLBACK_CMAKE_ARGS || BUILD_OK=false
if [ "$BUILD_OK" = true ]; then
_BUILD_DESC="building (CPU fallback)"
fi
else
BUILD_OK=false
fi
fi
fi
if [ "$BUILD_OK" = true ]; then
if ! run_quiet_no_exit "build llama-server" cmake --build "$_BUILD_TMP/build" --config Release --target llama-server -j"$NCPU"; then
if [ "$_TRY_METAL_CPU_FALLBACK" = true ]; then
_TRY_METAL_CPU_FALLBACK=false
substep "Metal build failed; retrying CPU build..." "$C_WARN"
rm -rf "$_BUILD_TMP/build"
if run_quiet_no_exit "cmake llama.cpp (cpu fallback)" cmake $CMAKE_GENERATOR_ARGS -S "$_BUILD_TMP" -B "$_BUILD_TMP/build" $CPU_FALLBACK_CMAKE_ARGS; then
_BUILD_DESC="building (CPU fallback)"
run_quiet_no_exit "build llama-server (cpu fallback)" cmake --build "$_BUILD_TMP/build" --config Release --target llama-server -j"$NCPU" || BUILD_OK=false
else
BUILD_OK=false
fi
else
BUILD_OK=false
fi
fi
fi
if [ "$BUILD_OK" = true ]; then
run_quiet_no_exit "build llama-quantize" cmake --build "$_BUILD_TMP/build" --config Release --target llama-quantize -j"$NCPU" || true
fi
# Swap only after build succeeds -- preserves existing install on failure
if [ "$BUILD_OK" = true ]; then
rm -rf "$LLAMA_CPP_DIR"
mv "$_BUILD_TMP" "$LLAMA_CPP_DIR"
# Symlink to llama.cpp root -- check_llama_cpp() looks for the binary there
QUANTIZE_BIN="$LLAMA_CPP_DIR/build/bin/llama-quantize"
if [ -f "$QUANTIZE_BIN" ]; then
ln -sf build/bin/llama-quantize "$LLAMA_CPP_DIR/llama-quantize"
fi
else
rm -rf "$_BUILD_TMP"
fi
if [ "$BUILD_OK" = true ] && [ -f "$LLAMA_SERVER_BIN" ]; then
step "llama.cpp" "built"
[ -f "$LLAMA_CPP_DIR/llama-quantize" ] && step "llama-quantize" "built"
elif [ "$BUILD_OK" = true ]; then
step "llama.cpp" "binary not found after build" "$C_WARN"
_LLAMA_CPP_DEGRADED=true
else
step "llama.cpp" "build failed" "$C_ERR"
[ -f "$LLAMA_SERVER_BIN" ] || _LLAMA_CPP_DEGRADED=true
fi
fi
}
fi # end _SKIP_GGUF_BUILD check
# ── Footer ──
if [ "$_LLAMA_ONLY" = "1" ]; then
echo ""
printf " ${C_DIM}%s${C_RST}\n" "$RULE"
if [ "$_LLAMA_CPP_DEGRADED" = true ]; then
printf " ${C_WARN}%s${C_RST}\n" "llama.cpp update finished (limited: llama.cpp unavailable)"
else
printf " ${C_TITLE}%s${C_RST}\n" "llama.cpp update finished"
fi
printf " ${C_DIM}%s${C_RST}\n" "$RULE"
elif [ "$IS_COLAB" = true ]; then
echo ""
printf " ${C_DIM}%s${C_RST}\n" "$RULE"
if [ "$_LLAMA_CPP_DEGRADED" = true ]; then
printf " ${C_WARN}%s${C_RST}\n" "Unsloth Studio Setup Complete (limited: llama.cpp unavailable)"
else
printf " ${C_TITLE}%s${C_RST}\n" "Unsloth Studio Setup Complete"
fi
printf " ${C_DIM}%s${C_RST}\n" "$RULE"
substep "from colab import start"
substep "start()"
else
printf " ${C_DIM}%s${C_RST}\n" "$RULE"
if [ "$_LLAMA_CPP_DEGRADED" = true ]; then
printf " ${C_WARN}%s${C_RST}\n" "Unsloth Studio Installed (limited: llama.cpp unavailable)"
else
printf " ${C_TITLE}%s${C_RST}\n" "Unsloth Studio Installed"
fi
printf " ${C_DIM}%s${C_RST}\n" "$RULE"
if [ "$_LLAMA_CPP_DEGRADED" = true ]; then
printf " ${C_DIM}%-15s${C_WARN}%s${C_RST}\n" "launch" "unsloth studio -H 0.0.0.0 -p 8888"
else
printf " ${C_DIM}%-15s${C_OK}%s${C_RST}\n" "launch" "unsloth studio -H 0.0.0.0 -p 8888"
fi
fi
echo ""
# When called from install.sh (SKIP_STUDIO_BASE=1), exit non-zero so the
# installer can report the GGUF failure after finishing PATH/shortcut setup.
# When called directly via 'unsloth studio update', keep the install
# successful -- the footer above already reports the limitation and Studio
# is still usable for non-GGUF workflows.
if [ "$_LLAMA_CPP_DEGRADED" = true ] && [ "${SKIP_STUDIO_BASE:-0}" = "1" ]; then
exit 1
fi