Use prebuilt llama.cpp for unsloth studio setup (#4562)

* Use prebuilt llama.cpp for unsloth studio setup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix 3 issues that cause unnecessary fallback to source build

1. Make filelock import optional -- environments without filelock
   (e.g. minimal installs) crashed at import time instead of
   gracefully skipping the lock.

2. Use already-verified converter script from the hydrated source
   tree instead of re-downloading from raw.githubusercontent.com
   with no checksum. Adds symlink with copy fallback for the
   legacy filename.

3. Initialize $SkipPrebuiltInstall in setup.ps1 before first use
   to prevent potential uninitialized variable errors.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Keep network fallback in ensure_converter_scripts

Prefer the local verified copy from the hydrated source tree, but
retain the original network download as a fallback if the file is
missing. Create the legacy hyphenated filename as a symlink with a
copy fallback instead of writing a second full copy.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix 4 bugs in source-build fallback and binary_env paths

- setup.ps1: Replace git pull + checkout FETCH_HEAD with fetch + checkout -B
  to avoid detached HEAD state that breaks re-runs. Use pinned tag in both
  fetch and clone paths.
- setup.sh: Move rm -rf after cmake/git prerequisite checks so a missing
  tool no longer deletes the existing install. Add --branch tag to clone.
- install_llama_prebuilt.py: Add binary_path.parent to Linux LD_LIBRARY_PATH
  in binary_env() so bundled .so files in build/bin are found even without
  RPATH, matching the existing Windows PATH logic.
- Add test for binary_env LD_LIBRARY_PATH on Linux.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Handle unresolved "latest" tag in source-build fallback clone

When tag resolution fails and the requested tag is "latest", both
setup scripts now omit --branch from git clone so the default branch
is cloned instead of failing on a nonexistent "latest" branch/tag.
Similarly, the PS1 fetch path fetches the default ref when the tag
is "latest".

* Resolve actual latest ggml-org tag instead of using literal "latest"

When both Python tag resolution attempts fail and the requested tag
is "latest", query the GitHub API for the actual latest release tag
from ggml-org/llama.cpp (e.g. b8508) instead of passing the literal
string "latest" to git clone --branch, which would fail since no
such branch/tag exists.

setup.sh uses curl + python json parsing; setup.ps1 uses
Invoke-RestMethod. Both fall back to the raw requested tag if the
API call also fails.

* Try Unsloth release repo before ggml-org when resolving latest tag

When falling back to the GitHub API to resolve "latest", query the
Unsloth release repo (unslothai/llama.cpp) first since it has the
prebuilt binaries pinned to tested tags. Only fall back to
ggml-org/llama.cpp if the Unsloth repo query fails.

* Add comprehensive sandbox tests for PR #4562 bug fixes

35 tests covering all fixes across platforms:
- binary_env cross-platform (Linux LD_LIBRARY_PATH, Windows PATH,
  macOS DYLD_LIBRARY_PATH) with edge cases (dedup, ordering, existing paths)
- resolve_requested_llama_tag (concrete, latest, None, empty)
- setup.sh logic via subprocess: prereq check ordering (cmake/git missing
  preserves install), pinned tag in clone, fetch+checkout -B pattern,
  fetch failure warns instead of aborting
- "latest" tag resolution fallback chain (Unsloth API -> ggml-org ->
  raw) with mock curl: success, failure, malformed JSON, empty body,
  empty tag_name, env overrides
- Source code pattern verification for both .sh and .ps1 files

All 138 tests pass in isolated uv venv.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add binary_path.parent to macOS DYLD_LIBRARY_PATH in binary_env

macOS prebuilt .dylib files are overlaid into build/bin (same as
Linux), but binary_env only added install_dir to DYLD_LIBRARY_PATH.
Add binary_path.parent so the loader can find sibling dylibs even
without embedded loader paths.

Mirrors the existing fix for Linux LD_LIBRARY_PATH and the Windows
PATH pattern.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Guard --branch when resolved tag is "latest"; fix broken test assertion

When all API fallbacks fail and the tag stays as literal "latest",
omit --branch from git clone (clones default branch instead of
failing). Both setup.sh and setup.ps1 now check for "latest" before
passing --branch to git clone/fetch.

Also fix test_setup_ps1_clone_uses_branch_tag which used Python
tuple syntax (assert "x", "y" in z) that always passes. Changed to
assert "x" in z and "y" in z.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix macOS DYLD trailing colon, install_lock no-op, and debug log

- binary_env macOS: use dedupe_existing_dirs instead of raw string
  concatenation. Eliminates trailing colon in DYLD_LIBRARY_PATH
  (which causes dyld to search CWD for libraries) and deduplicates
  when binary_path.parent == install_dir. Now consistent with the
  Linux and Windows branches.
- install_lock: when filelock is not installed, use os.O_CREAT|O_EXCL
  as a fallback exclusive file lock with timeout, instead of yielding
  with no locking. Prevents concurrent installs from corrupting each
  other's staging directories.
- setup.ps1: remove [DEBUG] log line that printed to every user on
  every Windows setup run.

* Add stale-lock detection and atomic clone-then-swap

install_lock fallback (no filelock): write PID to lock file and
check if the holder process is still alive on contention. Dead PIDs
(ProcessLookupError) and unreadable lock files trigger immediate
cleanup. Live processes owned by other users (PermissionError) are
correctly recognized as alive -- the lock is not removed.

setup.sh/setup.ps1 source-build: clone into a temporary directory
first, then swap into place only on success. If git clone fails,
the existing install is preserved instead of being deleted by the
premature rm -rf.

* Remove redundant upstream_tag != release_tag check

load_approved_release_checksums compared checksums.upstream_tag
against the Unsloth release_tag, which are different namespaces
(upstream ggml-org tag vs Unsloth published tag). This only worked
because both happened to be "b8508" by convention. Would break if
Unsloth ever uses a different release naming scheme.

The existing check at parse_approved_release_checksums (line 950)
already validates the release_tag field correctly.

* Fix lock TOCTOU race and build-in-temp-dir swap

install_lock fallback: add os.fsync(fd) after writing PID to ensure
the PID is visible to racing processes before they check. Treat
empty lock files (PID not yet written) as "wait and retry" instead
of stale, closing the window where two processes could both see an
empty file, both unlink it, and both acquire the lock.

setup.sh/setup.ps1 source-build: clone AND build in a temp directory
(LLAMA_CPP_DIR.build.$$). Only swap into the final LLAMA_CPP_DIR
after the build succeeds. If clone or cmake or build fails, the temp
dir is cleaned up and the existing working install is preserved.
Previously, rm -rf ran after clone but before build, destroying the
existing install even if the build later failed.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
This commit is contained in:
DoubleMathew 2026-03-25 07:42:43 -05:00 committed by GitHub
parent cc1be75621
commit f4d8a246bf
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 6046 additions and 64 deletions

3395
studio/install_llama_prebuilt.py Executable file

File diff suppressed because it is too large Load diff

View file

@ -503,7 +503,6 @@ if ($DriverMaxCuda) {
$isCompat = ($tkMaj -lt $drMajorCuda) -or ($tkMaj -eq $drMajorCuda -and $tkMin -le $drMinorCuda)
if ($isCompat) {
# Also verify the toolkit supports our GPU architecture
Write-Host " [DEBUG] Checking CUDA compatibility: toolkit=$tkMaj.$tkMin arch=sm_$CudaArch" -ForegroundColor Magenta
$archOk = $true
if ($CudaArch) {
$archOk = Test-NvccArchSupport -NvccExe $candidateNvcc -Arch $CudaArch
@ -1296,6 +1295,93 @@ if ($LASTEXITCODE -ne 0) {
$ErrorActionPreference = $prevEAP_t5
Write-Host "[OK] Transformers 5.x pre-installed to .venv_t5/" -ForegroundColor Green
# ==========================================================================
# PHASE 3.4: Prefer prebuilt llama.cpp bundles before source build
# ==========================================================================
$UnslothHome = Join-Path $env:USERPROFILE ".unsloth"
if (-not (Test-Path $UnslothHome)) { New-Item -ItemType Directory -Force $UnslothHome | Out-Null }
$LlamaCppDir = Join-Path $UnslothHome "llama.cpp"
$NeedLlamaSourceBuild = $false
$SkipPrebuiltInstall = $false
$RequestedLlamaTag = if ($env:UNSLOTH_LLAMA_TAG) { $env:UNSLOTH_LLAMA_TAG } else { "latest" }
$HelperReleaseRepo = if ($env:UNSLOTH_LLAMA_RELEASE_REPO) { $env:UNSLOTH_LLAMA_RELEASE_REPO } else { "unslothai/llama.cpp" }
$resolveOutput = & python "$PSScriptRoot\install_llama_prebuilt.py" --resolve-install-tag $RequestedLlamaTag --published-repo $HelperReleaseRepo 2>&1
$resolveExit = $LASTEXITCODE
$ResolvedLlamaTag = if ($resolveOutput) { ($resolveOutput | Select-Object -Last 1).ToString().Trim() } else { "" }
if ($resolveExit -ne 0 -or [string]::IsNullOrWhiteSpace($ResolvedLlamaTag)) {
Write-Host ""
Write-Host "[WARN] Failed to resolve an installable prebuilt llama.cpp tag via $HelperReleaseRepo" -ForegroundColor Yellow
if ($resolveOutput) {
$resolveOutput | ForEach-Object { Write-Host $_ }
}
$fallbackOutput = & python "$PSScriptRoot\install_llama_prebuilt.py" --resolve-llama-tag $RequestedLlamaTag 2>$null
$fallbackExit = $LASTEXITCODE
$ResolvedLlamaTag = if ($fallbackExit -eq 0 -and $fallbackOutput) {
($fallbackOutput | Select-Object -Last 1).ToString().Trim()
} elseif ($RequestedLlamaTag -eq "latest") {
# Try Unsloth release repo first, then fall back to ggml-org upstream
$resolvedLatest = $null
try {
$latestRelease = Invoke-RestMethod -Uri "https://api.github.com/repos/$HelperReleaseRepo/releases/latest" -ErrorAction Stop
$resolvedLatest = $latestRelease.tag_name
} catch {}
if (-not $resolvedLatest) {
try {
$latestRelease = Invoke-RestMethod -Uri "https://api.github.com/repos/ggml-org/llama.cpp/releases/latest" -ErrorAction Stop
$resolvedLatest = $latestRelease.tag_name
} catch {}
}
if ($resolvedLatest) { $resolvedLatest } else { $RequestedLlamaTag }
} else {
$RequestedLlamaTag
}
$NeedLlamaSourceBuild = $true
$SkipPrebuiltInstall = $true
}
Write-Host ""
Write-Host "Resolved llama.cpp release tag: $ResolvedLlamaTag" -ForegroundColor Gray
if ($env:UNSLOTH_LLAMA_FORCE_COMPILE -eq "1") {
Write-Host ""
Write-Host "[WARN] UNSLOTH_LLAMA_FORCE_COMPILE=1 -- skipping prebuilt llama.cpp install" -ForegroundColor Yellow
$NeedLlamaSourceBuild = $true
} else {
Write-Host ""
Write-Host "Installing prebuilt llama.cpp bundle (preferred path)..." -ForegroundColor Cyan
if (Test-Path $LlamaCppDir) {
Write-Host "Existing llama.cpp install detected -- validating staged prebuilt update before replacement" -ForegroundColor Gray
}
if ($SkipPrebuiltInstall) {
Write-Host "[WARN] Skipping prebuilt install because prebuilt tag resolution failed -- falling back to source build" -ForegroundColor Yellow
} else {
$prebuiltArgs = @(
"$PSScriptRoot\install_llama_prebuilt.py",
"--install-dir", $LlamaCppDir,
"--llama-tag", $ResolvedLlamaTag,
"--published-repo", $HelperReleaseRepo
)
if ($env:UNSLOTH_LLAMA_RELEASE_TAG) {
$prebuiltArgs += @("--published-release-tag", $env:UNSLOTH_LLAMA_RELEASE_TAG)
}
$prevEAPPrebuilt = $ErrorActionPreference
$ErrorActionPreference = "Continue"
& python @prebuiltArgs
$prebuiltExit = $LASTEXITCODE
$ErrorActionPreference = $prevEAPPrebuilt
if ($prebuiltExit -eq 0) {
Write-Host "[OK] Prebuilt llama.cpp installed and validated" -ForegroundColor Green
} else {
if (Test-Path $LlamaCppDir) {
Write-Host "[WARN] Prebuilt update failed; existing install was restored or cleaned before source build fallback" -ForegroundColor Yellow
}
Write-Host "[WARN] Prebuilt llama.cpp path unavailable or failed validation -- falling back to source build" -ForegroundColor Yellow
$NeedLlamaSourceBuild = $true
}
}
}
# ==========================================================================
# PHASE 3.5: Install OpenSSL dev (for HTTPS support in llama-server)
# ==========================================================================
@ -1303,42 +1389,46 @@ Write-Host "[OK] Transformers 5.x pre-installed to .venv_t5/" -ForegroundColor G
# ShiningLight.OpenSSL.Dev includes headers + libs that cmake can find.
$OpenSslAvailable = $false
# Check if OpenSSL dev is already installed (look for include dir)
$OpenSslRoots = @(
'C:\Program Files\OpenSSL-Win64',
'C:\Program Files\OpenSSL',
'C:\OpenSSL-Win64'
)
$OpenSslRoot = $null
foreach ($root in $OpenSslRoots) {
if (Test-Path (Join-Path $root 'include\openssl\ssl.h')) {
$OpenSslRoot = $root
break
}
}
if ($OpenSslRoot) {
$OpenSslAvailable = $true
Write-Host "[OK] OpenSSL dev found at $OpenSslRoot" -ForegroundColor Green
} else {
Write-Host ""
Write-Host "Installing OpenSSL dev (for HTTPS in llama-server)..." -ForegroundColor Cyan
$HasWinget = $null -ne (Get-Command winget -ErrorAction SilentlyContinue)
if ($HasWinget) {
winget install -e --id ShiningLight.OpenSSL.Dev --accept-package-agreements --accept-source-agreements
# Re-check after install
foreach ($root in $OpenSslRoots) {
if (Test-Path (Join-Path $root 'include\openssl\ssl.h')) {
$OpenSslRoot = $root
$OpenSslAvailable = $true
Write-Host "[OK] OpenSSL dev installed at $OpenSslRoot" -ForegroundColor Green
break
}
if ($NeedLlamaSourceBuild) {
# Check if OpenSSL dev is already installed (look for include dir)
$OpenSslRoots = @(
'C:\Program Files\OpenSSL-Win64',
'C:\Program Files\OpenSSL',
'C:\OpenSSL-Win64'
)
$OpenSslRoot = $null
foreach ($root in $OpenSslRoots) {
if (Test-Path (Join-Path $root 'include\openssl\ssl.h')) {
$OpenSslRoot = $root
break
}
}
if (-not $OpenSslAvailable) {
Write-Host "[WARN] OpenSSL dev not available -- llama-server will be built without HTTPS" -ForegroundColor Yellow
if ($OpenSslRoot) {
$OpenSslAvailable = $true
Write-Host "[OK] OpenSSL dev found at $OpenSslRoot" -ForegroundColor Green
} else {
Write-Host ""
Write-Host "Installing OpenSSL dev (for HTTPS in llama-server)..." -ForegroundColor Cyan
$HasWinget = $null -ne (Get-Command winget -ErrorAction SilentlyContinue)
if ($HasWinget) {
winget install -e --id ShiningLight.OpenSSL.Dev --accept-package-agreements --accept-source-agreements
# Re-check after install
foreach ($root in $OpenSslRoots) {
if (Test-Path (Join-Path $root 'include\openssl\ssl.h')) {
$OpenSslRoot = $root
$OpenSslAvailable = $true
Write-Host "[OK] OpenSSL dev installed at $OpenSslRoot" -ForegroundColor Green
break
}
}
}
if (-not $OpenSslAvailable) {
Write-Host "[WARN] OpenSSL dev not available -- llama-server will be built without HTTPS" -ForegroundColor Yellow
}
}
} else {
Write-Host "[SKIP] OpenSSL dev install -- prebuilt llama.cpp already validated" -ForegroundColor Yellow
}
# ==========================================================================
@ -1351,9 +1441,7 @@ if ($OpenSslRoot) {
# - llama-server: for GGUF model inference (with HTTPS if OpenSSL available)
# - llama-quantize: for GGUF export quantization
# Prerequisites (git, cmake, VS Build Tools, CUDA Toolkit) already installed in Phase 1.
$UnslothHome = Join-Path $env:USERPROFILE ".unsloth"
if (-not (Test-Path $UnslothHome)) { New-Item -ItemType Directory -Force $UnslothHome | Out-Null }
$LlamaCppDir = Join-Path $UnslothHome "llama.cpp"
$OriginalLlamaCppDir = $LlamaCppDir
$BuildDir = Join-Path $LlamaCppDir "build"
$LlamaServerBin = Join-Path $BuildDir "bin\Release\llama-server.exe"
@ -1376,7 +1464,10 @@ if (Test-Path $LlamaServerBin) {
}
}
if ((Test-Path $LlamaServerBin) -and -not $NeedRebuild) {
if (-not $NeedLlamaSourceBuild) {
Write-Host ""
Write-Host "[OK] Using validated prebuilt llama.cpp install at $LlamaCppDir" -ForegroundColor Green
} elseif ((Test-Path $LlamaServerBin) -and -not $NeedRebuild) {
Write-Host ""
Write-Host "[OK] llama-server already exists at $LlamaServerBin" -ForegroundColor Green
} elseif (-not $HasCmakeForBuild) {
@ -1432,29 +1523,49 @@ if ((Test-Path $LlamaServerBin) -and -not $NeedRebuild) {
# -- Step A: Clone or pull llama.cpp --
$UseConcreteRef = ($ResolvedLlamaTag -ne "latest" -and -not [string]::IsNullOrWhiteSpace($ResolvedLlamaTag))
if (Test-Path (Join-Path $LlamaCppDir ".git")) {
Write-Host " llama.cpp repo already cloned, pulling latest..." -ForegroundColor Gray
git -C $LlamaCppDir pull 2>&1 | Out-Null
Write-Host " Syncing llama.cpp to $ResolvedLlamaTag..." -ForegroundColor Gray
if ($UseConcreteRef) {
git -C $LlamaCppDir fetch --depth 1 origin $ResolvedLlamaTag 2>&1 | Out-Null
} else {
git -C $LlamaCppDir fetch --depth 1 origin 2>&1 | Out-Null
}
if ($LASTEXITCODE -ne 0) {
Write-Host " [WARN] git pull failed -- using existing source" -ForegroundColor Yellow
Write-Host " [WARN] git fetch failed -- using existing source" -ForegroundColor Yellow
} else {
git -C $LlamaCppDir checkout -B unsloth-llama-build FETCH_HEAD 2>&1 | Out-Null
if ($LASTEXITCODE -ne 0) {
$BuildOk = $false
$FailedStep = "git checkout"
} else {
git -C $LlamaCppDir clean -fdx 2>&1 | Out-Null
}
}
} else {
Write-Host " Cloning llama.cpp..." -ForegroundColor Gray
if (Test-Path $LlamaCppDir) { Remove-Item -Recurse -Force $LlamaCppDir }
git clone --depth 1 https://github.com/ggml-org/llama.cpp.git $LlamaCppDir 2>&1 | Out-Null
Write-Host " Cloning llama.cpp @ $ResolvedLlamaTag..." -ForegroundColor Gray
$buildTmp = "$LlamaCppDir.build.$PID"
if (Test-Path $buildTmp) { Remove-Item -Recurse -Force $buildTmp }
$cloneArgs = @("clone", "--depth", "1")
if ($UseConcreteRef) {
$cloneArgs += @("--branch", $ResolvedLlamaTag)
}
$cloneArgs += @("https://github.com/ggml-org/llama.cpp.git", $buildTmp)
git @cloneArgs 2>&1 | Out-Null
if ($LASTEXITCODE -ne 0) {
$BuildOk = $false
$FailedStep = "git clone"
if (Test-Path $buildTmp) { Remove-Item -Recurse -Force $buildTmp }
}
# Use temp dir for build; swap into $LlamaCppDir only after build succeeds
if ($BuildOk) {
$LlamaCppDir = $buildTmp
$BuildDir = Join-Path $LlamaCppDir "build"
}
}
# -- Step B: cmake configure --
# Clean stale CMake cache to prevent previous CUDA settings from leaking
# into a CPU-only rebuild (or vice versa).
$CmakeCacheFile = Join-Path $BuildDir "CMakeCache.txt"
if (Test-Path $CmakeCacheFile) {
Remove-Item -Recurse -Force $BuildDir
}
if ($BuildOk) {
Write-Host ""
@ -1555,6 +1666,21 @@ if ((Test-Path $LlamaServerBin) -and -not $NeedRebuild) {
}
}
# Swap temp build dir into final location (only if we built in a temp dir)
if ($BuildOk -and $LlamaCppDir -ne $OriginalLlamaCppDir) {
if (Test-Path $OriginalLlamaCppDir) { Remove-Item -Recurse -Force $OriginalLlamaCppDir }
Move-Item $LlamaCppDir $OriginalLlamaCppDir
$LlamaCppDir = $OriginalLlamaCppDir
$BuildDir = Join-Path $LlamaCppDir "build"
$LlamaServerBin = Join-Path $BuildDir "bin\Release\llama-server.exe"
} elseif (-not $BuildOk -and $LlamaCppDir -ne $OriginalLlamaCppDir) {
# Build failed -- clean up temp dir, preserve existing install
if (Test-Path $LlamaCppDir) { Remove-Item -Recurse -Force $LlamaCppDir }
$LlamaCppDir = $OriginalLlamaCppDir
$BuildDir = Join-Path $LlamaCppDir "build"
$LlamaServerBin = Join-Path $BuildDir "bin\Release\llama-server.exe"
}
# Restore ErrorActionPreference
$ErrorActionPreference = $prevEAP

View file

@ -341,10 +341,98 @@ else
echo "✅ Python dependencies up to date — skipping"
fi
# ── 7. WSL: pre-install GGUF build dependencies ──
# ── 7. Prefer prebuilt llama.cpp bundles before any source build path ──
UNSLOTH_HOME="$HOME/.unsloth"
mkdir -p "$UNSLOTH_HOME"
LLAMA_CPP_DIR="$UNSLOTH_HOME/llama.cpp"
LLAMA_SERVER_BIN="$LLAMA_CPP_DIR/build/bin/llama-server"
_NEED_LLAMA_SOURCE_BUILD=false
_LLAMA_FORCE_COMPILE="${UNSLOTH_LLAMA_FORCE_COMPILE:-0}"
_REQUESTED_LLAMA_TAG="${UNSLOTH_LLAMA_TAG:-latest}"
_HELPER_RELEASE_REPO="${UNSLOTH_LLAMA_RELEASE_REPO:-unslothai/llama.cpp}"
_RESOLVE_LLAMA_LOG="$(mktemp)"
set +e
python "$SCRIPT_DIR/install_llama_prebuilt.py" \
--resolve-install-tag "$_REQUESTED_LLAMA_TAG" \
--published-repo "$_HELPER_RELEASE_REPO" >"$_RESOLVE_LLAMA_LOG" 2>&1
_RESOLVE_LLAMA_STATUS=$?
set -e
if [ "$_RESOLVE_LLAMA_STATUS" -eq 0 ]; then
_RESOLVED_LLAMA_TAG="$(tail -n 1 "$_RESOLVE_LLAMA_LOG" | tr -d '\r')"
else
_RESOLVED_LLAMA_TAG=""
fi
if [ -z "$_RESOLVED_LLAMA_TAG" ]; then
echo ""
echo "⚠️ Failed to resolve an installable prebuilt llama.cpp tag via $_HELPER_RELEASE_REPO"
cat "$_RESOLVE_LLAMA_LOG" >&2 || true
set +e
_RESOLVED_LLAMA_TAG="$(python "$SCRIPT_DIR/install_llama_prebuilt.py" --resolve-llama-tag "$_REQUESTED_LLAMA_TAG" 2>/dev/null)"
_RESOLVE_UPSTREAM_STATUS=$?
set -e
if [ "$_RESOLVE_UPSTREAM_STATUS" -ne 0 ] || [ -z "$_RESOLVED_LLAMA_TAG" ]; then
if [ "$_REQUESTED_LLAMA_TAG" = "latest" ]; then
# Try Unsloth release repo first, then fall back to ggml-org upstream
_RESOLVED_LLAMA_TAG="$(curl -fsSL "https://api.github.com/repos/${_HELPER_RELEASE_REPO}/releases/latest" 2>/dev/null | python -c "import sys,json; print(json.load(sys.stdin)['tag_name'])" 2>/dev/null)" || _RESOLVED_LLAMA_TAG=""
if [ -z "$_RESOLVED_LLAMA_TAG" ]; then
_RESOLVED_LLAMA_TAG="$(curl -fsSL https://api.github.com/repos/ggml-org/llama.cpp/releases/latest 2>/dev/null | python -c "import sys,json; print(json.load(sys.stdin)['tag_name'])" 2>/dev/null)" || _RESOLVED_LLAMA_TAG=""
fi
fi
if [ -z "$_RESOLVED_LLAMA_TAG" ]; then
_RESOLVED_LLAMA_TAG="$_REQUESTED_LLAMA_TAG"
fi
fi
_NEED_LLAMA_SOURCE_BUILD=true
_SKIP_PREBUILT_INSTALL=true
fi
rm -f "$_RESOLVE_LLAMA_LOG"
echo ""
echo "Resolved llama.cpp release tag: $_RESOLVED_LLAMA_TAG"
if [ "$_LLAMA_FORCE_COMPILE" = "1" ]; then
echo ""
echo "⚠️ UNSLOTH_LLAMA_FORCE_COMPILE=1 -- skipping prebuilt llama.cpp install"
_NEED_LLAMA_SOURCE_BUILD=true
else
echo ""
echo "Installing prebuilt llama.cpp bundle (preferred path)..."
if [ -d "$LLAMA_CPP_DIR" ]; then
echo "Existing llama.cpp install detected -- validating staged prebuilt update before replacement"
fi
if [ "${_SKIP_PREBUILT_INSTALL:-false}" = true ]; then
echo "⚠️ Skipping prebuilt install because prebuilt tag resolution failed -- falling back to source build"
else
_PREBUILT_CMD=(
python "$SCRIPT_DIR/install_llama_prebuilt.py"
--install-dir "$LLAMA_CPP_DIR"
--llama-tag "$_RESOLVED_LLAMA_TAG"
--published-repo "$_HELPER_RELEASE_REPO"
)
if [ -n "${UNSLOTH_LLAMA_RELEASE_TAG:-}" ]; then
_PREBUILT_CMD+=(--published-release-tag "$UNSLOTH_LLAMA_RELEASE_TAG")
fi
set +e
"${_PREBUILT_CMD[@]}"
_PREBUILT_STATUS=$?
set -e
if [ "$_PREBUILT_STATUS" -eq 0 ]; then
echo "✅ Prebuilt llama.cpp installed and validated"
else
if [ -d "$LLAMA_CPP_DIR" ]; then
echo "⚠️ Prebuilt update failed; existing install was restored or cleaned before source build fallback"
fi
echo "⚠️ Prebuilt llama.cpp path unavailable or failed validation -- falling back to source build"
_NEED_LLAMA_SOURCE_BUILD=true
fi
fi
fi
# ── 8. WSL: pre-install GGUF build dependencies for fallback source builds ──
# On WSL, sudo requires a password and can't be entered during GGUF export
# (runs in a non-interactive subprocess). Install build deps here instead.
if grep -qi microsoft /proc/version 2>/dev/null; then
if [ "$_NEED_LLAMA_SOURCE_BUILD" = true ] && grep -qi microsoft /proc/version 2>/dev/null; then
echo ""
echo "⚠️ WSL detected -- installing build dependencies for GGUF export..."
_GGUF_DEPS="pciutils build-essential cmake curl git libcurl4-openssl-dev"
@ -402,22 +490,19 @@ if grep -qi microsoft /proc/version 2>/dev/null; then
fi
fi
# ── 8. Build llama.cpp binaries for GGUF inference + export ──
# ── 9. Build llama.cpp binaries for GGUF inference + export when prebuilt install fails ──
# Builds at ~/.unsloth/llama.cpp — a single shared location under the user's
# home directory. This is used by both the inference server and the GGUF
# export pipeline (unsloth-zoo).
# - llama-server: for GGUF model inference
# - llama-quantize: for GGUF export quantization (symlinked to root for check_llama_cpp())
UNSLOTH_HOME="$HOME/.unsloth"
mkdir -p "$UNSLOTH_HOME"
LLAMA_CPP_DIR="$UNSLOTH_HOME/llama.cpp"
LLAMA_SERVER_BIN="$LLAMA_CPP_DIR/build/bin/llama-server"
if [ "${_SKIP_GGUF_BUILD:-}" = true ]; then
if [ "$_NEED_LLAMA_SOURCE_BUILD" = false ]; then
:
elif [ "${_SKIP_GGUF_BUILD:-}" = true ]; then
echo ""
echo "Skipping llama-server build (missing dependencies)"
echo " Install the missing packages and re-run setup to enable GGUF inference."
else
rm -rf "$LLAMA_CPP_DIR"
{
# Check prerequisites
if ! command -v cmake &>/dev/null; then
@ -432,7 +517,13 @@ rm -rf "$LLAMA_CPP_DIR"
echo "Building llama-server for GGUF inference..."
BUILD_OK=true
run_quiet_no_exit "clone llama.cpp" git clone --depth 1 https://github.com/ggml-org/llama.cpp.git "$LLAMA_CPP_DIR" || BUILD_OK=false
_CLONE_BRANCH_ARGS=()
if [ "$_RESOLVED_LLAMA_TAG" != "latest" ] && [ -n "$_RESOLVED_LLAMA_TAG" ]; then
_CLONE_BRANCH_ARGS=(--branch "$_RESOLVED_LLAMA_TAG")
fi
_BUILD_TMP="${LLAMA_CPP_DIR}.build.$$"
rm -rf "$_BUILD_TMP"
run_quiet_no_exit "clone llama.cpp" git clone --depth 1 "${_CLONE_BRANCH_ARGS[@]}" https://github.com/ggml-org/llama.cpp.git "$_BUILD_TMP" || BUILD_OK=false
if [ "$BUILD_OK" = true ]; then
# Skip tests/examples we don't need (faster build)
@ -571,21 +662,29 @@ rm -rf "$LLAMA_CPP_DIR"
CMAKE_GENERATOR_ARGS="-G Ninja"
fi
run_quiet_no_exit "cmake llama.cpp" cmake $CMAKE_GENERATOR_ARGS -S "$LLAMA_CPP_DIR" -B "$LLAMA_CPP_DIR/build" $CMAKE_ARGS || BUILD_OK=false
run_quiet_no_exit "cmake llama.cpp" cmake $CMAKE_GENERATOR_ARGS -S "$_BUILD_TMP" -B "$_BUILD_TMP/build" $CMAKE_ARGS || BUILD_OK=false
fi
if [ "$BUILD_OK" = true ]; then
run_quiet_no_exit "build llama-server" cmake --build "$LLAMA_CPP_DIR/build" --config Release --target llama-server -j"$NCPU" || BUILD_OK=false
run_quiet_no_exit "build llama-server" cmake --build "$_BUILD_TMP/build" --config Release --target llama-server -j"$NCPU" || BUILD_OK=false
fi
# Also build llama-quantize (needed by unsloth-zoo's GGUF export pipeline)
if [ "$BUILD_OK" = true ]; then
run_quiet_no_exit "build llama-quantize" cmake --build "$LLAMA_CPP_DIR/build" --config Release --target llama-quantize -j"$NCPU" || true
# Symlink to llama.cpp root — check_llama_cpp() looks for the binary there
run_quiet_no_exit "build llama-quantize" cmake --build "$_BUILD_TMP/build" --config Release --target llama-quantize -j"$NCPU" || true
fi
# Swap only after build succeeds -- preserves existing install on failure
if [ "$BUILD_OK" = true ]; then
rm -rf "$LLAMA_CPP_DIR"
mv "$_BUILD_TMP" "$LLAMA_CPP_DIR"
# Symlink to llama.cpp root -- check_llama_cpp() looks for the binary there
QUANTIZE_BIN="$LLAMA_CPP_DIR/build/bin/llama-quantize"
if [ -f "$QUANTIZE_BIN" ]; then
ln -sf build/bin/llama-quantize "$LLAMA_CPP_DIR/llama-quantize"
fi
else
rm -rf "$_BUILD_TMP"
fi
if [ "$BUILD_OK" = true ]; then

View file

@ -0,0 +1,142 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import importlib.util
import shutil
import sys
import tempfile
import time
from pathlib import Path
PACKAGE_ROOT = Path(__file__).resolve().parents[3]
INSTALLER_PATH = PACKAGE_ROOT / "studio" / "install_llama_prebuilt.py"
def load_installer_module():
spec = importlib.util.spec_from_file_location(
"studio_install_llama_prebuilt", INSTALLER_PATH
)
if spec is None or spec.loader is None:
raise RuntimeError(f"unable to load installer module from {INSTALLER_PATH}")
module = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = module
spec.loader.exec_module(module)
return module
installer = load_installer_module()
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
description = (
"Run a real end-to-end prebuilt llama.cpp install into an isolated temporary "
"directory on the current machine."
)
)
parser.add_argument(
"--llama-tag",
default = "latest",
help = "llama.cpp tag to resolve. Defaults to the approved prebuilt tag for this host.",
)
parser.add_argument(
"--published-repo",
default = installer.DEFAULT_PUBLISHED_REPO,
help = "Published bundle repository used for Linux CUDA selection.",
)
parser.add_argument(
"--published-release-tag",
default = installer.DEFAULT_PUBLISHED_TAG or "",
help = "Optional published GitHub release tag to pin.",
)
parser.add_argument(
"--work-dir",
default = "",
help = (
"Optional directory under which the smoke install temp dir will be created. "
"If omitted, defaults to ./.tmp/llama-prebuilt-smoke under the current directory."
),
)
parser.add_argument(
"--keep-temp",
action = "store_true",
help = "Keep the temporary smoke install directory after success.",
)
return parser.parse_args()
def smoke_root_base(work_dir: str) -> Path:
if work_dir:
return Path(work_dir).expanduser().resolve()
return (Path.cwd() / ".tmp" / "llama-prebuilt-smoke").resolve()
def make_smoke_root(base_dir: Path) -> Path:
base_dir.mkdir(parents = True, exist_ok = True)
timestamp = time.strftime("%Y%m%d%H%M%S", time.gmtime())
return Path(tempfile.mkdtemp(prefix = f"run-{timestamp}-", dir = base_dir))
def main() -> int:
args = parse_args()
host = installer.detect_host()
smoke_base = smoke_root_base(args.work_dir)
smoke_root = make_smoke_root(smoke_base)
install_dir = smoke_root / "install" / "llama.cpp"
choice = None
print(f"[smoke] host={host.system} machine={host.machine}")
print(f"[smoke] temp_root={smoke_root}")
try:
requested_tag, resolved_tag, attempts, _approved_checksums = (
installer.resolve_install_attempts(
args.llama_tag,
host,
args.published_repo,
args.published_release_tag,
)
)
choice = attempts[0]
print(f"[smoke] requested_tag={requested_tag}")
print(f"[smoke] resolved_tag={resolved_tag}")
print(f"[smoke] selected_asset={choice.name}")
print(f"[smoke] selected_source={choice.source_label}")
print(f"[smoke] install_dir={install_dir}")
installer.install_prebuilt(
install_dir = install_dir,
llama_tag = args.llama_tag,
published_repo = args.published_repo,
published_release_tag = args.published_release_tag,
)
print(f"[smoke] PASS install_dir={install_dir}")
print(
"[smoke] note=This was a real prebuilt install into an isolated temp directory."
)
return installer.EXIT_SUCCESS
except SystemExit as exc:
code = int(exc.code) if isinstance(exc.code, int) else installer.EXIT_ERROR
if code == installer.EXIT_FALLBACK:
print(f"[smoke] FALLBACK install_dir={install_dir}")
print(
"[smoke] note=Prebuilt path failed and would fall back to source build in setup."
)
print(installer.collect_system_report(host, choice, install_dir))
else:
print(f"[smoke] ERROR exit_code={code} install_dir={install_dir}")
return code
except Exception as exc:
print(f"[smoke] ERROR {exc}")
print(installer.collect_system_report(host, choice, install_dir))
return installer.EXIT_ERROR
finally:
if args.keep_temp:
print(f"[smoke] keeping_temp_root={smoke_root}")
elif smoke_root.exists():
shutil.rmtree(smoke_root, ignore_errors = True)
if __name__ == "__main__":
raise SystemExit(main())

View file

@ -0,0 +1,630 @@
import importlib.util
import io
import json
import os
import sys
import tarfile
import zipfile
from pathlib import Path
import pytest
PACKAGE_ROOT = Path(__file__).resolve().parents[3]
MODULE_PATH = PACKAGE_ROOT / "studio" / "install_llama_prebuilt.py"
SPEC = importlib.util.spec_from_file_location(
"studio_install_llama_prebuilt", MODULE_PATH
)
assert SPEC is not None and SPEC.loader is not None
INSTALL_LLAMA_PREBUILT = importlib.util.module_from_spec(SPEC)
sys.modules[SPEC.name] = INSTALL_LLAMA_PREBUILT
SPEC.loader.exec_module(INSTALL_LLAMA_PREBUILT)
PrebuiltFallback = INSTALL_LLAMA_PREBUILT.PrebuiltFallback
extract_archive = INSTALL_LLAMA_PREBUILT.extract_archive
binary_env = INSTALL_LLAMA_PREBUILT.binary_env
HostInfo = INSTALL_LLAMA_PREBUILT.HostInfo
AssetChoice = INSTALL_LLAMA_PREBUILT.AssetChoice
ApprovedArtifactHash = INSTALL_LLAMA_PREBUILT.ApprovedArtifactHash
ApprovedReleaseChecksums = INSTALL_LLAMA_PREBUILT.ApprovedReleaseChecksums
hydrate_source_tree = INSTALL_LLAMA_PREBUILT.hydrate_source_tree
validate_prebuilt_choice = INSTALL_LLAMA_PREBUILT.validate_prebuilt_choice
activate_install_tree = INSTALL_LLAMA_PREBUILT.activate_install_tree
create_install_staging_dir = INSTALL_LLAMA_PREBUILT.create_install_staging_dir
sha256_file = INSTALL_LLAMA_PREBUILT.sha256_file
source_archive_logical_name = INSTALL_LLAMA_PREBUILT.source_archive_logical_name
def approved_checksums_for(
upstream_tag: str, *, source_archive: Path, bundle_archive: Path, bundle_name: str
) -> ApprovedReleaseChecksums:
return ApprovedReleaseChecksums(
repo = "local",
release_tag = upstream_tag,
upstream_tag = upstream_tag,
source_commit = None,
artifacts = {
source_archive_logical_name(upstream_tag): ApprovedArtifactHash(
asset_name = source_archive_logical_name(upstream_tag),
sha256 = sha256_file(source_archive),
repo = "ggml-org/llama.cpp",
kind = "upstream-source",
),
bundle_name: ApprovedArtifactHash(
asset_name = bundle_name,
sha256 = sha256_file(bundle_archive),
repo = "local",
kind = "local-test-bundle",
),
},
)
def test_extract_archive_allows_safe_tar_symlink_chain(tmp_path: Path):
archive_path = tmp_path / "bundle.tar.gz"
payload = b"shared-object"
with tarfile.open(archive_path, "w:gz") as archive:
versioned = tarfile.TarInfo("libllama.so.0.0.1")
versioned.size = len(payload)
archive.addfile(versioned, io_bytes(payload))
soname = tarfile.TarInfo("libllama.so.0")
soname.type = tarfile.SYMTYPE
soname.linkname = "libllama.so.0.0.1"
archive.addfile(soname)
linker_name = tarfile.TarInfo("libllama.so")
linker_name.type = tarfile.SYMTYPE
linker_name.linkname = "libllama.so.0"
archive.addfile(linker_name)
destination = tmp_path / "extract"
extract_archive(archive_path, destination)
assert (destination / "libllama.so.0.0.1").read_bytes() == payload
assert (destination / "libllama.so.0").is_symlink()
assert (destination / "libllama.so").is_symlink()
assert (destination / "libllama.so").resolve().read_bytes() == payload
def test_extract_archive_allows_safe_tar_hardlink(tmp_path: Path):
archive_path = tmp_path / "bundle.tar.gz"
payload = b"quantize"
with tarfile.open(archive_path, "w:gz") as archive:
target = tarfile.TarInfo("llama-quantize")
target.size = len(payload)
archive.addfile(target, io_bytes(payload))
hardlink = tarfile.TarInfo("llama-quantize-copy")
hardlink.type = tarfile.LNKTYPE
hardlink.linkname = "llama-quantize"
archive.addfile(hardlink)
destination = tmp_path / "extract"
extract_archive(archive_path, destination)
assert (destination / "llama-quantize-copy").read_bytes() == payload
assert not (destination / "llama-quantize-copy").is_symlink()
def test_extract_archive_rejects_absolute_tar_symlink_target(tmp_path: Path):
archive_path = tmp_path / "bundle.tar.gz"
with tarfile.open(archive_path, "w:gz") as archive:
entry = tarfile.TarInfo("libllama.so")
entry.type = tarfile.SYMTYPE
entry.linkname = "/tmp/libllama.so.0"
archive.addfile(entry)
with pytest.raises(PrebuiltFallback, match = "archive link used an absolute target"):
extract_archive(archive_path, tmp_path / "extract")
def test_extract_archive_rejects_escaping_tar_symlink_target(tmp_path: Path):
archive_path = tmp_path / "bundle.tar.gz"
with tarfile.open(archive_path, "w:gz") as archive:
entry = tarfile.TarInfo("libllama.so")
entry.type = tarfile.SYMTYPE
entry.linkname = "../outside/libllama.so.0"
archive.addfile(entry)
with pytest.raises(PrebuiltFallback, match = "archive link escaped destination"):
extract_archive(archive_path, tmp_path / "extract")
def test_extract_archive_rejects_unresolved_tar_symlink_target(tmp_path: Path):
archive_path = tmp_path / "bundle.tar.gz"
with tarfile.open(archive_path, "w:gz") as archive:
entry = tarfile.TarInfo("libllama.so")
entry.type = tarfile.SYMTYPE
entry.linkname = "libllama.so.0"
archive.addfile(entry)
with pytest.raises(PrebuiltFallback, match = "unresolved link entries"):
extract_archive(archive_path, tmp_path / "extract")
def test_extract_archive_rejects_zip_symlink_entry(tmp_path: Path):
archive_path = tmp_path / "bundle.zip"
with zipfile.ZipFile(archive_path, "w") as archive:
info = zipfile.ZipInfo("libllama.so")
info.create_system = 3
info.external_attr = 0o120777 << 16
archive.writestr(info, "libllama.so.0")
with pytest.raises(PrebuiltFallback, match = "zip archive contained a symlink entry"):
extract_archive(archive_path, tmp_path / "extract")
def test_hydrate_source_tree_extracts_upstream_archive_contents(
tmp_path: Path, monkeypatch: pytest.MonkeyPatch
):
upstream_tag = "b9999"
archive_path = tmp_path / "llama.cpp-source.tar.gz"
with tarfile.open(archive_path, "w:gz") as archive:
add_bytes_to_tar(
archive,
f"llama.cpp-{upstream_tag}/CMakeLists.txt",
b"cmake_minimum_required(VERSION 3.14)\n",
)
add_bytes_to_tar(
archive,
f"llama.cpp-{upstream_tag}/convert_hf_to_gguf.py",
b"#!/usr/bin/env python3\nimport gguf\n",
)
add_bytes_to_tar(
archive,
f"llama.cpp-{upstream_tag}/gguf-py/gguf/__init__.py",
b"__all__ = []\n",
)
source_urls = set(INSTALL_LLAMA_PREBUILT.upstream_source_archive_urls(upstream_tag))
def fake_download_file(url: str, destination: Path) -> None:
assert url in source_urls
destination.write_bytes(archive_path.read_bytes())
monkeypatch.setattr(INSTALL_LLAMA_PREBUILT, "download_file", fake_download_file)
install_dir = tmp_path / "install"
work_dir = tmp_path / "work"
work_dir.mkdir()
hydrate_source_tree(
upstream_tag, install_dir, work_dir, expected_sha256 = sha256_file(archive_path)
)
assert (install_dir / "CMakeLists.txt").exists()
assert (install_dir / "convert_hf_to_gguf.py").exists()
assert (install_dir / "gguf-py" / "gguf" / "__init__.py").exists()
assert not (install_dir / f"llama.cpp-{upstream_tag}").exists()
def test_validate_prebuilt_choice_creates_repo_shaped_linux_install(
tmp_path: Path, monkeypatch: pytest.MonkeyPatch
):
upstream_tag = "b9998"
bundle_name = "app-b9998-linux-x64-cuda13-newer.tar.gz"
source_archive = tmp_path / "source.tar.gz"
bundle_archive = tmp_path / "bundle.tar.gz"
with tarfile.open(source_archive, "w:gz") as archive:
add_bytes_to_tar(
archive,
f"llama.cpp-{upstream_tag}/CMakeLists.txt",
b"cmake_minimum_required(VERSION 3.14)\n",
)
add_bytes_to_tar(
archive,
f"llama.cpp-{upstream_tag}/convert_hf_to_gguf.py",
b"#!/usr/bin/env python3\nimport gguf\n",
)
add_bytes_to_tar(
archive,
f"llama.cpp-{upstream_tag}/gguf-py/gguf/__init__.py",
b"__all__ = []\n",
)
with tarfile.open(bundle_archive, "w:gz") as archive:
add_bytes_to_tar(archive, "llama-server", b"#!/bin/sh\nexit 0\n", mode = 0o755)
add_bytes_to_tar(archive, "llama-quantize", b"#!/bin/sh\nexit 0\n", mode = 0o755)
add_bytes_to_tar(archive, "libllama.so.0.0.1", b"libllama")
add_symlink_to_tar(archive, "libllama.so.0", "libllama.so.0.0.1")
add_symlink_to_tar(archive, "libllama.so", "libllama.so.0")
add_bytes_to_tar(archive, "libggml.so.0.9.8", b"libggml")
add_symlink_to_tar(archive, "libggml.so.0", "libggml.so.0.9.8")
add_symlink_to_tar(archive, "libggml.so", "libggml.so.0")
add_bytes_to_tar(archive, "libggml-base.so.0.9.8", b"libggml-base")
add_symlink_to_tar(archive, "libggml-base.so.0", "libggml-base.so.0.9.8")
add_symlink_to_tar(archive, "libggml-base.so", "libggml-base.so.0")
add_bytes_to_tar(archive, "libggml-cpu-x64.so.0.9.8", b"libggml-cpu")
add_symlink_to_tar(archive, "libggml-cpu-x64.so.0", "libggml-cpu-x64.so.0.9.8")
add_symlink_to_tar(archive, "libggml-cpu-x64.so", "libggml-cpu-x64.so.0")
add_bytes_to_tar(archive, "libmtmd.so.0.0.1", b"libmtmd")
add_symlink_to_tar(archive, "libmtmd.so.0", "libmtmd.so.0.0.1")
add_symlink_to_tar(archive, "libmtmd.so", "libmtmd.so.0")
add_bytes_to_tar(archive, "BUILD_INFO.txt", b"bundle metadata\n")
add_bytes_to_tar(archive, "THIRD_PARTY_LICENSES.txt", b"licenses\n")
source_urls = set(INSTALL_LLAMA_PREBUILT.upstream_source_archive_urls(upstream_tag))
def fake_download_file(url: str, destination: Path) -> None:
if url in source_urls:
destination.write_bytes(source_archive.read_bytes())
return
if url == "file://bundle":
destination.write_bytes(bundle_archive.read_bytes())
return
raise AssertionError(f"unexpected download url: {url}")
monkeypatch.setattr(INSTALL_LLAMA_PREBUILT, "download_file", fake_download_file)
monkeypatch.setattr(
INSTALL_LLAMA_PREBUILT,
"download_bytes",
lambda url, **_: b"#!/usr/bin/env python3\nimport gguf\n",
)
monkeypatch.setattr(
INSTALL_LLAMA_PREBUILT,
"preflight_linux_installed_binaries",
lambda *args, **kwargs: None,
)
monkeypatch.setattr(
INSTALL_LLAMA_PREBUILT, "validate_quantize", lambda *args, **kwargs: None
)
monkeypatch.setattr(
INSTALL_LLAMA_PREBUILT, "validate_server", lambda *args, **kwargs: None
)
host = HostInfo(
system = "Linux",
machine = "x86_64",
is_windows = False,
is_linux = True,
is_macos = False,
is_x86_64 = True,
is_arm64 = False,
nvidia_smi = None,
driver_cuda_version = None,
compute_caps = [],
visible_cuda_devices = None,
has_physical_nvidia = False,
has_usable_nvidia = False,
)
choice = AssetChoice(
repo = "local",
tag = upstream_tag,
name = bundle_name,
url = "file://bundle",
source_label = "local",
is_ready_bundle = True,
install_kind = "linux-cuda",
bundle_profile = "cuda13-newer",
runtime_line = "cuda13",
expected_sha256 = sha256_file(bundle_archive),
)
install_dir = tmp_path / "install"
work_dir = tmp_path / "work"
work_dir.mkdir()
probe_path = tmp_path / "stories260K.gguf"
quantized_path = tmp_path / "stories260K-q4.gguf"
validate_prebuilt_choice(
choice,
host,
install_dir,
work_dir,
probe_path,
requested_tag = upstream_tag,
llama_tag = upstream_tag,
approved_checksums = approved_checksums_for(
upstream_tag,
source_archive = source_archive,
bundle_archive = bundle_archive,
bundle_name = bundle_name,
),
prebuilt_fallback_used = False,
quantized_path = quantized_path,
)
assert (install_dir / "gguf-py" / "gguf" / "__init__.py").exists()
assert (install_dir / "convert_hf_to_gguf.py").exists()
assert (install_dir / "build" / "bin" / "llama-server").exists()
assert (install_dir / "build" / "bin" / "llama-quantize").exists()
assert (install_dir / "build" / "bin" / "libllama.so").exists()
assert (install_dir / "llama-server").exists()
assert (install_dir / "llama-quantize").exists()
assert (install_dir / "UNSLOTH_PREBUILT_INFO.json").exists()
assert (install_dir / "BUILD_INFO.txt").exists()
def test_validate_prebuilt_choice_creates_repo_shaped_windows_install(
tmp_path: Path, monkeypatch: pytest.MonkeyPatch
):
upstream_tag = "b9997"
bundle_name = "app-b9997-windows-x64-cpu.zip"
source_archive = tmp_path / "source.tar.gz"
bundle_archive = tmp_path / "bundle.zip"
with tarfile.open(source_archive, "w:gz") as archive:
add_bytes_to_tar(
archive,
f"llama.cpp-{upstream_tag}/CMakeLists.txt",
b"cmake_minimum_required(VERSION 3.14)\n",
)
add_bytes_to_tar(
archive,
f"llama.cpp-{upstream_tag}/convert_hf_to_gguf.py",
b"#!/usr/bin/env python3\nimport gguf\n",
)
add_bytes_to_tar(
archive,
f"llama.cpp-{upstream_tag}/gguf-py/gguf/__init__.py",
b"__all__ = []\n",
)
with zipfile.ZipFile(bundle_archive, "w") as archive:
archive.writestr("llama-server.exe", b"MZ")
archive.writestr("llama-quantize.exe", b"MZ")
archive.writestr("llama.dll", b"DLL")
archive.writestr("BUILD_INFO.txt", b"bundle metadata\n")
source_urls = set(INSTALL_LLAMA_PREBUILT.upstream_source_archive_urls(upstream_tag))
def fake_download_file(url: str, destination: Path) -> None:
if url in source_urls:
destination.write_bytes(source_archive.read_bytes())
return
if url == "file://bundle.zip":
destination.write_bytes(bundle_archive.read_bytes())
return
raise AssertionError(f"unexpected download url: {url}")
monkeypatch.setattr(INSTALL_LLAMA_PREBUILT, "download_file", fake_download_file)
monkeypatch.setattr(
INSTALL_LLAMA_PREBUILT,
"download_bytes",
lambda url, **_: b"#!/usr/bin/env python3\nimport gguf\n",
)
monkeypatch.setattr(
INSTALL_LLAMA_PREBUILT,
"preflight_linux_installed_binaries",
lambda *args, **kwargs: None,
)
monkeypatch.setattr(
INSTALL_LLAMA_PREBUILT, "validate_quantize", lambda *args, **kwargs: None
)
monkeypatch.setattr(
INSTALL_LLAMA_PREBUILT, "validate_server", lambda *args, **kwargs: None
)
host = HostInfo(
system = "Windows",
machine = "AMD64",
is_windows = True,
is_linux = False,
is_macos = False,
is_x86_64 = True,
is_arm64 = False,
nvidia_smi = None,
driver_cuda_version = None,
compute_caps = [],
visible_cuda_devices = None,
has_physical_nvidia = False,
has_usable_nvidia = False,
)
choice = AssetChoice(
repo = "local",
tag = upstream_tag,
name = bundle_name,
url = "file://bundle.zip",
source_label = "local",
is_ready_bundle = True,
install_kind = "windows-cpu",
expected_sha256 = sha256_file(bundle_archive),
)
install_dir = tmp_path / "install"
work_dir = tmp_path / "work"
work_dir.mkdir()
probe_path = tmp_path / "stories260K.gguf"
quantized_path = tmp_path / "stories260K-q4.gguf"
validate_prebuilt_choice(
choice,
host,
install_dir,
work_dir,
probe_path,
requested_tag = upstream_tag,
llama_tag = upstream_tag,
approved_checksums = approved_checksums_for(
upstream_tag,
source_archive = source_archive,
bundle_archive = bundle_archive,
bundle_name = bundle_name,
),
prebuilt_fallback_used = False,
quantized_path = quantized_path,
)
assert (install_dir / "gguf-py" / "gguf" / "__init__.py").exists()
assert (install_dir / "convert_hf_to_gguf.py").exists()
assert (install_dir / "build" / "bin" / "Release" / "llama-server.exe").exists()
assert (install_dir / "build" / "bin" / "Release" / "llama-quantize.exe").exists()
assert (install_dir / "build" / "bin" / "Release" / "llama.dll").exists()
assert not (install_dir / "llama-server.exe").exists()
assert (install_dir / "UNSLOTH_PREBUILT_INFO.json").exists()
assert (install_dir / "BUILD_INFO.txt").exists()
def test_activate_install_tree_restores_existing_install_after_activation_failure(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
capsys: pytest.CaptureFixture[str],
):
install_dir = tmp_path / "llama.cpp"
install_dir.mkdir()
(install_dir / "old.txt").write_text("old install\n")
staging_dir = create_install_staging_dir(install_dir)
(staging_dir / "new.txt").write_text("new install\n")
host = HostInfo(
system = "Linux",
machine = "x86_64",
is_windows = False,
is_linux = True,
is_macos = False,
is_x86_64 = True,
is_arm64 = False,
nvidia_smi = None,
driver_cuda_version = None,
compute_caps = [],
visible_cuda_devices = None,
has_physical_nvidia = False,
has_usable_nvidia = False,
)
monkeypatch.setattr(
INSTALL_LLAMA_PREBUILT,
"confirm_install_tree",
lambda *_args, **_kwargs: (_ for _ in ()).throw(
RuntimeError("activation confirm failed")
),
)
with pytest.raises(
PrebuiltFallback,
match = "activation failed; restored previous install",
):
activate_install_tree(staging_dir, install_dir, host)
assert (install_dir / "old.txt").read_text() == "old install\n"
assert not (install_dir / "new.txt").exists()
assert not staging_dir.exists()
assert not (tmp_path / ".staging").exists()
output = capsys.readouterr().out
assert "moving existing install to rollback path" in output
assert "restored previous install from rollback path" in output
def test_activate_install_tree_cleans_all_paths_when_rollback_restore_fails(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
capsys: pytest.CaptureFixture[str],
):
install_dir = tmp_path / "llama.cpp"
install_dir.mkdir()
(install_dir / "old.txt").write_text("old install\n")
staging_dir = create_install_staging_dir(install_dir)
(staging_dir / "new.txt").write_text("new install\n")
host = HostInfo(
system = "Linux",
machine = "x86_64",
is_windows = False,
is_linux = True,
is_macos = False,
is_x86_64 = True,
is_arm64 = False,
nvidia_smi = None,
driver_cuda_version = None,
compute_caps = [],
visible_cuda_devices = None,
has_physical_nvidia = False,
has_usable_nvidia = False,
)
monkeypatch.setattr(
INSTALL_LLAMA_PREBUILT,
"confirm_install_tree",
lambda *_args, **_kwargs: (_ for _ in ()).throw(
RuntimeError("activation confirm failed")
),
)
original_replace = INSTALL_LLAMA_PREBUILT.os.replace
def flaky_replace(src, dst):
src_path = Path(src)
dst_path = Path(dst)
if "rollback-" in src_path.name and dst_path == install_dir:
raise OSError("restore failed")
return original_replace(src, dst)
monkeypatch.setattr(INSTALL_LLAMA_PREBUILT.os, "replace", flaky_replace)
with pytest.raises(
PrebuiltFallback,
match = "activation and rollback failed; cleaned install state for fresh source build",
):
activate_install_tree(staging_dir, install_dir, host)
assert not install_dir.exists()
assert not staging_dir.exists()
assert not (tmp_path / ".staging").exists()
output = capsys.readouterr().out
assert "rollback after failed activation also failed: restore failed" in output
assert (
"cleaning staging, install, and rollback paths before source build fallback"
in output
)
assert "removing failed install path" in output
assert "removing rollback path" in output
def test_binary_env_linux_includes_binary_parent_in_ld_library_path(
tmp_path: Path, monkeypatch: pytest.MonkeyPatch
):
install_dir = tmp_path / "llama.cpp"
bin_dir = install_dir / "build" / "bin"
bin_dir.mkdir(parents = True)
binary_path = bin_dir / "llama-server"
binary_path.write_bytes(b"fake")
host = HostInfo(
system = "Linux",
machine = "x86_64",
is_windows = False,
is_linux = True,
is_macos = False,
is_x86_64 = True,
is_arm64 = False,
nvidia_smi = None,
driver_cuda_version = None,
compute_caps = [],
visible_cuda_devices = None,
has_physical_nvidia = False,
has_usable_nvidia = False,
)
monkeypatch.setattr(INSTALL_LLAMA_PREBUILT, "linux_runtime_dirs", lambda _bp: [])
env = binary_env(binary_path, install_dir, host)
ld_dirs = env["LD_LIBRARY_PATH"].split(os.pathsep)
assert (
str(bin_dir) in ld_dirs
), f"binary_path.parent ({bin_dir}) must be in LD_LIBRARY_PATH, got: {ld_dirs}"
assert str(install_dir) in ld_dirs
def io_bytes(data: bytes):
return io.BytesIO(data)
def add_bytes_to_tar(
archive: tarfile.TarFile, name: str, data: bytes, *, mode: int = 0o644
) -> None:
info = tarfile.TarInfo(name)
info.size = len(data)
info.mode = mode
archive.addfile(info, io_bytes(data))
def add_symlink_to_tar(archive: tarfile.TarFile, name: str, target: str) -> None:
info = tarfile.TarInfo(name)
info.type = tarfile.SYMTYPE
info.linkname = target
archive.addfile(info)

View file

@ -0,0 +1,687 @@
"""
Comprehensive tests for PR #4562 bug fixes.
Tests cover:
- Bug 1: PS1 detached HEAD on re-run (fetch + checkout -B pattern)
- Bug 2: Source-build fallback ignores pinned tag (both .sh and .ps1)
- Bug 3: Unix fallback deletes install before checking prerequisites
- Bug 4: Linux LD_LIBRARY_PATH missing build/bin
- "latest" tag resolution fallback chain (Unsloth -> ggml-org -> raw)
- Cross-platform binary_env (Linux, macOS, Windows)
- Edge cases: malformed JSON, empty responses, env overrides
Run: pytest tests/studio/install/test_pr4562_bugfixes.py -v
"""
import importlib.util
import json
import os
import subprocess
import sys
import textwrap
from pathlib import Path
from unittest.mock import patch
import pytest
# ---------------------------------------------------------------------------
# Load the module under test (same pattern as existing test files)
# ---------------------------------------------------------------------------
PACKAGE_ROOT = Path(__file__).resolve().parents[3]
MODULE_PATH = PACKAGE_ROOT / "studio" / "install_llama_prebuilt.py"
SPEC = importlib.util.spec_from_file_location(
"studio_install_llama_prebuilt", MODULE_PATH
)
assert SPEC is not None and SPEC.loader is not None
MOD = importlib.util.module_from_spec(SPEC)
sys.modules[SPEC.name] = MOD
SPEC.loader.exec_module(MOD)
binary_env = MOD.binary_env
HostInfo = MOD.HostInfo
resolve_requested_llama_tag = MOD.resolve_requested_llama_tag
SETUP_SH = PACKAGE_ROOT / "studio" / "setup.sh"
SETUP_PS1 = PACKAGE_ROOT / "studio" / "setup.ps1"
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def make_host(*, system: str) -> HostInfo:
"""Create a HostInfo for the given OS."""
return HostInfo(
system = system,
machine = "x86_64" if system != "Darwin" else "arm64",
is_windows = (system == "Windows"),
is_linux = (system == "Linux"),
is_macos = (system == "Darwin"),
is_x86_64 = (system != "Darwin"),
is_arm64 = (system == "Darwin"),
nvidia_smi = None,
driver_cuda_version = None,
compute_caps = [],
visible_cuda_devices = None,
has_physical_nvidia = False,
has_usable_nvidia = False,
)
BASH = "/bin/bash"
def run_bash(script: str, *, timeout: int = 10, env: dict | None = None) -> str:
"""Run a bash script fragment and return its stdout."""
run_env = os.environ.copy()
if env:
run_env.update(env)
result = subprocess.run(
[BASH, "-c", script],
capture_output = True,
text = True,
timeout = timeout,
env = run_env,
)
return result.stdout.strip()
# =========================================================================
# TEST GROUP A: binary_env across all platforms (Bug 4 + cross-platform)
# =========================================================================
class TestBinaryEnvCrossPlatform:
"""Test that binary_env returns correct library paths for all OSes."""
def test_linux_includes_binary_parent_in_ld_library_path(
self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch
):
install_dir = tmp_path / "llama.cpp"
bin_dir = install_dir / "build" / "bin"
bin_dir.mkdir(parents = True)
binary_path = bin_dir / "llama-server"
binary_path.write_bytes(b"fake")
host = make_host(system = "Linux")
monkeypatch.setattr(MOD, "linux_runtime_dirs", lambda _bp: [])
env = binary_env(binary_path, install_dir, host)
ld_dirs = env["LD_LIBRARY_PATH"].split(os.pathsep)
assert str(bin_dir) in ld_dirs, f"build/bin not in LD_LIBRARY_PATH: {ld_dirs}"
assert (
str(install_dir) in ld_dirs
), f"install_dir not in LD_LIBRARY_PATH: {ld_dirs}"
def test_linux_binary_parent_comes_before_install_dir(
self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch
):
"""build/bin should be searched before install_dir for .so files."""
install_dir = tmp_path / "llama.cpp"
bin_dir = install_dir / "build" / "bin"
bin_dir.mkdir(parents = True)
binary_path = bin_dir / "llama-server"
binary_path.write_bytes(b"fake")
host = make_host(system = "Linux")
monkeypatch.setattr(MOD, "linux_runtime_dirs", lambda _bp: [])
env = binary_env(binary_path, install_dir, host)
ld_dirs = env["LD_LIBRARY_PATH"].split(os.pathsep)
bin_idx = ld_dirs.index(str(bin_dir))
install_idx = ld_dirs.index(str(install_dir))
assert (
bin_idx < install_idx
), "binary_path.parent should come before install_dir"
def test_linux_deduplicates_when_binary_parent_equals_install_dir(
self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch
):
"""When binary is directly in install_dir, no duplicate entries."""
install_dir = tmp_path / "llama.cpp"
install_dir.mkdir(parents = True)
binary_path = install_dir / "llama-server"
binary_path.write_bytes(b"fake")
host = make_host(system = "Linux")
monkeypatch.setattr(MOD, "linux_runtime_dirs", lambda _bp: [])
env = binary_env(binary_path, install_dir, host)
ld_dirs = [d for d in env["LD_LIBRARY_PATH"].split(os.pathsep) if d]
count = ld_dirs.count(str(install_dir))
assert count == 1, f"install_dir appears {count} times in LD_LIBRARY_PATH"
def test_linux_preserves_existing_ld_library_path(
self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch
):
install_dir = tmp_path / "llama.cpp"
bin_dir = install_dir / "build" / "bin"
bin_dir.mkdir(parents = True)
binary_path = bin_dir / "llama-server"
binary_path.write_bytes(b"fake")
# Create real directories so dedupe_existing_dirs keeps them
custom_lib = tmp_path / "custom_lib"
other_lib = tmp_path / "other_lib"
custom_lib.mkdir()
other_lib.mkdir()
host = make_host(system = "Linux")
monkeypatch.setattr(MOD, "linux_runtime_dirs", lambda _bp: [])
original = os.environ.get("LD_LIBRARY_PATH", "")
os.environ["LD_LIBRARY_PATH"] = f"{custom_lib}:{other_lib}"
try:
env = binary_env(binary_path, install_dir, host)
finally:
if original:
os.environ["LD_LIBRARY_PATH"] = original
else:
os.environ.pop("LD_LIBRARY_PATH", None)
ld_dirs = env["LD_LIBRARY_PATH"].split(os.pathsep)
assert str(custom_lib.resolve()) in ld_dirs
assert str(other_lib.resolve()) in ld_dirs
def test_windows_includes_binary_parent_in_path(
self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch
):
install_dir = tmp_path / "llama.cpp"
bin_dir = install_dir / "build" / "bin" / "Release"
bin_dir.mkdir(parents = True)
binary_path = bin_dir / "llama-server.exe"
binary_path.write_bytes(b"MZ")
host = make_host(system = "Windows")
monkeypatch.setattr(
MOD, "windows_runtime_dirs_for_runtime_line", lambda _rt: []
)
env = binary_env(binary_path, install_dir, host)
path_dirs = env["PATH"].split(os.pathsep)
assert str(bin_dir) in path_dirs, f"build/bin/Release not in PATH: {path_dirs}"
def test_macos_sets_dyld_library_path(
self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch
):
install_dir = tmp_path / "llama.cpp"
install_dir.mkdir(parents = True)
bin_dir = install_dir / "build" / "bin"
binary_path = bin_dir / "llama-server"
binary_path.parent.mkdir(parents = True)
binary_path.write_bytes(b"fake")
host = make_host(system = "Darwin")
monkeypatch.delenv("DYLD_LIBRARY_PATH", raising = False)
env = binary_env(binary_path, install_dir, host)
dyld_parts = [p for p in env["DYLD_LIBRARY_PATH"].split(os.pathsep) if p]
assert (
str(bin_dir) in dyld_parts
), f"build/bin not in DYLD_LIBRARY_PATH: {dyld_parts}"
assert (
str(install_dir) in dyld_parts
), f"install_dir not in DYLD_LIBRARY_PATH: {dyld_parts}"
# binary_path.parent (build/bin) should come before install_dir
assert dyld_parts.index(str(bin_dir)) < dyld_parts.index(str(install_dir))
# =========================================================================
# TEST GROUP B: resolve_requested_llama_tag (Python function)
# =========================================================================
class TestResolveRequestedLlamaTag:
def test_concrete_tag_passes_through(self):
assert resolve_requested_llama_tag("b8508") == "b8508"
def test_none_resolves_to_latest(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setattr(MOD, "latest_upstream_release_tag", lambda: "b9999")
assert resolve_requested_llama_tag(None) == "b9999"
def test_latest_resolves_to_upstream(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setattr(MOD, "latest_upstream_release_tag", lambda: "b1234")
assert resolve_requested_llama_tag("latest") == "b1234"
def test_empty_string_resolves_to_latest(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setattr(MOD, "latest_upstream_release_tag", lambda: "b5555")
assert resolve_requested_llama_tag("") == "b5555"
# =========================================================================
# TEST GROUP C: setup.sh logic (bash subprocess tests)
# =========================================================================
class TestSetupShLogic:
"""Test setup.sh fragments via bash subprocess with controlled PATH."""
def test_cmake_missing_preserves_install(self, tmp_path: Path):
"""Bug 3: When cmake is missing, rm -rf should NOT run."""
llama_dir = tmp_path / "llama.cpp"
llama_dir.mkdir()
marker = llama_dir / "marker.txt"
marker.write_text("existing")
mock_bin = tmp_path / "mock_bin"
mock_bin.mkdir()
# Create mock git but NOT cmake
(mock_bin / "git").write_text("#!/bin/bash\nexit 0\n")
(mock_bin / "git").chmod(0o755)
# Build PATH: mock_bin first, then system dirs WITHOUT cmake
safe_dirs = [str(mock_bin)]
for d in os.environ.get("PATH", "").split(":"):
if d and not os.path.isfile(os.path.join(d, "cmake")):
safe_dirs.append(d)
script = textwrap.dedent(f"""\
export LLAMA_CPP_DIR="{llama_dir}"
if ! command -v cmake &>/dev/null; then
echo "cmake_missing"
elif ! command -v git &>/dev/null; then
echo "git_missing"
else
rm -rf "$LLAMA_CPP_DIR"
echo "would_clone"
fi
""")
output = run_bash(script, env = {"PATH": ":".join(safe_dirs)})
assert "cmake_missing" in output
assert marker.exists(), "Install dir was deleted despite cmake missing!"
def test_git_missing_preserves_install(self, tmp_path: Path):
"""Bug 3: When git is missing, rm -rf should NOT run."""
llama_dir = tmp_path / "llama.cpp"
llama_dir.mkdir()
marker = llama_dir / "marker.txt"
marker.write_text("existing")
mock_bin = tmp_path / "mock_bin"
mock_bin.mkdir()
# Create mock cmake but NOT git
(mock_bin / "cmake").write_text("#!/bin/bash\nexit 0\n")
(mock_bin / "cmake").chmod(0o755)
# Build PATH: mock_bin first, then system dirs WITHOUT git
safe_dirs = [str(mock_bin)]
for d in os.environ.get("PATH", "").split(":"):
if d and not os.path.isfile(os.path.join(d, "git")):
safe_dirs.append(d)
script = textwrap.dedent(f"""\
export LLAMA_CPP_DIR="{llama_dir}"
if ! command -v cmake &>/dev/null; then
echo "cmake_missing"
elif ! command -v git &>/dev/null; then
echo "git_missing"
else
rm -rf "$LLAMA_CPP_DIR"
echo "would_clone"
fi
""")
output = run_bash(script, env = {"PATH": ":".join(safe_dirs)})
assert "git_missing" in output
assert marker.exists(), "Install dir was deleted despite git missing!"
def test_both_present_runs_rm_and_clone(self, tmp_path: Path):
"""Bug 3: When both present, rm -rf runs before clone."""
llama_dir = tmp_path / "llama.cpp"
llama_dir.mkdir()
marker = llama_dir / "marker.txt"
marker.write_text("existing")
mock_bin = tmp_path / "mock_bin"
mock_bin.mkdir()
(mock_bin / "cmake").write_text("#!/bin/bash\nexit 0\n")
(mock_bin / "cmake").chmod(0o755)
(mock_bin / "git").write_text("#!/bin/bash\nexit 0\n")
(mock_bin / "git").chmod(0o755)
script = textwrap.dedent(f"""\
export PATH="{mock_bin}:$PATH"
export LLAMA_CPP_DIR="{llama_dir}"
if ! command -v cmake &>/dev/null; then
echo "cmake_missing"
elif ! command -v git &>/dev/null; then
echo "git_missing"
else
rm -rf "$LLAMA_CPP_DIR"
echo "would_clone"
fi
""")
output = run_bash(script)
assert "would_clone" in output
assert not marker.exists(), "Install dir should have been deleted"
def test_clone_uses_pinned_tag(self, tmp_path: Path):
"""Bug 2: git clone should use --branch with the resolved tag."""
mock_bin = tmp_path / "mock_bin"
mock_bin.mkdir()
log_file = tmp_path / "git_calls.log"
(mock_bin / "git").write_text(f'#!/bin/bash\necho "$*" >> {log_file}\nexit 0\n')
(mock_bin / "git").chmod(0o755)
script = textwrap.dedent(f"""\
export PATH="{mock_bin}:$PATH"
git clone --depth 1 --branch "b8508" https://github.com/ggml-org/llama.cpp.git /tmp/llama_test
""")
run_bash(script)
log = log_file.read_text()
assert "--branch b8508" in log, f"Expected --branch b8508 in: {log}"
def test_fetch_checkout_b_pattern(self, tmp_path: Path):
"""Bug 1: Re-run should use fetch + checkout -B, not pull + checkout FETCH_HEAD."""
mock_bin = tmp_path / "mock_bin"
mock_bin.mkdir()
log_file = tmp_path / "git_calls.log"
(mock_bin / "git").write_text(f'#!/bin/bash\necho "$*" >> {log_file}\nexit 0\n')
(mock_bin / "git").chmod(0o755)
llama_dir = tmp_path / "llama.cpp"
llama_dir.mkdir()
(llama_dir / ".git").mkdir()
script = textwrap.dedent(f"""\
export PATH="{mock_bin}:$PATH"
LlamaCppDir="{llama_dir}"
ResolvedLlamaTag="b8508"
if [ -d "$LlamaCppDir/.git" ]; then
git -C "$LlamaCppDir" fetch --depth 1 origin "$ResolvedLlamaTag"
if [ $? -ne 0 ]; then
echo "WARN: fetch failed"
else
git -C "$LlamaCppDir" checkout -B unsloth-llama-build FETCH_HEAD
fi
fi
""")
run_bash(script)
log = log_file.read_text()
assert "fetch --depth 1 origin b8508" in log
assert "checkout -B unsloth-llama-build FETCH_HEAD" in log
assert "pull" not in log, "Should use fetch, not pull"
def test_fetch_failure_warns_not_aborts(self, tmp_path: Path):
"""Bug 1: fetch failure should warn and continue, not set BuildOk=false."""
mock_bin = tmp_path / "mock_bin"
mock_bin.mkdir()
(mock_bin / "git").write_text(
'#!/bin/bash\nif echo "$*" | grep -q fetch; then exit 1; fi\nexit 0\n'
)
(mock_bin / "git").chmod(0o755)
llama_dir = tmp_path / "llama.cpp"
llama_dir.mkdir()
(llama_dir / ".git").mkdir()
script = textwrap.dedent(f"""\
export PATH="{mock_bin}:$PATH"
LlamaCppDir="{llama_dir}"
ResolvedLlamaTag="b8508"
BuildOk=true
if [ -d "$LlamaCppDir/.git" ]; then
git -C "$LlamaCppDir" fetch --depth 1 origin "$ResolvedLlamaTag"
if [ $? -ne 0 ]; then
echo "WARN: fetch failed -- using existing source"
else
git -C "$LlamaCppDir" checkout -B unsloth-llama-build FETCH_HEAD
fi
fi
echo "BuildOk=$BuildOk"
""")
output = run_bash(script)
assert "WARN: fetch failed" in output
assert "BuildOk=true" in output
# =========================================================================
# TEST GROUP D: "latest" tag resolution (bash subprocess)
# =========================================================================
class TestLatestTagResolution:
"""Test the fallback chain: Unsloth API -> ggml-org API -> raw."""
RESOLVE_TEMPLATE = textwrap.dedent("""\
export PATH="{mock_bin}:$PATH"
_REQUESTED_LLAMA_TAG="{requested_tag}"
_RESOLVED_LLAMA_TAG=""
_RESOLVE_UPSTREAM_STATUS=1
_HELPER_RELEASE_REPO="unslothai/llama.cpp"
if [ "$_RESOLVE_UPSTREAM_STATUS" -ne 0 ] || [ -z "$_RESOLVED_LLAMA_TAG" ]; then
if [ "$_REQUESTED_LLAMA_TAG" = "latest" ]; then
_RESOLVED_LLAMA_TAG="$(curl -fsSL "https://api.github.com/repos/${{_HELPER_RELEASE_REPO}}/releases/latest" 2>/dev/null | python -c "import sys,json; print(json.load(sys.stdin)['tag_name'])" 2>/dev/null)" || _RESOLVED_LLAMA_TAG=""
if [ -z "$_RESOLVED_LLAMA_TAG" ]; then
_RESOLVED_LLAMA_TAG="$(curl -fsSL https://api.github.com/repos/ggml-org/llama.cpp/releases/latest 2>/dev/null | python -c "import sys,json; print(json.load(sys.stdin)['tag_name'])" 2>/dev/null)" || _RESOLVED_LLAMA_TAG=""
fi
fi
if [ -z "$_RESOLVED_LLAMA_TAG" ]; then
_RESOLVED_LLAMA_TAG="$_REQUESTED_LLAMA_TAG"
fi
fi
echo "$_RESOLVED_LLAMA_TAG"
""")
@staticmethod
def _make_curl_mock(
mock_bin: Path, unsloth_response: str | None, ggml_response: str | None
):
"""Create a curl mock that returns different responses per repo."""
lines = ["#!/bin/bash"]
if unsloth_response is not None:
lines.append(
f'if echo "$*" | grep -q "unslothai/llama.cpp"; then echo \'{unsloth_response}\'; exit 0; fi'
)
else:
lines.append(
'if echo "$*" | grep -q "unslothai/llama.cpp"; then exit 1; fi'
)
if ggml_response is not None:
lines.append(
f'if echo "$*" | grep -q "ggml-org/llama.cpp"; then echo \'{ggml_response}\'; exit 0; fi'
)
else:
lines.append('if echo "$*" | grep -q "ggml-org/llama.cpp"; then exit 1; fi')
lines.append("exit 1")
curl_path = mock_bin / "curl"
curl_path.write_text("\n".join(lines) + "\n")
curl_path.chmod(0o755)
def _run_resolve(
self,
tmp_path: Path,
requested_tag: str,
unsloth_resp: str | None,
ggml_resp: str | None,
) -> str:
mock_bin = tmp_path / "mock_bin"
mock_bin.mkdir(exist_ok = True)
self._make_curl_mock(mock_bin, unsloth_resp, ggml_resp)
script = self.RESOLVE_TEMPLATE.format(
mock_bin = mock_bin, requested_tag = requested_tag
)
return run_bash(script)
def test_unsloth_succeeds(self, tmp_path: Path):
output = self._run_resolve(
tmp_path,
"latest",
unsloth_resp = '{"tag_name":"b8508"}',
ggml_resp = '{"tag_name":"b9000"}',
)
assert output == "b8508"
def test_unsloth_fails_ggml_succeeds(self, tmp_path: Path):
output = self._run_resolve(
tmp_path,
"latest",
unsloth_resp = None,
ggml_resp = '{"tag_name":"b9000"}',
)
assert output == "b9000"
def test_both_fail_raw_fallback(self, tmp_path: Path):
output = self._run_resolve(
tmp_path,
"latest",
unsloth_resp = None,
ggml_resp = None,
)
assert output == "latest"
def test_concrete_tag_passes_through(self, tmp_path: Path):
output = self._run_resolve(
tmp_path,
"b7777",
unsloth_resp = '{"tag_name":"b8508"}',
ggml_resp = '{"tag_name":"b9000"}',
)
assert output == "b7777"
def test_unsloth_malformed_json_falls_through(self, tmp_path: Path):
output = self._run_resolve(
tmp_path,
"latest",
unsloth_resp = '{"bad_key":"no_tag"}',
ggml_resp = '{"tag_name":"b9001"}',
)
assert output == "b9001"
def test_both_malformed_json_raw_fallback(self, tmp_path: Path):
output = self._run_resolve(
tmp_path,
"latest",
unsloth_resp = '{"bad":"data"}',
ggml_resp = '{"also":"bad"}',
)
assert output == "latest"
def test_unsloth_empty_body_falls_through(self, tmp_path: Path):
output = self._run_resolve(
tmp_path,
"latest",
unsloth_resp = "",
ggml_resp = '{"tag_name":"b7000"}',
)
assert output == "b7000"
def test_unsloth_empty_tag_name_falls_through(self, tmp_path: Path):
output = self._run_resolve(
tmp_path,
"latest",
unsloth_resp = '{"tag_name":""}',
ggml_resp = '{"tag_name":"b6000"}',
)
assert output == "b6000"
def test_env_override_unsloth_llama_tag(self):
output = run_bash(
'echo "${UNSLOTH_LLAMA_TAG:-latest}"',
env = {"UNSLOTH_LLAMA_TAG": "b1234"},
)
assert output == "b1234"
def test_env_unset_defaults_to_latest(self):
env = os.environ.copy()
env.pop("UNSLOTH_LLAMA_TAG", None)
output = run_bash('echo "${UNSLOTH_LLAMA_TAG:-latest}"', env = env)
assert output == "latest"
def test_env_empty_defaults_to_latest(self):
output = run_bash(
'echo "${UNSLOTH_LLAMA_TAG:-latest}"',
env = {"UNSLOTH_LLAMA_TAG": ""},
)
assert output == "latest"
# =========================================================================
# TEST GROUP E: Source file verification
# =========================================================================
class TestSourceCodePatterns:
"""Verify the actual source files contain the expected fix patterns."""
def test_setup_sh_no_rm_before_prereq_check(self):
"""rm -rf must appear AFTER cmake/git checks, not before."""
content = SETUP_SH.read_text()
# Find the source-build block
idx_else = content.find("# Check prerequisites")
assert idx_else != -1
block = content[idx_else:]
# rm -rf should appear after the cmake/git checks
idx_cmake = block.find("command -v cmake")
idx_git = block.find("command -v git")
idx_rm = block.find("rm -rf")
assert idx_rm > idx_cmake, "rm -rf should come after cmake check"
assert idx_rm > idx_git, "rm -rf should come after git check"
def test_setup_sh_clone_uses_branch_tag(self):
"""git clone in source-build should use --branch via _CLONE_BRANCH_ARGS."""
content = SETUP_SH.read_text()
# The clone line should use _CLONE_BRANCH_ARGS (which conditionally includes --branch)
assert (
"_CLONE_BRANCH_ARGS" in content
), "Clone should use _CLONE_BRANCH_ARGS array"
assert (
'--branch "$_RESOLVED_LLAMA_TAG"' in content
), "_CLONE_BRANCH_ARGS should be set to --branch $_RESOLVED_LLAMA_TAG"
# Verify the guard: --branch is only used when tag is not "latest"
assert (
'_RESOLVED_LLAMA_TAG" != "latest"' in content
), "Should guard against literal 'latest' tag"
def test_setup_sh_latest_resolution_queries_unsloth_first(self):
"""The Unsloth repo should be queried before ggml-org."""
content = SETUP_SH.read_text()
idx_unsloth = content.find("_HELPER_RELEASE_REPO}/releases/latest")
idx_ggml = content.find("ggml-org/llama.cpp/releases/latest")
assert idx_unsloth != -1, "Unsloth API query not found"
assert idx_ggml != -1, "ggml-org API query not found"
assert idx_unsloth < idx_ggml, "Unsloth should be queried before ggml-org"
def test_setup_ps1_uses_checkout_b(self):
"""PS1 should use checkout -B, not checkout --force FETCH_HEAD."""
content = SETUP_PS1.read_text()
assert "checkout -B unsloth-llama-build" in content
assert "checkout --force FETCH_HEAD" not in content
def test_setup_ps1_clone_uses_branch_tag(self):
"""PS1 clone should use --branch with the resolved tag."""
content = SETUP_PS1.read_text()
assert "--branch" in content and "$ResolvedLlamaTag" in content
# The old commented-out line should be gone
assert "# git clone --depth 1 --branch" not in content
def test_setup_ps1_no_git_pull(self):
"""PS1 should use fetch, not pull (which fails in detached HEAD)."""
content = SETUP_PS1.read_text()
# In the source-build section, there should be no "git pull"
# (git pull is only valid on a branch)
lines = content.splitlines()
for i, line in enumerate(lines):
stripped = line.strip()
if "git pull" in stripped and not stripped.startswith("#"):
# Check context -- should not be in the llama.cpp build section
# Allow git pull in other contexts
context = "\n".join(lines[max(0, i - 5) : i + 5])
if "LlamaCppDir" in context:
pytest.fail(
f"Found 'git pull' in llama.cpp build section at line {i+1}"
)
def test_setup_ps1_latest_resolution_queries_unsloth_first(self):
"""PS1 should query Unsloth repo before ggml-org."""
content = SETUP_PS1.read_text()
idx_unsloth = content.find("$HelperReleaseRepo/releases/latest")
idx_ggml = content.find("ggml-org/llama.cpp/releases/latest")
assert idx_unsloth != -1, "Unsloth API query not found in PS1"
assert idx_ggml != -1, "ggml-org API query not found in PS1"
assert idx_unsloth < idx_ggml, "Unsloth should be queried before ggml-org"
def test_binary_env_linux_has_binary_parent(self):
"""The Linux branch of binary_env should include binary_path.parent."""
content = MODULE_PATH.read_text()
# Find the binary_env function
in_func = False
in_linux = False
found = False
for line in content.splitlines():
if "def binary_env(" in line:
in_func = True
elif in_func and line and not line[0].isspace() and "def " in line:
break
if in_func and "host.is_linux" in line:
in_linux = True
if in_linux and "binary_path.parent" in line:
found = True
break
assert found, "binary_path.parent not found in Linux branch of binary_env"

View file

@ -0,0 +1,903 @@
"""Tests for binary selection logic in install_llama_prebuilt.py.
Covers: normalize_compute_cap, normalize_compute_caps, parse_cuda_visible_devices,
supports_explicit_visible_device_matching, select_visible_gpu_rows,
compatible_linux_runtime_lines, pick_windows_cuda_runtime,
compatible_windows_runtime_lines, runtime_line_from_cuda_version,
apply_approved_hashes, linux_cuda_choice_from_release, windows_cuda_attempts,
resolve_upstream_asset_choice.
No GPU, no network, no torch required -- all I/O is monkeypatched.
"""
import importlib.util
import sys
from pathlib import Path
import pytest
PACKAGE_ROOT = Path(__file__).resolve().parents[3]
MODULE_PATH = PACKAGE_ROOT / "studio" / "install_llama_prebuilt.py"
SPEC = importlib.util.spec_from_file_location(
"studio_install_llama_prebuilt", MODULE_PATH
)
assert SPEC is not None and SPEC.loader is not None
INSTALL_LLAMA_PREBUILT = importlib.util.module_from_spec(SPEC)
sys.modules[SPEC.name] = INSTALL_LLAMA_PREBUILT
SPEC.loader.exec_module(INSTALL_LLAMA_PREBUILT)
HostInfo = INSTALL_LLAMA_PREBUILT.HostInfo
AssetChoice = INSTALL_LLAMA_PREBUILT.AssetChoice
PublishedLlamaArtifact = INSTALL_LLAMA_PREBUILT.PublishedLlamaArtifact
PublishedReleaseBundle = INSTALL_LLAMA_PREBUILT.PublishedReleaseBundle
ApprovedArtifactHash = INSTALL_LLAMA_PREBUILT.ApprovedArtifactHash
ApprovedReleaseChecksums = INSTALL_LLAMA_PREBUILT.ApprovedReleaseChecksums
PrebuiltFallback = INSTALL_LLAMA_PREBUILT.PrebuiltFallback
LinuxCudaSelection = INSTALL_LLAMA_PREBUILT.LinuxCudaSelection
UPSTREAM_REPO = INSTALL_LLAMA_PREBUILT.UPSTREAM_REPO
normalize_compute_cap = INSTALL_LLAMA_PREBUILT.normalize_compute_cap
normalize_compute_caps = INSTALL_LLAMA_PREBUILT.normalize_compute_caps
parse_cuda_visible_devices = INSTALL_LLAMA_PREBUILT.parse_cuda_visible_devices
supports_explicit_visible_device_matching = (
INSTALL_LLAMA_PREBUILT.supports_explicit_visible_device_matching
)
select_visible_gpu_rows = INSTALL_LLAMA_PREBUILT.select_visible_gpu_rows
compatible_linux_runtime_lines = INSTALL_LLAMA_PREBUILT.compatible_linux_runtime_lines
pick_windows_cuda_runtime = INSTALL_LLAMA_PREBUILT.pick_windows_cuda_runtime
compatible_windows_runtime_lines = (
INSTALL_LLAMA_PREBUILT.compatible_windows_runtime_lines
)
runtime_line_from_cuda_version = INSTALL_LLAMA_PREBUILT.runtime_line_from_cuda_version
apply_approved_hashes = INSTALL_LLAMA_PREBUILT.apply_approved_hashes
linux_cuda_choice_from_release = INSTALL_LLAMA_PREBUILT.linux_cuda_choice_from_release
windows_cuda_attempts = INSTALL_LLAMA_PREBUILT.windows_cuda_attempts
resolve_upstream_asset_choice = INSTALL_LLAMA_PREBUILT.resolve_upstream_asset_choice
# ---------------------------------------------------------------------------
# Helper factories
# ---------------------------------------------------------------------------
def make_host(**overrides):
system = overrides.pop("system", "Linux")
machine = overrides.pop("machine", "x86_64")
defaults = dict(
system = system,
machine = machine,
is_linux = system == "Linux",
is_windows = system == "Windows",
is_macos = system == "Darwin",
is_x86_64 = machine.lower() in {"x86_64", "amd64"},
is_arm64 = machine.lower() in {"arm64", "aarch64"},
nvidia_smi = "/usr/bin/nvidia-smi",
driver_cuda_version = (12, 8),
compute_caps = ["86"],
visible_cuda_devices = None,
has_physical_nvidia = True,
has_usable_nvidia = True,
)
defaults.update(overrides)
return HostInfo(**defaults)
def make_artifact(asset_name, **overrides):
defaults = dict(
asset_name = asset_name,
install_kind = "linux-cuda",
runtime_line = "cuda12",
coverage_class = "targeted",
supported_sms = ["75", "80", "86", "89", "90"],
min_sm = 75,
max_sm = 90,
bundle_profile = "cuda12-newer",
rank = 100,
)
defaults.update(overrides)
return PublishedLlamaArtifact(**defaults)
def make_release(artifacts, **overrides):
defaults = dict(
repo = "unslothai/llama.cpp",
release_tag = "v1.0",
upstream_tag = "b8508",
assets = {a.asset_name: f"https://example.com/{a.asset_name}" for a in artifacts},
manifest_asset_name = "llama-prebuilt-manifest.json",
artifacts = artifacts,
selection_log = [],
)
defaults.update(overrides)
return PublishedReleaseBundle(**defaults)
def make_checksums(asset_names):
return ApprovedReleaseChecksums(
repo = "unslothai/llama.cpp",
release_tag = "v1.0",
upstream_tag = "b8508",
source_commit = None,
artifacts = {
name: ApprovedArtifactHash(
asset_name = name,
sha256 = "a" * 64,
repo = "unslothai/llama.cpp",
kind = "prebuilt",
)
for name in asset_names
},
)
def mock_linux_runtime(monkeypatch, lines):
dirs = {line: ["/usr/lib/stub"] for line in lines}
monkeypatch.setattr(
INSTALL_LLAMA_PREBUILT,
"detected_linux_runtime_lines",
lambda: (list(lines), dict(dirs)),
)
def mock_windows_runtime(monkeypatch, lines):
dirs = {line: ["C:\\Windows\\System32"] for line in lines}
monkeypatch.setattr(
INSTALL_LLAMA_PREBUILT,
"detected_windows_runtime_lines",
lambda: (list(lines), dict(dirs)),
)
# ===========================================================================
# A. normalize_compute_cap
# ===========================================================================
class TestNormalizeComputeCap:
def test_dotted_86(self):
assert normalize_compute_cap("8.6") == "86"
def test_dotted_leading_zero(self):
assert normalize_compute_cap("07.05") == "75"
def test_already_normalized(self):
assert normalize_compute_cap("75") == "75"
def test_int_input(self):
assert normalize_compute_cap(86) == "86"
def test_empty_string(self):
assert normalize_compute_cap("") is None
def test_whitespace(self):
assert normalize_compute_cap(" ") is None
def test_non_numeric(self):
assert normalize_compute_cap("x.y") is None
def test_triple_part(self):
assert normalize_compute_cap("8.6.0") is None
def test_zero_minor(self):
assert normalize_compute_cap("9.0") == "90"
# ===========================================================================
# B. normalize_compute_caps
# ===========================================================================
class TestNormalizeComputeCaps:
def test_deduplication(self):
assert normalize_compute_caps(["8.6", "86", "8.6"]) == ["86"]
def test_numeric_sort(self):
assert normalize_compute_caps(["9.0", "7.5", "8.6"]) == ["75", "86", "90"]
def test_drops_invalid(self):
assert normalize_compute_caps(["8.6", "bad", "", "7.5"]) == ["75", "86"]
def test_empty_input(self):
assert normalize_compute_caps([]) == []
# ===========================================================================
# C. parse_cuda_visible_devices
# ===========================================================================
class TestParseCudaVisibleDevices:
def test_none(self):
assert parse_cuda_visible_devices(None) is None
def test_empty(self):
assert parse_cuda_visible_devices("") == []
def test_minus_one(self):
assert parse_cuda_visible_devices("-1") == []
def test_single(self):
assert parse_cuda_visible_devices("0") == ["0"]
def test_multi(self):
assert parse_cuda_visible_devices("0,1,2") == ["0", "1", "2"]
def test_whitespace_stripped(self):
assert parse_cuda_visible_devices(" 0 , 1 ") == ["0", "1"]
# ===========================================================================
# D. supports_explicit_visible_device_matching
# ===========================================================================
class TestSupportsExplicitVisibleDeviceMatching:
def test_all_digits(self):
assert supports_explicit_visible_device_matching(["0", "1", "2"]) is True
def test_gpu_prefix(self):
assert supports_explicit_visible_device_matching(["GPU-abc123"]) is True
def test_none(self):
assert supports_explicit_visible_device_matching(None) is False
def test_empty(self):
assert supports_explicit_visible_device_matching([]) is False
def test_mixed_invalid(self):
assert supports_explicit_visible_device_matching(["0", "MIG-device"]) is False
# ===========================================================================
# E. select_visible_gpu_rows
# ===========================================================================
class TestSelectVisibleGpuRows:
ROWS = [
("0", "GPU-aaa", "8.6"),
("1", "GPU-bbb", "7.5"),
("2", "GPU-ccc", "8.9"),
]
def test_none_returns_all(self):
assert select_visible_gpu_rows(self.ROWS, None) == list(self.ROWS)
def test_empty_returns_empty(self):
assert select_visible_gpu_rows(self.ROWS, []) == []
def test_filter_by_index(self):
result = select_visible_gpu_rows(self.ROWS, ["0", "2"])
assert result == [("0", "GPU-aaa", "8.6"), ("2", "GPU-ccc", "8.9")]
def test_filter_by_uuid_case_insensitive(self):
result = select_visible_gpu_rows(self.ROWS, ["gpu-bbb"])
assert result == [("1", "GPU-bbb", "7.5")]
def test_dedup_same_device(self):
result = select_visible_gpu_rows(self.ROWS, ["0", "0"])
assert result == [("0", "GPU-aaa", "8.6")]
def test_missing_token(self):
result = select_visible_gpu_rows(self.ROWS, ["99"])
assert result == []
# ===========================================================================
# F. compatible_linux_runtime_lines
# ===========================================================================
class TestCompatibleLinuxRuntimeLines:
def test_no_driver(self):
host = make_host(driver_cuda_version = None)
assert compatible_linux_runtime_lines(host) == []
def test_driver_11_8(self):
host = make_host(driver_cuda_version = (11, 8))
assert compatible_linux_runtime_lines(host) == []
def test_driver_12_4(self):
host = make_host(driver_cuda_version = (12, 4))
assert compatible_linux_runtime_lines(host) == ["cuda12"]
def test_driver_13_0(self):
host = make_host(driver_cuda_version = (13, 0))
assert compatible_linux_runtime_lines(host) == ["cuda13", "cuda12"]
# ===========================================================================
# G. pick_windows_cuda_runtime + compatible_windows_runtime_lines
# ===========================================================================
class TestPickWindowsCudaRuntime:
def test_no_driver(self):
host = make_host(driver_cuda_version = None)
assert pick_windows_cuda_runtime(host) is None
def test_below_threshold(self):
host = make_host(driver_cuda_version = (12, 3))
assert pick_windows_cuda_runtime(host) is None
def test_driver_12_4(self):
host = make_host(driver_cuda_version = (12, 4))
assert pick_windows_cuda_runtime(host) == "12.4"
def test_driver_13_1(self):
host = make_host(driver_cuda_version = (13, 1))
assert pick_windows_cuda_runtime(host) == "13.1"
class TestCompatibleWindowsRuntimeLines:
def test_no_driver(self):
host = make_host(driver_cuda_version = None)
assert compatible_windows_runtime_lines(host) == []
def test_driver_12_4(self):
host = make_host(driver_cuda_version = (12, 4))
assert compatible_windows_runtime_lines(host) == ["cuda12"]
def test_driver_13_1(self):
host = make_host(driver_cuda_version = (13, 1))
assert compatible_windows_runtime_lines(host) == ["cuda13", "cuda12"]
# ===========================================================================
# H. runtime_line_from_cuda_version
# ===========================================================================
class TestRuntimeLineFromCudaVersion:
def test_cuda_12(self):
assert runtime_line_from_cuda_version("12.6") == "cuda12"
def test_cuda_13(self):
assert runtime_line_from_cuda_version("13.0") == "cuda13"
def test_cuda_11(self):
assert runtime_line_from_cuda_version("11.8") is None
def test_none(self):
assert runtime_line_from_cuda_version(None) is None
def test_empty(self):
assert runtime_line_from_cuda_version("") is None
# ===========================================================================
# I. apply_approved_hashes
# ===========================================================================
class TestApplyApprovedHashes:
def _choice(self, name):
return AssetChoice(
repo = "test",
tag = "v1",
name = name,
url = f"https://x/{name}",
source_label = "test",
)
def test_both_approved(self):
c1, c2 = self._choice("a.tar.gz"), self._choice("b.tar.gz")
checksums = make_checksums(["a.tar.gz", "b.tar.gz"])
result = apply_approved_hashes([c1, c2], checksums)
assert len(result) == 2
assert all(c.expected_sha256 == "a" * 64 for c in result)
def test_one_approved(self):
c1, c2 = self._choice("a.tar.gz"), self._choice("missing.tar.gz")
checksums = make_checksums(["a.tar.gz"])
result = apply_approved_hashes([c1, c2], checksums)
assert len(result) == 1
assert result[0].name == "a.tar.gz"
def test_none_approved(self):
c1 = self._choice("missing.tar.gz")
checksums = make_checksums(["other.tar.gz"])
with pytest.raises(PrebuiltFallback, match = "approved checksum"):
apply_approved_hashes([c1], checksums)
def test_empty_input(self):
checksums = make_checksums(["a.tar.gz"])
with pytest.raises(PrebuiltFallback, match = "approved checksum"):
apply_approved_hashes([], checksums)
# ===========================================================================
# J. linux_cuda_choice_from_release -- core selection
# ===========================================================================
class TestLinuxCudaChoiceFromRelease:
# --- Runtime line resolution ---
def test_no_runtime_lines_detected(self, monkeypatch):
mock_linux_runtime(monkeypatch, [])
host = make_host(driver_cuda_version = (12, 8))
art = make_artifact("bundle-cuda12.tar.gz")
release = make_release([art])
assert linux_cuda_choice_from_release(host, release) is None
def test_detected_lines_incompatible_with_driver(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda13"])
host = make_host(driver_cuda_version = (12, 4))
art = make_artifact("bundle-cuda13.tar.gz", runtime_line = "cuda13")
release = make_release([art])
assert linux_cuda_choice_from_release(host, release) is None
def test_driver_13_only_cuda12_detected(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(driver_cuda_version = (13, 0))
art = make_artifact("bundle-cuda12.tar.gz", runtime_line = "cuda12")
release = make_release([art])
result = linux_cuda_choice_from_release(host, release)
assert result is not None
assert result.primary.runtime_line == "cuda12"
def test_preferred_runtime_line_reorders(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda13", "cuda12"])
host = make_host(driver_cuda_version = (13, 0))
art12 = make_artifact("bundle-cuda12.tar.gz", runtime_line = "cuda12")
art13 = make_artifact("bundle-cuda13.tar.gz", runtime_line = "cuda13")
release = make_release([art12, art13])
result = linux_cuda_choice_from_release(
host, release, preferred_runtime_line = "cuda12"
)
assert result is not None
assert result.primary.runtime_line == "cuda12"
def test_preferred_runtime_line_unavailable(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(driver_cuda_version = (12, 8))
art = make_artifact("bundle-cuda12.tar.gz", runtime_line = "cuda12")
release = make_release([art])
result = linux_cuda_choice_from_release(
host, release, preferred_runtime_line = "cuda13"
)
assert result is not None
assert result.primary.runtime_line == "cuda12"
log_entries = result.selection_log
assert any("unavailable_on_host" in entry for entry in log_entries)
# --- SM matching ---
def test_exact_sm_match(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(compute_caps = ["86"])
art = make_artifact(
"bundle.tar.gz", supported_sms = ["75", "86", "89"], min_sm = 75, max_sm = 89
)
release = make_release([art])
result = linux_cuda_choice_from_release(host, release)
assert result is not None
assert result.primary.name == "bundle.tar.gz"
def test_sm_not_in_supported_sms(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(compute_caps = ["86"])
art = make_artifact(
"bundle.tar.gz", supported_sms = ["75", "80", "89"], min_sm = 75, max_sm = 89
)
release = make_release([art])
result = linux_cuda_choice_from_release(host, release)
assert result is None
def test_sm_outside_min_range(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(compute_caps = ["50"])
art = make_artifact(
"bundle.tar.gz", supported_sms = ["50", "75", "86"], min_sm = 75, max_sm = 90
)
release = make_release([art])
result = linux_cuda_choice_from_release(host, release)
assert result is None
def test_sm_outside_max_range(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(compute_caps = ["100"])
art = make_artifact(
"bundle.tar.gz", supported_sms = ["100", "75", "86"], min_sm = 75, max_sm = 90
)
release = make_release([art])
result = linux_cuda_choice_from_release(host, release)
assert result is None
def test_very_old_sm(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(compute_caps = ["50"])
art = make_artifact("bundle.tar.gz", min_sm = 75, max_sm = 90)
release = make_release([art])
result = linux_cuda_choice_from_release(host, release)
assert result is None
def test_very_new_sm(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(compute_caps = ["100"])
art = make_artifact("bundle.tar.gz", min_sm = 75, max_sm = 90)
release = make_release([art])
result = linux_cuda_choice_from_release(host, release)
assert result is None
# --- Unknown compute caps (empty list) ---
def test_unknown_caps_only_portable(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(compute_caps = [])
targeted = make_artifact("targeted.tar.gz", coverage_class = "targeted")
portable = make_artifact("portable.tar.gz", coverage_class = "portable")
release = make_release([targeted, portable])
result = linux_cuda_choice_from_release(host, release)
assert result is not None
assert result.primary.name == "portable.tar.gz"
def test_unknown_caps_no_portable(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(compute_caps = [])
targeted = make_artifact("targeted.tar.gz", coverage_class = "targeted")
release = make_release([targeted])
result = linux_cuda_choice_from_release(host, release)
assert result is None
# --- Multi-GPU ---
def test_multi_gpu_all_covered(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(compute_caps = ["75", "89"])
art = make_artifact(
"bundle.tar.gz",
supported_sms = ["75", "80", "86", "89", "90"],
min_sm = 75,
max_sm = 90,
)
release = make_release([art])
result = linux_cuda_choice_from_release(host, release)
assert result is not None
def test_multi_gpu_not_all_covered(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(compute_caps = ["50", "89"])
art = make_artifact(
"bundle.tar.gz", supported_sms = ["75", "89"], min_sm = 75, max_sm = 89
)
release = make_release([art])
result = linux_cuda_choice_from_release(host, release)
assert result is None
# --- Artifact selection priority ---
def test_narrowest_sm_range_wins(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(compute_caps = ["86"])
wide = make_artifact(
"wide.tar.gz",
supported_sms = ["75", "86", "90"],
min_sm = 75,
max_sm = 90,
rank = 100,
)
narrow = make_artifact(
"narrow.tar.gz",
supported_sms = ["80", "86", "89"],
min_sm = 80,
max_sm = 89,
rank = 100,
)
release = make_release([wide, narrow])
result = linux_cuda_choice_from_release(host, release)
assert result is not None
assert result.primary.name == "narrow.tar.gz"
def test_range_tie_lower_rank_wins(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(compute_caps = ["86"])
high = make_artifact(
"high.tar.gz",
supported_sms = ["75", "86", "90"],
min_sm = 75,
max_sm = 90,
rank = 200,
)
low = make_artifact(
"low.tar.gz",
supported_sms = ["75", "86", "90"],
min_sm = 75,
max_sm = 90,
rank = 50,
)
release = make_release([high, low])
result = linux_cuda_choice_from_release(host, release)
assert result is not None
assert result.primary.name == "low.tar.gz"
def test_targeted_preferred_portable_fallback(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(compute_caps = ["86"])
targeted = make_artifact("targeted.tar.gz", coverage_class = "targeted", rank = 100)
portable = make_artifact("portable.tar.gz", coverage_class = "portable", rank = 100)
release = make_release([targeted, portable])
result = linux_cuda_choice_from_release(host, release)
assert result is not None
assert result.primary.name == "targeted.tar.gz"
assert len(result.attempts) == 2
assert result.attempts[1].name == "portable.tar.gz"
# --- Edge cases ---
def test_asset_missing_from_release_assets(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(compute_caps = ["86"])
art = make_artifact("bundle.tar.gz")
release = make_release([art], assets = {})
result = linux_cuda_choice_from_release(host, release)
assert result is None
def test_artifact_empty_supported_sms(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(compute_caps = ["86"])
art = make_artifact("bundle.tar.gz", supported_sms = [])
release = make_release([art])
result = linux_cuda_choice_from_release(host, release)
assert result is None
def test_artifact_missing_min_sm(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(compute_caps = ["86"])
art = make_artifact("bundle.tar.gz", min_sm = None, max_sm = 90)
release = make_release([art])
result = linux_cuda_choice_from_release(host, release)
assert result is None
def test_artifact_missing_max_sm(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(compute_caps = ["86"])
art = make_artifact("bundle.tar.gz", min_sm = 75, max_sm = None)
release = make_release([art])
result = linux_cuda_choice_from_release(host, release)
assert result is None
def test_no_linux_cuda_artifacts(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(compute_caps = ["86"])
art = make_artifact("bundle.tar.gz", install_kind = "windows-cuda")
release = make_release([art])
result = linux_cuda_choice_from_release(host, release)
assert result is None
def test_empty_artifacts_list(self, monkeypatch):
mock_linux_runtime(monkeypatch, ["cuda12"])
host = make_host(compute_caps = ["86"])
release = make_release([])
result = linux_cuda_choice_from_release(host, release)
assert result is None
# ===========================================================================
# K. windows_cuda_attempts
# ===========================================================================
class TestWindowsCudaAttempts:
TAG = "b8508"
def _upstream(self, *runtime_versions):
assets = {}
for rv in runtime_versions:
name = f"llama-{self.TAG}-bin-win-cuda-{rv}-x64.zip"
assets[name] = f"https://example.com/{name}"
return assets
def test_driver_12_4_no_dlls_fallback(self, monkeypatch):
mock_windows_runtime(monkeypatch, [])
host = make_host(system = "Windows", machine = "AMD64", driver_cuda_version = (12, 4))
assets = self._upstream("12.4")
result = windows_cuda_attempts(host, self.TAG, assets, None)
assert len(result) == 1
assert result[0].runtime_line == "cuda12"
def test_driver_13_1_both_dlls(self, monkeypatch):
mock_windows_runtime(monkeypatch, ["cuda13", "cuda12"])
host = make_host(system = "Windows", machine = "AMD64", driver_cuda_version = (13, 1))
assets = self._upstream("13.1", "12.4")
result = windows_cuda_attempts(host, self.TAG, assets, None)
assert len(result) == 2
assert result[0].runtime_line == "cuda13"
assert result[1].runtime_line == "cuda12"
def test_preferred_reorders(self, monkeypatch):
mock_windows_runtime(monkeypatch, ["cuda13", "cuda12"])
host = make_host(system = "Windows", machine = "AMD64", driver_cuda_version = (13, 1))
assets = self._upstream("13.1", "12.4")
result = windows_cuda_attempts(host, self.TAG, assets, "cuda12")
assert len(result) == 2
assert result[0].runtime_line == "cuda12"
def test_preferred_unavailable(self, monkeypatch):
mock_windows_runtime(monkeypatch, ["cuda12"])
host = make_host(system = "Windows", machine = "AMD64", driver_cuda_version = (12, 4))
assets = self._upstream("12.4")
result = windows_cuda_attempts(host, self.TAG, assets, "cuda13")
assert len(result) == 1
assert result[0].runtime_line == "cuda12"
def test_detected_incompatible_with_driver(self, monkeypatch):
mock_windows_runtime(monkeypatch, ["cuda13"])
host = make_host(system = "Windows", machine = "AMD64", driver_cuda_version = (12, 4))
assets = self._upstream("12.4")
result = windows_cuda_attempts(host, self.TAG, assets, None)
assert len(result) == 1
assert result[0].runtime_line == "cuda12"
def test_driver_too_old(self, monkeypatch):
mock_windows_runtime(monkeypatch, [])
host = make_host(system = "Windows", machine = "AMD64", driver_cuda_version = (11, 8))
assets = self._upstream("12.4")
result = windows_cuda_attempts(host, self.TAG, assets, None)
assert result == []
def test_asset_missing_from_upstream(self, monkeypatch):
mock_windows_runtime(monkeypatch, ["cuda12"])
host = make_host(system = "Windows", machine = "AMD64", driver_cuda_version = (12, 4))
result = windows_cuda_attempts(host, self.TAG, {}, None)
assert result == []
def test_both_assets_present(self, monkeypatch):
mock_windows_runtime(monkeypatch, ["cuda13", "cuda12"])
host = make_host(system = "Windows", machine = "AMD64", driver_cuda_version = (13, 1))
assets = self._upstream("13.1", "12.4")
result = windows_cuda_attempts(host, self.TAG, assets, None)
assert len(result) == 2
# ===========================================================================
# L. resolve_upstream_asset_choice -- platform routing
# ===========================================================================
class TestResolveUpstreamAssetChoice:
TAG = "b8508"
def _mock_github_assets(self, monkeypatch, assets):
monkeypatch.setattr(
INSTALL_LLAMA_PREBUILT,
"github_release_assets",
lambda repo, tag: assets,
)
def test_linux_x86_64_cpu(self, monkeypatch):
name = f"llama-{self.TAG}-bin-ubuntu-x64.tar.gz"
self._mock_github_assets(monkeypatch, {name: f"https://x/{name}"})
host = make_host(
has_usable_nvidia = False, nvidia_smi = None, has_physical_nvidia = False
)
result = resolve_upstream_asset_choice(host, self.TAG)
assert result.install_kind == "linux-cpu"
assert result.name == name
def test_linux_cpu_missing(self, monkeypatch):
self._mock_github_assets(monkeypatch, {})
host = make_host(
has_usable_nvidia = False, nvidia_smi = None, has_physical_nvidia = False
)
with pytest.raises(PrebuiltFallback, match = "Linux CPU"):
resolve_upstream_asset_choice(host, self.TAG)
def test_windows_x86_64_cpu(self, monkeypatch):
name = f"llama-{self.TAG}-bin-win-cpu-x64.zip"
self._mock_github_assets(monkeypatch, {name: f"https://x/{name}"})
host = make_host(
system = "Windows",
machine = "AMD64",
has_usable_nvidia = False,
nvidia_smi = None,
has_physical_nvidia = False,
)
result = resolve_upstream_asset_choice(host, self.TAG)
assert result.install_kind == "windows-cpu"
assert result.name == name
def test_windows_cpu_missing(self, monkeypatch):
self._mock_github_assets(monkeypatch, {})
host = make_host(
system = "Windows",
machine = "AMD64",
has_usable_nvidia = False,
nvidia_smi = None,
has_physical_nvidia = False,
)
with pytest.raises(PrebuiltFallback, match = "Windows CPU"):
resolve_upstream_asset_choice(host, self.TAG)
def test_macos_arm64(self, monkeypatch):
name = f"llama-{self.TAG}-bin-macos-arm64.tar.gz"
self._mock_github_assets(monkeypatch, {name: f"https://x/{name}"})
host = make_host(
system = "Darwin",
machine = "arm64",
nvidia_smi = None,
driver_cuda_version = None,
compute_caps = [],
has_physical_nvidia = False,
has_usable_nvidia = False,
)
result = resolve_upstream_asset_choice(host, self.TAG)
assert result.install_kind == "macos-arm64"
assert result.name == name
def test_macos_arm64_missing(self, monkeypatch):
self._mock_github_assets(monkeypatch, {})
host = make_host(
system = "Darwin",
machine = "arm64",
nvidia_smi = None,
driver_cuda_version = None,
compute_caps = [],
has_physical_nvidia = False,
has_usable_nvidia = False,
)
with pytest.raises(PrebuiltFallback, match = "macOS arm64"):
resolve_upstream_asset_choice(host, self.TAG)
def test_macos_x86_64(self, monkeypatch):
name = f"llama-{self.TAG}-bin-macos-x64.tar.gz"
self._mock_github_assets(monkeypatch, {name: f"https://x/{name}"})
host = make_host(
system = "Darwin",
machine = "x86_64",
nvidia_smi = None,
driver_cuda_version = None,
compute_caps = [],
has_physical_nvidia = False,
has_usable_nvidia = False,
)
result = resolve_upstream_asset_choice(host, self.TAG)
assert result.install_kind == "macos-x64"
assert result.name == name
def test_linux_aarch64(self, monkeypatch):
self._mock_github_assets(monkeypatch, {})
host = make_host(
system = "Linux",
machine = "aarch64",
nvidia_smi = None,
driver_cuda_version = None,
compute_caps = [],
has_physical_nvidia = False,
has_usable_nvidia = False,
)
with pytest.raises(
PrebuiltFallback, match = "no prebuilt policy exists for Linux aarch64"
):
resolve_upstream_asset_choice(host, self.TAG)
def test_windows_usable_nvidia_delegates(self, monkeypatch):
cuda_name = f"llama-{self.TAG}-bin-win-cuda-12.4-x64.zip"
self._mock_github_assets(monkeypatch, {cuda_name: f"https://x/{cuda_name}"})
mock_windows_runtime(monkeypatch, ["cuda12"])
monkeypatch.setattr(
INSTALL_LLAMA_PREBUILT,
"resolve_windows_cuda_choices",
lambda host, tag, assets: [
AssetChoice(
repo = UPSTREAM_REPO,
tag = tag,
name = cuda_name,
url = f"https://x/{cuda_name}",
source_label = "upstream",
install_kind = "windows-cuda",
runtime_line = "cuda12",
)
],
)
host = make_host(
system = "Windows",
machine = "AMD64",
driver_cuda_version = (12, 4),
has_usable_nvidia = True,
)
result = resolve_upstream_asset_choice(host, self.TAG)
assert result.install_kind == "windows-cuda"
assert result.name == cuda_name