Use prebuilt llama.cpp for unsloth studio setup (#4562)

* Use prebuilt llama.cpp for unsloth studio setup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix 3 issues that cause unnecessary fallback to source build 1. Make filelock import optional -- environments without filelock (e.g. minimal installs) crashed at import time instead of gracefully skipping the lock. 2. Use already-verified converter script from the hydrated source tree instead of re-downloading from raw.githubusercontent.com with no checksum. Adds symlink with copy fallback for the legacy filename. 3. Initialize $SkipPrebuiltInstall in setup.ps1 before first use to prevent potential uninitialized variable errors. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Keep network fallback in ensure_converter_scripts Prefer the local verified copy from the hydrated source tree, but retain the original network download as a fallback if the file is missing. Create the legacy hyphenated filename as a symlink with a copy fallback instead of writing a second full copy. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix 4 bugs in source-build fallback and binary_env paths - setup.ps1: Replace git pull + checkout FETCH_HEAD with fetch + checkout -B to avoid detached HEAD state that breaks re-runs. Use pinned tag in both fetch and clone paths. - setup.sh: Move rm -rf after cmake/git prerequisite checks so a missing tool no longer deletes the existing install. Add --branch tag to clone. - install_llama_prebuilt.py: Add binary_path.parent to Linux LD_LIBRARY_PATH in binary_env() so bundled .so files in build/bin are found even without RPATH, matching the existing Windows PATH logic. - Add test for binary_env LD_LIBRARY_PATH on Linux. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Handle unresolved "latest" tag in source-build fallback clone When tag resolution fails and the requested tag is "latest", both setup scripts now omit --branch from git clone so the default branch is cloned instead of failing on a nonexistent "latest" branch/tag. Similarly, the PS1 fetch path fetches the default ref when the tag is "latest". * Resolve actual latest ggml-org tag instead of using literal "latest" When both Python tag resolution attempts fail and the requested tag is "latest", query the GitHub API for the actual latest release tag from ggml-org/llama.cpp (e.g. b8508) instead of passing the literal string "latest" to git clone --branch, which would fail since no such branch/tag exists. setup.sh uses curl + python json parsing; setup.ps1 uses Invoke-RestMethod. Both fall back to the raw requested tag if the API call also fails. * Try Unsloth release repo before ggml-org when resolving latest tag When falling back to the GitHub API to resolve "latest", query the Unsloth release repo (unslothai/llama.cpp) first since it has the prebuilt binaries pinned to tested tags. Only fall back to ggml-org/llama.cpp if the Unsloth repo query fails. * Add comprehensive sandbox tests for PR #4562 bug fixes 35 tests covering all fixes across platforms: - binary_env cross-platform (Linux LD_LIBRARY_PATH, Windows PATH, macOS DYLD_LIBRARY_PATH) with edge cases (dedup, ordering, existing paths) - resolve_requested_llama_tag (concrete, latest, None, empty) - setup.sh logic via subprocess: prereq check ordering (cmake/git missing preserves install), pinned tag in clone, fetch+checkout -B pattern, fetch failure warns instead of aborting - "latest" tag resolution fallback chain (Unsloth API -> ggml-org -> raw) with mock curl: success, failure, malformed JSON, empty body, empty tag_name, env overrides - Source code pattern verification for both .sh and .ps1 files All 138 tests pass in isolated uv venv. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add binary_path.parent to macOS DYLD_LIBRARY_PATH in binary_env macOS prebuilt .dylib files are overlaid into build/bin (same as Linux), but binary_env only added install_dir to DYLD_LIBRARY_PATH. Add binary_path.parent so the loader can find sibling dylibs even without embedded loader paths. Mirrors the existing fix for Linux LD_LIBRARY_PATH and the Windows PATH pattern. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Guard --branch when resolved tag is "latest"; fix broken test assertion When all API fallbacks fail and the tag stays as literal "latest", omit --branch from git clone (clones default branch instead of failing). Both setup.sh and setup.ps1 now check for "latest" before passing --branch to git clone/fetch. Also fix test_setup_ps1_clone_uses_branch_tag which used Python tuple syntax (assert "x", "y" in z) that always passes. Changed to assert "x" in z and "y" in z. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix macOS DYLD trailing colon, install_lock no-op, and debug log - binary_env macOS: use dedupe_existing_dirs instead of raw string concatenation. Eliminates trailing colon in DYLD_LIBRARY_PATH (which causes dyld to search CWD for libraries) and deduplicates when binary_path.parent == install_dir. Now consistent with the Linux and Windows branches. - install_lock: when filelock is not installed, use os.O_CREAT|O_EXCL as a fallback exclusive file lock with timeout, instead of yielding with no locking. Prevents concurrent installs from corrupting each other's staging directories. - setup.ps1: remove [DEBUG] log line that printed to every user on every Windows setup run. * Add stale-lock detection and atomic clone-then-swap install_lock fallback (no filelock): write PID to lock file and check if the holder process is still alive on contention. Dead PIDs (ProcessLookupError) and unreadable lock files trigger immediate cleanup. Live processes owned by other users (PermissionError) are correctly recognized as alive -- the lock is not removed. setup.sh/setup.ps1 source-build: clone into a temporary directory first, then swap into place only on success. If git clone fails, the existing install is preserved instead of being deleted by the premature rm -rf. * Remove redundant upstream_tag != release_tag check load_approved_release_checksums compared checksums.upstream_tag against the Unsloth release_tag, which are different namespaces (upstream ggml-org tag vs Unsloth published tag). This only worked because both happened to be "b8508" by convention. Would break if Unsloth ever uses a different release naming scheme. The existing check at parse_approved_release_checksums (line 950) already validates the release_tag field correctly. * Fix lock TOCTOU race and build-in-temp-dir swap install_lock fallback: add os.fsync(fd) after writing PID to ensure the PID is visible to racing processes before they check. Treat empty lock files (PID not yet written) as "wait and retry" instead of stale, closing the window where two processes could both see an empty file, both unlink it, and both acquire the lock. setup.sh/setup.ps1 source-build: clone AND build in a temp directory (LLAMA_CPP_DIR.build.$$). Only swap into the final LLAMA_CPP_DIR after the build succeeds. If clone or cmake or build fails, the temp dir is cleaned up and the existing working install is preserved. Previously, rm -rf ran after clone but before build, destroying the existing install even if the build later failed. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-21 13:37:39 +00:00 · 2026-03-25 07:42:43 -05:00 · 2026-03-25 07:42:43 -05:00 · f4d8a246bf
commit f4d8a246bf
parent cc1be75621
7 changed files with 6046 additions and 64 deletions
--- a/studio/install_llama_prebuilt.py
+++ b/studio/install_llama_prebuilt.py
--- a/studio/setup.ps1
+++ b/studio/setup.ps1
@ -503,7 +503,6 @@ if ($DriverMaxCuda) {
            $isCompat = ($tkMaj -lt $drMajorCuda) -or ($tkMaj -eq $drMajorCuda -and $tkMin -le $drMinorCuda)
            if ($isCompat) {
                # Also verify the toolkit supports our GPU architecture
-                Write-Host "   [DEBUG] Checking CUDA compatibility: toolkit=$tkMaj.$tkMin arch=sm_$CudaArch" -ForegroundColor Magenta
                $archOk = $true
                if ($CudaArch) {
                    $archOk = Test-NvccArchSupport -NvccExe $candidateNvcc -Arch $CudaArch
@ -1296,6 +1295,93 @@ if ($LASTEXITCODE -ne 0) {
 $ErrorActionPreference = $prevEAP_t5
 Write-Host "[OK] Transformers 5.x pre-installed to .venv_t5/" -ForegroundColor Green

+# ==========================================================================
+#  PHASE 3.4: Prefer prebuilt llama.cpp bundles before source build
+# ==========================================================================
+$UnslothHome = Join-Path $env:USERPROFILE ".unsloth"
+if (-not (Test-Path $UnslothHome)) { New-Item -ItemType Directory -Force $UnslothHome | Out-Null }
+$LlamaCppDir = Join-Path $UnslothHome "llama.cpp"
+$NeedLlamaSourceBuild = $false
+$SkipPrebuiltInstall = $false
+$RequestedLlamaTag = if ($env:UNSLOTH_LLAMA_TAG) { $env:UNSLOTH_LLAMA_TAG } else { "latest" }
+$HelperReleaseRepo = if ($env:UNSLOTH_LLAMA_RELEASE_REPO) { $env:UNSLOTH_LLAMA_RELEASE_REPO } else { "unslothai/llama.cpp" }
+$resolveOutput = & python "$PSScriptRoot\install_llama_prebuilt.py" --resolve-install-tag $RequestedLlamaTag --published-repo $HelperReleaseRepo 2>&1
+$resolveExit = $LASTEXITCODE
+$ResolvedLlamaTag = if ($resolveOutput) { ($resolveOutput | Select-Object -Last 1).ToString().Trim() } else { "" }
+if ($resolveExit -ne 0 -or [string]::IsNullOrWhiteSpace($ResolvedLlamaTag)) {
+    Write-Host ""
+    Write-Host "[WARN] Failed to resolve an installable prebuilt llama.cpp tag via $HelperReleaseRepo" -ForegroundColor Yellow
+    if ($resolveOutput) {
+        $resolveOutput | ForEach-Object { Write-Host $_ }
+    }
+    $fallbackOutput = & python "$PSScriptRoot\install_llama_prebuilt.py" --resolve-llama-tag $RequestedLlamaTag 2>$null
+    $fallbackExit = $LASTEXITCODE
+    $ResolvedLlamaTag = if ($fallbackExit -eq 0 -and $fallbackOutput) {
+        ($fallbackOutput | Select-Object -Last 1).ToString().Trim()
+    } elseif ($RequestedLlamaTag -eq "latest") {
+        # Try Unsloth release repo first, then fall back to ggml-org upstream
+        $resolvedLatest = $null
+        try {
+            $latestRelease = Invoke-RestMethod -Uri "https://api.github.com/repos/$HelperReleaseRepo/releases/latest" -ErrorAction Stop
+            $resolvedLatest = $latestRelease.tag_name
+        } catch {}
+        if (-not $resolvedLatest) {
+            try {
+                $latestRelease = Invoke-RestMethod -Uri "https://api.github.com/repos/ggml-org/llama.cpp/releases/latest" -ErrorAction Stop
+                $resolvedLatest = $latestRelease.tag_name
+            } catch {}
+        }
+        if ($resolvedLatest) { $resolvedLatest } else { $RequestedLlamaTag }
+    } else {
+        $RequestedLlamaTag
+    }
+    $NeedLlamaSourceBuild = $true
+    $SkipPrebuiltInstall = $true
+}
+
+Write-Host ""
+Write-Host "Resolved llama.cpp release tag: $ResolvedLlamaTag" -ForegroundColor Gray
+
+if ($env:UNSLOTH_LLAMA_FORCE_COMPILE -eq "1") {
+    Write-Host ""
+    Write-Host "[WARN] UNSLOTH_LLAMA_FORCE_COMPILE=1 -- skipping prebuilt llama.cpp install" -ForegroundColor Yellow
+    $NeedLlamaSourceBuild = $true
+} else {
+    Write-Host ""
+    Write-Host "Installing prebuilt llama.cpp bundle (preferred path)..." -ForegroundColor Cyan
+    if (Test-Path $LlamaCppDir) {
+        Write-Host "Existing llama.cpp install detected -- validating staged prebuilt update before replacement" -ForegroundColor Gray
+    }
+    if ($SkipPrebuiltInstall) {
+        Write-Host "[WARN] Skipping prebuilt install because prebuilt tag resolution failed -- falling back to source build" -ForegroundColor Yellow
+    } else {
+        $prebuiltArgs = @(
+            "$PSScriptRoot\install_llama_prebuilt.py",
+            "--install-dir", $LlamaCppDir,
+            "--llama-tag", $ResolvedLlamaTag,
+            "--published-repo", $HelperReleaseRepo
+        )
+        if ($env:UNSLOTH_LLAMA_RELEASE_TAG) {
+            $prebuiltArgs += @("--published-release-tag", $env:UNSLOTH_LLAMA_RELEASE_TAG)
+        }
+        $prevEAPPrebuilt = $ErrorActionPreference
+        $ErrorActionPreference = "Continue"
+        & python @prebuiltArgs
+        $prebuiltExit = $LASTEXITCODE
+        $ErrorActionPreference = $prevEAPPrebuilt
+
+        if ($prebuiltExit -eq 0) {
+            Write-Host "[OK] Prebuilt llama.cpp installed and validated" -ForegroundColor Green
+        } else {
+            if (Test-Path $LlamaCppDir) {
+                Write-Host "[WARN] Prebuilt update failed; existing install was restored or cleaned before source build fallback" -ForegroundColor Yellow
+            }
+            Write-Host "[WARN] Prebuilt llama.cpp path unavailable or failed validation -- falling back to source build" -ForegroundColor Yellow
+            $NeedLlamaSourceBuild = $true
+        }
+    }
+}
+
 # ==========================================================================
 #  PHASE 3.5: Install OpenSSL dev (for HTTPS support in llama-server)
 # ==========================================================================
@ -1303,42 +1389,46 @@ Write-Host "[OK] Transformers 5.x pre-installed to .venv_t5/" -ForegroundColor G
 # ShiningLight.OpenSSL.Dev includes headers + libs that cmake can find.
 $OpenSslAvailable = $false

-# Check if OpenSSL dev is already installed (look for include dir)
-$OpenSslRoots = @(
-    'C:\Program Files\OpenSSL-Win64',
-    'C:\Program Files\OpenSSL',
-    'C:\OpenSSL-Win64'
-)
-$OpenSslRoot = $null
-foreach ($root in $OpenSslRoots) {
-    if (Test-Path (Join-Path $root 'include\openssl\ssl.h')) {
-        $OpenSslRoot = $root
-        break
-    }
-}
-
-if ($OpenSslRoot) {
-    $OpenSslAvailable = $true
-    Write-Host "[OK] OpenSSL dev found at $OpenSslRoot" -ForegroundColor Green
-} else {
-    Write-Host "" 
-    Write-Host "Installing OpenSSL dev (for HTTPS in llama-server)..." -ForegroundColor Cyan
-    $HasWinget = $null -ne (Get-Command winget -ErrorAction SilentlyContinue)
-    if ($HasWinget) {
-        winget install -e --id ShiningLight.OpenSSL.Dev --accept-package-agreements --accept-source-agreements
-        # Re-check after install
-        foreach ($root in $OpenSslRoots) {
-            if (Test-Path (Join-Path $root 'include\openssl\ssl.h')) {
-                $OpenSslRoot = $root
-                $OpenSslAvailable = $true
-                Write-Host "[OK] OpenSSL dev installed at $OpenSslRoot" -ForegroundColor Green
-                break
-            }
+if ($NeedLlamaSourceBuild) {
+    # Check if OpenSSL dev is already installed (look for include dir)
+    $OpenSslRoots = @(
+        'C:\Program Files\OpenSSL-Win64',
+        'C:\Program Files\OpenSSL',
+        'C:\OpenSSL-Win64'
+    )
+    $OpenSslRoot = $null
+    foreach ($root in $OpenSslRoots) {
+        if (Test-Path (Join-Path $root 'include\openssl\ssl.h')) {
+            $OpenSslRoot = $root
+            break
        }
    }
-    if (-not $OpenSslAvailable) {
-        Write-Host "[WARN] OpenSSL dev not available -- llama-server will be built without HTTPS" -ForegroundColor Yellow
+
+    if ($OpenSslRoot) {
+        $OpenSslAvailable = $true
+        Write-Host "[OK] OpenSSL dev found at $OpenSslRoot" -ForegroundColor Green
+    } else {
+        Write-Host "" 
+        Write-Host "Installing OpenSSL dev (for HTTPS in llama-server)..." -ForegroundColor Cyan
+        $HasWinget = $null -ne (Get-Command winget -ErrorAction SilentlyContinue)
+        if ($HasWinget) {
+            winget install -e --id ShiningLight.OpenSSL.Dev --accept-package-agreements --accept-source-agreements
+            # Re-check after install
+            foreach ($root in $OpenSslRoots) {
+                if (Test-Path (Join-Path $root 'include\openssl\ssl.h')) {
+                    $OpenSslRoot = $root
+                    $OpenSslAvailable = $true
+                    Write-Host "[OK] OpenSSL dev installed at $OpenSslRoot" -ForegroundColor Green
+                    break
+                }
+            }
+        }
+        if (-not $OpenSslAvailable) {
+            Write-Host "[WARN] OpenSSL dev not available -- llama-server will be built without HTTPS" -ForegroundColor Yellow
+        }
    }
+} else {
+    Write-Host "[SKIP] OpenSSL dev install -- prebuilt llama.cpp already validated" -ForegroundColor Yellow
 }

 # ==========================================================================
@ -1351,9 +1441,7 @@ if ($OpenSslRoot) {
 #   - llama-server:   for GGUF model inference (with HTTPS if OpenSSL available)
 #   - llama-quantize: for GGUF export quantization
 # Prerequisites (git, cmake, VS Build Tools, CUDA Toolkit) already installed in Phase 1.
-$UnslothHome = Join-Path $env:USERPROFILE ".unsloth"
-if (-not (Test-Path $UnslothHome)) { New-Item -ItemType Directory -Force $UnslothHome | Out-Null }
-$LlamaCppDir = Join-Path $UnslothHome "llama.cpp"
+$OriginalLlamaCppDir = $LlamaCppDir
 $BuildDir = Join-Path $LlamaCppDir "build"
 $LlamaServerBin = Join-Path $BuildDir "bin\Release\llama-server.exe"

@ -1376,7 +1464,10 @@ if (Test-Path $LlamaServerBin) {
    }
 }

-if ((Test-Path $LlamaServerBin) -and -not $NeedRebuild) {
+if (-not $NeedLlamaSourceBuild) {
+    Write-Host ""
+    Write-Host "[OK] Using validated prebuilt llama.cpp install at $LlamaCppDir" -ForegroundColor Green
+} elseif ((Test-Path $LlamaServerBin) -and -not $NeedRebuild) {
    Write-Host ""
    Write-Host "[OK] llama-server already exists at $LlamaServerBin" -ForegroundColor Green
 } elseif (-not $HasCmakeForBuild) {
@ -1432,29 +1523,49 @@ if ((Test-Path $LlamaServerBin) -and -not $NeedRebuild) {

    # -- Step A: Clone or pull llama.cpp --

+    $UseConcreteRef = ($ResolvedLlamaTag -ne "latest" -and -not [string]::IsNullOrWhiteSpace($ResolvedLlamaTag))
+
    if (Test-Path (Join-Path $LlamaCppDir ".git")) {
-        Write-Host "   llama.cpp repo already cloned, pulling latest..." -ForegroundColor Gray
-        git -C $LlamaCppDir pull 2>&1 | Out-Null
+        Write-Host "   Syncing llama.cpp to $ResolvedLlamaTag..." -ForegroundColor Gray
+        if ($UseConcreteRef) {
+            git -C $LlamaCppDir fetch --depth 1 origin $ResolvedLlamaTag 2>&1 | Out-Null
+        } else {
+            git -C $LlamaCppDir fetch --depth 1 origin 2>&1 | Out-Null
+        }
        if ($LASTEXITCODE -ne 0) {
-            Write-Host "   [WARN] git pull failed -- using existing source" -ForegroundColor Yellow
+            Write-Host "   [WARN] git fetch failed -- using existing source" -ForegroundColor Yellow
+        } else {
+            git -C $LlamaCppDir checkout -B unsloth-llama-build FETCH_HEAD 2>&1 | Out-Null
+            if ($LASTEXITCODE -ne 0) {
+                $BuildOk = $false
+                $FailedStep = "git checkout"
+            } else {
+                git -C $LlamaCppDir clean -fdx 2>&1 | Out-Null
+            }
        }
    } else {
-        Write-Host "   Cloning llama.cpp..." -ForegroundColor Gray
-        if (Test-Path $LlamaCppDir) { Remove-Item -Recurse -Force $LlamaCppDir }
-        git clone --depth 1 https://github.com/ggml-org/llama.cpp.git $LlamaCppDir 2>&1 | Out-Null
+        Write-Host "   Cloning llama.cpp @ $ResolvedLlamaTag..." -ForegroundColor Gray
+        $buildTmp = "$LlamaCppDir.build.$PID"
+        if (Test-Path $buildTmp) { Remove-Item -Recurse -Force $buildTmp }
+        $cloneArgs = @("clone", "--depth", "1")
+        if ($UseConcreteRef) {
+            $cloneArgs += @("--branch", $ResolvedLlamaTag)
+        }
+        $cloneArgs += @("https://github.com/ggml-org/llama.cpp.git", $buildTmp)
+        git @cloneArgs 2>&1 | Out-Null
        if ($LASTEXITCODE -ne 0) {
            $BuildOk = $false
            $FailedStep = "git clone"
+            if (Test-Path $buildTmp) { Remove-Item -Recurse -Force $buildTmp }
+        }
+        # Use temp dir for build; swap into $LlamaCppDir only after build succeeds
+        if ($BuildOk) {
+            $LlamaCppDir = $buildTmp
+            $BuildDir = Join-Path $LlamaCppDir "build"
        }
    }

    # -- Step B: cmake configure --
-    # Clean stale CMake cache to prevent previous CUDA settings from leaking
-    # into a CPU-only rebuild (or vice versa).
-    $CmakeCacheFile = Join-Path $BuildDir "CMakeCache.txt"
-    if (Test-Path $CmakeCacheFile) {
-        Remove-Item -Recurse -Force $BuildDir
-    }

    if ($BuildOk) {
        Write-Host ""
@ -1555,6 +1666,21 @@ if ((Test-Path $LlamaServerBin) -and -not $NeedRebuild) {
        }
    }

+    # Swap temp build dir into final location (only if we built in a temp dir)
+    if ($BuildOk -and $LlamaCppDir -ne $OriginalLlamaCppDir) {
+        if (Test-Path $OriginalLlamaCppDir) { Remove-Item -Recurse -Force $OriginalLlamaCppDir }
+        Move-Item $LlamaCppDir $OriginalLlamaCppDir
+        $LlamaCppDir = $OriginalLlamaCppDir
+        $BuildDir = Join-Path $LlamaCppDir "build"
+        $LlamaServerBin = Join-Path $BuildDir "bin\Release\llama-server.exe"
+    } elseif (-not $BuildOk -and $LlamaCppDir -ne $OriginalLlamaCppDir) {
+        # Build failed -- clean up temp dir, preserve existing install
+        if (Test-Path $LlamaCppDir) { Remove-Item -Recurse -Force $LlamaCppDir }
+        $LlamaCppDir = $OriginalLlamaCppDir
+        $BuildDir = Join-Path $LlamaCppDir "build"
+        $LlamaServerBin = Join-Path $BuildDir "bin\Release\llama-server.exe"
+    }
+
    # Restore ErrorActionPreference
    $ErrorActionPreference = $prevEAP

--- a/studio/setup.sh
+++ b/studio/setup.sh
@ -341,10 +341,98 @@ else
    echo "✅ Python dependencies up to date — skipping"
 fi

-# ── 7. WSL: pre-install GGUF build dependencies ──
+# ── 7. Prefer prebuilt llama.cpp bundles before any source build path ──
+UNSLOTH_HOME="$HOME/.unsloth"
+mkdir -p "$UNSLOTH_HOME"
+LLAMA_CPP_DIR="$UNSLOTH_HOME/llama.cpp"
+LLAMA_SERVER_BIN="$LLAMA_CPP_DIR/build/bin/llama-server"
+_NEED_LLAMA_SOURCE_BUILD=false
+_LLAMA_FORCE_COMPILE="${UNSLOTH_LLAMA_FORCE_COMPILE:-0}"
+_REQUESTED_LLAMA_TAG="${UNSLOTH_LLAMA_TAG:-latest}"
+_HELPER_RELEASE_REPO="${UNSLOTH_LLAMA_RELEASE_REPO:-unslothai/llama.cpp}"
+_RESOLVE_LLAMA_LOG="$(mktemp)"
+set +e
+python "$SCRIPT_DIR/install_llama_prebuilt.py" \
+    --resolve-install-tag "$_REQUESTED_LLAMA_TAG" \
+    --published-repo "$_HELPER_RELEASE_REPO" >"$_RESOLVE_LLAMA_LOG" 2>&1
+_RESOLVE_LLAMA_STATUS=$?
+set -e
+if [ "$_RESOLVE_LLAMA_STATUS" -eq 0 ]; then
+    _RESOLVED_LLAMA_TAG="$(tail -n 1 "$_RESOLVE_LLAMA_LOG" | tr -d '\r')"
+else
+    _RESOLVED_LLAMA_TAG=""
+fi
+if [ -z "$_RESOLVED_LLAMA_TAG" ]; then
+    echo ""
+    echo "⚠️  Failed to resolve an installable prebuilt llama.cpp tag via $_HELPER_RELEASE_REPO"
+    cat "$_RESOLVE_LLAMA_LOG" >&2 || true
+    set +e
+    _RESOLVED_LLAMA_TAG="$(python "$SCRIPT_DIR/install_llama_prebuilt.py" --resolve-llama-tag "$_REQUESTED_LLAMA_TAG" 2>/dev/null)"
+    _RESOLVE_UPSTREAM_STATUS=$?
+    set -e
+    if [ "$_RESOLVE_UPSTREAM_STATUS" -ne 0 ] || [ -z "$_RESOLVED_LLAMA_TAG" ]; then
+        if [ "$_REQUESTED_LLAMA_TAG" = "latest" ]; then
+            # Try Unsloth release repo first, then fall back to ggml-org upstream
+            _RESOLVED_LLAMA_TAG="$(curl -fsSL "https://api.github.com/repos/${_HELPER_RELEASE_REPO}/releases/latest" 2>/dev/null | python -c "import sys,json; print(json.load(sys.stdin)['tag_name'])" 2>/dev/null)" || _RESOLVED_LLAMA_TAG=""
+            if [ -z "$_RESOLVED_LLAMA_TAG" ]; then
+                _RESOLVED_LLAMA_TAG="$(curl -fsSL https://api.github.com/repos/ggml-org/llama.cpp/releases/latest 2>/dev/null | python -c "import sys,json; print(json.load(sys.stdin)['tag_name'])" 2>/dev/null)" || _RESOLVED_LLAMA_TAG=""
+            fi
+        fi
+        if [ -z "$_RESOLVED_LLAMA_TAG" ]; then
+            _RESOLVED_LLAMA_TAG="$_REQUESTED_LLAMA_TAG"
+        fi
+    fi
+    _NEED_LLAMA_SOURCE_BUILD=true
+    _SKIP_PREBUILT_INSTALL=true
+fi
+rm -f "$_RESOLVE_LLAMA_LOG"
+
+echo ""
+echo "Resolved llama.cpp release tag: $_RESOLVED_LLAMA_TAG"
+
+if [ "$_LLAMA_FORCE_COMPILE" = "1" ]; then
+    echo ""
+    echo "⚠️  UNSLOTH_LLAMA_FORCE_COMPILE=1 -- skipping prebuilt llama.cpp install"
+    _NEED_LLAMA_SOURCE_BUILD=true
+else
+    echo ""
+    echo "Installing prebuilt llama.cpp bundle (preferred path)..."
+    if [ -d "$LLAMA_CPP_DIR" ]; then
+        echo "Existing llama.cpp install detected -- validating staged prebuilt update before replacement"
+    fi
+    if [ "${_SKIP_PREBUILT_INSTALL:-false}" = true ]; then
+        echo "⚠️  Skipping prebuilt install because prebuilt tag resolution failed -- falling back to source build"
+    else
+        _PREBUILT_CMD=(
+            python "$SCRIPT_DIR/install_llama_prebuilt.py"
+            --install-dir "$LLAMA_CPP_DIR"
+            --llama-tag "$_RESOLVED_LLAMA_TAG"
+            --published-repo "$_HELPER_RELEASE_REPO"
+        )
+        if [ -n "${UNSLOTH_LLAMA_RELEASE_TAG:-}" ]; then
+            _PREBUILT_CMD+=(--published-release-tag "$UNSLOTH_LLAMA_RELEASE_TAG")
+        fi
+        set +e
+        "${_PREBUILT_CMD[@]}"
+        _PREBUILT_STATUS=$?
+        set -e
+
+        if [ "$_PREBUILT_STATUS" -eq 0 ]; then
+            echo "✅ Prebuilt llama.cpp installed and validated"
+        else
+            if [ -d "$LLAMA_CPP_DIR" ]; then
+                echo "⚠️  Prebuilt update failed; existing install was restored or cleaned before source build fallback"
+            fi
+            echo "⚠️  Prebuilt llama.cpp path unavailable or failed validation -- falling back to source build"
+            _NEED_LLAMA_SOURCE_BUILD=true
+        fi
+    fi
+fi
+
+# ── 8. WSL: pre-install GGUF build dependencies for fallback source builds ──
 # On WSL, sudo requires a password and can't be entered during GGUF export
 # (runs in a non-interactive subprocess). Install build deps here instead.
-if grep -qi microsoft /proc/version 2>/dev/null; then
+if [ "$_NEED_LLAMA_SOURCE_BUILD" = true ] && grep -qi microsoft /proc/version 2>/dev/null; then
    echo ""
    echo "⚠️  WSL detected -- installing build dependencies for GGUF export..."
    _GGUF_DEPS="pciutils build-essential cmake curl git libcurl4-openssl-dev"
@ -402,22 +490,19 @@ if grep -qi microsoft /proc/version 2>/dev/null; then
    fi
 fi

-# ── 8. Build llama.cpp binaries for GGUF inference + export ──
+# ── 9. Build llama.cpp binaries for GGUF inference + export when prebuilt install fails ──
 # Builds at ~/.unsloth/llama.cpp — a single shared location under the user's
 # home directory. This is used by both the inference server and the GGUF
 # export pipeline (unsloth-zoo).
 #   - llama-server: for GGUF model inference
 #   - llama-quantize: for GGUF export quantization (symlinked to root for check_llama_cpp())
-UNSLOTH_HOME="$HOME/.unsloth"
-mkdir -p "$UNSLOTH_HOME"
-LLAMA_CPP_DIR="$UNSLOTH_HOME/llama.cpp"
-LLAMA_SERVER_BIN="$LLAMA_CPP_DIR/build/bin/llama-server"
-if [ "${_SKIP_GGUF_BUILD:-}" = true ]; then
+if [ "$_NEED_LLAMA_SOURCE_BUILD" = false ]; then
+    :
+elif [ "${_SKIP_GGUF_BUILD:-}" = true ]; then
    echo ""
    echo "Skipping llama-server build (missing dependencies)"
    echo "   Install the missing packages and re-run setup to enable GGUF inference."
 else
-rm -rf "$LLAMA_CPP_DIR"
 {
    # Check prerequisites
    if ! command -v cmake &>/dev/null; then
@ -432,7 +517,13 @@ rm -rf "$LLAMA_CPP_DIR"
        echo "Building llama-server for GGUF inference..."

        BUILD_OK=true
-        run_quiet_no_exit "clone llama.cpp" git clone --depth 1 https://github.com/ggml-org/llama.cpp.git "$LLAMA_CPP_DIR" || BUILD_OK=false
+        _CLONE_BRANCH_ARGS=()
+        if [ "$_RESOLVED_LLAMA_TAG" != "latest" ] && [ -n "$_RESOLVED_LLAMA_TAG" ]; then
+            _CLONE_BRANCH_ARGS=(--branch "$_RESOLVED_LLAMA_TAG")
+        fi
+        _BUILD_TMP="${LLAMA_CPP_DIR}.build.$$"
+        rm -rf "$_BUILD_TMP"
+        run_quiet_no_exit "clone llama.cpp" git clone --depth 1 "${_CLONE_BRANCH_ARGS[@]}" https://github.com/ggml-org/llama.cpp.git "$_BUILD_TMP" || BUILD_OK=false

        if [ "$BUILD_OK" = true ]; then
            # Skip tests/examples we don't need (faster build)
@ -571,21 +662,29 @@ rm -rf "$LLAMA_CPP_DIR"
                CMAKE_GENERATOR_ARGS="-G Ninja"
            fi

-            run_quiet_no_exit "cmake llama.cpp" cmake $CMAKE_GENERATOR_ARGS -S "$LLAMA_CPP_DIR" -B "$LLAMA_CPP_DIR/build" $CMAKE_ARGS || BUILD_OK=false
+            run_quiet_no_exit "cmake llama.cpp" cmake $CMAKE_GENERATOR_ARGS -S "$_BUILD_TMP" -B "$_BUILD_TMP/build" $CMAKE_ARGS || BUILD_OK=false
        fi

        if [ "$BUILD_OK" = true ]; then
-            run_quiet_no_exit "build llama-server" cmake --build "$LLAMA_CPP_DIR/build" --config Release --target llama-server -j"$NCPU" || BUILD_OK=false
+            run_quiet_no_exit "build llama-server" cmake --build "$_BUILD_TMP/build" --config Release --target llama-server -j"$NCPU" || BUILD_OK=false
        fi

        # Also build llama-quantize (needed by unsloth-zoo's GGUF export pipeline)
        if [ "$BUILD_OK" = true ]; then
-            run_quiet_no_exit "build llama-quantize" cmake --build "$LLAMA_CPP_DIR/build" --config Release --target llama-quantize -j"$NCPU" || true
-            # Symlink to llama.cpp root — check_llama_cpp() looks for the binary there
+            run_quiet_no_exit "build llama-quantize" cmake --build "$_BUILD_TMP/build" --config Release --target llama-quantize -j"$NCPU" || true
+        fi
+
+        # Swap only after build succeeds -- preserves existing install on failure
+        if [ "$BUILD_OK" = true ]; then
+            rm -rf "$LLAMA_CPP_DIR"
+            mv "$_BUILD_TMP" "$LLAMA_CPP_DIR"
+            # Symlink to llama.cpp root -- check_llama_cpp() looks for the binary there
            QUANTIZE_BIN="$LLAMA_CPP_DIR/build/bin/llama-quantize"
            if [ -f "$QUANTIZE_BIN" ]; then
                ln -sf build/bin/llama-quantize "$LLAMA_CPP_DIR/llama-quantize"
            fi
+        else
+            rm -rf "$_BUILD_TMP"
        fi

        if [ "$BUILD_OK" = true ]; then
--- a/tests/studio/install/smoke_test_llama_prebuilt.py
+++ b/tests/studio/install/smoke_test_llama_prebuilt.py
@ -0,0 +1,142 @@
+#!/usr/bin/env python3
+from __future__ import annotations
+
+import argparse
+import importlib.util
+import shutil
+import sys
+import tempfile
+import time
+from pathlib import Path
+
+
+PACKAGE_ROOT = Path(__file__).resolve().parents[3]
+INSTALLER_PATH = PACKAGE_ROOT / "studio" / "install_llama_prebuilt.py"
+
+
+def load_installer_module():
+    spec = importlib.util.spec_from_file_location(
+        "studio_install_llama_prebuilt", INSTALLER_PATH
+    )
+    if spec is None or spec.loader is None:
+        raise RuntimeError(f"unable to load installer module from {INSTALLER_PATH}")
+    module = importlib.util.module_from_spec(spec)
+    sys.modules[spec.name] = module
+    spec.loader.exec_module(module)
+    return module
+
+
+installer = load_installer_module()
+
+
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description = (
+            "Run a real end-to-end prebuilt llama.cpp install into an isolated temporary "
+            "directory on the current machine."
+        )
+    )
+    parser.add_argument(
+        "--llama-tag",
+        default = "latest",
+        help = "llama.cpp tag to resolve. Defaults to the approved prebuilt tag for this host.",
+    )
+    parser.add_argument(
+        "--published-repo",
+        default = installer.DEFAULT_PUBLISHED_REPO,
+        help = "Published bundle repository used for Linux CUDA selection.",
+    )
+    parser.add_argument(
+        "--published-release-tag",
+        default = installer.DEFAULT_PUBLISHED_TAG or "",
+        help = "Optional published GitHub release tag to pin.",
+    )
+    parser.add_argument(
+        "--work-dir",
+        default = "",
+        help = (
+            "Optional directory under which the smoke install temp dir will be created. "
+            "If omitted, defaults to ./.tmp/llama-prebuilt-smoke under the current directory."
+        ),
+    )
+    parser.add_argument(
+        "--keep-temp",
+        action = "store_true",
+        help = "Keep the temporary smoke install directory after success.",
+    )
+    return parser.parse_args()
+
+
+def smoke_root_base(work_dir: str) -> Path:
+    if work_dir:
+        return Path(work_dir).expanduser().resolve()
+    return (Path.cwd() / ".tmp" / "llama-prebuilt-smoke").resolve()
+
+
+def make_smoke_root(base_dir: Path) -> Path:
+    base_dir.mkdir(parents = True, exist_ok = True)
+    timestamp = time.strftime("%Y%m%d%H%M%S", time.gmtime())
+    return Path(tempfile.mkdtemp(prefix = f"run-{timestamp}-", dir = base_dir))
+
+
+def main() -> int:
+    args = parse_args()
+    host = installer.detect_host()
+    smoke_base = smoke_root_base(args.work_dir)
+    smoke_root = make_smoke_root(smoke_base)
+    install_dir = smoke_root / "install" / "llama.cpp"
+    choice = None
+
+    print(f"[smoke] host={host.system} machine={host.machine}")
+    print(f"[smoke] temp_root={smoke_root}")
+
+    try:
+        requested_tag, resolved_tag, attempts, _approved_checksums = (
+            installer.resolve_install_attempts(
+                args.llama_tag,
+                host,
+                args.published_repo,
+                args.published_release_tag,
+            )
+        )
+        choice = attempts[0]
+        print(f"[smoke] requested_tag={requested_tag}")
+        print(f"[smoke] resolved_tag={resolved_tag}")
+        print(f"[smoke] selected_asset={choice.name}")
+        print(f"[smoke] selected_source={choice.source_label}")
+        print(f"[smoke] install_dir={install_dir}")
+        installer.install_prebuilt(
+            install_dir = install_dir,
+            llama_tag = args.llama_tag,
+            published_repo = args.published_repo,
+            published_release_tag = args.published_release_tag,
+        )
+        print(f"[smoke] PASS install_dir={install_dir}")
+        print(
+            "[smoke] note=This was a real prebuilt install into an isolated temp directory."
+        )
+        return installer.EXIT_SUCCESS
+    except SystemExit as exc:
+        code = int(exc.code) if isinstance(exc.code, int) else installer.EXIT_ERROR
+        if code == installer.EXIT_FALLBACK:
+            print(f"[smoke] FALLBACK install_dir={install_dir}")
+            print(
+                "[smoke] note=Prebuilt path failed and would fall back to source build in setup."
+            )
+            print(installer.collect_system_report(host, choice, install_dir))
+        else:
+            print(f"[smoke] ERROR exit_code={code} install_dir={install_dir}")
+        return code
+    except Exception as exc:
+        print(f"[smoke] ERROR {exc}")
+        print(installer.collect_system_report(host, choice, install_dir))
+        return installer.EXIT_ERROR
+    finally:
+        if args.keep_temp:
+            print(f"[smoke] keeping_temp_root={smoke_root}")
+        elif smoke_root.exists():
+            shutil.rmtree(smoke_root, ignore_errors = True)
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/tests/studio/install/test_install_llama_prebuilt_logic.py
+++ b/tests/studio/install/test_install_llama_prebuilt_logic.py
@ -0,0 +1,630 @@
+import importlib.util
+import io
+import json
+import os
+import sys
+import tarfile
+import zipfile
+from pathlib import Path
+
+import pytest
+
+
+PACKAGE_ROOT = Path(__file__).resolve().parents[3]
+MODULE_PATH = PACKAGE_ROOT / "studio" / "install_llama_prebuilt.py"
+SPEC = importlib.util.spec_from_file_location(
+    "studio_install_llama_prebuilt", MODULE_PATH
+)
+assert SPEC is not None and SPEC.loader is not None
+INSTALL_LLAMA_PREBUILT = importlib.util.module_from_spec(SPEC)
+sys.modules[SPEC.name] = INSTALL_LLAMA_PREBUILT
+SPEC.loader.exec_module(INSTALL_LLAMA_PREBUILT)
+
+PrebuiltFallback = INSTALL_LLAMA_PREBUILT.PrebuiltFallback
+extract_archive = INSTALL_LLAMA_PREBUILT.extract_archive
+binary_env = INSTALL_LLAMA_PREBUILT.binary_env
+HostInfo = INSTALL_LLAMA_PREBUILT.HostInfo
+AssetChoice = INSTALL_LLAMA_PREBUILT.AssetChoice
+ApprovedArtifactHash = INSTALL_LLAMA_PREBUILT.ApprovedArtifactHash
+ApprovedReleaseChecksums = INSTALL_LLAMA_PREBUILT.ApprovedReleaseChecksums
+hydrate_source_tree = INSTALL_LLAMA_PREBUILT.hydrate_source_tree
+validate_prebuilt_choice = INSTALL_LLAMA_PREBUILT.validate_prebuilt_choice
+activate_install_tree = INSTALL_LLAMA_PREBUILT.activate_install_tree
+create_install_staging_dir = INSTALL_LLAMA_PREBUILT.create_install_staging_dir
+sha256_file = INSTALL_LLAMA_PREBUILT.sha256_file
+source_archive_logical_name = INSTALL_LLAMA_PREBUILT.source_archive_logical_name
+
+
+def approved_checksums_for(
+    upstream_tag: str, *, source_archive: Path, bundle_archive: Path, bundle_name: str
+) -> ApprovedReleaseChecksums:
+    return ApprovedReleaseChecksums(
+        repo = "local",
+        release_tag = upstream_tag,
+        upstream_tag = upstream_tag,
+        source_commit = None,
+        artifacts = {
+            source_archive_logical_name(upstream_tag): ApprovedArtifactHash(
+                asset_name = source_archive_logical_name(upstream_tag),
+                sha256 = sha256_file(source_archive),
+                repo = "ggml-org/llama.cpp",
+                kind = "upstream-source",
+            ),
+            bundle_name: ApprovedArtifactHash(
+                asset_name = bundle_name,
+                sha256 = sha256_file(bundle_archive),
+                repo = "local",
+                kind = "local-test-bundle",
+            ),
+        },
+    )
+
+
+def test_extract_archive_allows_safe_tar_symlink_chain(tmp_path: Path):
+    archive_path = tmp_path / "bundle.tar.gz"
+    payload = b"shared-object"
+
+    with tarfile.open(archive_path, "w:gz") as archive:
+        versioned = tarfile.TarInfo("libllama.so.0.0.1")
+        versioned.size = len(payload)
+        archive.addfile(versioned, io_bytes(payload))
+
+        soname = tarfile.TarInfo("libllama.so.0")
+        soname.type = tarfile.SYMTYPE
+        soname.linkname = "libllama.so.0.0.1"
+        archive.addfile(soname)
+
+        linker_name = tarfile.TarInfo("libllama.so")
+        linker_name.type = tarfile.SYMTYPE
+        linker_name.linkname = "libllama.so.0"
+        archive.addfile(linker_name)
+
+    destination = tmp_path / "extract"
+    extract_archive(archive_path, destination)
+
+    assert (destination / "libllama.so.0.0.1").read_bytes() == payload
+    assert (destination / "libllama.so.0").is_symlink()
+    assert (destination / "libllama.so").is_symlink()
+    assert (destination / "libllama.so").resolve().read_bytes() == payload
+
+
+def test_extract_archive_allows_safe_tar_hardlink(tmp_path: Path):
+    archive_path = tmp_path / "bundle.tar.gz"
+    payload = b"quantize"
+
+    with tarfile.open(archive_path, "w:gz") as archive:
+        target = tarfile.TarInfo("llama-quantize")
+        target.size = len(payload)
+        archive.addfile(target, io_bytes(payload))
+
+        hardlink = tarfile.TarInfo("llama-quantize-copy")
+        hardlink.type = tarfile.LNKTYPE
+        hardlink.linkname = "llama-quantize"
+        archive.addfile(hardlink)
+
+    destination = tmp_path / "extract"
+    extract_archive(archive_path, destination)
+
+    assert (destination / "llama-quantize-copy").read_bytes() == payload
+    assert not (destination / "llama-quantize-copy").is_symlink()
+
+
+def test_extract_archive_rejects_absolute_tar_symlink_target(tmp_path: Path):
+    archive_path = tmp_path / "bundle.tar.gz"
+
+    with tarfile.open(archive_path, "w:gz") as archive:
+        entry = tarfile.TarInfo("libllama.so")
+        entry.type = tarfile.SYMTYPE
+        entry.linkname = "/tmp/libllama.so.0"
+        archive.addfile(entry)
+
+    with pytest.raises(PrebuiltFallback, match = "archive link used an absolute target"):
+        extract_archive(archive_path, tmp_path / "extract")
+
+
+def test_extract_archive_rejects_escaping_tar_symlink_target(tmp_path: Path):
+    archive_path = tmp_path / "bundle.tar.gz"
+
+    with tarfile.open(archive_path, "w:gz") as archive:
+        entry = tarfile.TarInfo("libllama.so")
+        entry.type = tarfile.SYMTYPE
+        entry.linkname = "../outside/libllama.so.0"
+        archive.addfile(entry)
+
+    with pytest.raises(PrebuiltFallback, match = "archive link escaped destination"):
+        extract_archive(archive_path, tmp_path / "extract")
+
+
+def test_extract_archive_rejects_unresolved_tar_symlink_target(tmp_path: Path):
+    archive_path = tmp_path / "bundle.tar.gz"
+
+    with tarfile.open(archive_path, "w:gz") as archive:
+        entry = tarfile.TarInfo("libllama.so")
+        entry.type = tarfile.SYMTYPE
+        entry.linkname = "libllama.so.0"
+        archive.addfile(entry)
+
+    with pytest.raises(PrebuiltFallback, match = "unresolved link entries"):
+        extract_archive(archive_path, tmp_path / "extract")
+
+
+def test_extract_archive_rejects_zip_symlink_entry(tmp_path: Path):
+    archive_path = tmp_path / "bundle.zip"
+
+    with zipfile.ZipFile(archive_path, "w") as archive:
+        info = zipfile.ZipInfo("libllama.so")
+        info.create_system = 3
+        info.external_attr = 0o120777 << 16
+        archive.writestr(info, "libllama.so.0")
+
+    with pytest.raises(PrebuiltFallback, match = "zip archive contained a symlink entry"):
+        extract_archive(archive_path, tmp_path / "extract")
+
+
+def test_hydrate_source_tree_extracts_upstream_archive_contents(
+    tmp_path: Path, monkeypatch: pytest.MonkeyPatch
+):
+    upstream_tag = "b9999"
+    archive_path = tmp_path / "llama.cpp-source.tar.gz"
+    with tarfile.open(archive_path, "w:gz") as archive:
+        add_bytes_to_tar(
+            archive,
+            f"llama.cpp-{upstream_tag}/CMakeLists.txt",
+            b"cmake_minimum_required(VERSION 3.14)\n",
+        )
+        add_bytes_to_tar(
+            archive,
+            f"llama.cpp-{upstream_tag}/convert_hf_to_gguf.py",
+            b"#!/usr/bin/env python3\nimport gguf\n",
+        )
+        add_bytes_to_tar(
+            archive,
+            f"llama.cpp-{upstream_tag}/gguf-py/gguf/__init__.py",
+            b"__all__ = []\n",
+        )
+
+    source_urls = set(INSTALL_LLAMA_PREBUILT.upstream_source_archive_urls(upstream_tag))
+
+    def fake_download_file(url: str, destination: Path) -> None:
+        assert url in source_urls
+        destination.write_bytes(archive_path.read_bytes())
+
+    monkeypatch.setattr(INSTALL_LLAMA_PREBUILT, "download_file", fake_download_file)
+
+    install_dir = tmp_path / "install"
+    work_dir = tmp_path / "work"
+    work_dir.mkdir()
+    hydrate_source_tree(
+        upstream_tag, install_dir, work_dir, expected_sha256 = sha256_file(archive_path)
+    )
+
+    assert (install_dir / "CMakeLists.txt").exists()
+    assert (install_dir / "convert_hf_to_gguf.py").exists()
+    assert (install_dir / "gguf-py" / "gguf" / "__init__.py").exists()
+    assert not (install_dir / f"llama.cpp-{upstream_tag}").exists()
+
+
+def test_validate_prebuilt_choice_creates_repo_shaped_linux_install(
+    tmp_path: Path, monkeypatch: pytest.MonkeyPatch
+):
+    upstream_tag = "b9998"
+    bundle_name = "app-b9998-linux-x64-cuda13-newer.tar.gz"
+    source_archive = tmp_path / "source.tar.gz"
+    bundle_archive = tmp_path / "bundle.tar.gz"
+    with tarfile.open(source_archive, "w:gz") as archive:
+        add_bytes_to_tar(
+            archive,
+            f"llama.cpp-{upstream_tag}/CMakeLists.txt",
+            b"cmake_minimum_required(VERSION 3.14)\n",
+        )
+        add_bytes_to_tar(
+            archive,
+            f"llama.cpp-{upstream_tag}/convert_hf_to_gguf.py",
+            b"#!/usr/bin/env python3\nimport gguf\n",
+        )
+        add_bytes_to_tar(
+            archive,
+            f"llama.cpp-{upstream_tag}/gguf-py/gguf/__init__.py",
+            b"__all__ = []\n",
+        )
+    with tarfile.open(bundle_archive, "w:gz") as archive:
+        add_bytes_to_tar(archive, "llama-server", b"#!/bin/sh\nexit 0\n", mode = 0o755)
+        add_bytes_to_tar(archive, "llama-quantize", b"#!/bin/sh\nexit 0\n", mode = 0o755)
+        add_bytes_to_tar(archive, "libllama.so.0.0.1", b"libllama")
+        add_symlink_to_tar(archive, "libllama.so.0", "libllama.so.0.0.1")
+        add_symlink_to_tar(archive, "libllama.so", "libllama.so.0")
+        add_bytes_to_tar(archive, "libggml.so.0.9.8", b"libggml")
+        add_symlink_to_tar(archive, "libggml.so.0", "libggml.so.0.9.8")
+        add_symlink_to_tar(archive, "libggml.so", "libggml.so.0")
+        add_bytes_to_tar(archive, "libggml-base.so.0.9.8", b"libggml-base")
+        add_symlink_to_tar(archive, "libggml-base.so.0", "libggml-base.so.0.9.8")
+        add_symlink_to_tar(archive, "libggml-base.so", "libggml-base.so.0")
+        add_bytes_to_tar(archive, "libggml-cpu-x64.so.0.9.8", b"libggml-cpu")
+        add_symlink_to_tar(archive, "libggml-cpu-x64.so.0", "libggml-cpu-x64.so.0.9.8")
+        add_symlink_to_tar(archive, "libggml-cpu-x64.so", "libggml-cpu-x64.so.0")
+        add_bytes_to_tar(archive, "libmtmd.so.0.0.1", b"libmtmd")
+        add_symlink_to_tar(archive, "libmtmd.so.0", "libmtmd.so.0.0.1")
+        add_symlink_to_tar(archive, "libmtmd.so", "libmtmd.so.0")
+        add_bytes_to_tar(archive, "BUILD_INFO.txt", b"bundle metadata\n")
+        add_bytes_to_tar(archive, "THIRD_PARTY_LICENSES.txt", b"licenses\n")
+
+    source_urls = set(INSTALL_LLAMA_PREBUILT.upstream_source_archive_urls(upstream_tag))
+
+    def fake_download_file(url: str, destination: Path) -> None:
+        if url in source_urls:
+            destination.write_bytes(source_archive.read_bytes())
+            return
+        if url == "file://bundle":
+            destination.write_bytes(bundle_archive.read_bytes())
+            return
+        raise AssertionError(f"unexpected download url: {url}")
+
+    monkeypatch.setattr(INSTALL_LLAMA_PREBUILT, "download_file", fake_download_file)
+    monkeypatch.setattr(
+        INSTALL_LLAMA_PREBUILT,
+        "download_bytes",
+        lambda url, **_: b"#!/usr/bin/env python3\nimport gguf\n",
+    )
+    monkeypatch.setattr(
+        INSTALL_LLAMA_PREBUILT,
+        "preflight_linux_installed_binaries",
+        lambda *args, **kwargs: None,
+    )
+    monkeypatch.setattr(
+        INSTALL_LLAMA_PREBUILT, "validate_quantize", lambda *args, **kwargs: None
+    )
+    monkeypatch.setattr(
+        INSTALL_LLAMA_PREBUILT, "validate_server", lambda *args, **kwargs: None
+    )
+
+    host = HostInfo(
+        system = "Linux",
+        machine = "x86_64",
+        is_windows = False,
+        is_linux = True,
+        is_macos = False,
+        is_x86_64 = True,
+        is_arm64 = False,
+        nvidia_smi = None,
+        driver_cuda_version = None,
+        compute_caps = [],
+        visible_cuda_devices = None,
+        has_physical_nvidia = False,
+        has_usable_nvidia = False,
+    )
+    choice = AssetChoice(
+        repo = "local",
+        tag = upstream_tag,
+        name = bundle_name,
+        url = "file://bundle",
+        source_label = "local",
+        is_ready_bundle = True,
+        install_kind = "linux-cuda",
+        bundle_profile = "cuda13-newer",
+        runtime_line = "cuda13",
+        expected_sha256 = sha256_file(bundle_archive),
+    )
+
+    install_dir = tmp_path / "install"
+    work_dir = tmp_path / "work"
+    work_dir.mkdir()
+    probe_path = tmp_path / "stories260K.gguf"
+    quantized_path = tmp_path / "stories260K-q4.gguf"
+    validate_prebuilt_choice(
+        choice,
+        host,
+        install_dir,
+        work_dir,
+        probe_path,
+        requested_tag = upstream_tag,
+        llama_tag = upstream_tag,
+        approved_checksums = approved_checksums_for(
+            upstream_tag,
+            source_archive = source_archive,
+            bundle_archive = bundle_archive,
+            bundle_name = bundle_name,
+        ),
+        prebuilt_fallback_used = False,
+        quantized_path = quantized_path,
+    )
+
+    assert (install_dir / "gguf-py" / "gguf" / "__init__.py").exists()
+    assert (install_dir / "convert_hf_to_gguf.py").exists()
+    assert (install_dir / "build" / "bin" / "llama-server").exists()
+    assert (install_dir / "build" / "bin" / "llama-quantize").exists()
+    assert (install_dir / "build" / "bin" / "libllama.so").exists()
+    assert (install_dir / "llama-server").exists()
+    assert (install_dir / "llama-quantize").exists()
+    assert (install_dir / "UNSLOTH_PREBUILT_INFO.json").exists()
+    assert (install_dir / "BUILD_INFO.txt").exists()
+
+
+def test_validate_prebuilt_choice_creates_repo_shaped_windows_install(
+    tmp_path: Path, monkeypatch: pytest.MonkeyPatch
+):
+    upstream_tag = "b9997"
+    bundle_name = "app-b9997-windows-x64-cpu.zip"
+    source_archive = tmp_path / "source.tar.gz"
+    bundle_archive = tmp_path / "bundle.zip"
+    with tarfile.open(source_archive, "w:gz") as archive:
+        add_bytes_to_tar(
+            archive,
+            f"llama.cpp-{upstream_tag}/CMakeLists.txt",
+            b"cmake_minimum_required(VERSION 3.14)\n",
+        )
+        add_bytes_to_tar(
+            archive,
+            f"llama.cpp-{upstream_tag}/convert_hf_to_gguf.py",
+            b"#!/usr/bin/env python3\nimport gguf\n",
+        )
+        add_bytes_to_tar(
+            archive,
+            f"llama.cpp-{upstream_tag}/gguf-py/gguf/__init__.py",
+            b"__all__ = []\n",
+        )
+    with zipfile.ZipFile(bundle_archive, "w") as archive:
+        archive.writestr("llama-server.exe", b"MZ")
+        archive.writestr("llama-quantize.exe", b"MZ")
+        archive.writestr("llama.dll", b"DLL")
+        archive.writestr("BUILD_INFO.txt", b"bundle metadata\n")
+
+    source_urls = set(INSTALL_LLAMA_PREBUILT.upstream_source_archive_urls(upstream_tag))
+
+    def fake_download_file(url: str, destination: Path) -> None:
+        if url in source_urls:
+            destination.write_bytes(source_archive.read_bytes())
+            return
+        if url == "file://bundle.zip":
+            destination.write_bytes(bundle_archive.read_bytes())
+            return
+        raise AssertionError(f"unexpected download url: {url}")
+
+    monkeypatch.setattr(INSTALL_LLAMA_PREBUILT, "download_file", fake_download_file)
+    monkeypatch.setattr(
+        INSTALL_LLAMA_PREBUILT,
+        "download_bytes",
+        lambda url, **_: b"#!/usr/bin/env python3\nimport gguf\n",
+    )
+    monkeypatch.setattr(
+        INSTALL_LLAMA_PREBUILT,
+        "preflight_linux_installed_binaries",
+        lambda *args, **kwargs: None,
+    )
+    monkeypatch.setattr(
+        INSTALL_LLAMA_PREBUILT, "validate_quantize", lambda *args, **kwargs: None
+    )
+    monkeypatch.setattr(
+        INSTALL_LLAMA_PREBUILT, "validate_server", lambda *args, **kwargs: None
+    )
+
+    host = HostInfo(
+        system = "Windows",
+        machine = "AMD64",
+        is_windows = True,
+        is_linux = False,
+        is_macos = False,
+        is_x86_64 = True,
+        is_arm64 = False,
+        nvidia_smi = None,
+        driver_cuda_version = None,
+        compute_caps = [],
+        visible_cuda_devices = None,
+        has_physical_nvidia = False,
+        has_usable_nvidia = False,
+    )
+    choice = AssetChoice(
+        repo = "local",
+        tag = upstream_tag,
+        name = bundle_name,
+        url = "file://bundle.zip",
+        source_label = "local",
+        is_ready_bundle = True,
+        install_kind = "windows-cpu",
+        expected_sha256 = sha256_file(bundle_archive),
+    )
+
+    install_dir = tmp_path / "install"
+    work_dir = tmp_path / "work"
+    work_dir.mkdir()
+    probe_path = tmp_path / "stories260K.gguf"
+    quantized_path = tmp_path / "stories260K-q4.gguf"
+    validate_prebuilt_choice(
+        choice,
+        host,
+        install_dir,
+        work_dir,
+        probe_path,
+        requested_tag = upstream_tag,
+        llama_tag = upstream_tag,
+        approved_checksums = approved_checksums_for(
+            upstream_tag,
+            source_archive = source_archive,
+            bundle_archive = bundle_archive,
+            bundle_name = bundle_name,
+        ),
+        prebuilt_fallback_used = False,
+        quantized_path = quantized_path,
+    )
+
+    assert (install_dir / "gguf-py" / "gguf" / "__init__.py").exists()
+    assert (install_dir / "convert_hf_to_gguf.py").exists()
+    assert (install_dir / "build" / "bin" / "Release" / "llama-server.exe").exists()
+    assert (install_dir / "build" / "bin" / "Release" / "llama-quantize.exe").exists()
+    assert (install_dir / "build" / "bin" / "Release" / "llama.dll").exists()
+    assert not (install_dir / "llama-server.exe").exists()
+    assert (install_dir / "UNSLOTH_PREBUILT_INFO.json").exists()
+    assert (install_dir / "BUILD_INFO.txt").exists()
+
+
+def test_activate_install_tree_restores_existing_install_after_activation_failure(
+    tmp_path: Path,
+    monkeypatch: pytest.MonkeyPatch,
+    capsys: pytest.CaptureFixture[str],
+):
+    install_dir = tmp_path / "llama.cpp"
+    install_dir.mkdir()
+    (install_dir / "old.txt").write_text("old install\n")
+
+    staging_dir = create_install_staging_dir(install_dir)
+    (staging_dir / "new.txt").write_text("new install\n")
+
+    host = HostInfo(
+        system = "Linux",
+        machine = "x86_64",
+        is_windows = False,
+        is_linux = True,
+        is_macos = False,
+        is_x86_64 = True,
+        is_arm64 = False,
+        nvidia_smi = None,
+        driver_cuda_version = None,
+        compute_caps = [],
+        visible_cuda_devices = None,
+        has_physical_nvidia = False,
+        has_usable_nvidia = False,
+    )
+
+    monkeypatch.setattr(
+        INSTALL_LLAMA_PREBUILT,
+        "confirm_install_tree",
+        lambda *_args, **_kwargs: (_ for _ in ()).throw(
+            RuntimeError("activation confirm failed")
+        ),
+    )
+
+    with pytest.raises(
+        PrebuiltFallback,
+        match = "activation failed; restored previous install",
+    ):
+        activate_install_tree(staging_dir, install_dir, host)
+
+    assert (install_dir / "old.txt").read_text() == "old install\n"
+    assert not (install_dir / "new.txt").exists()
+    assert not staging_dir.exists()
+    assert not (tmp_path / ".staging").exists()
+
+    output = capsys.readouterr().out
+    assert "moving existing install to rollback path" in output
+    assert "restored previous install from rollback path" in output
+
+
+def test_activate_install_tree_cleans_all_paths_when_rollback_restore_fails(
+    tmp_path: Path,
+    monkeypatch: pytest.MonkeyPatch,
+    capsys: pytest.CaptureFixture[str],
+):
+    install_dir = tmp_path / "llama.cpp"
+    install_dir.mkdir()
+    (install_dir / "old.txt").write_text("old install\n")
+
+    staging_dir = create_install_staging_dir(install_dir)
+    (staging_dir / "new.txt").write_text("new install\n")
+
+    host = HostInfo(
+        system = "Linux",
+        machine = "x86_64",
+        is_windows = False,
+        is_linux = True,
+        is_macos = False,
+        is_x86_64 = True,
+        is_arm64 = False,
+        nvidia_smi = None,
+        driver_cuda_version = None,
+        compute_caps = [],
+        visible_cuda_devices = None,
+        has_physical_nvidia = False,
+        has_usable_nvidia = False,
+    )
+
+    monkeypatch.setattr(
+        INSTALL_LLAMA_PREBUILT,
+        "confirm_install_tree",
+        lambda *_args, **_kwargs: (_ for _ in ()).throw(
+            RuntimeError("activation confirm failed")
+        ),
+    )
+
+    original_replace = INSTALL_LLAMA_PREBUILT.os.replace
+
+    def flaky_replace(src, dst):
+        src_path = Path(src)
+        dst_path = Path(dst)
+        if "rollback-" in src_path.name and dst_path == install_dir:
+            raise OSError("restore failed")
+        return original_replace(src, dst)
+
+    monkeypatch.setattr(INSTALL_LLAMA_PREBUILT.os, "replace", flaky_replace)
+
+    with pytest.raises(
+        PrebuiltFallback,
+        match = "activation and rollback failed; cleaned install state for fresh source build",
+    ):
+        activate_install_tree(staging_dir, install_dir, host)
+
+    assert not install_dir.exists()
+    assert not staging_dir.exists()
+    assert not (tmp_path / ".staging").exists()
+
+    output = capsys.readouterr().out
+    assert "rollback after failed activation also failed: restore failed" in output
+    assert (
+        "cleaning staging, install, and rollback paths before source build fallback"
+        in output
+    )
+    assert "removing failed install path" in output
+    assert "removing rollback path" in output
+
+
+def test_binary_env_linux_includes_binary_parent_in_ld_library_path(
+    tmp_path: Path, monkeypatch: pytest.MonkeyPatch
+):
+    install_dir = tmp_path / "llama.cpp"
+    bin_dir = install_dir / "build" / "bin"
+    bin_dir.mkdir(parents = True)
+    binary_path = bin_dir / "llama-server"
+    binary_path.write_bytes(b"fake")
+
+    host = HostInfo(
+        system = "Linux",
+        machine = "x86_64",
+        is_windows = False,
+        is_linux = True,
+        is_macos = False,
+        is_x86_64 = True,
+        is_arm64 = False,
+        nvidia_smi = None,
+        driver_cuda_version = None,
+        compute_caps = [],
+        visible_cuda_devices = None,
+        has_physical_nvidia = False,
+        has_usable_nvidia = False,
+    )
+
+    monkeypatch.setattr(INSTALL_LLAMA_PREBUILT, "linux_runtime_dirs", lambda _bp: [])
+
+    env = binary_env(binary_path, install_dir, host)
+    ld_dirs = env["LD_LIBRARY_PATH"].split(os.pathsep)
+    assert (
+        str(bin_dir) in ld_dirs
+    ), f"binary_path.parent ({bin_dir}) must be in LD_LIBRARY_PATH, got: {ld_dirs}"
+    assert str(install_dir) in ld_dirs
+
+
+def io_bytes(data: bytes):
+    return io.BytesIO(data)
+
+
+def add_bytes_to_tar(
+    archive: tarfile.TarFile, name: str, data: bytes, *, mode: int = 0o644
+) -> None:
+    info = tarfile.TarInfo(name)
+    info.size = len(data)
+    info.mode = mode
+    archive.addfile(info, io_bytes(data))
+
+
+def add_symlink_to_tar(archive: tarfile.TarFile, name: str, target: str) -> None:
+    info = tarfile.TarInfo(name)
+    info.type = tarfile.SYMTYPE
+    info.linkname = target
+    archive.addfile(info)
--- a/tests/studio/install/test_pr4562_bugfixes.py
+++ b/tests/studio/install/test_pr4562_bugfixes.py
@ -0,0 +1,687 @@
+"""
+Comprehensive tests for PR #4562 bug fixes.
+
+Tests cover:
+  - Bug 1: PS1 detached HEAD on re-run (fetch + checkout -B pattern)
+  - Bug 2: Source-build fallback ignores pinned tag (both .sh and .ps1)
+  - Bug 3: Unix fallback deletes install before checking prerequisites
+  - Bug 4: Linux LD_LIBRARY_PATH missing build/bin
+  - "latest" tag resolution fallback chain (Unsloth -> ggml-org -> raw)
+  - Cross-platform binary_env (Linux, macOS, Windows)
+  - Edge cases: malformed JSON, empty responses, env overrides
+
+Run: pytest tests/studio/install/test_pr4562_bugfixes.py -v
+"""
+
+import importlib.util
+import json
+import os
+import subprocess
+import sys
+import textwrap
+from pathlib import Path
+from unittest.mock import patch
+
+import pytest
+
+# ---------------------------------------------------------------------------
+# Load the module under test (same pattern as existing test files)
+# ---------------------------------------------------------------------------
+PACKAGE_ROOT = Path(__file__).resolve().parents[3]
+MODULE_PATH = PACKAGE_ROOT / "studio" / "install_llama_prebuilt.py"
+SPEC = importlib.util.spec_from_file_location(
+    "studio_install_llama_prebuilt", MODULE_PATH
+)
+assert SPEC is not None and SPEC.loader is not None
+MOD = importlib.util.module_from_spec(SPEC)
+sys.modules[SPEC.name] = MOD
+SPEC.loader.exec_module(MOD)
+
+binary_env = MOD.binary_env
+HostInfo = MOD.HostInfo
+resolve_requested_llama_tag = MOD.resolve_requested_llama_tag
+
+SETUP_SH = PACKAGE_ROOT / "studio" / "setup.sh"
+SETUP_PS1 = PACKAGE_ROOT / "studio" / "setup.ps1"
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+def make_host(*, system: str) -> HostInfo:
+    """Create a HostInfo for the given OS."""
+    return HostInfo(
+        system = system,
+        machine = "x86_64" if system != "Darwin" else "arm64",
+        is_windows = (system == "Windows"),
+        is_linux = (system == "Linux"),
+        is_macos = (system == "Darwin"),
+        is_x86_64 = (system != "Darwin"),
+        is_arm64 = (system == "Darwin"),
+        nvidia_smi = None,
+        driver_cuda_version = None,
+        compute_caps = [],
+        visible_cuda_devices = None,
+        has_physical_nvidia = False,
+        has_usable_nvidia = False,
+    )
+
+
+BASH = "/bin/bash"
+
+
+def run_bash(script: str, *, timeout: int = 10, env: dict | None = None) -> str:
+    """Run a bash script fragment and return its stdout."""
+    run_env = os.environ.copy()
+    if env:
+        run_env.update(env)
+    result = subprocess.run(
+        [BASH, "-c", script],
+        capture_output = True,
+        text = True,
+        timeout = timeout,
+        env = run_env,
+    )
+    return result.stdout.strip()
+
+
+# =========================================================================
+# TEST GROUP A: binary_env across all platforms (Bug 4 + cross-platform)
+# =========================================================================
+class TestBinaryEnvCrossPlatform:
+    """Test that binary_env returns correct library paths for all OSes."""
+
+    def test_linux_includes_binary_parent_in_ld_library_path(
+        self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch
+    ):
+        install_dir = tmp_path / "llama.cpp"
+        bin_dir = install_dir / "build" / "bin"
+        bin_dir.mkdir(parents = True)
+        binary_path = bin_dir / "llama-server"
+        binary_path.write_bytes(b"fake")
+
+        host = make_host(system = "Linux")
+        monkeypatch.setattr(MOD, "linux_runtime_dirs", lambda _bp: [])
+
+        env = binary_env(binary_path, install_dir, host)
+        ld_dirs = env["LD_LIBRARY_PATH"].split(os.pathsep)
+        assert str(bin_dir) in ld_dirs, f"build/bin not in LD_LIBRARY_PATH: {ld_dirs}"
+        assert (
+            str(install_dir) in ld_dirs
+        ), f"install_dir not in LD_LIBRARY_PATH: {ld_dirs}"
+
+    def test_linux_binary_parent_comes_before_install_dir(
+        self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch
+    ):
+        """build/bin should be searched before install_dir for .so files."""
+        install_dir = tmp_path / "llama.cpp"
+        bin_dir = install_dir / "build" / "bin"
+        bin_dir.mkdir(parents = True)
+        binary_path = bin_dir / "llama-server"
+        binary_path.write_bytes(b"fake")
+
+        host = make_host(system = "Linux")
+        monkeypatch.setattr(MOD, "linux_runtime_dirs", lambda _bp: [])
+
+        env = binary_env(binary_path, install_dir, host)
+        ld_dirs = env["LD_LIBRARY_PATH"].split(os.pathsep)
+        bin_idx = ld_dirs.index(str(bin_dir))
+        install_idx = ld_dirs.index(str(install_dir))
+        assert (
+            bin_idx < install_idx
+        ), "binary_path.parent should come before install_dir"
+
+    def test_linux_deduplicates_when_binary_parent_equals_install_dir(
+        self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch
+    ):
+        """When binary is directly in install_dir, no duplicate entries."""
+        install_dir = tmp_path / "llama.cpp"
+        install_dir.mkdir(parents = True)
+        binary_path = install_dir / "llama-server"
+        binary_path.write_bytes(b"fake")
+
+        host = make_host(system = "Linux")
+        monkeypatch.setattr(MOD, "linux_runtime_dirs", lambda _bp: [])
+
+        env = binary_env(binary_path, install_dir, host)
+        ld_dirs = [d for d in env["LD_LIBRARY_PATH"].split(os.pathsep) if d]
+        count = ld_dirs.count(str(install_dir))
+        assert count == 1, f"install_dir appears {count} times in LD_LIBRARY_PATH"
+
+    def test_linux_preserves_existing_ld_library_path(
+        self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch
+    ):
+        install_dir = tmp_path / "llama.cpp"
+        bin_dir = install_dir / "build" / "bin"
+        bin_dir.mkdir(parents = True)
+        binary_path = bin_dir / "llama-server"
+        binary_path.write_bytes(b"fake")
+
+        # Create real directories so dedupe_existing_dirs keeps them
+        custom_lib = tmp_path / "custom_lib"
+        other_lib = tmp_path / "other_lib"
+        custom_lib.mkdir()
+        other_lib.mkdir()
+
+        host = make_host(system = "Linux")
+        monkeypatch.setattr(MOD, "linux_runtime_dirs", lambda _bp: [])
+        original = os.environ.get("LD_LIBRARY_PATH", "")
+        os.environ["LD_LIBRARY_PATH"] = f"{custom_lib}:{other_lib}"
+        try:
+            env = binary_env(binary_path, install_dir, host)
+        finally:
+            if original:
+                os.environ["LD_LIBRARY_PATH"] = original
+            else:
+                os.environ.pop("LD_LIBRARY_PATH", None)
+        ld_dirs = env["LD_LIBRARY_PATH"].split(os.pathsep)
+        assert str(custom_lib.resolve()) in ld_dirs
+        assert str(other_lib.resolve()) in ld_dirs
+
+    def test_windows_includes_binary_parent_in_path(
+        self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch
+    ):
+        install_dir = tmp_path / "llama.cpp"
+        bin_dir = install_dir / "build" / "bin" / "Release"
+        bin_dir.mkdir(parents = True)
+        binary_path = bin_dir / "llama-server.exe"
+        binary_path.write_bytes(b"MZ")
+
+        host = make_host(system = "Windows")
+        monkeypatch.setattr(
+            MOD, "windows_runtime_dirs_for_runtime_line", lambda _rt: []
+        )
+
+        env = binary_env(binary_path, install_dir, host)
+        path_dirs = env["PATH"].split(os.pathsep)
+        assert str(bin_dir) in path_dirs, f"build/bin/Release not in PATH: {path_dirs}"
+
+    def test_macos_sets_dyld_library_path(
+        self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch
+    ):
+        install_dir = tmp_path / "llama.cpp"
+        install_dir.mkdir(parents = True)
+        bin_dir = install_dir / "build" / "bin"
+        binary_path = bin_dir / "llama-server"
+        binary_path.parent.mkdir(parents = True)
+        binary_path.write_bytes(b"fake")
+
+        host = make_host(system = "Darwin")
+        monkeypatch.delenv("DYLD_LIBRARY_PATH", raising = False)
+
+        env = binary_env(binary_path, install_dir, host)
+        dyld_parts = [p for p in env["DYLD_LIBRARY_PATH"].split(os.pathsep) if p]
+        assert (
+            str(bin_dir) in dyld_parts
+        ), f"build/bin not in DYLD_LIBRARY_PATH: {dyld_parts}"
+        assert (
+            str(install_dir) in dyld_parts
+        ), f"install_dir not in DYLD_LIBRARY_PATH: {dyld_parts}"
+        # binary_path.parent (build/bin) should come before install_dir
+        assert dyld_parts.index(str(bin_dir)) < dyld_parts.index(str(install_dir))
+
+
+# =========================================================================
+# TEST GROUP B: resolve_requested_llama_tag (Python function)
+# =========================================================================
+class TestResolveRequestedLlamaTag:
+    def test_concrete_tag_passes_through(self):
+        assert resolve_requested_llama_tag("b8508") == "b8508"
+
+    def test_none_resolves_to_latest(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setattr(MOD, "latest_upstream_release_tag", lambda: "b9999")
+        assert resolve_requested_llama_tag(None) == "b9999"
+
+    def test_latest_resolves_to_upstream(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setattr(MOD, "latest_upstream_release_tag", lambda: "b1234")
+        assert resolve_requested_llama_tag("latest") == "b1234"
+
+    def test_empty_string_resolves_to_latest(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setattr(MOD, "latest_upstream_release_tag", lambda: "b5555")
+        assert resolve_requested_llama_tag("") == "b5555"
+
+
+# =========================================================================
+# TEST GROUP C: setup.sh logic (bash subprocess tests)
+# =========================================================================
+class TestSetupShLogic:
+    """Test setup.sh fragments via bash subprocess with controlled PATH."""
+
+    def test_cmake_missing_preserves_install(self, tmp_path: Path):
+        """Bug 3: When cmake is missing, rm -rf should NOT run."""
+        llama_dir = tmp_path / "llama.cpp"
+        llama_dir.mkdir()
+        marker = llama_dir / "marker.txt"
+        marker.write_text("existing")
+
+        mock_bin = tmp_path / "mock_bin"
+        mock_bin.mkdir()
+        # Create mock git but NOT cmake
+        (mock_bin / "git").write_text("#!/bin/bash\nexit 0\n")
+        (mock_bin / "git").chmod(0o755)
+
+        # Build PATH: mock_bin first, then system dirs WITHOUT cmake
+        safe_dirs = [str(mock_bin)]
+        for d in os.environ.get("PATH", "").split(":"):
+            if d and not os.path.isfile(os.path.join(d, "cmake")):
+                safe_dirs.append(d)
+
+        script = textwrap.dedent(f"""\
+            export LLAMA_CPP_DIR="{llama_dir}"
+            if ! command -v cmake &>/dev/null; then
+                echo "cmake_missing"
+            elif ! command -v git &>/dev/null; then
+                echo "git_missing"
+            else
+                rm -rf "$LLAMA_CPP_DIR"
+                echo "would_clone"
+            fi
+        """)
+        output = run_bash(script, env = {"PATH": ":".join(safe_dirs)})
+        assert "cmake_missing" in output
+        assert marker.exists(), "Install dir was deleted despite cmake missing!"
+
+    def test_git_missing_preserves_install(self, tmp_path: Path):
+        """Bug 3: When git is missing, rm -rf should NOT run."""
+        llama_dir = tmp_path / "llama.cpp"
+        llama_dir.mkdir()
+        marker = llama_dir / "marker.txt"
+        marker.write_text("existing")
+
+        mock_bin = tmp_path / "mock_bin"
+        mock_bin.mkdir()
+        # Create mock cmake but NOT git
+        (mock_bin / "cmake").write_text("#!/bin/bash\nexit 0\n")
+        (mock_bin / "cmake").chmod(0o755)
+
+        # Build PATH: mock_bin first, then system dirs WITHOUT git
+        safe_dirs = [str(mock_bin)]
+        for d in os.environ.get("PATH", "").split(":"):
+            if d and not os.path.isfile(os.path.join(d, "git")):
+                safe_dirs.append(d)
+
+        script = textwrap.dedent(f"""\
+            export LLAMA_CPP_DIR="{llama_dir}"
+            if ! command -v cmake &>/dev/null; then
+                echo "cmake_missing"
+            elif ! command -v git &>/dev/null; then
+                echo "git_missing"
+            else
+                rm -rf "$LLAMA_CPP_DIR"
+                echo "would_clone"
+            fi
+        """)
+        output = run_bash(script, env = {"PATH": ":".join(safe_dirs)})
+        assert "git_missing" in output
+        assert marker.exists(), "Install dir was deleted despite git missing!"
+
+    def test_both_present_runs_rm_and_clone(self, tmp_path: Path):
+        """Bug 3: When both present, rm -rf runs before clone."""
+        llama_dir = tmp_path / "llama.cpp"
+        llama_dir.mkdir()
+        marker = llama_dir / "marker.txt"
+        marker.write_text("existing")
+
+        mock_bin = tmp_path / "mock_bin"
+        mock_bin.mkdir()
+        (mock_bin / "cmake").write_text("#!/bin/bash\nexit 0\n")
+        (mock_bin / "cmake").chmod(0o755)
+        (mock_bin / "git").write_text("#!/bin/bash\nexit 0\n")
+        (mock_bin / "git").chmod(0o755)
+
+        script = textwrap.dedent(f"""\
+            export PATH="{mock_bin}:$PATH"
+            export LLAMA_CPP_DIR="{llama_dir}"
+            if ! command -v cmake &>/dev/null; then
+                echo "cmake_missing"
+            elif ! command -v git &>/dev/null; then
+                echo "git_missing"
+            else
+                rm -rf "$LLAMA_CPP_DIR"
+                echo "would_clone"
+            fi
+        """)
+        output = run_bash(script)
+        assert "would_clone" in output
+        assert not marker.exists(), "Install dir should have been deleted"
+
+    def test_clone_uses_pinned_tag(self, tmp_path: Path):
+        """Bug 2: git clone should use --branch with the resolved tag."""
+        mock_bin = tmp_path / "mock_bin"
+        mock_bin.mkdir()
+        log_file = tmp_path / "git_calls.log"
+        (mock_bin / "git").write_text(f'#!/bin/bash\necho "$*" >> {log_file}\nexit 0\n')
+        (mock_bin / "git").chmod(0o755)
+
+        script = textwrap.dedent(f"""\
+            export PATH="{mock_bin}:$PATH"
+            git clone --depth 1 --branch "b8508" https://github.com/ggml-org/llama.cpp.git /tmp/llama_test
+        """)
+        run_bash(script)
+        log = log_file.read_text()
+        assert "--branch b8508" in log, f"Expected --branch b8508 in: {log}"
+
+    def test_fetch_checkout_b_pattern(self, tmp_path: Path):
+        """Bug 1: Re-run should use fetch + checkout -B, not pull + checkout FETCH_HEAD."""
+        mock_bin = tmp_path / "mock_bin"
+        mock_bin.mkdir()
+        log_file = tmp_path / "git_calls.log"
+        (mock_bin / "git").write_text(f'#!/bin/bash\necho "$*" >> {log_file}\nexit 0\n')
+        (mock_bin / "git").chmod(0o755)
+
+        llama_dir = tmp_path / "llama.cpp"
+        llama_dir.mkdir()
+        (llama_dir / ".git").mkdir()
+
+        script = textwrap.dedent(f"""\
+            export PATH="{mock_bin}:$PATH"
+            LlamaCppDir="{llama_dir}"
+            ResolvedLlamaTag="b8508"
+            if [ -d "$LlamaCppDir/.git" ]; then
+                git -C "$LlamaCppDir" fetch --depth 1 origin "$ResolvedLlamaTag"
+                if [ $? -ne 0 ]; then
+                    echo "WARN: fetch failed"
+                else
+                    git -C "$LlamaCppDir" checkout -B unsloth-llama-build FETCH_HEAD
+                fi
+            fi
+        """)
+        run_bash(script)
+        log = log_file.read_text()
+        assert "fetch --depth 1 origin b8508" in log
+        assert "checkout -B unsloth-llama-build FETCH_HEAD" in log
+        assert "pull" not in log, "Should use fetch, not pull"
+
+    def test_fetch_failure_warns_not_aborts(self, tmp_path: Path):
+        """Bug 1: fetch failure should warn and continue, not set BuildOk=false."""
+        mock_bin = tmp_path / "mock_bin"
+        mock_bin.mkdir()
+        (mock_bin / "git").write_text(
+            '#!/bin/bash\nif echo "$*" | grep -q fetch; then exit 1; fi\nexit 0\n'
+        )
+        (mock_bin / "git").chmod(0o755)
+
+        llama_dir = tmp_path / "llama.cpp"
+        llama_dir.mkdir()
+        (llama_dir / ".git").mkdir()
+
+        script = textwrap.dedent(f"""\
+            export PATH="{mock_bin}:$PATH"
+            LlamaCppDir="{llama_dir}"
+            ResolvedLlamaTag="b8508"
+            BuildOk=true
+            if [ -d "$LlamaCppDir/.git" ]; then
+                git -C "$LlamaCppDir" fetch --depth 1 origin "$ResolvedLlamaTag"
+                if [ $? -ne 0 ]; then
+                    echo "WARN: fetch failed -- using existing source"
+                else
+                    git -C "$LlamaCppDir" checkout -B unsloth-llama-build FETCH_HEAD
+                fi
+            fi
+            echo "BuildOk=$BuildOk"
+        """)
+        output = run_bash(script)
+        assert "WARN: fetch failed" in output
+        assert "BuildOk=true" in output
+
+
+# =========================================================================
+# TEST GROUP D: "latest" tag resolution (bash subprocess)
+# =========================================================================
+class TestLatestTagResolution:
+    """Test the fallback chain: Unsloth API -> ggml-org API -> raw."""
+
+    RESOLVE_TEMPLATE = textwrap.dedent("""\
+        export PATH="{mock_bin}:$PATH"
+        _REQUESTED_LLAMA_TAG="{requested_tag}"
+        _RESOLVED_LLAMA_TAG=""
+        _RESOLVE_UPSTREAM_STATUS=1
+        _HELPER_RELEASE_REPO="unslothai/llama.cpp"
+        if [ "$_RESOLVE_UPSTREAM_STATUS" -ne 0 ] || [ -z "$_RESOLVED_LLAMA_TAG" ]; then
+            if [ "$_REQUESTED_LLAMA_TAG" = "latest" ]; then
+                _RESOLVED_LLAMA_TAG="$(curl -fsSL "https://api.github.com/repos/${{_HELPER_RELEASE_REPO}}/releases/latest" 2>/dev/null | python -c "import sys,json; print(json.load(sys.stdin)['tag_name'])" 2>/dev/null)" || _RESOLVED_LLAMA_TAG=""
+                if [ -z "$_RESOLVED_LLAMA_TAG" ]; then
+                    _RESOLVED_LLAMA_TAG="$(curl -fsSL https://api.github.com/repos/ggml-org/llama.cpp/releases/latest 2>/dev/null | python -c "import sys,json; print(json.load(sys.stdin)['tag_name'])" 2>/dev/null)" || _RESOLVED_LLAMA_TAG=""
+                fi
+            fi
+            if [ -z "$_RESOLVED_LLAMA_TAG" ]; then
+                _RESOLVED_LLAMA_TAG="$_REQUESTED_LLAMA_TAG"
+            fi
+        fi
+        echo "$_RESOLVED_LLAMA_TAG"
+    """)
+
+    @staticmethod
+    def _make_curl_mock(
+        mock_bin: Path, unsloth_response: str | None, ggml_response: str | None
+    ):
+        """Create a curl mock that returns different responses per repo."""
+        lines = ["#!/bin/bash"]
+        if unsloth_response is not None:
+            lines.append(
+                f'if echo "$*" | grep -q "unslothai/llama.cpp"; then echo \'{unsloth_response}\'; exit 0; fi'
+            )
+        else:
+            lines.append(
+                'if echo "$*" | grep -q "unslothai/llama.cpp"; then exit 1; fi'
+            )
+        if ggml_response is not None:
+            lines.append(
+                f'if echo "$*" | grep -q "ggml-org/llama.cpp"; then echo \'{ggml_response}\'; exit 0; fi'
+            )
+        else:
+            lines.append('if echo "$*" | grep -q "ggml-org/llama.cpp"; then exit 1; fi')
+        lines.append("exit 1")
+        curl_path = mock_bin / "curl"
+        curl_path.write_text("\n".join(lines) + "\n")
+        curl_path.chmod(0o755)
+
+    def _run_resolve(
+        self,
+        tmp_path: Path,
+        requested_tag: str,
+        unsloth_resp: str | None,
+        ggml_resp: str | None,
+    ) -> str:
+        mock_bin = tmp_path / "mock_bin"
+        mock_bin.mkdir(exist_ok = True)
+        self._make_curl_mock(mock_bin, unsloth_resp, ggml_resp)
+        script = self.RESOLVE_TEMPLATE.format(
+            mock_bin = mock_bin, requested_tag = requested_tag
+        )
+        return run_bash(script)
+
+    def test_unsloth_succeeds(self, tmp_path: Path):
+        output = self._run_resolve(
+            tmp_path,
+            "latest",
+            unsloth_resp = '{"tag_name":"b8508"}',
+            ggml_resp = '{"tag_name":"b9000"}',
+        )
+        assert output == "b8508"
+
+    def test_unsloth_fails_ggml_succeeds(self, tmp_path: Path):
+        output = self._run_resolve(
+            tmp_path,
+            "latest",
+            unsloth_resp = None,
+            ggml_resp = '{"tag_name":"b9000"}',
+        )
+        assert output == "b9000"
+
+    def test_both_fail_raw_fallback(self, tmp_path: Path):
+        output = self._run_resolve(
+            tmp_path,
+            "latest",
+            unsloth_resp = None,
+            ggml_resp = None,
+        )
+        assert output == "latest"
+
+    def test_concrete_tag_passes_through(self, tmp_path: Path):
+        output = self._run_resolve(
+            tmp_path,
+            "b7777",
+            unsloth_resp = '{"tag_name":"b8508"}',
+            ggml_resp = '{"tag_name":"b9000"}',
+        )
+        assert output == "b7777"
+
+    def test_unsloth_malformed_json_falls_through(self, tmp_path: Path):
+        output = self._run_resolve(
+            tmp_path,
+            "latest",
+            unsloth_resp = '{"bad_key":"no_tag"}',
+            ggml_resp = '{"tag_name":"b9001"}',
+        )
+        assert output == "b9001"
+
+    def test_both_malformed_json_raw_fallback(self, tmp_path: Path):
+        output = self._run_resolve(
+            tmp_path,
+            "latest",
+            unsloth_resp = '{"bad":"data"}',
+            ggml_resp = '{"also":"bad"}',
+        )
+        assert output == "latest"
+
+    def test_unsloth_empty_body_falls_through(self, tmp_path: Path):
+        output = self._run_resolve(
+            tmp_path,
+            "latest",
+            unsloth_resp = "",
+            ggml_resp = '{"tag_name":"b7000"}',
+        )
+        assert output == "b7000"
+
+    def test_unsloth_empty_tag_name_falls_through(self, tmp_path: Path):
+        output = self._run_resolve(
+            tmp_path,
+            "latest",
+            unsloth_resp = '{"tag_name":""}',
+            ggml_resp = '{"tag_name":"b6000"}',
+        )
+        assert output == "b6000"
+
+    def test_env_override_unsloth_llama_tag(self):
+        output = run_bash(
+            'echo "${UNSLOTH_LLAMA_TAG:-latest}"',
+            env = {"UNSLOTH_LLAMA_TAG": "b1234"},
+        )
+        assert output == "b1234"
+
+    def test_env_unset_defaults_to_latest(self):
+        env = os.environ.copy()
+        env.pop("UNSLOTH_LLAMA_TAG", None)
+        output = run_bash('echo "${UNSLOTH_LLAMA_TAG:-latest}"', env = env)
+        assert output == "latest"
+
+    def test_env_empty_defaults_to_latest(self):
+        output = run_bash(
+            'echo "${UNSLOTH_LLAMA_TAG:-latest}"',
+            env = {"UNSLOTH_LLAMA_TAG": ""},
+        )
+        assert output == "latest"
+
+
+# =========================================================================
+# TEST GROUP E: Source file verification
+# =========================================================================
+class TestSourceCodePatterns:
+    """Verify the actual source files contain the expected fix patterns."""
+
+    def test_setup_sh_no_rm_before_prereq_check(self):
+        """rm -rf must appear AFTER cmake/git checks, not before."""
+        content = SETUP_SH.read_text()
+        # Find the source-build block
+        idx_else = content.find("# Check prerequisites")
+        assert idx_else != -1
+        block = content[idx_else:]
+        # rm -rf should appear after the cmake/git checks
+        idx_cmake = block.find("command -v cmake")
+        idx_git = block.find("command -v git")
+        idx_rm = block.find("rm -rf")
+        assert idx_rm > idx_cmake, "rm -rf should come after cmake check"
+        assert idx_rm > idx_git, "rm -rf should come after git check"
+
+    def test_setup_sh_clone_uses_branch_tag(self):
+        """git clone in source-build should use --branch via _CLONE_BRANCH_ARGS."""
+        content = SETUP_SH.read_text()
+        # The clone line should use _CLONE_BRANCH_ARGS (which conditionally includes --branch)
+        assert (
+            "_CLONE_BRANCH_ARGS" in content
+        ), "Clone should use _CLONE_BRANCH_ARGS array"
+        assert (
+            '--branch "$_RESOLVED_LLAMA_TAG"' in content
+        ), "_CLONE_BRANCH_ARGS should be set to --branch $_RESOLVED_LLAMA_TAG"
+        # Verify the guard: --branch is only used when tag is not "latest"
+        assert (
+            '_RESOLVED_LLAMA_TAG" != "latest"' in content
+        ), "Should guard against literal 'latest' tag"
+
+    def test_setup_sh_latest_resolution_queries_unsloth_first(self):
+        """The Unsloth repo should be queried before ggml-org."""
+        content = SETUP_SH.read_text()
+        idx_unsloth = content.find("_HELPER_RELEASE_REPO}/releases/latest")
+        idx_ggml = content.find("ggml-org/llama.cpp/releases/latest")
+        assert idx_unsloth != -1, "Unsloth API query not found"
+        assert idx_ggml != -1, "ggml-org API query not found"
+        assert idx_unsloth < idx_ggml, "Unsloth should be queried before ggml-org"
+
+    def test_setup_ps1_uses_checkout_b(self):
+        """PS1 should use checkout -B, not checkout --force FETCH_HEAD."""
+        content = SETUP_PS1.read_text()
+        assert "checkout -B unsloth-llama-build" in content
+        assert "checkout --force FETCH_HEAD" not in content
+
+    def test_setup_ps1_clone_uses_branch_tag(self):
+        """PS1 clone should use --branch with the resolved tag."""
+        content = SETUP_PS1.read_text()
+        assert "--branch" in content and "$ResolvedLlamaTag" in content
+        # The old commented-out line should be gone
+        assert "# git clone --depth 1 --branch" not in content
+
+    def test_setup_ps1_no_git_pull(self):
+        """PS1 should use fetch, not pull (which fails in detached HEAD)."""
+        content = SETUP_PS1.read_text()
+        # In the source-build section, there should be no "git pull"
+        # (git pull is only valid on a branch)
+        lines = content.splitlines()
+        for i, line in enumerate(lines):
+            stripped = line.strip()
+            if "git pull" in stripped and not stripped.startswith("#"):
+                # Check context -- should not be in the llama.cpp build section
+                # Allow git pull in other contexts
+                context = "\n".join(lines[max(0, i - 5) : i + 5])
+                if "LlamaCppDir" in context:
+                    pytest.fail(
+                        f"Found 'git pull' in llama.cpp build section at line {i+1}"
+                    )
+
+    def test_setup_ps1_latest_resolution_queries_unsloth_first(self):
+        """PS1 should query Unsloth repo before ggml-org."""
+        content = SETUP_PS1.read_text()
+        idx_unsloth = content.find("$HelperReleaseRepo/releases/latest")
+        idx_ggml = content.find("ggml-org/llama.cpp/releases/latest")
+        assert idx_unsloth != -1, "Unsloth API query not found in PS1"
+        assert idx_ggml != -1, "ggml-org API query not found in PS1"
+        assert idx_unsloth < idx_ggml, "Unsloth should be queried before ggml-org"
+
+    def test_binary_env_linux_has_binary_parent(self):
+        """The Linux branch of binary_env should include binary_path.parent."""
+        content = MODULE_PATH.read_text()
+        # Find the binary_env function
+        in_func = False
+        in_linux = False
+        found = False
+        for line in content.splitlines():
+            if "def binary_env(" in line:
+                in_func = True
+            elif in_func and line and not line[0].isspace() and "def " in line:
+                break
+            if in_func and "host.is_linux" in line:
+                in_linux = True
+            if in_linux and "binary_path.parent" in line:
+                found = True
+                break
+        assert found, "binary_path.parent not found in Linux branch of binary_env"
--- a/tests/studio/install/test_selection_logic.py
+++ b/tests/studio/install/test_selection_logic.py
@ -0,0 +1,903 @@
+"""Tests for binary selection logic in install_llama_prebuilt.py.
+
+Covers: normalize_compute_cap, normalize_compute_caps, parse_cuda_visible_devices,
+supports_explicit_visible_device_matching, select_visible_gpu_rows,
+compatible_linux_runtime_lines, pick_windows_cuda_runtime,
+compatible_windows_runtime_lines, runtime_line_from_cuda_version,
+apply_approved_hashes, linux_cuda_choice_from_release, windows_cuda_attempts,
+resolve_upstream_asset_choice.
+
+No GPU, no network, no torch required -- all I/O is monkeypatched.
+"""
+
+import importlib.util
+import sys
+from pathlib import Path
+
+import pytest
+
+
+PACKAGE_ROOT = Path(__file__).resolve().parents[3]
+MODULE_PATH = PACKAGE_ROOT / "studio" / "install_llama_prebuilt.py"
+SPEC = importlib.util.spec_from_file_location(
+    "studio_install_llama_prebuilt", MODULE_PATH
+)
+assert SPEC is not None and SPEC.loader is not None
+INSTALL_LLAMA_PREBUILT = importlib.util.module_from_spec(SPEC)
+sys.modules[SPEC.name] = INSTALL_LLAMA_PREBUILT
+SPEC.loader.exec_module(INSTALL_LLAMA_PREBUILT)
+
+HostInfo = INSTALL_LLAMA_PREBUILT.HostInfo
+AssetChoice = INSTALL_LLAMA_PREBUILT.AssetChoice
+PublishedLlamaArtifact = INSTALL_LLAMA_PREBUILT.PublishedLlamaArtifact
+PublishedReleaseBundle = INSTALL_LLAMA_PREBUILT.PublishedReleaseBundle
+ApprovedArtifactHash = INSTALL_LLAMA_PREBUILT.ApprovedArtifactHash
+ApprovedReleaseChecksums = INSTALL_LLAMA_PREBUILT.ApprovedReleaseChecksums
+PrebuiltFallback = INSTALL_LLAMA_PREBUILT.PrebuiltFallback
+LinuxCudaSelection = INSTALL_LLAMA_PREBUILT.LinuxCudaSelection
+UPSTREAM_REPO = INSTALL_LLAMA_PREBUILT.UPSTREAM_REPO
+
+normalize_compute_cap = INSTALL_LLAMA_PREBUILT.normalize_compute_cap
+normalize_compute_caps = INSTALL_LLAMA_PREBUILT.normalize_compute_caps
+parse_cuda_visible_devices = INSTALL_LLAMA_PREBUILT.parse_cuda_visible_devices
+supports_explicit_visible_device_matching = (
+    INSTALL_LLAMA_PREBUILT.supports_explicit_visible_device_matching
+)
+select_visible_gpu_rows = INSTALL_LLAMA_PREBUILT.select_visible_gpu_rows
+compatible_linux_runtime_lines = INSTALL_LLAMA_PREBUILT.compatible_linux_runtime_lines
+pick_windows_cuda_runtime = INSTALL_LLAMA_PREBUILT.pick_windows_cuda_runtime
+compatible_windows_runtime_lines = (
+    INSTALL_LLAMA_PREBUILT.compatible_windows_runtime_lines
+)
+runtime_line_from_cuda_version = INSTALL_LLAMA_PREBUILT.runtime_line_from_cuda_version
+apply_approved_hashes = INSTALL_LLAMA_PREBUILT.apply_approved_hashes
+linux_cuda_choice_from_release = INSTALL_LLAMA_PREBUILT.linux_cuda_choice_from_release
+windows_cuda_attempts = INSTALL_LLAMA_PREBUILT.windows_cuda_attempts
+resolve_upstream_asset_choice = INSTALL_LLAMA_PREBUILT.resolve_upstream_asset_choice
+
+
+# ---------------------------------------------------------------------------
+# Helper factories
+# ---------------------------------------------------------------------------
+
+
+def make_host(**overrides):
+    system = overrides.pop("system", "Linux")
+    machine = overrides.pop("machine", "x86_64")
+    defaults = dict(
+        system = system,
+        machine = machine,
+        is_linux = system == "Linux",
+        is_windows = system == "Windows",
+        is_macos = system == "Darwin",
+        is_x86_64 = machine.lower() in {"x86_64", "amd64"},
+        is_arm64 = machine.lower() in {"arm64", "aarch64"},
+        nvidia_smi = "/usr/bin/nvidia-smi",
+        driver_cuda_version = (12, 8),
+        compute_caps = ["86"],
+        visible_cuda_devices = None,
+        has_physical_nvidia = True,
+        has_usable_nvidia = True,
+    )
+    defaults.update(overrides)
+    return HostInfo(**defaults)
+
+
+def make_artifact(asset_name, **overrides):
+    defaults = dict(
+        asset_name = asset_name,
+        install_kind = "linux-cuda",
+        runtime_line = "cuda12",
+        coverage_class = "targeted",
+        supported_sms = ["75", "80", "86", "89", "90"],
+        min_sm = 75,
+        max_sm = 90,
+        bundle_profile = "cuda12-newer",
+        rank = 100,
+    )
+    defaults.update(overrides)
+    return PublishedLlamaArtifact(**defaults)
+
+
+def make_release(artifacts, **overrides):
+    defaults = dict(
+        repo = "unslothai/llama.cpp",
+        release_tag = "v1.0",
+        upstream_tag = "b8508",
+        assets = {a.asset_name: f"https://example.com/{a.asset_name}" for a in artifacts},
+        manifest_asset_name = "llama-prebuilt-manifest.json",
+        artifacts = artifacts,
+        selection_log = [],
+    )
+    defaults.update(overrides)
+    return PublishedReleaseBundle(**defaults)
+
+
+def make_checksums(asset_names):
+    return ApprovedReleaseChecksums(
+        repo = "unslothai/llama.cpp",
+        release_tag = "v1.0",
+        upstream_tag = "b8508",
+        source_commit = None,
+        artifacts = {
+            name: ApprovedArtifactHash(
+                asset_name = name,
+                sha256 = "a" * 64,
+                repo = "unslothai/llama.cpp",
+                kind = "prebuilt",
+            )
+            for name in asset_names
+        },
+    )
+
+
+def mock_linux_runtime(monkeypatch, lines):
+    dirs = {line: ["/usr/lib/stub"] for line in lines}
+    monkeypatch.setattr(
+        INSTALL_LLAMA_PREBUILT,
+        "detected_linux_runtime_lines",
+        lambda: (list(lines), dict(dirs)),
+    )
+
+
+def mock_windows_runtime(monkeypatch, lines):
+    dirs = {line: ["C:\\Windows\\System32"] for line in lines}
+    monkeypatch.setattr(
+        INSTALL_LLAMA_PREBUILT,
+        "detected_windows_runtime_lines",
+        lambda: (list(lines), dict(dirs)),
+    )
+
+
+# ===========================================================================
+# A. normalize_compute_cap
+# ===========================================================================
+
+
+class TestNormalizeComputeCap:
+    def test_dotted_86(self):
+        assert normalize_compute_cap("8.6") == "86"
+
+    def test_dotted_leading_zero(self):
+        assert normalize_compute_cap("07.05") == "75"
+
+    def test_already_normalized(self):
+        assert normalize_compute_cap("75") == "75"
+
+    def test_int_input(self):
+        assert normalize_compute_cap(86) == "86"
+
+    def test_empty_string(self):
+        assert normalize_compute_cap("") is None
+
+    def test_whitespace(self):
+        assert normalize_compute_cap("  ") is None
+
+    def test_non_numeric(self):
+        assert normalize_compute_cap("x.y") is None
+
+    def test_triple_part(self):
+        assert normalize_compute_cap("8.6.0") is None
+
+    def test_zero_minor(self):
+        assert normalize_compute_cap("9.0") == "90"
+
+
+# ===========================================================================
+# B. normalize_compute_caps
+# ===========================================================================
+
+
+class TestNormalizeComputeCaps:
+    def test_deduplication(self):
+        assert normalize_compute_caps(["8.6", "86", "8.6"]) == ["86"]
+
+    def test_numeric_sort(self):
+        assert normalize_compute_caps(["9.0", "7.5", "8.6"]) == ["75", "86", "90"]
+
+    def test_drops_invalid(self):
+        assert normalize_compute_caps(["8.6", "bad", "", "7.5"]) == ["75", "86"]
+
+    def test_empty_input(self):
+        assert normalize_compute_caps([]) == []
+
+
+# ===========================================================================
+# C. parse_cuda_visible_devices
+# ===========================================================================
+
+
+class TestParseCudaVisibleDevices:
+    def test_none(self):
+        assert parse_cuda_visible_devices(None) is None
+
+    def test_empty(self):
+        assert parse_cuda_visible_devices("") == []
+
+    def test_minus_one(self):
+        assert parse_cuda_visible_devices("-1") == []
+
+    def test_single(self):
+        assert parse_cuda_visible_devices("0") == ["0"]
+
+    def test_multi(self):
+        assert parse_cuda_visible_devices("0,1,2") == ["0", "1", "2"]
+
+    def test_whitespace_stripped(self):
+        assert parse_cuda_visible_devices(" 0 , 1 ") == ["0", "1"]
+
+
+# ===========================================================================
+# D. supports_explicit_visible_device_matching
+# ===========================================================================
+
+
+class TestSupportsExplicitVisibleDeviceMatching:
+    def test_all_digits(self):
+        assert supports_explicit_visible_device_matching(["0", "1", "2"]) is True
+
+    def test_gpu_prefix(self):
+        assert supports_explicit_visible_device_matching(["GPU-abc123"]) is True
+
+    def test_none(self):
+        assert supports_explicit_visible_device_matching(None) is False
+
+    def test_empty(self):
+        assert supports_explicit_visible_device_matching([]) is False
+
+    def test_mixed_invalid(self):
+        assert supports_explicit_visible_device_matching(["0", "MIG-device"]) is False
+
+
+# ===========================================================================
+# E. select_visible_gpu_rows
+# ===========================================================================
+
+
+class TestSelectVisibleGpuRows:
+    ROWS = [
+        ("0", "GPU-aaa", "8.6"),
+        ("1", "GPU-bbb", "7.5"),
+        ("2", "GPU-ccc", "8.9"),
+    ]
+
+    def test_none_returns_all(self):
+        assert select_visible_gpu_rows(self.ROWS, None) == list(self.ROWS)
+
+    def test_empty_returns_empty(self):
+        assert select_visible_gpu_rows(self.ROWS, []) == []
+
+    def test_filter_by_index(self):
+        result = select_visible_gpu_rows(self.ROWS, ["0", "2"])
+        assert result == [("0", "GPU-aaa", "8.6"), ("2", "GPU-ccc", "8.9")]
+
+    def test_filter_by_uuid_case_insensitive(self):
+        result = select_visible_gpu_rows(self.ROWS, ["gpu-bbb"])
+        assert result == [("1", "GPU-bbb", "7.5")]
+
+    def test_dedup_same_device(self):
+        result = select_visible_gpu_rows(self.ROWS, ["0", "0"])
+        assert result == [("0", "GPU-aaa", "8.6")]
+
+    def test_missing_token(self):
+        result = select_visible_gpu_rows(self.ROWS, ["99"])
+        assert result == []
+
+
+# ===========================================================================
+# F. compatible_linux_runtime_lines
+# ===========================================================================
+
+
+class TestCompatibleLinuxRuntimeLines:
+    def test_no_driver(self):
+        host = make_host(driver_cuda_version = None)
+        assert compatible_linux_runtime_lines(host) == []
+
+    def test_driver_11_8(self):
+        host = make_host(driver_cuda_version = (11, 8))
+        assert compatible_linux_runtime_lines(host) == []
+
+    def test_driver_12_4(self):
+        host = make_host(driver_cuda_version = (12, 4))
+        assert compatible_linux_runtime_lines(host) == ["cuda12"]
+
+    def test_driver_13_0(self):
+        host = make_host(driver_cuda_version = (13, 0))
+        assert compatible_linux_runtime_lines(host) == ["cuda13", "cuda12"]
+
+
+# ===========================================================================
+# G. pick_windows_cuda_runtime + compatible_windows_runtime_lines
+# ===========================================================================
+
+
+class TestPickWindowsCudaRuntime:
+    def test_no_driver(self):
+        host = make_host(driver_cuda_version = None)
+        assert pick_windows_cuda_runtime(host) is None
+
+    def test_below_threshold(self):
+        host = make_host(driver_cuda_version = (12, 3))
+        assert pick_windows_cuda_runtime(host) is None
+
+    def test_driver_12_4(self):
+        host = make_host(driver_cuda_version = (12, 4))
+        assert pick_windows_cuda_runtime(host) == "12.4"
+
+    def test_driver_13_1(self):
+        host = make_host(driver_cuda_version = (13, 1))
+        assert pick_windows_cuda_runtime(host) == "13.1"
+
+
+class TestCompatibleWindowsRuntimeLines:
+    def test_no_driver(self):
+        host = make_host(driver_cuda_version = None)
+        assert compatible_windows_runtime_lines(host) == []
+
+    def test_driver_12_4(self):
+        host = make_host(driver_cuda_version = (12, 4))
+        assert compatible_windows_runtime_lines(host) == ["cuda12"]
+
+    def test_driver_13_1(self):
+        host = make_host(driver_cuda_version = (13, 1))
+        assert compatible_windows_runtime_lines(host) == ["cuda13", "cuda12"]
+
+
+# ===========================================================================
+# H. runtime_line_from_cuda_version
+# ===========================================================================
+
+
+class TestRuntimeLineFromCudaVersion:
+    def test_cuda_12(self):
+        assert runtime_line_from_cuda_version("12.6") == "cuda12"
+
+    def test_cuda_13(self):
+        assert runtime_line_from_cuda_version("13.0") == "cuda13"
+
+    def test_cuda_11(self):
+        assert runtime_line_from_cuda_version("11.8") is None
+
+    def test_none(self):
+        assert runtime_line_from_cuda_version(None) is None
+
+    def test_empty(self):
+        assert runtime_line_from_cuda_version("") is None
+
+
+# ===========================================================================
+# I. apply_approved_hashes
+# ===========================================================================
+
+
+class TestApplyApprovedHashes:
+    def _choice(self, name):
+        return AssetChoice(
+            repo = "test",
+            tag = "v1",
+            name = name,
+            url = f"https://x/{name}",
+            source_label = "test",
+        )
+
+    def test_both_approved(self):
+        c1, c2 = self._choice("a.tar.gz"), self._choice("b.tar.gz")
+        checksums = make_checksums(["a.tar.gz", "b.tar.gz"])
+        result = apply_approved_hashes([c1, c2], checksums)
+        assert len(result) == 2
+        assert all(c.expected_sha256 == "a" * 64 for c in result)
+
+    def test_one_approved(self):
+        c1, c2 = self._choice("a.tar.gz"), self._choice("missing.tar.gz")
+        checksums = make_checksums(["a.tar.gz"])
+        result = apply_approved_hashes([c1, c2], checksums)
+        assert len(result) == 1
+        assert result[0].name == "a.tar.gz"
+
+    def test_none_approved(self):
+        c1 = self._choice("missing.tar.gz")
+        checksums = make_checksums(["other.tar.gz"])
+        with pytest.raises(PrebuiltFallback, match = "approved checksum"):
+            apply_approved_hashes([c1], checksums)
+
+    def test_empty_input(self):
+        checksums = make_checksums(["a.tar.gz"])
+        with pytest.raises(PrebuiltFallback, match = "approved checksum"):
+            apply_approved_hashes([], checksums)
+
+
+# ===========================================================================
+# J. linux_cuda_choice_from_release -- core selection
+# ===========================================================================
+
+
+class TestLinuxCudaChoiceFromRelease:
+    # --- Runtime line resolution ---
+
+    def test_no_runtime_lines_detected(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, [])
+        host = make_host(driver_cuda_version = (12, 8))
+        art = make_artifact("bundle-cuda12.tar.gz")
+        release = make_release([art])
+        assert linux_cuda_choice_from_release(host, release) is None
+
+    def test_detected_lines_incompatible_with_driver(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda13"])
+        host = make_host(driver_cuda_version = (12, 4))
+        art = make_artifact("bundle-cuda13.tar.gz", runtime_line = "cuda13")
+        release = make_release([art])
+        assert linux_cuda_choice_from_release(host, release) is None
+
+    def test_driver_13_only_cuda12_detected(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(driver_cuda_version = (13, 0))
+        art = make_artifact("bundle-cuda12.tar.gz", runtime_line = "cuda12")
+        release = make_release([art])
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is not None
+        assert result.primary.runtime_line == "cuda12"
+
+    def test_preferred_runtime_line_reorders(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda13", "cuda12"])
+        host = make_host(driver_cuda_version = (13, 0))
+        art12 = make_artifact("bundle-cuda12.tar.gz", runtime_line = "cuda12")
+        art13 = make_artifact("bundle-cuda13.tar.gz", runtime_line = "cuda13")
+        release = make_release([art12, art13])
+        result = linux_cuda_choice_from_release(
+            host, release, preferred_runtime_line = "cuda12"
+        )
+        assert result is not None
+        assert result.primary.runtime_line == "cuda12"
+
+    def test_preferred_runtime_line_unavailable(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(driver_cuda_version = (12, 8))
+        art = make_artifact("bundle-cuda12.tar.gz", runtime_line = "cuda12")
+        release = make_release([art])
+        result = linux_cuda_choice_from_release(
+            host, release, preferred_runtime_line = "cuda13"
+        )
+        assert result is not None
+        assert result.primary.runtime_line == "cuda12"
+        log_entries = result.selection_log
+        assert any("unavailable_on_host" in entry for entry in log_entries)
+
+    # --- SM matching ---
+
+    def test_exact_sm_match(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(compute_caps = ["86"])
+        art = make_artifact(
+            "bundle.tar.gz", supported_sms = ["75", "86", "89"], min_sm = 75, max_sm = 89
+        )
+        release = make_release([art])
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is not None
+        assert result.primary.name == "bundle.tar.gz"
+
+    def test_sm_not_in_supported_sms(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(compute_caps = ["86"])
+        art = make_artifact(
+            "bundle.tar.gz", supported_sms = ["75", "80", "89"], min_sm = 75, max_sm = 89
+        )
+        release = make_release([art])
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is None
+
+    def test_sm_outside_min_range(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(compute_caps = ["50"])
+        art = make_artifact(
+            "bundle.tar.gz", supported_sms = ["50", "75", "86"], min_sm = 75, max_sm = 90
+        )
+        release = make_release([art])
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is None
+
+    def test_sm_outside_max_range(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(compute_caps = ["100"])
+        art = make_artifact(
+            "bundle.tar.gz", supported_sms = ["100", "75", "86"], min_sm = 75, max_sm = 90
+        )
+        release = make_release([art])
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is None
+
+    def test_very_old_sm(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(compute_caps = ["50"])
+        art = make_artifact("bundle.tar.gz", min_sm = 75, max_sm = 90)
+        release = make_release([art])
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is None
+
+    def test_very_new_sm(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(compute_caps = ["100"])
+        art = make_artifact("bundle.tar.gz", min_sm = 75, max_sm = 90)
+        release = make_release([art])
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is None
+
+    # --- Unknown compute caps (empty list) ---
+
+    def test_unknown_caps_only_portable(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(compute_caps = [])
+        targeted = make_artifact("targeted.tar.gz", coverage_class = "targeted")
+        portable = make_artifact("portable.tar.gz", coverage_class = "portable")
+        release = make_release([targeted, portable])
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is not None
+        assert result.primary.name == "portable.tar.gz"
+
+    def test_unknown_caps_no_portable(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(compute_caps = [])
+        targeted = make_artifact("targeted.tar.gz", coverage_class = "targeted")
+        release = make_release([targeted])
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is None
+
+    # --- Multi-GPU ---
+
+    def test_multi_gpu_all_covered(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(compute_caps = ["75", "89"])
+        art = make_artifact(
+            "bundle.tar.gz",
+            supported_sms = ["75", "80", "86", "89", "90"],
+            min_sm = 75,
+            max_sm = 90,
+        )
+        release = make_release([art])
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is not None
+
+    def test_multi_gpu_not_all_covered(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(compute_caps = ["50", "89"])
+        art = make_artifact(
+            "bundle.tar.gz", supported_sms = ["75", "89"], min_sm = 75, max_sm = 89
+        )
+        release = make_release([art])
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is None
+
+    # --- Artifact selection priority ---
+
+    def test_narrowest_sm_range_wins(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(compute_caps = ["86"])
+        wide = make_artifact(
+            "wide.tar.gz",
+            supported_sms = ["75", "86", "90"],
+            min_sm = 75,
+            max_sm = 90,
+            rank = 100,
+        )
+        narrow = make_artifact(
+            "narrow.tar.gz",
+            supported_sms = ["80", "86", "89"],
+            min_sm = 80,
+            max_sm = 89,
+            rank = 100,
+        )
+        release = make_release([wide, narrow])
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is not None
+        assert result.primary.name == "narrow.tar.gz"
+
+    def test_range_tie_lower_rank_wins(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(compute_caps = ["86"])
+        high = make_artifact(
+            "high.tar.gz",
+            supported_sms = ["75", "86", "90"],
+            min_sm = 75,
+            max_sm = 90,
+            rank = 200,
+        )
+        low = make_artifact(
+            "low.tar.gz",
+            supported_sms = ["75", "86", "90"],
+            min_sm = 75,
+            max_sm = 90,
+            rank = 50,
+        )
+        release = make_release([high, low])
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is not None
+        assert result.primary.name == "low.tar.gz"
+
+    def test_targeted_preferred_portable_fallback(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(compute_caps = ["86"])
+        targeted = make_artifact("targeted.tar.gz", coverage_class = "targeted", rank = 100)
+        portable = make_artifact("portable.tar.gz", coverage_class = "portable", rank = 100)
+        release = make_release([targeted, portable])
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is not None
+        assert result.primary.name == "targeted.tar.gz"
+        assert len(result.attempts) == 2
+        assert result.attempts[1].name == "portable.tar.gz"
+
+    # --- Edge cases ---
+
+    def test_asset_missing_from_release_assets(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(compute_caps = ["86"])
+        art = make_artifact("bundle.tar.gz")
+        release = make_release([art], assets = {})
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is None
+
+    def test_artifact_empty_supported_sms(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(compute_caps = ["86"])
+        art = make_artifact("bundle.tar.gz", supported_sms = [])
+        release = make_release([art])
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is None
+
+    def test_artifact_missing_min_sm(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(compute_caps = ["86"])
+        art = make_artifact("bundle.tar.gz", min_sm = None, max_sm = 90)
+        release = make_release([art])
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is None
+
+    def test_artifact_missing_max_sm(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(compute_caps = ["86"])
+        art = make_artifact("bundle.tar.gz", min_sm = 75, max_sm = None)
+        release = make_release([art])
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is None
+
+    def test_no_linux_cuda_artifacts(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(compute_caps = ["86"])
+        art = make_artifact("bundle.tar.gz", install_kind = "windows-cuda")
+        release = make_release([art])
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is None
+
+    def test_empty_artifacts_list(self, monkeypatch):
+        mock_linux_runtime(monkeypatch, ["cuda12"])
+        host = make_host(compute_caps = ["86"])
+        release = make_release([])
+        result = linux_cuda_choice_from_release(host, release)
+        assert result is None
+
+
+# ===========================================================================
+# K. windows_cuda_attempts
+# ===========================================================================
+
+
+class TestWindowsCudaAttempts:
+    TAG = "b8508"
+
+    def _upstream(self, *runtime_versions):
+        assets = {}
+        for rv in runtime_versions:
+            name = f"llama-{self.TAG}-bin-win-cuda-{rv}-x64.zip"
+            assets[name] = f"https://example.com/{name}"
+        return assets
+
+    def test_driver_12_4_no_dlls_fallback(self, monkeypatch):
+        mock_windows_runtime(monkeypatch, [])
+        host = make_host(system = "Windows", machine = "AMD64", driver_cuda_version = (12, 4))
+        assets = self._upstream("12.4")
+        result = windows_cuda_attempts(host, self.TAG, assets, None)
+        assert len(result) == 1
+        assert result[0].runtime_line == "cuda12"
+
+    def test_driver_13_1_both_dlls(self, monkeypatch):
+        mock_windows_runtime(monkeypatch, ["cuda13", "cuda12"])
+        host = make_host(system = "Windows", machine = "AMD64", driver_cuda_version = (13, 1))
+        assets = self._upstream("13.1", "12.4")
+        result = windows_cuda_attempts(host, self.TAG, assets, None)
+        assert len(result) == 2
+        assert result[0].runtime_line == "cuda13"
+        assert result[1].runtime_line == "cuda12"
+
+    def test_preferred_reorders(self, monkeypatch):
+        mock_windows_runtime(monkeypatch, ["cuda13", "cuda12"])
+        host = make_host(system = "Windows", machine = "AMD64", driver_cuda_version = (13, 1))
+        assets = self._upstream("13.1", "12.4")
+        result = windows_cuda_attempts(host, self.TAG, assets, "cuda12")
+        assert len(result) == 2
+        assert result[0].runtime_line == "cuda12"
+
+    def test_preferred_unavailable(self, monkeypatch):
+        mock_windows_runtime(monkeypatch, ["cuda12"])
+        host = make_host(system = "Windows", machine = "AMD64", driver_cuda_version = (12, 4))
+        assets = self._upstream("12.4")
+        result = windows_cuda_attempts(host, self.TAG, assets, "cuda13")
+        assert len(result) == 1
+        assert result[0].runtime_line == "cuda12"
+
+    def test_detected_incompatible_with_driver(self, monkeypatch):
+        mock_windows_runtime(monkeypatch, ["cuda13"])
+        host = make_host(system = "Windows", machine = "AMD64", driver_cuda_version = (12, 4))
+        assets = self._upstream("12.4")
+        result = windows_cuda_attempts(host, self.TAG, assets, None)
+        assert len(result) == 1
+        assert result[0].runtime_line == "cuda12"
+
+    def test_driver_too_old(self, monkeypatch):
+        mock_windows_runtime(monkeypatch, [])
+        host = make_host(system = "Windows", machine = "AMD64", driver_cuda_version = (11, 8))
+        assets = self._upstream("12.4")
+        result = windows_cuda_attempts(host, self.TAG, assets, None)
+        assert result == []
+
+    def test_asset_missing_from_upstream(self, monkeypatch):
+        mock_windows_runtime(monkeypatch, ["cuda12"])
+        host = make_host(system = "Windows", machine = "AMD64", driver_cuda_version = (12, 4))
+        result = windows_cuda_attempts(host, self.TAG, {}, None)
+        assert result == []
+
+    def test_both_assets_present(self, monkeypatch):
+        mock_windows_runtime(monkeypatch, ["cuda13", "cuda12"])
+        host = make_host(system = "Windows", machine = "AMD64", driver_cuda_version = (13, 1))
+        assets = self._upstream("13.1", "12.4")
+        result = windows_cuda_attempts(host, self.TAG, assets, None)
+        assert len(result) == 2
+
+
+# ===========================================================================
+# L. resolve_upstream_asset_choice -- platform routing
+# ===========================================================================
+
+
+class TestResolveUpstreamAssetChoice:
+    TAG = "b8508"
+
+    def _mock_github_assets(self, monkeypatch, assets):
+        monkeypatch.setattr(
+            INSTALL_LLAMA_PREBUILT,
+            "github_release_assets",
+            lambda repo, tag: assets,
+        )
+
+    def test_linux_x86_64_cpu(self, monkeypatch):
+        name = f"llama-{self.TAG}-bin-ubuntu-x64.tar.gz"
+        self._mock_github_assets(monkeypatch, {name: f"https://x/{name}"})
+        host = make_host(
+            has_usable_nvidia = False, nvidia_smi = None, has_physical_nvidia = False
+        )
+        result = resolve_upstream_asset_choice(host, self.TAG)
+        assert result.install_kind == "linux-cpu"
+        assert result.name == name
+
+    def test_linux_cpu_missing(self, monkeypatch):
+        self._mock_github_assets(monkeypatch, {})
+        host = make_host(
+            has_usable_nvidia = False, nvidia_smi = None, has_physical_nvidia = False
+        )
+        with pytest.raises(PrebuiltFallback, match = "Linux CPU"):
+            resolve_upstream_asset_choice(host, self.TAG)
+
+    def test_windows_x86_64_cpu(self, monkeypatch):
+        name = f"llama-{self.TAG}-bin-win-cpu-x64.zip"
+        self._mock_github_assets(monkeypatch, {name: f"https://x/{name}"})
+        host = make_host(
+            system = "Windows",
+            machine = "AMD64",
+            has_usable_nvidia = False,
+            nvidia_smi = None,
+            has_physical_nvidia = False,
+        )
+        result = resolve_upstream_asset_choice(host, self.TAG)
+        assert result.install_kind == "windows-cpu"
+        assert result.name == name
+
+    def test_windows_cpu_missing(self, monkeypatch):
+        self._mock_github_assets(monkeypatch, {})
+        host = make_host(
+            system = "Windows",
+            machine = "AMD64",
+            has_usable_nvidia = False,
+            nvidia_smi = None,
+            has_physical_nvidia = False,
+        )
+        with pytest.raises(PrebuiltFallback, match = "Windows CPU"):
+            resolve_upstream_asset_choice(host, self.TAG)
+
+    def test_macos_arm64(self, monkeypatch):
+        name = f"llama-{self.TAG}-bin-macos-arm64.tar.gz"
+        self._mock_github_assets(monkeypatch, {name: f"https://x/{name}"})
+        host = make_host(
+            system = "Darwin",
+            machine = "arm64",
+            nvidia_smi = None,
+            driver_cuda_version = None,
+            compute_caps = [],
+            has_physical_nvidia = False,
+            has_usable_nvidia = False,
+        )
+        result = resolve_upstream_asset_choice(host, self.TAG)
+        assert result.install_kind == "macos-arm64"
+        assert result.name == name
+
+    def test_macos_arm64_missing(self, monkeypatch):
+        self._mock_github_assets(monkeypatch, {})
+        host = make_host(
+            system = "Darwin",
+            machine = "arm64",
+            nvidia_smi = None,
+            driver_cuda_version = None,
+            compute_caps = [],
+            has_physical_nvidia = False,
+            has_usable_nvidia = False,
+        )
+        with pytest.raises(PrebuiltFallback, match = "macOS arm64"):
+            resolve_upstream_asset_choice(host, self.TAG)
+
+    def test_macos_x86_64(self, monkeypatch):
+        name = f"llama-{self.TAG}-bin-macos-x64.tar.gz"
+        self._mock_github_assets(monkeypatch, {name: f"https://x/{name}"})
+        host = make_host(
+            system = "Darwin",
+            machine = "x86_64",
+            nvidia_smi = None,
+            driver_cuda_version = None,
+            compute_caps = [],
+            has_physical_nvidia = False,
+            has_usable_nvidia = False,
+        )
+        result = resolve_upstream_asset_choice(host, self.TAG)
+        assert result.install_kind == "macos-x64"
+        assert result.name == name
+
+    def test_linux_aarch64(self, monkeypatch):
+        self._mock_github_assets(monkeypatch, {})
+        host = make_host(
+            system = "Linux",
+            machine = "aarch64",
+            nvidia_smi = None,
+            driver_cuda_version = None,
+            compute_caps = [],
+            has_physical_nvidia = False,
+            has_usable_nvidia = False,
+        )
+        with pytest.raises(
+            PrebuiltFallback, match = "no prebuilt policy exists for Linux aarch64"
+        ):
+            resolve_upstream_asset_choice(host, self.TAG)
+
+    def test_windows_usable_nvidia_delegates(self, monkeypatch):
+        cuda_name = f"llama-{self.TAG}-bin-win-cuda-12.4-x64.zip"
+        self._mock_github_assets(monkeypatch, {cuda_name: f"https://x/{cuda_name}"})
+        mock_windows_runtime(monkeypatch, ["cuda12"])
+        monkeypatch.setattr(
+            INSTALL_LLAMA_PREBUILT,
+            "resolve_windows_cuda_choices",
+            lambda host, tag, assets: [
+                AssetChoice(
+                    repo = UPSTREAM_REPO,
+                    tag = tag,
+                    name = cuda_name,
+                    url = f"https://x/{cuda_name}",
+                    source_label = "upstream",
+                    install_kind = "windows-cuda",
+                    runtime_line = "cuda12",
+                )
+            ],
+        )
+        host = make_host(
+            system = "Windows",
+            machine = "AMD64",
+            driver_cuda_version = (12, 4),
+            has_usable_nvidia = True,
+        )
+        result = resolve_upstream_asset_choice(host, self.TAG)
+        assert result.install_kind == "windows-cuda"
+        assert result.name == cuda_name