Studio: Ollama support, recommended folders, Custom Folders UX polish (#5050)

* Studio: Ollama support, recommended folders, Custom Folders UX polish Backend: - Add _scan_ollama_dir that reads manifests/registry.ollama.ai/library/* and creates .gguf symlinks under <ollama_dir>/.studio_links/ pointing at the content-addressable blobs, so detect_gguf_model and llama-server -m work unchanged for Ollama models - Filter entries under .studio_links from the generic models/hf/lmstudio scanners to avoid duplicate rows and leaked internal paths in the UI - New GET /api/models/recommended-folders endpoint returning LM Studio and Ollama model directories that currently exist on the machine (OLLAMA_MODELS env var + standard paths, ~/.lmstudio/models, legacy LM Studio cache), used by the Custom Folders quick-add chips - detect_gguf_model now uses os.path.abspath instead of Path.resolve so the readable symlink name is preserved as display_name (e.g. qwen2.5-0.5b-Q4_K_M.gguf instead of sha256-abc...) - llama-server failure with a path under .studio_links or .cache/ollama surfaces a friendlier message ("Some Ollama models do not work with llama.cpp. Try a different model, or use this model directly through Ollama instead.") instead of the generic validation error Frontend: - ListLabel supports an optional leading icon and collapse toggle; used for Downloaded (download icon), Custom Folders (folder icon), and Recommended (star icon) - Custom Folders header gets folder icon on the left, and +, search, and chevron buttons on the right; chevron uses ml-auto so it aligns with the Downloaded and Recommended chevrons - New recommended folder chips render below the registered scan folders when there are unregistered well-known paths; one click adds them as a scan folder - Custom folder rows that are direct .gguf files (Ollama symlinks) load immediately via onSelect instead of opening the GGUF variant expander (which is for repos containing multiple quants, not single files) - When loading a direct .gguf file path, send max_seq_length = 0 so the backend uses the model's native context instead of the 4096 chat default (qwen2.5:0.5b now loads at 32768 instead of 4096) - New listRecommendedFolders() helper on the chat API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: log silent exceptions and support read-only Ollama dirs Replace silent except blocks in _scan_ollama_dir and the recommended-folders endpoint with narrower exception types plus debug or warning logs, so failures are diagnosable without hiding signal. Add _ollama_links_dir helper that falls back to a per-ollama-dir hashed namespace under Studio's own cache (~/.unsloth/studio/cache/ollama_links) when the Ollama models directory is read-only. Common for system installs at /usr/share/ollama/.ollama/models and /var/lib/ollama/.ollama/models where the Studio process has read but not write access. Previously the scanner returned an empty list in that case and Ollama models would silently not appear. The fallback preserves the .gguf suffix on symlink names so detect_gguf_model keeps recognising them. The prior "raw sha256 blob path" fallback would have missed the suffix check and failed to load. * Address review: detect mmproj next to symlink target for vision GGUFs Codex P1 on model_config.py:1012: when detect_gguf_model returns the symlink path (to preserve readable display names), detect_mmproj_file searched the symlink's parent directory instead of the target's. For vision GGUFs surfaced via Ollama's .studio_links/ -- where the weight file is symlinked but any mmproj sidecar lives next to the real blob -- mmproj was no longer detected, so the model was misclassified as text-only and llama-server would start without --mmproj. detect_mmproj_file now adds the resolved target's parent to the scan order when path is a symlink. Direct (non-symlink) .gguf paths are unchanged, so LM Studio and HF cache layouts keep working exactly as before. Verified with a fake layout reproducing the bug plus a regression check on a non-symlink LM Studio model. * Address review: support all Ollama namespaces and vision projector layers - Iterate over all directories under registry.ollama.ai/ instead of hardcoding the "library" namespace. Custom namespaces like "mradermacher/llama3" now get scanned and include the namespace prefix in display names, model IDs, and symlink names to avoid collisions. - Create companion -mmproj.gguf symlinks for Ollama vision models that have an "application/vnd.ollama.image.projector" layer, so detect_mmproj_file can find the projector alongside the model. - Extract symlink creation into _make_symlink helper to reduce duplication between model and projector paths. * Address review: move imports to top level and add scan limit - Move hashlib and json imports to the top of the file (PEP 8). - Remove inline `import json as _json` and `import hashlib` from function bodies, use the top-level imports directly. - Add `limit` parameter to `_scan_ollama_dir()` with early exit when the threshold is reached. - Pass `_MAX_MODELS_PER_FOLDER` into the scanner so it stops traversing once enough models are found. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: Windows fallback, all registry hosts, collision safety _make_link (formerly _make_symlink): - Falls back to os.link() hardlink when symlink_to() fails (Windows without Developer Mode), then to shutil.copy2 as last resort - Uses atomic os.replace via tmp file to avoid race window where the .gguf path is missing during rescan Scanner now handles all Ollama registry layouts: - Uses rglob over manifests/ instead of hardcoding registry.ollama.ai - Discovers hf.co/org/repo:tag and any other host, not just library/ - Filenames include a stable sha1 hash of the manifest path to prevent collisions between models that normalize to the same stem Per-model subdirectories under .studio_links/: - Each model's links live in their own hash-keyed subdirectory - detect_mmproj_file only sees the projector for that specific model, not siblings from other Ollama models Friendly Ollama error detection: - Now also matches ollama_links/ (the read-only fallback cache path) and model_identifier starting with "ollama/" Recommended folders: - Added os.access(R_OK | X_OK) check so unreadable system directories like /var/lib/ollama/.ollama/models are not advertised as chips * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: filter ollama_links from generic scanners The generic scanners (models_dir, hf_cache, lmstudio) already filter out .studio_links to avoid duplicate Ollama entries, but missed the ollama_links fallback cache directory used for read-only Ollama installs. Add it to the filter. * Address review: idempotent link creation and path-component filter _make_link: - Skip recreation when a valid link/copy already exists (samefile or matching size check). Prevents blocking the model-list API with multi-GB copies on repeated scans. - Use uuid4 instead of os.getpid() for tmp file names to avoid race conditions from concurrent scans. - Log cleanup errors instead of silently swallowing them. Path filter: - Use os.sep-bounded checks instead of bare substring match to avoid false positives on paths like "my.studio_links.backup/model.gguf". * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: drop copy fallback, targeted glob, robust path filter _make_link: - Drop shutil.copy2 fallback -- copying multi-GB GGUFs inside a sync API request would block the backend. Log a warning and skip the model when both symlink and hardlink fail. Scanner: - Replace rglob("*") with targeted glob patterns (*/*/* and */*/*/*) to avoid traversing unrelated subdirectories in large custom folders. Path filter: - Use Path.parts membership check instead of os.sep substring matching for robustness across platforms. Scan limit: - Skip _scan_ollama_dir when _generic already fills the per-folder cap. * Address review: sha256, top-level uuid import, Path.absolute() - Switch hashlib.sha1 to hashlib.sha256 for path hashing consistency. - Move uuid import to the top of the file instead of inside _make_link. - Replace os.path.abspath with Path.absolute() in detect_gguf_model to match the pathlib style used throughout the codebase. * Address review: fix stale comments (sha1, rglob, copy fallback) Update three docstrings/comments that still referenced the old implementation after recent changes: - sha1 comment now says "not a security boundary" (no hash name) - "rglob" -> "targeted glob patterns" - "file copies as a last resort" -> removed (copy fallback was dropped) * Address review: fix stale links, support all manifest depths, scope error _make_link: - Drop size-based idempotency shortcut that kept stale links after ollama pull updates a tag to a same-sized blob. Only samefile() is used now -- if the link doesn't point at the exact same inode, it gets replaced. Scanner: - Revert targeted glob back to rglob so deeper OCI-style repo names (5+ path segments) are not silently skipped. Ollama error: - Only show "Some Ollama models do not work with llama.cpp" when the server output contains GGUF compatibility hints (key not found, unknown architecture, failed to load). Unrelated failures like OOM or missing binaries now show the generic error instead of being misdiagnosed. --------- Co-authored-by: Daniel Han <info@unsloth.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: danielhanchen <michaelhan2050@gmail.com>
2026-04-21 13:37:39 +00:00 · 2026-04-16 08:24:08 -07:00 · 2026-04-16 08:24:08 -07:00 · 05ec0f110b
commit 05ec0f110b
parent ff23ce40b4
6 changed files with 500 additions and 35 deletions
--- a/studio/backend/core/inference/llama_cpp.py
+++ b/studio/backend/core/inference/llama_cpp.py
@ -1703,6 +1703,28 @@ class LlamaCppBackend:
            # Wait for llama-server to become healthy
            if not self._wait_for_health(timeout = 600.0):
                self._kill_process()
+                _gguf = gguf_path or ""
+                _is_ollama = (
+                    ".studio_links" in _gguf
+                    or os.sep + "ollama_links" + os.sep in _gguf
+                    or os.sep + ".cache" + os.sep + "ollama" + os.sep in _gguf
+                    or (self._model_identifier or "").startswith("ollama/")
+                )
+                # Only show the Ollama-specific message when the server
+                # output indicates a GGUF compatibility issue, not for
+                # unrelated failures like OOM or missing binaries.
+                if _is_ollama:
+                    _output = "\n".join(self._stdout_lines[-50:]).lower()
+                    _gguf_compat_hints = (
+                        "key not found",
+                        "unknown model architecture",
+                        "failed to load model",
+                    )
+                    if any(h in _output for h in _gguf_compat_hints):
+                        raise RuntimeError(
+                            "Some Ollama models do not work with llama.cpp. "
+                            "Try a different model, or use this model directly through Ollama instead."
+                        )
                raise RuntimeError(
                    "llama-server failed to start. "
                    "Check that the GGUF file is valid and you have enough memory."
--- a/studio/backend/routes/models.py
+++ b/studio/backend/routes/models.py
@ -5,8 +5,11 @@
 Model Management API routes
 """

+import hashlib
+import json
 import os
 import sys
+import uuid
 from pathlib import Path
 from fastapi import APIRouter, Body, Depends, HTTPException, Query
 from typing import List, Optional
@ -411,6 +414,267 @@ def _scan_lmstudio_dir(lm_dir: Path) -> List[LocalModelInfo]:
    return found


+def _ollama_links_dir(ollama_dir: Path) -> Optional[Path]:
+    """Return a writable directory for Ollama ``.gguf`` symlinks.
+
+    Prefers ``<ollama_dir>/.studio_links/`` so the links sit next to the
+    blobs they point at. Falls back to a per-ollama-dir namespace under
+    Studio's own cache when the models directory is read-only (common
+    for system installs under ``/usr/share/ollama`` or ``/var/lib/ollama``)
+    so we still surface Ollama models in those environments.
+    """
+    from utils.paths.storage_roots import cache_root
+
+    primary = ollama_dir / ".studio_links"
+    try:
+        primary.mkdir(exist_ok = True)
+        return primary
+    except OSError as e:
+        logger.debug(
+            "Ollama dir %s not writable for .studio_links (%s); "
+            "falling back to Studio cache",
+            ollama_dir,
+            e,
+        )
+
+    # Fallback: namespace by a hash of the ollama_dir so two different
+    # Ollama roots don't collide. This is a cache path, not a security
+    # boundary.
+    try:
+        digest = hashlib.sha256(str(ollama_dir.resolve()).encode()).hexdigest()[:12]
+    except OSError:
+        digest = "default"
+    fallback = cache_root() / "ollama_links" / digest
+    try:
+        fallback.mkdir(parents = True, exist_ok = True)
+        return fallback
+    except OSError as e:
+        logger.warning(
+            "Could not create Ollama symlink cache at %s: %s",
+            fallback,
+            e,
+        )
+        return None
+
+
+def _scan_ollama_dir(
+    ollama_dir: Path, limit: Optional[int] = None
+) -> List[LocalModelInfo]:
+    """Scan an Ollama models directory for downloaded models.
+
+    Ollama stores models in a content-addressable layout::
+
+        <ollama_dir>/manifests/<host>/<namespace>/<model>/<tag>
+        <ollama_dir>/blobs/sha256-...
+
+    The default host is ``registry.ollama.ai`` with namespace
+    ``library`` (official models), but users can pull from custom
+    namespaces (``mradermacher/llama3``) or entirely different hosts
+    (``hf.co/org/repo:tag``).  We iterate all manifest files via
+    ``rglob`` so every layout depth is discovered.
+
+    Each manifest is JSON with a ``layers`` array. The layer with
+    ``mediaType == "application/vnd.ollama.image.model"`` contains the
+    GGUF weights. Vision models also have a projector layer
+    (``application/vnd.ollama.image.projector``). We read the config
+    layer to extract family/size info.
+
+    Since Ollama blobs lack a ``.gguf`` extension (which the GGUF
+    loading pipeline requires), we create ``.gguf``-named links
+    pointing at the blobs so the existing ``detect_gguf_model`` and
+    ``llama-server -m`` paths work unchanged. Each model gets its
+    own subdirectory under the links dir (keyed by a short hash of
+    the manifest path) so that ``detect_mmproj_file`` only sees the
+    projector for *that* model.  Links are created as symlinks when
+    possible, falling back to hardlinks (Windows without Developer
+    Mode) as a last resort.  The link dir lives under
+    ``<ollama_dir>/.studio_links/`` when writable, otherwise under
+    Studio's own cache directory.
+    """
+    manifests_root = ollama_dir / "manifests"
+    if not manifests_root.is_dir():
+        return []
+
+    found: List[LocalModelInfo] = []
+    blobs_dir = ollama_dir / "blobs"
+    links_root = _ollama_links_dir(ollama_dir)
+    if links_root is None:
+        logger.warning(
+            "Skipping Ollama scan for %s: no writable location for .gguf links",
+            ollama_dir,
+        )
+        return []
+
+    def _make_link(link_dir: Path, link_name: str, target: Path) -> Optional[str]:
+        """Create a .gguf-named link to an Ollama blob.
+
+        Tries symlink first, then hardlink (works on Windows without
+        Developer Mode when target is on the same filesystem).  Skips
+        the model if neither works -- a full file copy of a multi-GB
+        GGUF inside a synchronous API request would block the backend.
+
+        Idempotent: skips recreation when a valid link already exists.
+        """
+        link_dir.mkdir(parents = True, exist_ok = True)
+        link_path = link_dir / link_name
+        resolved = target.resolve()
+
+        # Skip if the link already points at the exact same blob.
+        # Only use samefile -- size-based checks can reuse stale links
+        # after `ollama pull` updates a tag to a same-sized blob.
+        try:
+            if link_path.exists() and os.path.samefile(str(link_path), str(resolved)):
+                return str(link_path)
+        except OSError as e:
+            logger.debug("Error checking existing link %s: %s", link_path, e)
+
+        tmp_path = link_dir / f".{link_name}.tmp-{uuid.uuid4().hex[:8]}"
+        try:
+            if tmp_path.is_symlink() or tmp_path.exists():
+                tmp_path.unlink()
+            try:
+                tmp_path.symlink_to(resolved)
+            except OSError:
+                try:
+                    os.link(str(resolved), str(tmp_path))
+                except OSError:
+                    logger.warning(
+                        "Could not create link for Ollama blob %s "
+                        "(symlinks and hardlinks both failed). "
+                        "Skipping model to avoid blocking the API.",
+                        target,
+                    )
+                    return None
+            os.replace(str(tmp_path), str(link_path))
+            return str(link_path)
+        except OSError as e:
+            logger.debug("Could not create Ollama link %s: %s", link_path, e)
+            try:
+                if tmp_path.is_symlink() or tmp_path.exists():
+                    tmp_path.unlink()
+            except OSError as cleanup_err:
+                logger.debug(
+                    "Could not clean up tmp path %s: %s", tmp_path, cleanup_err
+                )
+            return None
+
+    try:
+        for tag_file in manifests_root.rglob("*"):
+            if not tag_file.is_file():
+                continue
+
+            rel = tag_file.relative_to(manifests_root)
+            parts = rel.parts
+            if len(parts) < 3:
+                continue
+
+            host = parts[0]
+            repo_parts = list(parts[1:-1])
+            tag = parts[-1]
+
+            if (
+                host == "registry.ollama.ai"
+                and repo_parts
+                and repo_parts[0] == "library"
+            ):
+                repo_name = "/".join(repo_parts[1:])
+            elif host == "registry.ollama.ai":
+                repo_name = "/".join(repo_parts)
+            else:
+                repo_name = "/".join([host] + repo_parts)
+
+            if not repo_name:
+                continue
+
+            display = f"{repo_name}:{tag}"
+
+            manifest_key = rel.as_posix()
+            stem_hash = hashlib.sha256(manifest_key.encode()).hexdigest()[:10]
+
+            try:
+                manifest = json.loads(tag_file.read_text())
+            except (json.JSONDecodeError, OSError) as e:
+                logger.debug(
+                    "Skipping unreadable/invalid Ollama manifest %s: %s",
+                    tag_file,
+                    e,
+                )
+                continue
+
+            config_digest = manifest.get("config", {}).get("digest", "")
+            model_type = ""
+            file_type = ""
+            if config_digest and blobs_dir.is_dir():
+                config_blob = blobs_dir / config_digest.replace(":", "-")
+                if config_blob.is_file():
+                    try:
+                        cfg = json.loads(config_blob.read_text())
+                        model_type = cfg.get("model_type", "")
+                        file_type = cfg.get("file_type", "")
+                    except (json.JSONDecodeError, OSError) as e:
+                        logger.debug(
+                            "Could not parse Ollama config blob %s: %s",
+                            config_blob,
+                            e,
+                        )
+
+            model_link_dir = links_root / stem_hash
+
+            gguf_link_path: Optional[str] = None
+            quant = f"-{file_type}" if file_type else ""
+            safe_name = repo_name.replace("/", "-")
+            for layer in manifest.get("layers", []):
+                media = layer.get("mediaType", "")
+                digest = layer.get("digest", "")
+                if not digest:
+                    continue
+
+                if media == "application/vnd.ollama.image.model":
+                    candidate = blobs_dir / digest.replace(":", "-")
+                    if candidate.is_file():
+                        link_name = f"{safe_name}-{tag}{quant}.gguf"
+                        gguf_link_path = _make_link(
+                            model_link_dir, link_name, candidate
+                        )
+
+                elif media == "application/vnd.ollama.image.projector":
+                    candidate = blobs_dir / digest.replace(":", "-")
+                    if candidate.is_file():
+                        mmproj_name = f"{safe_name}-{tag}-mmproj.gguf"
+                        _make_link(model_link_dir, mmproj_name, candidate)
+
+            if not gguf_link_path:
+                continue
+
+            suffix = ""
+            if model_type:
+                suffix += f" ({model_type}"
+                if file_type:
+                    suffix += f" {file_type}"
+                suffix += ")"
+
+            try:
+                updated_at = tag_file.stat().st_mtime
+            except OSError:
+                updated_at = None
+
+            found.append(
+                LocalModelInfo(
+                    id = gguf_link_path,
+                    model_id = f"ollama/{repo_name}:{tag}",
+                    display_name = display + suffix,
+                    path = gguf_link_path,
+                    source = "custom",
+                    updated_at = updated_at,
+                ),
+            )
+            if limit is not None and len(found) >= limit:
+                return found
+    except OSError as e:
+        logger.warning("Error scanning Ollama directory %s: %s", ollama_dir, e)
+    return found
+
+
@router.get("/local", response_model = LocalModelListResponse)
 async def list_local_models(
    models_dir: str = Query(
@ -493,11 +757,27 @@ async def list_local_models(
        for folder in custom_folders:
            folder_path = Path(folder["path"])
            try:
-                custom_models = (
-                    _scan_models_dir(folder_path, limit = _MAX_MODELS_PER_FOLDER)
-                    + _scan_hf_cache(folder_path)
-                    + _scan_lmstudio_dir(folder_path)
-                )[:_MAX_MODELS_PER_FOLDER]
+                # Ollama scanner creates .studio_links/ with .gguf symlinks.
+                # Filter those from the generic scanners to avoid duplicates
+                # and leaking internal paths into the UI.
+                _generic = [
+                    m
+                    for m in (
+                        _scan_models_dir(folder_path, limit = _MAX_MODELS_PER_FOLDER)
+                        + _scan_hf_cache(folder_path)
+                        + _scan_lmstudio_dir(folder_path)
+                    )
+                    if not any(
+                        p in (".studio_links", "ollama_links")
+                        for p in Path(m.path).parts
+                    )
+                ]
+                custom_models = _generic
+                if len(custom_models) < _MAX_MODELS_PER_FOLDER:
+                    custom_models += _scan_ollama_dir(
+                        folder_path,
+                        limit = _MAX_MODELS_PER_FOLDER - len(custom_models),
+                    )
            except OSError as e:
                logger.warning("Skipping unreadable scan folder %s: %s", folder_path, e)
                continue
@ -575,6 +855,57 @@ async def remove_scan_folder_endpoint(
    return {"ok": True}


+@router.get("/recommended-folders")
+async def get_recommended_folders(
+    current_subject: str = Depends(get_current_subject),
+):
+    """Return well-known model directories that exist on this machine.
+
+    Lightweight alternative to ``browse-folders`` for showing quick-pick
+    chips without the overhead of enumerating a directory tree.  Returns
+    paths that actually exist on disk (HF cache, LM Studio, Ollama,
+    ``~/models``, etc.) so the frontend can offer them as one-click
+    "Recommended" shortcuts in the Custom Folders section.
+    """
+    from utils.paths.storage_roots import lmstudio_model_dirs
+
+    folders: list[str] = []
+    seen: set[str] = set()
+
+    def _add(p: Optional[Path]) -> None:
+        if p is None:
+            return
+        try:
+            resolved = str(p.resolve())
+        except OSError:
+            return
+        if resolved in seen:
+            return
+        if Path(resolved).is_dir() and os.access(resolved, os.R_OK | os.X_OK):
+            seen.add(resolved)
+            folders.append(resolved)
+
+    # LM Studio model directories
+    try:
+        for p in lmstudio_model_dirs():
+            _add(p)
+    except Exception as e:
+        logger.warning("Failed to scan for LM Studio model directories: %s", e)
+
+    # Ollama model directories
+    ollama_env = os.environ.get("OLLAMA_MODELS")
+    if ollama_env:
+        _add(Path(ollama_env).expanduser())
+    for candidate in (
+        Path.home() / ".ollama" / "models",
+        Path("/usr/share/ollama/.ollama/models"),
+        Path("/var/lib/ollama/.ollama/models"),
+    ):
+        _add(candidate)
+
+    return {"folders": folders}
+
+
 # Heuristic ceiling on how many children to stat when checking whether a
 # directory "looks like" it contains models. Keeps the browser snappy
 # even when a directory has thousands of unrelated entries.
--- a/studio/backend/utils/models/model_config.py
+++ b/studio/backend/utils/models/model_config.py
@ -959,6 +959,20 @@ def detect_mmproj_file(path: str, search_root: Optional[str] = None) -> Optional
        scan_order.append(resolved)

    _add(start_dir)
+
+    # When ``path`` is a symlink (e.g. Ollama's ``.studio_links/...gguf``
+    # -> ``blobs/sha256-...``), the symlink's parent directory rarely
+    # contains the mmproj sibling; the real mmproj file lives next to
+    # the symlink target. Add the target's parent to the scan so vision
+    # GGUFs that are surfaced via symlinks are still recognised as
+    # vision models.
+    try:
+        if p.is_symlink() and p.is_file():
+            target_parent = p.resolve().parent
+            if target_parent.is_dir():
+                _add(target_parent)
+    except OSError:
+        pass
    if search_root is not None:
        try:
            root_resolved = Path(search_root).resolve()
@ -1006,7 +1020,10 @@ def detect_gguf_model(path: str) -> Optional[str]:
    if p.suffix.lower() == ".gguf" and p.is_file():
        if _is_mmproj(p.name):
            return None
-        return str(p.resolve())
+        # Use absolute (not resolve) to preserve symlink names -- e.g.
+        # Ollama .studio_links/model.gguf -> blobs/sha256-... should
+        # keep the readable symlink name, not the opaque blob hash.
+        return str(p.absolute())

    # Case 2: directory containing .gguf files (skip mmproj)
    if p.is_dir():
--- a/studio/frontend/src/components/assistant-ui/model-selector/pickers.tsx
+++ b/studio/frontend/src/components/assistant-ui/model-selector/pickers.tsx
@ -27,6 +27,7 @@ import {
  listCachedModels,
  listGgufVariants,
  listLocalModels,
+  listRecommendedFolders,
  listScanFolders,
  removeScanFolder,
 } from "@/features/chat/api/chat-api";
@ -49,7 +50,7 @@ import { checkVramFit, estimateLoadingVram } from "@/lib/vram";
 import { Add01Icon, Cancel01Icon, Folder02Icon, Search01Icon } from "@hugeicons/core-free-icons";
 import { HugeiconsIcon } from "@hugeicons/react";
 import { FolderBrowser } from "./folder-browser";
-import { Trash2Icon } from "lucide-react";
+import { ChevronDownIcon, ChevronRightIcon, DownloadIcon, StarIcon, Trash2Icon } from "lucide-react";
 import {
  type ReactNode,
  useCallback,
@ -73,10 +74,35 @@ function normalizeForSearch(s: string): string {
  return s.toLowerCase().replace(/[\s\-_\.]/g, "");
 }

-function ListLabel({ children }: { children: ReactNode }) {
+function ListLabel({
+  children,
+  icon,
+  collapsed,
+  onToggle,
+}: {
+  children: ReactNode;
+  icon?: ReactNode;
+  collapsed?: boolean;
+  onToggle?: () => void;
+}) {
  return (
-    <div className="px-2.5 py-1.5 text-[10px] font-semibold uppercase tracking-wider text-muted-foreground">
-      {children}
+    <div className="flex items-center justify-between gap-1 px-2.5 py-1.5">
+      <span className="flex items-center gap-1.5 text-[10px] font-semibold uppercase tracking-wider text-muted-foreground">
+        {icon}
+        {children}
+      </span>
+      {onToggle && (
+        <button
+          type="button"
+          onClick={onToggle}
+          aria-label={collapsed ? "Expand section" : "Collapse section"}
+          className="shrink-0 rounded p-1 text-muted-foreground/60 transition-colors hover:text-foreground"
+        >
+          {collapsed
+            ? <ChevronRightIcon className="size-3" />
+            : <ChevronDownIcon className="size-3" />}
+        </button>
+      )}
    </div>
  );
 }
@ -489,6 +515,9 @@ export function HubModelPicker({
  // Delete confirmation dialog state
  const [deleteTarget, setDeleteTarget] = useState<string | null>(null);
  const [deleting, setDeleting] = useState(false);
+  const [downloadedCollapsed, setDownloadedCollapsed] = useState(false);
+  const [customFoldersCollapsed, setCustomFoldersCollapsed] = useState(false);
+  const [recommendedCollapsed, setRecommendedCollapsed] = useState(false);

  // Cached (already downloaded) repos -- use module-level cache so
  // re-mounting the popover does not flash an empty "Downloaded" section.
@ -514,6 +543,7 @@ export function HubModelPicker({
  const [showFolderInput, setShowFolderInput] = useState(false);
  const [folderLoading, setFolderLoading] = useState(false);
  const [showFolderBrowser, setShowFolderBrowser] = useState(false);
+  const [recommendedFolders, setRecommendedFolders] = useState<string[]>([]);

  const refreshLocalModelsList = useCallback(() => {
    listLocalModels()
@ -616,6 +646,9 @@ export function HubModelPicker({
    // Always refresh LM Studio + custom folder models (not gated by alreadyCached)
    refreshLocalModelsList();
    refreshScanFolders();
+    listRecommendedFolders()
+      .then(setRecommendedFolders)
+      .catch(() => {});

    // Always refetch cached GGUF/model lists. The module-level caches give
    // an instant render with stale data (no spinner flash), but newly
@ -893,8 +926,12 @@ export function HubModelPicker({
            (cachedGguf.length > 0 ||
              (!chatOnly && cachedModels.length > 0)) ? (
            <>
-              <ListLabel>Downloaded</ListLabel>
-              {cachedGguf.map((c) => (
+              <ListLabel
+                icon={<DownloadIcon className="size-3" />}
+                collapsed={downloadedCollapsed}
+                onToggle={() => setDownloadedCollapsed((v) => !v)}
+              >Downloaded</ListLabel>
+              {!downloadedCollapsed && cachedGguf.map((c) => (
                <div key={c.repo_id}>
                  <ModelRow
                    label={c.repo_id}
@ -922,7 +959,7 @@ export function HubModelPicker({
                  )}
                </div>
              ))}
-              {!chatOnly &&
+              {!downloadedCollapsed && !chatOnly &&
                cachedModels.map((c) => (
                  <div key={c.repo_id} className="flex items-center gap-0.5">
                    <div className="min-w-0 flex-1">
@ -1001,20 +1038,12 @@ export function HubModelPicker({

          {!showHfSection ? (
            <>
-              <div className="flex items-center justify-between gap-1 px-2.5 py-1.5">
-                <span className="text-[10px] font-semibold uppercase tracking-wider text-muted-foreground">
+              <div className="flex items-center gap-1 px-2.5 py-1.5">
+                <span className="flex items-center gap-1.5 text-[10px] font-semibold uppercase tracking-wider text-muted-foreground">
+                  <HugeiconsIcon icon={Folder02Icon} className="size-3" />
                  Custom Folders
                </span>
                <div className="flex items-center gap-0.5">
-                  <button
-                    type="button"
-                    aria-label="Browse for a folder on the server"
-                    title="Browse folders on the server"
-                    onClick={() => setShowFolderBrowser(true)}
-                    className="shrink-0 rounded p-1 text-muted-foreground/60 transition-colors hover:text-foreground"
-                  >
-                    <HugeiconsIcon icon={Search01Icon} className="size-3" />
-                  </button>
                  <button
                    type="button"
                    aria-label={showFolderInput ? "Cancel adding folder" : "Add scan folder by path"}
@ -1029,11 +1058,33 @@ export function HubModelPicker({
                  >
                    <HugeiconsIcon icon={showFolderInput ? Cancel01Icon : Add01Icon} className="size-3" />
                  </button>
+                  <button
+                    type="button"
+                    aria-label="Browse for a folder on the server"
+                    title="Browse folders on the server"
+                    onClick={() => setShowFolderBrowser(true)}
+                    className="shrink-0 rounded p-0.5 text-muted-foreground/60 transition-colors hover:text-foreground"
+                  >
+                    <HugeiconsIcon icon={Search01Icon} className="size-2.5" />
+                  </button>
+                </div>
+                <div className="ml-auto">
+                  <button
+                    type="button"
+                    aria-label={customFoldersCollapsed ? "Expand custom folders" : "Collapse custom folders"}
+                    title={customFoldersCollapsed ? "Expand" : "Collapse"}
+                    onClick={() => setCustomFoldersCollapsed((v) => !v)}
+                    className="shrink-0 rounded p-1 text-muted-foreground/60 transition-colors hover:text-foreground"
+                  >
+                    {customFoldersCollapsed
+                      ? <ChevronRightIcon className="size-3" />
+                      : <ChevronDownIcon className="size-3" />}
+                  </button>
                </div>
              </div>

              {/* Folder paths */}
-              {scanFolders.map((f) => (
+              {!customFoldersCollapsed && scanFolders.map((f) => (
                <div
                  key={f.id}
                  className="group flex items-center gap-1.5 px-2.5 py-0.5"
@ -1056,8 +1107,31 @@ export function HubModelPicker({
                </div>
              ))}

+              {/* Recommended folders */}
+              {!customFoldersCollapsed && (() => {
+                const registered = new Set(scanFolders.map((f) => f.path));
+                const unregistered = recommendedFolders.filter((p) => !registered.has(p));
+                if (unregistered.length === 0) return null;
+                return (
+                  <div className="flex flex-wrap gap-1 px-2.5 pb-0.5">
+                    {unregistered.map((p) => (
+                      <button
+                        key={p}
+                        type="button"
+                        onClick={() => void handleAddFolder(p)}
+                        disabled={folderLoading}
+                        title={`Add ${p}`}
+                        className="rounded-full border border-dashed border-border/50 px-2 py-0.5 font-mono text-[10px] text-muted-foreground/70 transition-colors hover:border-foreground/30 hover:bg-accent hover:text-foreground disabled:opacity-40"
+                      >
+                        <span className="text-[11px] font-semibold">+</span> {p.length > 30 ? `...${p.slice(-27)}` : p}
+                      </button>
+                    ))}
+                  </div>
+                );
+              })()}
+
              {/* Add folder input */}
-              {showFolderInput && (
+              {!customFoldersCollapsed && showFolderInput && (
                <div className="px-2.5 pb-1 pt-0.5">
                  <div className="flex items-center gap-1">
                    <HugeiconsIcon icon={Folder02Icon} className="size-3 shrink-0 text-muted-foreground/40" />
@ -1114,11 +1188,15 @@ export function HubModelPicker({


              {/* Models from custom folders */}
-              {customFolderModels.map((m) => {
+              {!customFoldersCollapsed && customFolderModels.map((m) => {
+                const isGgufFile = m.path.toLowerCase().endsWith(".gguf");
                const isGguf =
+                  isGgufFile ||
                  isGgufRepo(m.id) ||
-                  isGgufRepo(m.display_name) ||
-                  m.path.toLowerCase().endsWith(".gguf");
+                  isGgufRepo(m.display_name);
+                // Single .gguf files (e.g. Ollama blobs) load directly;
+                // GGUF repos/directories expand to pick a variant.
+                const isDirectGguf = isGgufFile;
                return (
                  <div key={m.id}>
                    <ModelRow
@ -1126,7 +1204,13 @@ export function HubModelPicker({
                      meta={isGguf ? "GGUF" : "Local"}
                      selected={value === m.id}
                      onClick={() => {
-                        if (isGguf) {
+                        if (isDirectGguf) {
+                          onSelect(m.id, {
+                            source: "local",
+                            isLora: false,
+                            isDownloaded: true,
+                          });
+                        } else if (isGguf) {
                          setExpandedGguf((prev) =>
                            prev === m.id ? null : m.id,
                          );
@ -1158,8 +1242,12 @@ export function HubModelPicker({

          {!showHfSection && cachedReady ? (
            <>
-              <ListLabel>Recommended</ListLabel>
-              {visibleRecommendedIds.length === 0 ? (
+              <ListLabel
+                icon={<StarIcon className="size-3" />}
+                collapsed={recommendedCollapsed}
+                onToggle={() => setRecommendedCollapsed((v) => !v)}
+              >Recommended</ListLabel>
+              {recommendedCollapsed ? null : visibleRecommendedIds.length === 0 ? (
                <div className="px-2.5 py-2 text-xs text-muted-foreground">
                  No default models.
                </div>
@ -1203,7 +1291,7 @@ export function HubModelPicker({
                  );
                })
              )}
-              {hasMoreRecommended && (
+              {!recommendedCollapsed && hasMoreRecommended && (
                <>
                  <div ref={recommendedSentinelRef} className="h-px" />
                  <div className="flex items-center justify-center py-2">
@ -1216,7 +1304,7 @@ export function HubModelPicker({

          {showHfSection && filteredRecommendedIds.length > 0 ? (
            <>
-              <ListLabel>Recommended</ListLabel>
+              <ListLabel icon={<StarIcon className="size-3" />}>Recommended</ListLabel>
              {filteredRecommendedIds.map((id) => {
                const vram = recommendedVramMap.get(id);
                return (
--- a/studio/frontend/src/features/chat/api/chat-api.ts
+++ b/studio/frontend/src/features/chat/api/chat-api.ts
@ -262,6 +262,12 @@ export interface BrowseFoldersResponse {
  model_files_here?: number;
 }

+export async function listRecommendedFolders(): Promise<string[]> {
+  const response = await authFetch("/api/models/recommended-folders");
+  const data = await parseJsonOrThrow<{ folders: string[] }>(response);
+  return data.folders;
+}
+
 export async function browseFolders(
  path?: string,
  showHidden = false,
--- a/studio/frontend/src/features/chat/hooks/use-chat-model-runtime.ts
+++ b/studio/frontend/src/features/chat/hooks/use-chat-model-runtime.ts
@ -437,9 +437,10 @@ export function useChatModelRuntime() {
            const { chatTemplateOverride, kvCacheDtype, customContextLength, ggufContextLength, speculativeType } = useChatRuntimeStore.getState();
            // GGUF: use custom context length, or 0 = model's native context
            // Non-GGUF: use the Max Seq Length slider value
+            const isDirectGgufFile = modelId.toLowerCase().endsWith(".gguf");
            const effectiveMaxSeqLength = customContextLength != null
              ? customContextLength
-              : ggufVariant != null ? (ggufContextLength ?? 0) : maxSeqLength;
+              : (ggufVariant != null || isDirectGgufFile) ? (ggufContextLength ?? 0) : maxSeqLength;
            const loadResponse = await loadModel({
              model_path: modelId,
              hf_token: hfToken,