mirror of https://github.com/mudler/LocalAI synced 2026-04-21 13:27:21 +00:00

Ettore Di Giacinto 285f7d4340 chore: add embeddingemma

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-04-08 17:40:55 +00:00

3.8 KiB

Raw Blame History

Adding GGUF Models from HuggingFace to the Gallery

When adding a GGUF model from HuggingFace to the LocalAI model gallery, follow this guide.

Gallery file

All models are defined in gallery/index.yaml. Find the appropriate section (embedding models near other embeddings, chat models near similar chat models) and add a new entry.

Getting the SHA256

GGUF files on HuggingFace expose their SHA256 via the x-linked-etag HTTP header. Fetch it with:

curl -sI "https://huggingface.co/<org>/<repo>/resolve/main/<filename>.gguf" | grep -i x-linked-etag

The value (without quotes) is the SHA256 hash. Example:

curl -sI "https://huggingface.co/ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/resolve/main/embeddinggemma-300m-qat-Q8_0.gguf" | grep -i x-linked-etag
# x-linked-etag: "6fa0c02a9c302be6f977521d399b4de3a46310a4f2621ee0063747881b673f67"

Important: Pay attention to exact filename casing — HuggingFace filenames are case-sensitive (e.g., Q8_0 vs q8_0). Check the repo's file listing to get the exact name.

Entry format — Embedding models

Embedding models use gallery/virtual.yaml as the base config and set embeddings: true:

- name: "model-name"
  url: github:mudler/LocalAI/gallery/virtual.yaml@master
  urls:
    - https://huggingface.co/<original-model-org>/<original-model-name>
    - https://huggingface.co/<gguf-org>/<gguf-repo-name>
  description: |
    Short description of the model, its size, and capabilities.    
  tags:
    - embeddings
  overrides:
    backend: llama-cpp
    embeddings: true
    parameters:
      model: <filename>.gguf
  files:
    - filename: <filename>.gguf
      uri: huggingface://<gguf-org>/<gguf-repo-name>/<filename>.gguf
      sha256: <sha256-hash>

Entry format — Chat/LLM models

Chat models typically reference a template config (e.g., gallery/gemma.yaml, gallery/chatml.yaml) that defines the prompt format. Use YAML anchors (&name / *name) if adding multiple quantization variants of the same model:

- &model-anchor
  url: "github:mudler/LocalAI/gallery/<template>.yaml@master"
  name: "model-name"
  icon: https://example.com/icon.png
  license: <license>
  urls:
    - https://huggingface.co/<org>/<model>
    - https://huggingface.co/<gguf-org>/<gguf-repo>
  description: |
    Model description.    
  tags:
    - llm
    - gguf
    - gpu
    - cpu
  overrides:
    parameters:
      model: <filename>-Q4_K_M.gguf
  files:
    - filename: <filename>-Q4_K_M.gguf
      sha256: <sha256>
      uri: huggingface://<gguf-org>/<gguf-repo>/<filename>-Q4_K_M.gguf

To add a variant (e.g., different quantization), use YAML merge:

- !!merge <<: *model-anchor
  name: "model-name-q8"
  overrides:
    parameters:
      model: <filename>-Q8_0.gguf
  files:
    - filename: <filename>-Q8_0.gguf
      sha256: <sha256>
      uri: huggingface://<gguf-org>/<gguf-repo>/<filename>-Q8_0.gguf

Available template configs

Look at existing .yaml files in gallery/ to find the right prompt template for your model architecture:

gemma.yaml — Gemma-family models (gemma, embeddinggemma, etc.)
chatml.yaml — ChatML format (many Mistral/OpenHermes models)
deepseek.yaml — DeepSeek models
virtual.yaml — Minimal base (good for embedding models that don't need chat templates)

Checklist

Find the GGUF file on HuggingFace — note exact filename (case-sensitive)
Get the SHA256 using the curl -sI + x-linked-etag method above
Choose the right template config from gallery/ based on model architecture
Add the entry to gallery/index.yaml near similar models
Set embeddings: true if it's an embedding model
Include both URLs — the original model page and the GGUF repo
Write a description — mention model size, capabilities, and quantization type

3.8 KiB Raw Blame History