* More appropriate place for data storing
The /usr/share subtree in Linux is used for data that generally are not
supposed to change. Conventional places for changeable data are usually
located under /var, so /var/lib seems to be a reasonable default here.
* Data paths consistency fix
* Directory name consistency fix
* feat(ui): add watchdog settings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Do not re-read env
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Some refactor, move other settings to runtime (p2p)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add API Keys handling
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Allow to disable runtime settings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Documentation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* show MCP toggle in index
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop context default
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(llama.cpp): expose env vars as options for consistency
This allows to configure everything in the YAML file of the model rather
than have global configurations
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(llama.cpp): respect usetokenizertemplate and use llama.cpp templating system to process messages
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Detect template exists if use tokenizer template is enabled
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Better recognization of chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixes to support tool calls while using templates from tokenizer
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop template guessing, fix passing tools to tokenizer
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Extract grammar and other options from chat template, add schema struct
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Automatically set use_jinja
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Cleanups, identify by default gguf models for chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Added download instructions for macOS DMG file and updated command for Linux and macOS.
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* fix: properly terminate kv_overrides array with empty key
The llama model loading function expects KV overrides to be terminated
with an empty key (key[0] == 0). Previously, the kv_overrides vector was
not being properly terminated, causing an assertion failure.
This commit ensures that after parsing all KV override strings, we add a
final terminating entry with an empty key to satisfy the C-style array
termination requirement. This fixes the assertion error and allows the
model to load correctly with custom KV overrides.
Fixes#6643
- Also included a reference to the usage of the `overrides` option in
the advanced-usage section.
Signed-off-by: blob42 <contact@blob42.xyz>
* doc: document the `overrides` option
---------
Signed-off-by: blob42 <contact@blob42.xyz>
* feat(neutts): add backend
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ci): add images to CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(gallery): add Neutts
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Make it work with quantized versions
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Apply suggestion from @mudler
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* Apply suggestion from @mudler
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* Apply suggestion from @mudler
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* WIP - add endpoint
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Rename
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Wire the Completion API
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Try to make it functional
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Almost functional
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Bump golang versions used in tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add description of the tool
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Make it working
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small optimizations
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Cleanup/refactor
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
The binary is now named "llama-cpp-rpc-server" for p2p workers.
We also decrease the default token rotation interval, in this way
peer discovery is much more responsive.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: split remaining backends and drop embedded backends
- Drop silero-vad, huggingface, and stores backend from embedded
binaries
- Refactor Makefile and Dockerfile to avoid building grpc backends
- Drop golang code that was used to embed backends
- Simplify building by using goreleaser
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(gallery): be specific with llama-cpp backend templates
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(docs): update
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ci): minor fixes
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: drop all ffmpeg references
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: run protogen-go
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Always enable p2p mode
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update gorelease file
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(stores): do not always load
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix linting issues
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Simplify
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Mac OS fixup
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
feat: reduce images size and stop bundling sources
Do not copy sources anymore, and reduce packages of the base images by
not using builder images.
If needed to rebuild, just build the container image from scratch by
following the docs. We will slowly try to migrate all backends to the
gallery to keep the core small.
This PR is a breaking change, it also sets the base folders to /models
and /backends instead of /build/models and /build/backends.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* feat: Add backend gallery
This PR add support to manage backends as similar to models. There is
now available a backend gallery which can be used to install and remove
extra backends.
The backend gallery can be configured similarly as a model gallery, and
API calls allows to install and remove new backends in runtime, and as
well during the startup phase of LocalAI.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add backends docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* wip: Backend Dockerfile for python backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: drop extras images, build python backends separately
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixup on all backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Tweaks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop old backends leftovers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixup CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Move dockerfile upper
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix proto
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Feature dropped for consistency - we prefer model galleries
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add missing packages in the build image
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* exllama is ponly available on cublas
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* pin torch on chatterbox
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups to index
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Debug CI
* Install accellerators deps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add target arch
* Add cuda minor version
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Use self-hosted runners
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: use quay for test images
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups for vllm and chatterbox
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small fixups on CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chatterbox is only available for nvidia
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Simplify CI builds
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Adapt test, use qwen3
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(model gallery): add jina-reranker-v1-tiny-en-gguf
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(gguf-parser): recover from potential panics that can happen while reading ggufs with gguf-parser
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Use reranker from llama.cpp in AIO images
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Limit concurrent jobs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* Add note to help run nvidia containers with SELinux
* Use correct CUDA container references as noted in the dockerhub overview
* Clean trailing whitespaces
The GGML format is now dead, since in the next version of LocalAI we
already bring many breaking compatibility changes, taking the occasion
also to drop ggml support (pre-gguf).
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Since the remote gallery was introduced this is now completely
superseded by it. In order to keep the code clean and remove redudant
parts let's simplify the usage.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Rename LocalAI-Extra-Usage -> Extra-Usage, add MACHINE_TAG as cli flag option, add docs about extra-usage and machine-tag
Signed-off-by: mintyleaf <mintyleafdev@gmail.com>
* feat(backend): add stablediffusion-ggml
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ci): track stablediffusion-ggml
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Use default scheduler and sampler if not specified
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Move cfg scale out of diffusers block
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Make it working
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: set free_params_immediately to false to call the model in sequence
https://github.com/leejet/stable-diffusion.cpp/issues/366
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backends): Drop bert.cpp
use llama.cpp 3.2 as a drop-in replacement for bert.cpp
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(tests): make test more robust
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
update link to examples which have moved to their own repository
Signed-off-by: Philipp Seelig <philipp@daxbau.net>
Co-authored-by: Philipp Seelig <philipp@daxbau.net>
Co-authored-by: Dave <dave@gray101.com>
* chore(cli): be consistent between workers and expose ExtraLLamaCPPArgs to both
Fixes: https://github.com/mudler/LocalAI/issues/3427
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* bump grpcio
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Refer to the OpenAI documentation to update the openai-functions documentation
I saw the openai official website, apIn the description: The parameters `function_call` and `functions` have been replaced by `tool_choice` and `tools`.So I submitted this update;But I haven't read the code of localai, so I'm not sure if it also applies to localai.
Signed-off-by: 四少爷 <sex@jermey.cn>
* Update Usage Example
The original usage example was too outdated, and calling with the new version of the openai python package would result in errors. Therefore, the curl example was rewritten (as curl examples are also used elsewhere).
Signed-off-by: 四少爷 <sex@jermey.cn>
* add python example
Signed-off-by: 四少爷 <sex@jermey.cn>
---------
Signed-off-by: 四少爷 <sex@jermey.cn>
* feat(functions): enhance parsing with broken JSON when we parse the raw results
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* breaking: make function name by default
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(grammar): dynamically generate grammars with mutating keys
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactor: simplify condition
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* docs(swagger): finish convering gallery section
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* docs: add section to explain how to install models with local-ai run
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Minor docs adjustments
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(llama.cpp): add embeddings
Also enable embeddings by default for llama.cpp models
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(Makefile): prepare llama.cpp sources only once
Otherwise we keep cloning llama.cpp for each of the variants
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not set embeddings to false
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* docs: add embeddings to the YAML config reference
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(install.sh): support federated install
This allows to support federation by exposing:
- FEDERATED: true/false to share the instance
- FEDERATED_SERVER: true/false to start the federated load balancer (it
forwards requests to the federation)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* docs: update installer parameters
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Dave <dave@gray101.com>
This allows LocalAI to be less noisy avoiding to connect outside.
Needed if e.g. there is no plan into using p2p across separate networks.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Added some OpenVINO models
Added Phi-3 trust_remote_code: true
Added Hermes 2 Pro Llama3
Added Multilingual-E5-base embedding model with OpenVINO acceleration (CPU and XPU)
Added all-MiniLM-L6-v2 with OpenVINO acceleration (CPU and XPU)
* Added Remote Code for phi, fixed error on Yamllint
* update openvino.yaml
I need to go to rest: today is not my day...