Commit graph

183 commits

Author SHA1 Message Date
Ettore Di Giacinto
80f7f17843
chore(deps): bump llama.cpp to 'e562eece7cb476276bfc4cbb18deb7c0369b2233' (#5552)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-31 12:46:32 +02:00
Ettore Di Giacinto
dd7fa6b9f7
chore(deps): bump llama.cpp to 'e83ba3e460651b20a594e9f2f0f0bffb998d3ce1 (#5527)
chore(deps): bump llama.cpp to 'e83ba3e460651b20a594e9f2f0f0bffb998d3ce1'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-30 10:29:01 +02:00
Ettore Di Giacinto
88de2ea01a
feat(llama.cpp): add support for audio input (#5466)
* feat(llama.cpp): add support for audio input

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Adapt tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-26 16:06:03 +02:00
Ettore Di Giacinto
3b0cf52f6a
feat(llama.cpp): add reranking (#5396)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-22 21:49:30 +02:00
Ettore Di Giacinto
6d5bde860b
feat(llama.cpp): upgrade and use libmtmd (#5379)
* WIP

* wip

* wip

* Make it compile

* Update json.hpp

* this shouldn't be private for now

* Add logs

* Reset auto detected template

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Re-enable grammars

* This seems to be broken - 360a9c98e1 (diff-a18a8e64e12a01167d8e98fc)[…]cccf0d4eed09d76d879L2998-L3207

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Placeholder

* Simplify image loading

* use completion type

* disable streaming

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* correctly return timings

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Remove some debug logging

* Adapt tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Keep header

* embedding: do not use oai type

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Sync from server.cpp

* Use utils and json directly from llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Sync with upstream

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: copy json.hpp from the correct location

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: add httplib

* sync llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Embeddiongs: set OAICOMPAT_TYPE_EMBEDDING

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: sync with server.cpp by including it

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* make it darwin-compatible

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-17 16:02:53 +02:00
Ettore Di Giacinto
adb24214c6
chore(deps): bump llama.cpp to b34c859146630dff136943abc9852ca173a7c9d6 (#5323)
chore(deps): bump llama.cpp to 'b34c859146630dff136943abc9852ca173a7c9d6'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-06 11:21:25 +02:00
Ettore Di Giacinto
1fc6d469ac
chore(deps): bump llama.cpp to '1d36b3670b285e69e58b9d687c770a2a0a192194 (#5307)
chore(deps): bump llama.cpp to '1d36b3670b285e69e58b9d687c770a2a0a192194'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-03 18:44:40 +02:00
Ettore Di Giacinto
8abecb4a18
chore: bump grpc limits to 50MB (#5212)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-19 08:53:24 +02:00
Richard Palethorpe
1b899e1a68
feat(stablediffusion): Enable SYCL (#5144)
* feat(sycl): Enable SYCL for stable diffusion

This is a pain because we compile with CGO, but SD is compiled with
CMake. I don't think we can easily use CMake to set the linker flags
necessary. Also I could not find pkg-config calls that would fully set
the flags, so some of them are set manually.

See https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html
for reference. I also resorted to searching the shared object files in
MKLROOT/lib for the symbols.

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(ci): Don't set nproc on cmake

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-04-10 15:20:53 +02:00
Ettore Di Giacinto
25e6f21322
chore(deps): bump llama.cpp to 4ccea213bc629c4eef7b520f7f6c59ce9bbdaca0 (#5143)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-08 11:26:06 +02:00
Ettore Di Giacinto
ece239966f
chore: ⬆️ Update ggml-org/llama.cpp to 6bf28f0111ff9f21b3c1b1eace20c590281e7ba6 (#5127)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-06 14:01:51 +02:00
Richard Palethorpe
d2cf8ef070
fix(sycl): kernel not found error by forcing -fsycl (#5115)
* chore(sycl): Update oneapi to 2025:1

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(sycl): Pass -fsycl flag as workaround

-fsycl should be set by llama.cpp's cmake file, but something goes wrong
and it doesn't appear to get added

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(build): Speed up llama build by using all CPUs

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-04-03 16:22:59 +02:00
Ettore Di Giacinto
18b320d577
chore(deps): bump llama.cpp to 'f01bd02376f919b05ee635f438311be8dfc91d7c (#5110)
chore(deps): bump llama.cpp to 'f01bd02376f919b05ee635f438311be8dfc91d7c'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-03 10:23:14 +02:00
Ettore Di Giacinto
c2a39e3639
fix(llama.cpp): properly handle sigterm (#5099)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-30 18:08:29 +02:00
Ettore Di Giacinto
423514a5a5
fix(clip): do not imply GPU offload by default (#5010)
* fix(clip): do not imply GPUs by default

Until a better solution is found upstream, be conservative and default
to GPU.

https://github.com/ggml-org/llama.cpp/pull/12322
https://github.com/ggml-org/llama.cpp/pull/12322#issuecomment-2720970695

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* allow to override gpu via backend options

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-13 15:14:11 +01:00
Ettore Di Giacinto
e4fa894153
fix(llama.cpp): correctly handle embeddings in batches (#4957)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-07 19:29:52 +01:00
Ettore Di Giacinto
67f7bffd18
chore(deps): update llama.cpp and sync with upstream changes (#4950)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-06 00:40:58 +01:00
Ettore Di Giacinto
9e32fda304
fix(llama.cpp): improve context shift handling (#4820)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-14 14:55:03 +01:00
Shraddha
03974a4dd4
feat: tokenization with llama.cpp (#4724)
feat: tokenization

Signed-off-by: shraddhazpy <shraddha@shraddhafive.in>
2025-02-02 17:39:43 +00:00
Ettore Di Giacinto
1d6afbd65d
feat(llama.cpp): Add support to grammar triggers (#4733)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-02 13:25:03 +01:00
Ettore Di Giacinto
958f6eb722
chore(llama.cpp): update dependency (#4628)
Update to '3edfa7d3753c29e44b964c0ff424d2ea8d5fdee6' and adapt to upstream changes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-18 11:55:13 +01:00
mintyleaf
96f8ec0402
feat: add machine tag and inference timings (#4577)
* Add machine tag option, add extraUsage option, grpc-server -> proto -> endpoint extraUsage data is broken for now

Signed-off-by: mintyleaf <mintyleafdev@gmail.com>

* remove redurant timing fields, fix not working timings output

Signed-off-by: mintyleaf <mintyleafdev@gmail.com>

* use middleware for Machine-Tag only if tag is specified

Signed-off-by: mintyleaf <mintyleafdev@gmail.com>

---------

Signed-off-by: mintyleaf <mintyleafdev@gmail.com>
2025-01-17 17:05:58 +01:00
Ettore Di Giacinto
ab5adf40af
chore(deps): bump llama.cpp to '924518e2e5726e81f3aeb2518fb85963a500e… (#4592)
chore(deps): bump llama.cpp to '924518e2e5726e81f3aeb2518fb85963a500e93a'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-13 17:33:06 +01:00
Ettore Di Giacinto
c553d73748
chore(deps): bump llama.cpp to 4b0c638b9 (#4532)
deps(llama.cpp): bump to 4b0c638b9

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-04 09:40:08 +01:00
Ettore Di Giacinto
0eb2911aad
chore(llava): update clip.patch (#4453)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-23 19:11:31 +01:00
Ettore Di Giacinto
708cba0c1b
chore(llama.cpp): bump, drop penalize_nl (#4418)
deps(llama.cpp): bump, drop penalize_nl

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-17 00:47:52 +01:00
Ettore Di Giacinto
fc4a714992
feat(llama.cpp): bump and adapt to upstream changes (#4378)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-14 00:30:52 +01:00
Ettore Di Giacinto
d4c1746c7d
feat(llama.cpp): expose cache_type_k and cache_type_v for quant of kv cache (#4329)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-06 10:23:59 +01:00
Ettore Di Giacinto
cbedf2f428
fix(llama.cpp): embed metal file into result binary for darwin (#4279)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-11-28 04:17:00 +00:00
Ettore Di Giacinto
2b62260b6d
feat(models): use rwkv from llama.cpp (#4264)
feat(rwkv): use rwkv from llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-11-26 14:22:55 +01:00
Ettore Di Giacinto
404ca3cc23
chore(deps): bump llama.cpp to 47f931c8f9a26c072d71224bc8013cc66ea9e445 (#4263)
chore(deps): bump llama.cpp to '47f931c8f9a26c072d71224bc8013cc66ea9e445'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-11-26 11:12:57 +01:00
Ettore Di Giacinto
939fbe59cc
chore(deps): bump llama-cpp to ae8de6d50a09d49545e0afab2e50cc4acfb280e2 (#4157)
* chore(deps): bump llama-cpp to ae8de6d50a09d49545e0afab2e50cc4acfb280e2

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(metal): metal file has moved

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-11-15 12:51:43 +01:00
Ettore Di Giacinto
3d4bb757d2
chore(deps): bump llama-cpp to 8f275a7c4593aa34147595a90282cf950a853690 (#4016)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-10-30 08:31:13 +01:00
Ettore Di Giacinto
32db787991
chore(deps): bump llama-cpp to cda0e4b648dde8fac162b3430b14a99597d3d74f (#3884)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-10-20 00:26:49 +02:00
Ettore Di Giacinto
6257e2f510
chore(deps): bump llama-cpp to 96776405a17034dcfd53d3ddf5d142d34bdbb657 (#3793)
This adapts also to upstream changes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-10-12 01:25:03 +02:00
siddimore
f84b55d1ef
feat: Add Get Token Metrics to GRPC server (#3687)
* Add Get Token Metrics to GRPC server

Signed-off-by: Siddharth More <siddimore@gmail.com>

* Expose LocalAI endpoint

Signed-off-by: Siddharth More <siddimore@gmail.com>

---------

Signed-off-by: Siddharth More <siddimore@gmail.com>
2024-10-01 14:41:20 +02:00
siddimore
50a3b54e34
feat(api): add correlationID to Track Chat requests (#3668)
* Add CorrelationID to chat request

Signed-off-by: Siddharth More <siddimore@gmail.com>

* remove get_token_metrics

Signed-off-by: Siddharth More <siddimore@gmail.com>

* Add CorrelationID to proto

Signed-off-by: Siddharth More <siddimore@gmail.com>

* fix correlation method name

Signed-off-by: Siddharth More <siddimore@gmail.com>

* Update core/http/endpoints/openai/chat.go

Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: Siddharth More <siddimore@gmail.com>

* Update core/http/endpoints/openai/chat.go

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: Siddharth More <siddimore@gmail.com>

---------

Signed-off-by: Siddharth More <siddimore@gmail.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-09-28 17:23:56 +02:00
Ettore Di Giacinto
25deb4ba95
chore(deps): update llama.cpp to 6262d13e0b2da91f230129a93a996609a2fa2f2 (#3549)
chore(deps): update llama.cpp to 6262d13e0b2da91f230129a93a996609a2f5a2f2

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-09-16 10:29:20 +02:00
Ettore Di Giacinto
d51444d606
chore(deps): update llama.cpp (#3497)
* Apply llava patch

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-09-12 20:55:27 +02:00
Ettore Di Giacinto
b8e7a76524
chore(deps): update llama.cpp (#3438)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-08-31 01:21:45 +02:00
Ettore Di Giacinto
409e2d348e
chore(deps): bump llama.cpp, rename llama_add_bos_token (#3253)
deps(llama.cpp): bump, rename llama_add_bos_token

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-08-16 01:20:21 +02:00
Ettore Di Giacinto
abcf0ff000
chore: ⬆️ Update ggerganov/llama.cpp to 1e6f6554aa11fa10160a5fda689e736c3c34169f (#3189)
* arrow_up: Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(llama.cpp): adapt to upstream naming changes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-08-07 01:10:21 +02:00
Ettore Di Giacinto
4e11ca55fd
chore: ⬆️ Update ggerganov/llama.cpp (#3166)
* arrow_up: Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(llama.cpp): adapt init function call

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-08-06 11:39:35 +02:00
Ettore Di Giacinto
bd900945f7
fix(llama.cpp): do not set anymore lora_base (#2999)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-07-24 12:35:52 +02:00
Ettore Di Giacinto
35561edb6e
feat(llama.cpp): support embeddings endpoints (#2871)
* feat(llama.cpp): add embeddings

Also enable embeddings by default for llama.cpp models

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(Makefile): prepare llama.cpp sources only once

Otherwise we keep cloning llama.cpp for each of the variants

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* do not set embeddings to false

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs: add embeddings to the YAML config reference

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-07-15 22:54:16 +02:00
Dave
405794d4ca
fix: speedup git submodule update with --single-branch (#2847)
add --single-branch to submodule update commands for speed

Signed-off-by: Dave Lee <dave@gray101.com>
2024-07-13 22:32:25 +02:00
Loric
a00e9a82ae
Update remaining git clones to git fetch (#2779)
Signed-off-by: Loric <117862619+LoricOSC@users.noreply.github.com>
2024-07-12 06:43:58 +00:00
cryptk
c047c19145
fix: make sure the GNUMake jobserver is passed to cmake for the llama.cpp build (#2697)
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-07-02 08:46:59 +02:00
Ettore Di Giacinto
7b1e792732
deps(llama.cpp): bump to latest, update build variables (#2669)
* arrow_up: Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* deps(llama.cpp): update build variables to follow upstream

Update build recipes with https://github.com/ggerganov/llama.cpp/pull/8006

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Disable shared libs by default in llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Disable shared libs in llama.cpp Makefile

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Disable metal embedding for now, until it is tested

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(mac): explicitly enable metal

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* debug

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix typo

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-27 23:10:04 +02:00
Ettore Di Giacinto
a8bfb6f9c2
feat(options): add repeat_last_n (#2660)
feat(options): add repeat_last_n

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-26 14:58:50 +02:00
Ettore Di Giacinto
b783c811db
feat(build): only build llama.cpp relevant targets (#2659)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-26 14:58:38 +02:00
Ettore Di Giacinto
3a9408363b
deps(llama.cpp): update and adapt API changes (#2381)
deps(llama.cpp): update and rename function

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-23 01:02:11 +02:00
Ettore Di Giacinto
c89271b2e4
feat(llama.cpp): add distributed llama.cpp inferencing (#2324)
* feat(llama.cpp): support distributed llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: let tweak how chat messages are merged together

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Makefile: register to ALL_GRPC_BACKENDS

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactoring, allow disable auto-detection of backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* minor fixups

Signed-off-by: mudler <mudler@localai.io>

* feat: add cmd to start rpc-server from llama.cpp

Signed-off-by: mudler <mudler@localai.io>

* ci: add ccache

Signed-off-by: mudler <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: mudler <mudler@localai.io>
2024-05-15 01:17:02 +02:00
Ettore Di Giacinto
e49ea0123b
feat(llama.cpp): add flash_attention and no_kv_offloading (#2310)
feat(llama.cpp): add flash_attn and no_kv_offload

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-13 19:07:51 +02:00
cryptk
28a421cb1d
feat: migrate python backends from conda to uv (#2215)
* feat: migrate diffusers backend from conda to uv

  - replace conda with UV for diffusers install (prototype for all
    extras backends)
  - add ability to build docker with one/some/all extras backends
    instead of all or nothing

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate autogtpq bark coqui from conda to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: convert exllama over to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate exllama2 to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate mamba to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate parler to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate petals to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: fix tests

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate rerankers to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate sentencetransformers to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: install uv for tests-linux

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: make sure file exists before installing on intel images

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate transformers backend to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate transformers-musicgen to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate vall-e-x to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate vllm to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add uv install to the rest of test-extra.yml

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: adjust file perms on all install/run/test scripts

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add missing acclerate dependencies

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add some more missing dependencies to python backends

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: parler tests venv py dir fix

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: correct filename for transformers-musicgen tests

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: adjust the pwd for valle tests

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: cleanup and optimization work for uv migration

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add setuptools to requirements-install for mamba

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: more size optimization work

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: make installs and tests more consistent, cleanup some deps

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: cleanup

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: mamba backend is cublas only

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: uncomment lines in makefile

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

---------

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-05-10 15:08:08 +02:00
Ettore Di Giacinto
530bec9c64
feat(llama.cpp): do not specify backends to autoload and add llama.cpp variants (#2232)
* feat(initializer): do not specify backends to autoload

We can simply try to autoload the backends extracted in the asset dir.
This will allow to build variants of the same backend (for e.g. with different instructions sets),
so to have a single binary for all the variants.

Signed-off-by: mudler <mudler@localai.io>

* refactor(prepare): refactor out llama.cpp prepare steps

Make it so are idempotent and that we can re-build

Signed-off-by: mudler <mudler@localai.io>

* [TEST] feat(build): build noavx version along

Signed-off-by: mudler <mudler@localai.io>

* build: make build parallel

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* build: do not override CMAKE_ARGS

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* build: add fallback variant

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(huggingface-langchain): fail if no token is set

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(huggingface-langchain): rename

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: do not autoload local-store

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: give priority between the listed backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: mudler <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-04 17:56:12 +02:00
Ettore Di Giacinto
e843d7df0e
feat(grpc): return consumed token count and update response accordingly (#2035)
Fixes: #1920
2024-04-15 19:47:11 +02:00
cryptk
a8ebf6f575
fix: respect concurrency from parent build parameters when building GRPC (#2023)
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-04-13 09:14:32 +02:00
Dave
ed5734ae25
test/fix: OSX Test Repair (#1843)
* test with gguf instead of ggml. Updates testPrompt to match? Adds debugging line to Dockerfile that I've found helpful recently.

* fix testPrompt slightly

* Sad Experiment: Test GH runner without metal?

* break apart CGO_LDFLAGS

* switch runner

* upstream llama.cpp disables Metal on Github CI!

* missed a dir from clean-tests

* CGO_LDFLAGS

* tmate failure + NO_ACCELERATE

* whisper.cpp has a metal fix

* do the exact opposite of the name of this branch, but keep it around for unrelated fixes?

* add back newlines

* add tmate to linux for testing

* update fixtures

* timeout for tmate
2024-03-18 19:19:43 +01:00
Ettore Di Giacinto
fa9e330fc6
fix(llama.cpp): fix eos without cache (#1852) 2024-03-18 18:59:24 +01:00
cryptk
020ce29cd8
fix(make): allow to parallelize jobs (#1845)
* fix: clean up Makefile dependencies to allow for parallel builds

* refactor: remove old unused backend from Makefile

* fix: finish removing legacy backend, update piper

* fix: I broke llama... I fixed llama

* feat: give the tests and builds a few threads

* fix: ensure libraries are replaced before build, add dropreplace target

* Fix image build workflows
2024-03-17 15:39:20 +01:00
Dave
db199f61da
fix: osx build default.metallib (#1837)
fix: osx build default.metallib (#1837)
* port osx fix from refactor pr to slim pr
* manually bump llama.cpp version to unstick CI?
2024-03-15 08:18:58 +00:00
Dave
45d520f913
fix: OSX Build Files for llama.cpp (#1836)
bot ate my changes, seperate branch
2024-03-14 23:07:47 +01:00
cryptk
b423af001d
fix: the correct BUILD_TYPE for OpenCL is clblas (with no t) (#1828) 2024-03-14 08:39:21 +01:00
Ettore Di Giacinto
bc5f5aa538
deps(llama.cpp): update (#1759)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-02-26 13:18:44 +01:00
Ettore Di Giacinto
8292781045
deps(llama.cpp): update, support Gemma models (#1734)
deps(llama.cpp): update

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-02-21 17:23:38 +01:00
Ettore Di Giacinto
54ec6348fa
deps(llama.cpp): update (#1714)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-02-21 11:35:44 +01:00
Ettore Di Giacinto
c56b6ddb1c
fix(llama.cpp): disable infinite context shifting (#1704)
Infinite context loop might as well trigger an infinite loop of context
shifting if the model hallucinates and does not stop answering.
This has the unpleasant effect that the predicion never terminates,
which is the case especially on small models which tends to hallucinate.

Workarounds https://github.com/mudler/LocalAI/issues/1333 by removing
context-shifting.

See also upstream issue: https://github.com/ggerganov/llama.cpp/issues/3969
2024-02-13 21:17:21 +01:00
Ettore Di Giacinto
1c57f8d077
feat(sycl): Add support for Intel GPUs with sycl (#1647) (#1660)
* feat(sycl): Add sycl support (#1647)

* onekit: install without prompts

* set cmake args only in grpc-server

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* cleanup

* fixup sycl source env

* Cleanup docs

* ci: runs on self-hosted

* fix typo

* bump llama.cpp

* llama.cpp: update server

* adapt to upstream changes

* adapt to upstream changes

* docs: add sycl

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-02-01 19:21:52 +01:00
Ettore Di Giacinto
697c769b64
fix(llama.cpp): enable cont batching when parallel is set (#1622)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-01-21 14:59:48 +01:00
Sebastian
eaf85a30f9
fix(llama.cpp): Enable parallel requests (#1616)
integrate changes from llama.cpp

Signed-off-by: Sebastian <tauven@gmail.com>
2024-01-21 09:56:14 +01:00
Dionysius
441e2965ff
move BUILD_GRPC_FOR_BACKEND_LLAMA logic to makefile: errors in this section now immediately fail the build (#1576)
* move BUILD_GRPC_FOR_BACKEND_LLAMA option to makefile

* review: oversight, fixup cmake_args

Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: Dionysius <1341084+dionysius@users.noreply.github.com>

---------

Signed-off-by: Dionysius <1341084+dionysius@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-01-13 10:08:26 +01:00
Ettore Di Giacinto
fd48cb6506
deps(llama.cpp): update and sync grpc server (#1527)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-01-01 14:39:31 +01:00
Chris Natale
e2311a145c
Fix: Set proper Homebrew install location for x86 Macs (#1510)
* set proper Homebrew install location for x86 Macs

* fix: remove prior conditional that my logic replaces
2023-12-30 12:37:26 +01:00
Ettore Di Giacinto
fb6a5bc620
update(llama.cpp): update server, correctly propagate LLAMA_VERSION (#1440)
* fix(Makefile): correctly propagate LLAMA_VERSION

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* update grpc-server.cpp

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2023-12-15 08:26:48 +01:00
Ettore Di Giacinto
ad0e30bca5
refactor: move backends into the backends directory (#1279)
* refactor: move backends into the backends directory

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor: move main close to implementation for every backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-11-13 22:40:16 +01:00
Ettore Di Giacinto
803a0ac02a
feat(llama.cpp): support lora with scale and yarn (#1277)
* feat(llama.cpp): support lora with scale

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(llama.cpp): support yarn

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-11-11 18:40:48 +01:00
Ettore Di Giacinto
0eae727366
🔥 add LaVA support and GPT vision API, Multiple requests for llama.cpp, return JSON types (#1254)
* wip

* wip

* Make it functional

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* wip

* Small fixups

* do not inject space on role encoding, encode img at beginning of messages

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add examples/config defaults

* Add include dir of current source dir

* cleanup

* fixes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

* Revert "fixups"

This reverts commit f1a4731cca.

* fixes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-11-11 13:14:59 +01:00
Diego
e7fa2e06f8
Fixes the bug 1196 (#1232)
* Current state of the branch.

* Now gRPC is build only when the BUILD_GRPC_FOR_BACKEND_LLAMA variable is defined.

* Now the local compilation of gRPC is executed on BUILD_GRPC_FOR_BACKEND_LLAMA.

* Revised the Makefile.

* Removed replace directives in go.mod.

---------

Signed-off-by: Diego <38375572+diego-minguzzi@users.noreply.github.com>
Co-authored-by: lunamidori5 <118759930+lunamidori5@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2023-11-06 19:07:46 +01:00
Ettore Di Giacinto
f227e918f9
feat(llama.cpp): Bump llama.cpp, adapt grpc server (#1211)
* feat(llama.cpp): Bump llama.cpp, adapt grpc server

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci: fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-10-25 20:56:25 +02:00
Dave
b839eb80a1
Fix backend/cpp/llama CMakeList.txt on OSX (#1212)
* Fix backend/cpp/llama CMakeList.txt on OSX - detect OSX and use homebrew libraries

* sneak a logging fix in too for gallery debugging

* additional logging
2023-10-25 20:53:26 +02:00
Ettore Di Giacinto
004baaa30f feat(llama.cpp): update
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-10-21 11:04:03 +02:00
Ettore Di Giacinto
128694213f
feat: llama.cpp gRPC C++ backend (#1170)
* wip: llama.cpp c++ gRPC server

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* make it work, attach it to the build process

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* update deps

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: add protobuf dep

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* try fix protobuf on cmake

* cmake: workarounds

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add packages

* cmake: use fixed version of grpc

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* cmake(grpc): install locally

* install grpc

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* install required deps for grpc on debian bullseye

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* debug

* debug

* Fixups

* no need to install cmake manually

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci: fixup macOS

* use brew whenever possible

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* macOS fixups

* debug

* fix container build

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* workaround

* try mac

https://stackoverflow.com/questions/23905661/on-mac-g-clang-fails-to-search-usr-local-include-and-usr-local-lib-by-def

* Disable temp. arm64 docker image builds

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-10-16 21:46:29 +02:00