Commit graph

478 commits

Author SHA1 Message Date
LocalAI [bot]
a7e155240b
chore: ⬆️ Update ggml-org/llama.cpp to e57f52334b2e8436a94f7e332462dfc63a08f995 (#7848)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-04 10:27:45 +01:00
coffeerunhobby
666d110714
fix: Prevent BMI2 instruction crash on AVX-only CPUs (#7817)
* Fix: Prevent BMI2 instruction crash on AVX-only CPUs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: apply no-bmi flags on non-darwin

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: coffeerunhobby <coffeerunhobby@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-03 08:36:55 +01:00
LocalAI [bot]
641606ae93
chore: ⬆️ Update ggml-org/llama.cpp to 706e3f93a60109a40f1224eaf4af0d59caa7c3ae (#7836)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-02 21:26:37 +00:00
Ettore Di Giacinto
5f6c941399
fix(llama.cpp/mmproj): fix loading mmproj in nested sub-dirs different from model path (#7832)
fix(mmproj): fix loading mmproj in nested sub-dirs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-02 20:17:30 +01:00
LocalAI [bot]
949de04052
chore: ⬆️ Update ggml-org/llama.cpp to ced765be44ce173c374f295b3c6f4175f8fd109b (#7822)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-02 08:44:49 +01:00
LocalAI [bot]
bc3e8793ed
chore: ⬆️ Update ggml-org/llama.cpp to 13814eb370d2f0b70e1830cc577b6155b17aee47 (#7809)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-31 23:04:01 +01:00
LocalAI [bot]
218f3a126a
chore: ⬆️ Update ggml-org/llama.cpp to 0f89d2ecf14270f45f43c442e90ae433fd82dab1 (#7795)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-31 08:53:41 +01:00
LocalAI [bot]
bc8ec5cb39
chore: ⬆️ Update ggml-org/llama.cpp to c9a3b40d6578f2381a1373d10249403d58c3c5bd (#7778)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-30 08:27:16 +01:00
LocalAI [bot]
1a6fd0f7fc
chore: ⬆️ Update ggml-org/llama.cpp to 4ffc47cb2001e7d523f9ff525335bbe34b1a2858 (#7760)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-28 21:10:39 +00:00
LocalAI [bot]
c95c482f36
chore: ⬆️ Update ggml-org/llama.cpp to a4bf35889eda36d3597cd0f8f333f5b8a2fcaefc (#7751)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-27 21:09:12 +00:00
LocalAI [bot]
ddf0281785
chore: ⬆️ Update ggml-org/llama.cpp to 7ac8902133da6eb390c4d8368a7d252279123942 (#7740)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-26 21:44:34 +00:00
LocalAI [bot]
86c68c9623
chore: ⬆️ Update ggml-org/llama.cpp to 85c40c9b02941ebf1add1469af75f1796d513ef4 (#7731)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-25 21:10:28 +00:00
LocalAI [bot]
2fe6e278c8
chore: ⬆️ Update ggml-org/llama.cpp to c18428423018ed214c004e6ecaedb0cbdda06805 (#7718)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-25 10:00:40 +01:00
Ettore Di Giacinto
0a168830ea
chore(deps): Bump llama.cpp to '5b6c9bc0f3c8f55598b9999b65aff7ce4119bc15' and refactor usage of base params (#7706)
* chore(deps): Bump llama.cpp to '5b6c9bc0f3c8f55598b9999b65aff7ce4119bc15' and refactor usage of base params

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: update AGENTS.md

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-24 00:28:27 +01:00
Ettore Di Giacinto
fc6057a952
chore(deps): bump llama.cpp to '0e1ccf15c7b6d05c720551b537857ecf6194d420' (#7684)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-22 09:50:42 +01:00
LocalAI [bot]
38cde81ff4
chore: ⬆️ Update ggml-org/llama.cpp to 52ab19df633f3de5d4db171a16f2d9edd2342fec (#7665)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-20 21:09:15 +00:00
LocalAI [bot]
626057bcca
chore: ⬆️ Update ggml-org/llama.cpp to ce734a8a2f9fb6eb4f0383ab1370a1b0014ab787 (#7654)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-19 21:15:39 +00:00
LocalAI [bot]
f25ac00bca
chore: ⬆️ Update ggml-org/llama.cpp to f9ec8858edea4a0ecfea149d6815ebfb5ecc3bcd (#7642)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-18 21:17:14 +00:00
LocalAI [bot]
5515119a7e
chore: ⬆️ Update ggml-org/llama.cpp to d37fc935059211454e9ad2e2a44e8ed78fd6d1ce (#7629)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-18 09:07:09 +01:00
LocalAI [bot]
14bb65b57b
chore: ⬆️ Update ggml-org/llama.cpp to ef83fb8601229ff650d952985be47e82d644bfaa (#7611)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-17 08:32:42 +01:00
Ettore Di Giacinto
2387b266d8
chore(llama.cpp): Add Missing llama.cpp Options to gRPC Server (#7584)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-15 21:55:20 +01:00
LocalAI [bot]
0f5cc4c07b
chore: ⬆️ Update ggml-org/llama.cpp to 5c8a717128cc98aa9e5b1c44652f5cf458fd426e (#7573)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-14 22:21:54 +01:00
LocalAI [bot]
3e4e6777d8
chore: ⬆️ Update ggml-org/llama.cpp to 5266379bcae74214af397f36aa81b2a08b15d545 (#7563)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-14 11:41:10 +01:00
Simon Redman
5de539ab07
fix(7355): Update llama-cpp grpc for v3 interface (#7566)
* fix(7355): Update llama-cpp grpc for v3 interface

Signed-off-by: Simon Redman <simon@ergotech.com>

* feat(llama-gprc): Trim whitespace from servers list

Signed-off-by: Simon Redman <simon@ergotech.com>

* Trim trailing spaces in grpc-server.cpp

Signed-off-by: Simon Redman <simon@ergotech.com>

---------

Signed-off-by: Simon Redman <simon@ergotech.com>
2025-12-14 11:40:33 +01:00
Ettore Di Giacinto
0b130fb811
fix(llama.cpp): handle corner cases with tool array content (#7528)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-12 08:15:45 +01:00
LocalAI [bot]
0771a2d3ec
chore: ⬆️ Update ggml-org/llama.cpp to a81a569577cc38b32558958b048228150be63eae (#7529)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-11 21:55:44 +00:00
LocalAI [bot]
72621a1d1c
chore: ⬆️ Update ggml-org/llama.cpp to 4dff236a522bd0ed949331d6cb1ee2a1b3615c35 (#7508)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-11 08:15:38 +01:00
LocalAI [bot]
ef44ace73f
chore: ⬆️ Update ggml-org/llama.cpp to 086a63e3a5d2dbbb7183a74db453459e544eb55a (#7496)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-10 12:05:13 +01:00
Ettore Di Giacinto
74ee1463fe
chore(deps/llama-cpp): bump to '2fa51c19b028180b35d316e9ed06f5f0f7ada2c1' (#7484)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-09 15:41:37 +01:00
LocalAI [bot]
5610384d8a
chore: ⬆️ Update ggml-org/llama.cpp to db97837385edfbc772230debbd49e5efae843a71 (#7447)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-07 08:32:35 +01:00
LocalAI [bot]
edf7141b9b
chore: ⬆️ Update ggml-org/llama.cpp to 8160b38a5fa8a25490ca33ffdd200cda51405688 (#7438)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-06 13:35:24 +01:00
Ettore Di Giacinto
024aa6a55b
chore(deps): bump llama.cpp to 'bde188d60f58012ada0725c6dd5ba7c69fe4dd87' (#7434)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-05 00:17:35 +01:00
LocalAI [bot]
ca2e878aaf
chore: ⬆️ Update ggml-org/llama.cpp to e9f9483464e6f01d843d7f0293bd9c7bc6b2221c (#7421)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-04 11:54:01 +01:00
LocalAI [bot]
957eea3da3
chore: ⬆️ Update ggml-org/llama.cpp to 61bde8e21f4a1f9a98c9205831ca3e55457b4c78 (#7415)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-03 16:27:12 +01:00
LocalAI [bot]
665441ca94
chore: ⬆️ Update ggml-org/llama.cpp to ec18edfcba94dacb166e6523612fc0129cead67a (#7406)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-02 07:59:52 +01:00
Ettore Di Giacinto
e3bcba5c45
chore: ⬆️ Update ggml-org/llama.cpp to 7f8ef50cce40e3e7e4526a3696cb45658190e69a (#7402)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-01 07:50:40 +01:00
LocalAI [bot]
0824fd8efd
chore: ⬆️ Update ggml-org/llama.cpp to 8c32d9d96d9ae345a0150cae8572859e9aafea0b (#7395)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-30 09:06:18 +01:00
Ettore Di Giacinto
468ac608f3
chore(deps): bump llama.cpp to 'd82b7a7c1d73c0674698d9601b1bbb0200933f29' (#7392)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-29 08:58:07 +01:00
LocalAI [bot]
1a53fd2b9b
chore: ⬆️ Update ggml-org/llama.cpp to 4abef75f2cf2eee75eb5083b30a94cf981587394 (#7382)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-28 00:08:27 +01:00
LocalAI [bot]
b5f4f4ac6d
chore: ⬆️ Update ggml-org/llama.cpp to eec1e33a9ed71b79422e39cc489719cf4f8e0777 (#7363)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-27 09:17:25 +01:00
Ettore Di Giacinto
7a94d237c4
chore(deps): bump llama.cpp to '583cb83416467e8abf9b37349dcf1f6a0083745a (#7358)
chore(deps): bump llama.cpp to '583cb83416467e8abf9b37349dcf1f6a0083745a'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-26 08:23:21 +01:00
LocalAI [bot]
f6d2a52cd5
chore: ⬆️ Update ggml-org/llama.cpp to 0c7220db56525d40177fcce3baa0d083448ec813 (#7337)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-24 09:11:38 +01:00
LocalAI [bot]
05a00b2399
chore: ⬆️ Update ggml-org/llama.cpp to 3f3a4fb9c3b907c68598363b204e6f58f4757c8c (#7336)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-22 21:53:40 +00:00
LocalAI [bot]
bdfe8431fa
chore: ⬆️ Update ggml-org/llama.cpp to 23bc779a6e58762ea892eca1801b2ea1b9050c00 (#7331)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-22 08:44:01 +01:00
Ettore Di Giacinto
e88db7d142
fix(llama.cpp): handle corner cases with tool content (#7324)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-21 09:21:49 +01:00
LocalAI [bot]
b7b8a0a748
chore: ⬆️ Update ggml-org/llama.cpp to dd0f3219419b24740864b5343958a97e1b3e4b26 (#7322)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-21 08:11:47 +01:00
LocalAI [bot]
bfa07df7cd
chore: ⬆️ Update ggml-org/llama.cpp to 7d77f07325985c03a91fa371d0a68ef88a91ec7f (#7314)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-20 07:58:42 +01:00
Ettore Di Giacinto
3152611184
chore(deps): bump llama.cpp to '10e9780154365b191fb43ca4830659ef12def80f (#7311)
chore(deps): bump llama.cpp to '10e9780154365b191fb43ca4830659ef12def80f'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-19 14:42:11 +01:00
LocalAI [bot]
4278506876
chore: ⬆️ Update ggml-org/llama.cpp to cb623de3fc61011e5062522b4d05721a22f2e916 (#7301)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-18 07:43:57 +01:00
LocalAI [bot]
fb834805db
chore: ⬆️ Update ggml-org/llama.cpp to 80deff3648b93727422461c41c7279ef1dac7452 (#7287)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-17 07:51:08 +01:00
Ettore Di Giacinto
d7f9f3ac93
feat: add support to logitbias and logprobs (#7283)
* feat: add support to logprobs in results

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: add support to logitbias

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-16 13:27:36 +01:00
LocalAI [bot]
d1a0dd10e6
chore: ⬆️ Update ggml-org/llama.cpp to 662192e1dcd224bc25759aadd0190577524c6a66 (#7277)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-16 08:41:12 +01:00
LocalAI [bot]
a09d49da43
chore: ⬆️ Update ggml-org/llama.cpp to 9b17d74ab7d31cb7d15ee7eec1616c3d825a84c0 (#7273)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-15 00:05:39 +01:00
Ettore Di Giacinto
03e9f4b140
fix: handle tool errors (#7271)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-14 17:23:56 +01:00
Ettore Di Giacinto
7129409bf6
chore(deps): bump llama.cpp to c4abcb2457217198efdd67d02675f5fddb7071c2 (#7266)
* chore(deps): bump llama.cpp to '92bb442ad999a0d52df0af2730cd861012e8ac5c'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* DEBUG

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Bump

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test/debug

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Revert "DEBUG"

This reverts commit 2501ca3ff242076d623c13c86b3d6afcec426281.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-14 12:16:52 +01:00
Ettore Di Giacinto
3728552e94
feat: import models via URI (#7245)
* feat: initial hook to install elements directly

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* WIP: ui changes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Move HF api client to pkg

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add simple importer for gguf files

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add opcache

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* wire importers to CLI

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add omitempty to config fields

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fix tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add MLX importer

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Small refactors to star to use HF for discovery

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Common preferences

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add support to bare HF repos

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(importer/llama.cpp): add support for mmproj files

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add mmproj quants to common preferences

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fix vlm usage in tokenizer mode with llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-12 20:48:56 +01:00
Mikhail Khludnev
04fe0b0da8
fix(reranker): llama-cpp sort score desc, crop top_n (#7211)
Signed-off-by: Mikhail Khludnev <mkhl@apache.org>
2025-11-12 09:13:01 +01:00
LocalAI [bot]
fae93e5ba2
chore: ⬆️ Update ggml-org/llama.cpp to 7d019cff744b73084b15ca81ba9916f3efab1223 (#7247)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-11 21:31:01 +00:00
LocalAI [bot]
5f4663252d
chore: ⬆️ Update ggml-org/llama.cpp to 13730c183b9e1a32c09bf132b5367697d6c55048 (#7232)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-11 00:03:01 +01:00
LocalAI [bot]
e42f0f7e79
chore: ⬆️ Update ggml-org/llama.cpp to b8595b16e69e3029e06be3b8f6635f9812b2bc3f (#7210)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-09 23:56:27 +01:00
Ettore Di Giacinto
679d43c2f5
feat: respect context and add request cancellation (#7187)
* feat: respect context

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* workaround fasthttp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): allow to abort call

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Refactor

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: improving error

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Respect context also with MCP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Tie to both contexts

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Make detection more robust

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-09 18:19:19 +01:00
LocalAI [bot]
f678c6b0a9
chore: ⬆️ Update ggml-org/llama.cpp to 333f2595a3e0e4c0abf233f2f29ef1710acd134d (#7201)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-08 21:06:17 +00:00
LocalAI [bot]
8ac7e28c12
chore: ⬆️ Update ggml-org/llama.cpp to 65156105069fa86a4a81b6cb0e8cb583f6420677 (#7184)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-08 09:07:44 +01:00
Ettore Di Giacinto
02cc8cbcaa
feat(llama.cpp): consolidate options and respect tokenizer template when enabled (#7120)
* feat(llama.cpp): expose env vars as options for consistency

This allows to configure everything in the YAML file of the model rather
than have global configurations

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(llama.cpp): respect usetokenizertemplate and use llama.cpp templating system to process messages

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* WIP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Detect template exists if use tokenizer template is enabled

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Better recognization of chat

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixes to support tool calls while using templates from tokenizer

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop template guessing, fix passing tools to tokenizer

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Extract grammar and other options from chat template, add schema struct

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* WIP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* WIP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Automatically set use_jinja

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Cleanups, identify by default gguf models for chat

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Update docs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-07 21:23:50 +01:00
LocalAI [bot]
8f7c499f17
chore: ⬆️ Update ggml-org/llama.cpp to 7f09a680af6e0ef612de81018e1d19c19b8651e8 (#7156)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-07 08:38:56 +01:00
LocalAI [bot]
db9957b94e
chore: ⬆️ Update ggml-org/llama.cpp to a44d77126c911d105f7f800c17da21b2a5b112d1 (#7125)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-05 21:22:04 +00:00
LocalAI [bot]
98158881c2
chore: ⬆️ Update ggml-org/llama.cpp to ad51c0a720062a04349c779aae301ad65ca4c856 (#7098)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-04 21:19:58 +00:00
LocalAI [bot]
e2cb44ef37
chore: ⬆️ Update ggml-org/llama.cpp to c5023daf607c578d6344c628eb7da18ac3d92d32 (#7069)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-04 09:26:10 +01:00
LocalAI [bot]
2cad2c8591
chore: ⬆️ Update ggml-org/llama.cpp to cd5e3b57541ecc52421130742f4d89acbcf77cd4 (#7023)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-02 21:24:19 +00:00
Ettore Di Giacinto
424acd66ad
feat(llama.cpp): allow to set cache-ram and ctx_shift (#7009)
* feat(llama.cpp): allow to set cache-ram and ctx_shift

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-02 17:33:29 +01:00
LocalAI [bot]
f85e2dd1b8
chore: ⬆️ Update ggml-org/llama.cpp to 2f68ce7cfd20e9e7098514bf730e5389b7bba908 (#6998)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-02 09:44:37 +01:00
LocalAI [bot]
9ecfdc5938
chore: ⬆️ Update ggml-org/llama.cpp to 31c511a968348281e11d590446bb815048a1e912 (#6970)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-31 21:04:53 +00:00
LocalAI [bot]
0ddb2e8dcf
chore: ⬆️ Update ggml-org/llama.cpp to 4146d6a1a6228711a487a1e3e9ddd120f8d027d7 (#6945)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-31 14:51:03 +00:00
LocalAI [bot]
1e5b9135df
chore: ⬆️ Update ggml-org/llama.cpp to 16724b5b6836a2d4b8936a5824d2ff27c52b4517 (#6925)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-30 21:07:33 +00:00
LocalAI [bot]
dd21a0d2f9
chore: ⬆️ Update ggml-org/llama.cpp to 3464bdac37027c5e9661621fc75ffcef3c19c6ef (#6896)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-30 14:17:58 +01:00
LocalAI [bot]
fb825a2708
chore: ⬆️ Update ggml-org/llama.cpp to 851553ea6b24cb39fd5fd188b437d777cb411de8 (#6869)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-29 08:16:55 +01:00
LocalAI [bot]
e13cb8346d
chore: ⬆️ Update ggml-org/llama.cpp to 5a4ff43e7dd049e35942bc3d12361dab2f155544 (#6841)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-28 08:48:21 +01:00
LocalAI [bot]
8225697139
chore: ⬆️ Update ggml-org/llama.cpp to bbac6a26b2bd7f7c1f0831cb1e7b52734c66673b (#6783)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-27 08:45:14 +01:00
LocalAI [bot]
192589a17f
chore: ⬆️ Update ggml-org/llama.cpp to 5d195f17bc60eacc15cfb929f9403cf29ccdf419 (#6757)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-25 21:14:43 +00:00
LocalAI [bot]
ed4ac0b61e
chore: ⬆️ Update ggml-org/llama.cpp to 55945d2ef51b93821d4b6f4a9b994393344a90db (#6729)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-24 21:11:56 +00:00
LocalAI [bot]
b66bd2706f
chore: ⬆️ Update ggml-org/llama.cpp to 0bf47a1dbba4d36f2aff4e8c34b06210ba34e688 (#6703)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-23 21:10:51 +00:00
Chakib Benziane
32c0ab3a7f
fix: properly terminate llama.cpp kv_overrides array with empty key + updated doc (#6672)
* fix: properly terminate kv_overrides array with empty key

The llama model loading function expects KV overrides to be terminated
with an empty key (key[0] == 0). Previously, the kv_overrides vector was
not being properly terminated, causing an assertion failure.

This commit ensures that after parsing all KV override strings, we add a
final terminating entry with an empty key to satisfy the C-style array
termination requirement. This fixes the assertion error and allows the
model to load correctly with custom KV overrides.

Fixes #6643

- Also included a reference to the usage of the `overrides` option in
  the advanced-usage section.

Signed-off-by: blob42 <contact@blob42.xyz>

* doc: document the `overrides` option

---------

Signed-off-by: blob42 <contact@blob42.xyz>
2025-10-23 09:31:55 +02:00
LocalAI [bot]
24ce79a67c
chore: ⬆️ Update ggml-org/llama.cpp to a2e0088d9242bd9e57f8b852b05a6e47843b5a45 (#6676)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-22 21:05:27 +00:00
LocalAI [bot]
7a3d9ee5c1
chore: ⬆️ Update ggml-org/llama.cpp to 03792ad93609fc67e41041c6347d9aa14e5e0d74 (#6651)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-21 21:15:27 +00:00
LocalAI [bot]
4b30846d57
chore: ⬆️ Update ggml-org/llama.cpp to 84bf3c677857279037adf67cdcfd89eaa4ca9281 (#6621)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-21 09:22:03 +02:00
LocalAI [bot]
69adc46936
chore: ⬆️ Update ggml-org/llama.cpp to cec5edbcaec69bbf6d5851cabce4ac148be41701 (#6576)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-19 21:31:47 +00:00
LocalAI [bot]
f94b89c1b5
chore: ⬆️ Update ggml-org/llama.cpp to ee09828cb057460b369576410601a3a09279e23c (#6550)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-18 21:09:46 +00:00
LocalAI [bot]
cce185b345
chore: ⬆️ Update ggml-org/llama.cpp to 66b0dbcb2d462e7b70ba5a69ee8c3899ac2efb1c (#6520)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-17 21:14:57 +00:00
LocalAI [bot]
7bac49fb87
chore: ⬆️ Update ggml-org/llama.cpp to 1bb4f43380944e94c9a86e305789ba103f5e62bd (#6488)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-17 09:01:11 +02:00
LocalAI [bot]
9680a0b0fe
chore: ⬆️ Update ggml-org/llama.cpp to 466c1911ab736f0b7366127edee99f8ee5687417 (#6463)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-15 23:21:35 +02:00
LocalAI [bot]
7ed3666d2e
chore: ⬆️ Update ggml-org/llama.cpp to fa882fd2b1bcb663de23af06fdc391489d05b007 (#6454)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-14 21:08:17 +00:00
LocalAI [bot]
2e2e89e499
chore: ⬆️ Update ggml-org/llama.cpp to e60f241eacec42d3bd7c9edd37d236ebf35132a8 (#6452)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-14 09:06:39 +02:00
LocalAI [bot]
3a8fbb698e
chore: ⬆️ Update ggml-org/llama.cpp to a31cf36ad946a13b3a646bf0dadf2a481e89f944 (#6440)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-13 07:54:03 +02:00
LocalAI [bot]
c856d7dc73
chore: ⬆️ Update ggml-org/llama.cpp to 11f0af5504252e453d57406a935480c909e3ff37 (#6437)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-12 09:02:31 +02:00
LocalAI [bot]
fa6bbd9fa2
chore: ⬆️ Update ggml-org/llama.cpp to e60f01d941bc5b7fae62dd57fee4cec76ec0ea6e (#6434)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-11 09:30:48 +02:00
Ettore Di Giacinto
cd1e1124ea
fix(llama.cpp): correctly set grammar triggers (#6432)
* fix(llama.cpp): correctly set grammar triggers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Do not enable lazy by default

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-10-10 19:50:17 +02:00
Ettore Di Giacinto
791bc769c1
chore(deps): bump llama.cpp to '1deee0f8d494981c32597dca8b5f8696d399b0f2' (#6421)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-10-10 09:51:22 +02:00
LocalAI [bot]
336257cc3c
chore: ⬆️ Update ggml-org/llama.cpp to 9d0882840e6c3fb62965d03af0e22880ea90e012 (#6410)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-09 08:17:10 +02:00
LocalAI [bot]
5e1d809904
chore: ⬆️ Update ggml-org/llama.cpp to aeaf8a36f06b5810f5ae4bbefe26edb33925cf5e (#6408)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-08 08:01:08 +02:00
LocalAI [bot]
6f17c260a7
chore: ⬆️ Update ggml-org/llama.cpp to 3df2244df40c67dfd6ad548b40ccc507a066af2b (#6401)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-07 08:44:02 +02:00
LocalAI [bot]
d4d42740c8
chore: ⬆️ Update ggml-org/llama.cpp to ca71fb9b368e3db96e028f80c4c9df6b6b370edd (#6385)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-06 08:24:38 +02:00
LocalAI [bot]
6b2c8277c2
chore: ⬆️ Update ggml-org/llama.cpp to 86df2c9ae4f2f1ee63d2558a9dc797b98524639b (#6382)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-05 08:52:24 +02:00
LocalAI [bot]
6d5d3ebcf6
chore: ⬆️ Update ggml-org/llama.cpp to 128d522c04286e019666bd6ee4d18e3fbf8772e2 (#6379)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-04 19:00:50 +02:00
LocalAI [bot]
dd927c36f6
chore: ⬆️ Update ggml-org/llama.cpp to d64c8104f090b27b1f99e8da5995ffcfa6b726e2 (#6371)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-10-02 21:09:00 +00:00
LocalAI [bot]
052f42e926
chore: ⬆️ Update ggml-org/llama.cpp to 1fe4e38cc20af058ed320bd46cac934991190056 (#6368)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-10-02 16:29:57 +02:00
LocalAI [bot]
04fecd634a
chore: ⬆️ Update ggml-org/llama.cpp to b2ba81dbe07b6dbea9c96b13346c66973dede32c (#6366)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-30 21:13:23 +00:00
LocalAI [bot]
33c14198db
chore: ⬆️ Update ggml-org/llama.cpp to 5f7e166cbf7b9ca928c7fad990098ef32358ac75 (#6355)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-30 14:41:16 +02:00
LocalAI [bot]
dca685f784
chore: ⬆️ Update ggml-org/llama.cpp to bd0af02fc96c2057726f33c0f0daf7bb8f3e462a (#6352)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-28 21:08:50 +00:00
LocalAI [bot]
84ebf2a2c9
chore: ⬆️ Update ggml-org/llama.cpp to 4807e8f96a61b2adccebd5e57444c94d18de7264 (#6350)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-28 00:33:46 +02:00
Ettore Di Giacinto
ce5662ba90
chore(deps): bump llama.cpp to '72b24d96c6888c609d562779a23787304ae4609c' (#6349)
* chore(deps): bump llama.cpp to '72b24d96c6888c609d562779a23787304ae4609c'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Disable OPENSSL (just introduced upstream)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-09-27 13:55:51 +02:00
Ettore Di Giacinto
9878f27813
chore(deps): bump llama.cpp to '835b2b915c52bcabcd688d025eacff9a07b65f52' (#6347)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-09-26 23:26:14 +02:00
jongames
f2b9452ec4
fix: reranking models limited to 512 tokens in llama.cpp backend (#6344)
Fix reranking models being limited to 512 tokens input in llama.cpp backend

Signed-off-by: JonGames <18472148+jongames@users.noreply.github.com>
2025-09-25 23:32:07 +00:00
LocalAI [bot]
238c68c57b
chore: ⬆️ Update ggml-org/llama.cpp to 4ae88d07d026e66b41e85afece74e88af54f4e66 (#6339)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-25 08:47:02 +02:00
LocalAI [bot]
737248256e
chore: ⬆️ Update ggml-org/llama.cpp to 1d0125bcf1cbd7195ad0faf826a20bc7cec7d3f4 (#6335)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-22 21:13:34 +00:00
LocalAI [bot]
6afcb932b7
chore: ⬆️ Update ggml-org/llama.cpp to da30ab5f8696cabb2d4620cdc0aa41a298c54fd6 (#6321)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-21 21:28:27 +00:00
LocalAI [bot]
e74ade9ebb
chore: ⬆️ Update ggml-org/llama.cpp to 7f766929ca8e8e01dcceb1c526ee584f7e5e1408 (#6319)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-20 21:05:25 +00:00
LocalAI [bot]
75eb98f8bd
chore: ⬆️ Update ggml-org/llama.cpp to f432d8d83e7407073634c5e4fd81a3d23a10827f (#6316)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-20 09:41:45 +02:00
LocalAI [bot]
ae3d8fb0c4
chore: ⬆️ Update ggml-org/llama.cpp to 3edd87cd055a45d885fa914d879d36d33ecfc3e1 (#6308)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-18 21:09:14 +00:00
LocalAI [bot]
902e47f0b0
chore: ⬆️ Update ggml-org/llama.cpp to 0320ac5264279d74f8ee91bafa6c90e9ab9bbb91 (#6306)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-18 09:27:18 +02:00
LocalAI [bot]
e4ac7b14a3
chore: ⬆️ Update ggml-org/llama.cpp to 8ff206097c2bf3ca1c7aa95f9d6db779fc7bdd68 (#6292)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-16 21:09:47 +00:00
LocalAI [bot]
e89b5cc0e3
chore: ⬆️ Update ggml-org/llama.cpp to b907255f4bd169b0dc7dca9553b4c54af5170865 (#6287)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-16 08:10:37 +02:00
LocalAI [bot]
2a18206033
chore: ⬆️ Update ggml-org/llama.cpp to 6c019cb04e86e2dacfe62ce7666c64e9717dde1f (#6265)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-14 21:19:41 +00:00
LocalAI [bot]
39798d734e
chore: ⬆️ Update ggml-org/llama.cpp to 0fa154e3502e940df914f03b41475a2b80b985b0 (#6263)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-14 19:59:58 +00:00
Ettore Di Giacinto
6410c99bf2
fix(llama-cpp): correctly calculate embeddings (#6259)
* chore(tests): check embeddings differs in llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(llama.cpp): use the correct field for embedding

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(llama.cpp): use embedding type none

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(tests): add test-cases in aio-e2e suite

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-09-13 23:11:54 +02:00
LocalAI [bot]
55766d269b
chore: ⬆️ Update ggml-org/llama.cpp to aa0c461efe3603639af1a1defed2438d9c16ca0f (#6261)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-13 21:11:18 +00:00
LocalAI [bot]
623789a29e
chore: ⬆️ Update ggml-org/llama.cpp to 40be51152d4dc2d47444a4ed378285139859895b (#6260)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-12 21:10:39 +00:00
LocalAI [bot]
f8b71dc5d0
chore: ⬆️ Update ggml-org/llama.cpp to 0e6ff0046f4a2983b2c77950aa75960fe4b4f0e2 (#6235)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-11 21:21:49 +00:00
LocalAI [bot]
08432d49e5
chore: ⬆️ Update ggml-org/llama.cpp to 3976dfbe00f02a62c0deca32c46138e4f0ca81d8 (#6214)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-08 08:33:33 +02:00
LocalAI [bot]
59af928379
chore: ⬆️ Update ggml-org/llama.cpp to c4df49a42d396bdf7344501813e7de53bc9e7bb3 (#6209)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-06 21:05:07 +00:00
LocalAI [bot]
dbc2bb561b
chore: ⬆️ Update ggml-org/llama.cpp to 408ff524b40baf4f51a81d42a9828200dd4fcb6b (#6207)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-06 09:09:57 +02:00
LocalAI [bot]
1956681d4c
chore: ⬆️ Update ggml-org/llama.cpp to fb15d649ed14ab447eeab911e0c9d21e35fb243e (#6202)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-05 08:42:50 +02:00
LocalAI [bot]
9e6685ac9c
chore: ⬆️ Update ggml-org/llama.cpp to 0fce7a1248b74148c1eb0d368b7e18e8bcb96809 (#6193)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-04 07:35:28 +02:00
LocalAI [bot]
d82922786a
chore: ⬆️ Update ggml-org/llama.cpp to 3de008208b9b8a33f49f979097a99b4d59e6e521 (#6185)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-02 21:07:53 +00:00
LocalAI [bot]
4330fdce33
chore: ⬆️ Update ggml-org/llama.cpp to d4d8dbe383e8b9600cbe8b42016e3a4529b51219 (#6172)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-02 09:12:03 +02:00
LocalAI [bot]
969922ffec
chore: ⬆️ Update ggml-org/llama.cpp to e92d53b29e393fc4c0f9f1f7c3fe651be8d36faa (#6169)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-09-01 08:06:54 +00:00
Ettore Di Giacinto
739573e41b
feat(flash_attention): set auto for flash_attention in llama.cpp (#6168)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-08-31 17:59:09 +02:00
LocalAI [bot]
dbdf2908ad
chore: ⬆️ Update ggml-org/llama.cpp to 3d16b29c3bb1ec816ac0e782f20d169097063919 (#6165)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-29 21:14:03 +00:00
LocalAI [bot]
723f01c87e
chore: ⬆️ Update ggml-org/llama.cpp to c97dc093912ad014f6d22743ede0d4d7fd82365a (#6163)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-28 21:16:18 +00:00
LocalAI [bot]
6a4ab3c1e0
chore: ⬆️ Update ggml-org/llama.cpp to fbef0fad7a7c765939f6c9e322fa05cd52cf0c15 (#6155)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-27 21:09:34 +00:00
LocalAI [bot]
21faa4114b
chore: ⬆️ Update ggml-org/llama.cpp to 8b696861364360770e9f61a3422d32941a477824 (#6151)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-26 22:07:38 +00:00
LocalAI [bot]
0fc88b3cdf
chore: ⬆️ Update ggml-org/llama.cpp to c4e9239064a564de7b94ee2b401ae907235a8fca (#6139)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-26 12:18:58 +02:00
LocalAI [bot]
1a0d06f3db
chore: ⬆️ Update ggml-org/llama.cpp to 043fb27d3808766d8ea8195bbd12359727264402 (#6137)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-25 08:57:09 +02:00
LocalAI [bot]
057248008f
chore: ⬆️ Update ggml-org/llama.cpp to 710dfc465a68f7443b87d9f792cffba00ed739fe (#6126)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-24 08:41:39 +02:00
Ettore Di Giacinto
9f2c9cd691
feat(llama.cpp): Add gfx1201 support (#6125)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-08-23 23:06:01 +02:00
Ettore Di Giacinto
259383cf5e
chore(deps): bump llama.cpp to '45363632cbd593537d541e81b600242e0b3d47fc' (#6122)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-08-23 08:39:10 +02:00
LocalAI [bot]
6dccfb09f8
chore: ⬆️ Update ggml-org/llama.cpp to cd36b5e5c7fed2a3ac671dd542d579ca40b48b54 (#6118)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-22 07:57:27 +02:00
LocalAI [bot]
e4d9cf8349
chore: ⬆️ Update ggml-org/llama.cpp to 7a6e91ad26160dd6dfb33d29ac441617422f28e7 (#6116)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-20 21:05:39 +00:00
LocalAI [bot]
2e4dc6456f
chore: ⬆️ Update ggml-org/llama.cpp to fb22dd07a639e81c7415e30b146f545f1a2f2caf (#6112)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-20 09:01:36 +02:00
LocalAI [bot]
e44ff8514b
chore: ⬆️ Update ggml-org/llama.cpp to 6d7f1117e3e3285d0c5c11b5ebb0439e27920082 (#6088)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-19 08:09:49 +02:00
LocalAI [bot]
7920d75805
chore: ⬆️ Update ggml-org/llama.cpp to 21c17b5befc5f6be5992bc87fc1ba99d388561df (#6084)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-18 08:26:58 +00:00
LocalAI [bot]
9eed5ef872
chore: ⬆️ Update ggml-org/llama.cpp to 1fe00296f587dfca0957e006d146f5875b61e43d (#6079)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-16 21:10:03 +00:00
LocalAI [bot]
243e86176e
chore: ⬆️ Update ggml-org/llama.cpp to 5e6229a8409ac786e62cb133d09f1679a9aec13e (#6070)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-16 08:38:57 +02:00
Ettore Di Giacinto
22067e3384
chore(rocm): bump rocm image, add gfx1200 support (#6065)
Fixes: https://github.com/mudler/LocalAI/issues/6044

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-08-15 16:36:54 +02:00
Ettore Di Giacinto
4fbd639463 chore(ci): fixup builds for darwin and hipblas
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-08-15 15:58:02 +02:00
Ettore Di Giacinto
576e821298
chore(deps): bump llama.cpp to 'df36bce667bf14f8e538645547754386f9516326 (#6062)
chore(deps): bump llama.cpp to 'df36bce667bf14f8e538645547754386f9516326'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-08-15 13:28:15 +02:00
Ettore Di Giacinto
8ab51509cc
Update Makefile
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-08-15 08:33:25 +02:00
Ettore Di Giacinto
b3384e5428
Update Makefile
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-08-15 08:08:24 +02:00
Ettore Di Giacinto
253b7537dc
fix(llama-cpp/darwin): make sure to bundle libutf8 libs (#6060)
fix(darwin): make sure to bundle libutf8_validity

Plus some refactoring, use makefile

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-08-14 17:56:35 +02:00
Ettore Di Giacinto
bf60ca5bf0
Update Makefile
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-08-14 11:53:43 +02:00
LocalAI [bot]
2b44467bd1
chore: ⬆️ Update ggml-org/llama.cpp to 29c8fbe4e05fd23c44950d0958299e25fbeabc5c (#6054)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-14 09:19:15 +02:00
LocalAI [bot]
72f4d541d0
chore: ⬆️ Update ggml-org/llama.cpp to f4586ee5986d6f965becb37876d6f3666478a961 (#6048)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-08-13 08:33:48 +02:00
Ettore Di Giacinto
18fcd8557c
fix(llama.cpp): support gfx1200 (#6045)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-08-12 22:04:30 +02:00
LocalAI [bot]
b2e8b6d1aa
chore: ⬆️ Update ggml-org/llama.cpp to be48528b068111304e4a0bb82c028558b5705f05 (#6012)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-11 21:06:10 +00:00
LocalAI [bot]
6db19c5cb9
chore: ⬆️ Update ggml-org/llama.cpp to 79c1160b073b8148a404f3dd2584be1606dccc66 (#6006)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-11 12:54:21 +02:00
LocalAI [bot]
def7cdc0bf
chore: ⬆️ Update ggml-org/llama.cpp to cd6983d56d2cce94ecb86bb114ae8379a609073c (#6003)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-09 08:41:58 +02:00
LocalAI [bot]
4e40a8d1ed
chore: ⬆️ Update ggml-org/llama.cpp to a0552c8beef74e843bb085c8ef0c63f9ed7a2b27 (#5992)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-07 21:13:14 +00:00
Ettore Di Giacinto
ec1276e5a9
fix(llama.cpp): do not default to linear rope (#5982)
This seems to somehow sneaked in during the initial pass to gRPC server,
instead of setting linear rope when required, we did default to it if
not specified.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-08-06 23:20:28 +02:00
LocalAI [bot]
61ba98d43d
chore: ⬆️ Update ggml-org/llama.cpp to e725a1a982ca870404a9c4935df52466327bbd02 (#5984)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-06 21:17:20 +00:00
LocalAI [bot]
03e8592450
chore: ⬆️ Update ggml-org/llama.cpp to fd1234cb468935ea087d6929b2487926c3afff4b (#5972)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-05 23:14:43 +02:00
LocalAI [bot]
2913676157
chore: ⬆️ Update ggml-org/llama.cpp to 41613437ffee0dbccad684fc744788bc504ec213 (#5968)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-04 23:16:30 +02:00
LocalAI [bot]
4d90971424
chore: ⬆️ Update ggml-org/llama.cpp to d31192b4ee1441bbbecd3cbf9e02633368bdc4f5 (#5965)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-03 21:03:20 +00:00
LocalAI [bot]
2a9d675d62
chore: ⬆️ Update ggml-org/llama.cpp to 5c0eb5ef544aeefd81c303e03208f768e158d93c (#5959)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-08-02 23:35:24 +02:00
LocalAI [bot]
0b085089b9
chore: ⬆️ Update ggml-org/llama.cpp to daf2dd788066b8b239cb7f68210e090c2124c199 (#5951)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-08-01 08:25:36 +02:00
Richard Palethorpe
c07bc55fee
fix(intel): Set GPU vendor on Intel images and cleanup (#5945)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-07-31 19:44:46 +02:00
LocalAI [bot]
8b1e8b4cda
chore: ⬆️ Update ggml-org/llama.cpp to e9192bec564780bd4313ad6524d20a0ab92797db (#5940)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-07-31 09:26:02 +02:00
LocalAI [bot]
eb5c3670f1
chore: ⬆️ Update ggml-org/llama.cpp to aa79524c51fb014f8df17069d31d7c44b9ea6cb8 (#5934)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-07-29 21:05:00 +00:00
LocalAI [bot]
60726d16f2
chore: ⬆️ Update ggml-org/llama.cpp to 8ad7b3e65b5834e5574c2f5640056c9047b5d93b (#5931)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-07-29 08:01:03 +02:00
LocalAI [bot]
d25145e641
chore: ⬆️ Update ggml-org/llama.cpp to bf78f5439ee8e82e367674043303ebf8e92b4805 (#5927)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-07-27 21:08:32 +00:00
LocalAI [bot]
932360bf7e
chore: ⬆️ Update ggml-org/llama.cpp to 11dd5a44eb180e1d69fac24d3852b5222d66fb7f (#5921)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-07-27 09:50:56 +02:00
LocalAI [bot]
5ce982b9c9
chore: ⬆️ Update ggml-org/llama.cpp to c7f3169cd523140a288095f2d79befb20a0b73f4 (#5913)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-07-25 23:08:20 +02:00
LocalAI [bot]
813cb4296d
chore: ⬆️ Update ggml-org/llama.cpp to 3f4fc97f1d745f1d5d3c853949503136d419e6de (#5900)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-07-25 08:39:44 +02:00
LocalAI [bot]
61c2304638
chore: ⬆️ Update ggml-org/llama.cpp to a86f52b2859dae4db5a7a0bbc0f1ad9de6b43ec6 (#5894)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-07-24 15:02:37 +02:00
Ettore Di Giacinto
b7b3164736 chore: try to speedup build
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-07-23 21:21:23 +02:00
LocalAI [bot]
b5be867e28
chore: ⬆️ Update ggml-org/llama.cpp to acd6cb1c41676f6bbb25c2a76fa5abeb1719301e (#5882)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-07-22 21:12:06 +00:00
Ettore Di Giacinto
98e5291afc
feat: refactor build process, drop embedded backends (#5875)
* feat: split remaining backends and drop embedded backends

- Drop silero-vad, huggingface, and stores backend from embedded
  binaries
- Refactor Makefile and Dockerfile to avoid building grpc backends
- Drop golang code that was used to embed backends
- Simplify building by using goreleaser

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(gallery): be specific with llama-cpp backend templates

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(docs): update

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(ci): minor fixes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: drop all ffmpeg references

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: run protogen-go

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Always enable p2p mode

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Update gorelease file

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(stores): do not always load

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fix linting issues

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Simplify

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Mac OS fixup

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-07-22 16:31:04 +02:00
LocalAI [bot]
e29b2c3aff
chore: ⬆️ Update ggml-org/llama.cpp to 6c9ee3b17e19dcc82ab93d52ae46fdd0226d4777 (#5877)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-07-22 08:25:43 +02:00
LocalAI [bot]
fa284f7445
chore: ⬆️ Update ggml-org/llama.cpp to 2be60cbc2707359241c2784f9d2e30d8fc7cdabb (#5867)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-07-21 09:14:09 +02:00
LocalAI [bot]
7659461036
chore: ⬆️ Update ggml-org/llama.cpp to a979ca22db0d737af1e548a73291193655c6be99 (#5862)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-07-20 08:43:36 +02:00
Ettore Di Giacinto
580687da46
feat: remove stablediffusion-ggml from main binary (#5861)
* feat: split stablediffusion-ggml from main binary

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Test CI

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Adapt ci tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to support nvidial4t

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Latest fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-07-19 21:58:53 +02:00
LocalAI [bot]
1929eb2894
chore: ⬆️ Update ggml-org/llama.cpp to bf9087f59aab940cf312b85a67067ce33d9e365a (#5860)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-07-19 08:52:07 +02:00
Ettore Di Giacinto
b29544d747
feat: split piper from main binary (#5858)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-07-19 08:31:33 +02:00
Ettore Di Giacinto
294f7022f3
feat: do not bundle llama-cpp anymore (#5790)
* Build llama.cpp separately

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* WIP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* WIP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* WIP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Start to try to attach some tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add git and small fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: correctly autoload external backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to run AIO tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Slightly update the Makefile helps

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Adapt auto-bumper

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to run linux test

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add llama-cpp into build pipelines

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add default capability (for cpu)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop llama-cpp specific logic from the backend loader

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* drop grpc install in ci for tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Pass by backends path for tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Build protogen at start

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(tests): set backends path consistently

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Correctly configure the backends path

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to build for darwin

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* WIP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Compile for metal on arm64/darwin

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to run build off from cross-arch

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add to the backend index nvidia-l4t and cpu's llama-cpp backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Build also darwin-x86 for llama-cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Disable arm64 builds temporary

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Test backend build on PR

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixup build backend reusable workflow

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* pass by skip drivers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Use crane

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Skip drivers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* x86 darwin

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add packaging step for llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fix leftover from bark-cpp extraction

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to fix hipblas build

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-07-18 13:24:12 +02:00
Ettore Di Giacinto
dfadc3696e
feat(llama.cpp): allow to set kv-overrides (#5745)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-06-28 21:26:07 +02:00
Ettore Di Giacinto
d68660bd5a
chore(deps): bump llama.cpp to 'e434e69183fd9e1031f4445002083178c331a28b (#5665)
chore(deps): bump llama.cpp to 'e434e69183fd9e1031f4445002083178c331a28b'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-06-17 17:00:10 +02:00
Ettore Di Giacinto
cd3cd899ad
chore(deps): bump llama.cpp to '363757628848a27a435bbf22ff9476e9aeda5f40' (#5571)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-06-03 12:19:16 +02:00
Ettore Di Giacinto
80f7f17843
chore(deps): bump llama.cpp to 'e562eece7cb476276bfc4cbb18deb7c0369b2233' (#5552)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-31 12:46:32 +02:00
Ettore Di Giacinto
dd7fa6b9f7
chore(deps): bump llama.cpp to 'e83ba3e460651b20a594e9f2f0f0bffb998d3ce1 (#5527)
chore(deps): bump llama.cpp to 'e83ba3e460651b20a594e9f2f0f0bffb998d3ce1'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-30 10:29:01 +02:00
Ettore Di Giacinto
88de2ea01a
feat(llama.cpp): add support for audio input (#5466)
* feat(llama.cpp): add support for audio input

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Adapt tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-26 16:06:03 +02:00
Ettore Di Giacinto
3b0cf52f6a
feat(llama.cpp): add reranking (#5396)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-22 21:49:30 +02:00
Ettore Di Giacinto
6d5bde860b
feat(llama.cpp): upgrade and use libmtmd (#5379)
* WIP

* wip

* wip

* Make it compile

* Update json.hpp

* this shouldn't be private for now

* Add logs

* Reset auto detected template

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Re-enable grammars

* This seems to be broken - 360a9c98e1 (diff-a18a8e64e12a01167d8e98fc)[…]cccf0d4eed09d76d879L2998-L3207

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Placeholder

* Simplify image loading

* use completion type

* disable streaming

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* correctly return timings

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Remove some debug logging

* Adapt tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Keep header

* embedding: do not use oai type

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Sync from server.cpp

* Use utils and json directly from llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Sync with upstream

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: copy json.hpp from the correct location

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: add httplib

* sync llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Embeddiongs: set OAICOMPAT_TYPE_EMBEDDING

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: sync with server.cpp by including it

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* make it darwin-compatible

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-17 16:02:53 +02:00
Ettore Di Giacinto
adb24214c6
chore(deps): bump llama.cpp to b34c859146630dff136943abc9852ca173a7c9d6 (#5323)
chore(deps): bump llama.cpp to 'b34c859146630dff136943abc9852ca173a7c9d6'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-06 11:21:25 +02:00
Ettore Di Giacinto
1fc6d469ac
chore(deps): bump llama.cpp to '1d36b3670b285e69e58b9d687c770a2a0a192194 (#5307)
chore(deps): bump llama.cpp to '1d36b3670b285e69e58b9d687c770a2a0a192194'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-05-03 18:44:40 +02:00
Ettore Di Giacinto
8abecb4a18
chore: bump grpc limits to 50MB (#5212)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-19 08:53:24 +02:00
Richard Palethorpe
1b899e1a68
feat(stablediffusion): Enable SYCL (#5144)
* feat(sycl): Enable SYCL for stable diffusion

This is a pain because we compile with CGO, but SD is compiled with
CMake. I don't think we can easily use CMake to set the linker flags
necessary. Also I could not find pkg-config calls that would fully set
the flags, so some of them are set manually.

See https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html
for reference. I also resorted to searching the shared object files in
MKLROOT/lib for the symbols.

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(ci): Don't set nproc on cmake

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-04-10 15:20:53 +02:00
Ettore Di Giacinto
25e6f21322
chore(deps): bump llama.cpp to 4ccea213bc629c4eef7b520f7f6c59ce9bbdaca0 (#5143)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-08 11:26:06 +02:00
Ettore Di Giacinto
ece239966f
chore: ⬆️ Update ggml-org/llama.cpp to 6bf28f0111ff9f21b3c1b1eace20c590281e7ba6 (#5127)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-06 14:01:51 +02:00
Richard Palethorpe
d2cf8ef070
fix(sycl): kernel not found error by forcing -fsycl (#5115)
* chore(sycl): Update oneapi to 2025:1

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(sycl): Pass -fsycl flag as workaround

-fsycl should be set by llama.cpp's cmake file, but something goes wrong
and it doesn't appear to get added

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(build): Speed up llama build by using all CPUs

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-04-03 16:22:59 +02:00
Ettore Di Giacinto
18b320d577
chore(deps): bump llama.cpp to 'f01bd02376f919b05ee635f438311be8dfc91d7c (#5110)
chore(deps): bump llama.cpp to 'f01bd02376f919b05ee635f438311be8dfc91d7c'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-04-03 10:23:14 +02:00
Ettore Di Giacinto
c2a39e3639
fix(llama.cpp): properly handle sigterm (#5099)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-30 18:08:29 +02:00
Ettore Di Giacinto
423514a5a5
fix(clip): do not imply GPU offload by default (#5010)
* fix(clip): do not imply GPUs by default

Until a better solution is found upstream, be conservative and default
to GPU.

https://github.com/ggml-org/llama.cpp/pull/12322
https://github.com/ggml-org/llama.cpp/pull/12322#issuecomment-2720970695

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* allow to override gpu via backend options

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-13 15:14:11 +01:00
Ettore Di Giacinto
e4fa894153
fix(llama.cpp): correctly handle embeddings in batches (#4957)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-07 19:29:52 +01:00
Ettore Di Giacinto
67f7bffd18
chore(deps): update llama.cpp and sync with upstream changes (#4950)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-03-06 00:40:58 +01:00
Ettore Di Giacinto
9e32fda304
fix(llama.cpp): improve context shift handling (#4820)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-14 14:55:03 +01:00
Shraddha
03974a4dd4
feat: tokenization with llama.cpp (#4724)
feat: tokenization

Signed-off-by: shraddhazpy <shraddha@shraddhafive.in>
2025-02-02 17:39:43 +00:00
Ettore Di Giacinto
1d6afbd65d
feat(llama.cpp): Add support to grammar triggers (#4733)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-02-02 13:25:03 +01:00
Ettore Di Giacinto
958f6eb722
chore(llama.cpp): update dependency (#4628)
Update to '3edfa7d3753c29e44b964c0ff424d2ea8d5fdee6' and adapt to upstream changes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-18 11:55:13 +01:00
mintyleaf
96f8ec0402
feat: add machine tag and inference timings (#4577)
* Add machine tag option, add extraUsage option, grpc-server -> proto -> endpoint extraUsage data is broken for now

Signed-off-by: mintyleaf <mintyleafdev@gmail.com>

* remove redurant timing fields, fix not working timings output

Signed-off-by: mintyleaf <mintyleafdev@gmail.com>

* use middleware for Machine-Tag only if tag is specified

Signed-off-by: mintyleaf <mintyleafdev@gmail.com>

---------

Signed-off-by: mintyleaf <mintyleafdev@gmail.com>
2025-01-17 17:05:58 +01:00
Ettore Di Giacinto
ab5adf40af
chore(deps): bump llama.cpp to '924518e2e5726e81f3aeb2518fb85963a500e… (#4592)
chore(deps): bump llama.cpp to '924518e2e5726e81f3aeb2518fb85963a500e93a'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-13 17:33:06 +01:00
Ettore Di Giacinto
c553d73748
chore(deps): bump llama.cpp to 4b0c638b9 (#4532)
deps(llama.cpp): bump to 4b0c638b9

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-01-04 09:40:08 +01:00
Ettore Di Giacinto
0eb2911aad
chore(llava): update clip.patch (#4453)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-23 19:11:31 +01:00
Ettore Di Giacinto
708cba0c1b
chore(llama.cpp): bump, drop penalize_nl (#4418)
deps(llama.cpp): bump, drop penalize_nl

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-17 00:47:52 +01:00
Ettore Di Giacinto
fc4a714992
feat(llama.cpp): bump and adapt to upstream changes (#4378)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-14 00:30:52 +01:00
Ettore Di Giacinto
d4c1746c7d
feat(llama.cpp): expose cache_type_k and cache_type_v for quant of kv cache (#4329)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-12-06 10:23:59 +01:00
Ettore Di Giacinto
cbedf2f428
fix(llama.cpp): embed metal file into result binary for darwin (#4279)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-11-28 04:17:00 +00:00
Ettore Di Giacinto
2b62260b6d
feat(models): use rwkv from llama.cpp (#4264)
feat(rwkv): use rwkv from llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-11-26 14:22:55 +01:00
Ettore Di Giacinto
404ca3cc23
chore(deps): bump llama.cpp to 47f931c8f9a26c072d71224bc8013cc66ea9e445 (#4263)
chore(deps): bump llama.cpp to '47f931c8f9a26c072d71224bc8013cc66ea9e445'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-11-26 11:12:57 +01:00
Ettore Di Giacinto
939fbe59cc
chore(deps): bump llama-cpp to ae8de6d50a09d49545e0afab2e50cc4acfb280e2 (#4157)
* chore(deps): bump llama-cpp to ae8de6d50a09d49545e0afab2e50cc4acfb280e2

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(metal): metal file has moved

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-11-15 12:51:43 +01:00
Ettore Di Giacinto
3d4bb757d2
chore(deps): bump llama-cpp to 8f275a7c4593aa34147595a90282cf950a853690 (#4016)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-10-30 08:31:13 +01:00
Ettore Di Giacinto
32db787991
chore(deps): bump llama-cpp to cda0e4b648dde8fac162b3430b14a99597d3d74f (#3884)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-10-20 00:26:49 +02:00
Ettore Di Giacinto
6257e2f510
chore(deps): bump llama-cpp to 96776405a17034dcfd53d3ddf5d142d34bdbb657 (#3793)
This adapts also to upstream changes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-10-12 01:25:03 +02:00
siddimore
f84b55d1ef
feat: Add Get Token Metrics to GRPC server (#3687)
* Add Get Token Metrics to GRPC server

Signed-off-by: Siddharth More <siddimore@gmail.com>

* Expose LocalAI endpoint

Signed-off-by: Siddharth More <siddimore@gmail.com>

---------

Signed-off-by: Siddharth More <siddimore@gmail.com>
2024-10-01 14:41:20 +02:00
siddimore
50a3b54e34
feat(api): add correlationID to Track Chat requests (#3668)
* Add CorrelationID to chat request

Signed-off-by: Siddharth More <siddimore@gmail.com>

* remove get_token_metrics

Signed-off-by: Siddharth More <siddimore@gmail.com>

* Add CorrelationID to proto

Signed-off-by: Siddharth More <siddimore@gmail.com>

* fix correlation method name

Signed-off-by: Siddharth More <siddimore@gmail.com>

* Update core/http/endpoints/openai/chat.go

Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: Siddharth More <siddimore@gmail.com>

* Update core/http/endpoints/openai/chat.go

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: Siddharth More <siddimore@gmail.com>

---------

Signed-off-by: Siddharth More <siddimore@gmail.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-09-28 17:23:56 +02:00
Ettore Di Giacinto
25deb4ba95
chore(deps): update llama.cpp to 6262d13e0b2da91f230129a93a996609a2fa2f2 (#3549)
chore(deps): update llama.cpp to 6262d13e0b2da91f230129a93a996609a2f5a2f2

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-09-16 10:29:20 +02:00
Ettore Di Giacinto
d51444d606
chore(deps): update llama.cpp (#3497)
* Apply llava patch

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-09-12 20:55:27 +02:00
Ettore Di Giacinto
b8e7a76524
chore(deps): update llama.cpp (#3438)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-08-31 01:21:45 +02:00
Ettore Di Giacinto
409e2d348e
chore(deps): bump llama.cpp, rename llama_add_bos_token (#3253)
deps(llama.cpp): bump, rename llama_add_bos_token

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-08-16 01:20:21 +02:00
Ettore Di Giacinto
abcf0ff000
chore: ⬆️ Update ggerganov/llama.cpp to 1e6f6554aa11fa10160a5fda689e736c3c34169f (#3189)
* arrow_up: Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(llama.cpp): adapt to upstream naming changes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-08-07 01:10:21 +02:00
Ettore Di Giacinto
4e11ca55fd
chore: ⬆️ Update ggerganov/llama.cpp (#3166)
* arrow_up: Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(llama.cpp): adapt init function call

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-08-06 11:39:35 +02:00
Ettore Di Giacinto
bd900945f7
fix(llama.cpp): do not set anymore lora_base (#2999)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-07-24 12:35:52 +02:00
Ettore Di Giacinto
35561edb6e
feat(llama.cpp): support embeddings endpoints (#2871)
* feat(llama.cpp): add embeddings

Also enable embeddings by default for llama.cpp models

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(Makefile): prepare llama.cpp sources only once

Otherwise we keep cloning llama.cpp for each of the variants

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* do not set embeddings to false

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs: add embeddings to the YAML config reference

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-07-15 22:54:16 +02:00
Dave
405794d4ca
fix: speedup git submodule update with --single-branch (#2847)
add --single-branch to submodule update commands for speed

Signed-off-by: Dave Lee <dave@gray101.com>
2024-07-13 22:32:25 +02:00
Loric
a00e9a82ae
Update remaining git clones to git fetch (#2779)
Signed-off-by: Loric <117862619+LoricOSC@users.noreply.github.com>
2024-07-12 06:43:58 +00:00
cryptk
c047c19145
fix: make sure the GNUMake jobserver is passed to cmake for the llama.cpp build (#2697)
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-07-02 08:46:59 +02:00
Ettore Di Giacinto
7b1e792732
deps(llama.cpp): bump to latest, update build variables (#2669)
* arrow_up: Update ggerganov/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* deps(llama.cpp): update build variables to follow upstream

Update build recipes with https://github.com/ggerganov/llama.cpp/pull/8006

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Disable shared libs by default in llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Disable shared libs in llama.cpp Makefile

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Disable metal embedding for now, until it is tested

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(mac): explicitly enable metal

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* debug

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix typo

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-06-27 23:10:04 +02:00
Ettore Di Giacinto
a8bfb6f9c2
feat(options): add repeat_last_n (#2660)
feat(options): add repeat_last_n

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-26 14:58:50 +02:00
Ettore Di Giacinto
b783c811db
feat(build): only build llama.cpp relevant targets (#2659)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-06-26 14:58:38 +02:00
Ettore Di Giacinto
3a9408363b
deps(llama.cpp): update and adapt API changes (#2381)
deps(llama.cpp): update and rename function

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-23 01:02:11 +02:00
Ettore Di Giacinto
c89271b2e4
feat(llama.cpp): add distributed llama.cpp inferencing (#2324)
* feat(llama.cpp): support distributed llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: let tweak how chat messages are merged together

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Makefile: register to ALL_GRPC_BACKENDS

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactoring, allow disable auto-detection of backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* minor fixups

Signed-off-by: mudler <mudler@localai.io>

* feat: add cmd to start rpc-server from llama.cpp

Signed-off-by: mudler <mudler@localai.io>

* ci: add ccache

Signed-off-by: mudler <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: mudler <mudler@localai.io>
2024-05-15 01:17:02 +02:00
Ettore Di Giacinto
e49ea0123b
feat(llama.cpp): add flash_attention and no_kv_offloading (#2310)
feat(llama.cpp): add flash_attn and no_kv_offload

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-13 19:07:51 +02:00
cryptk
28a421cb1d
feat: migrate python backends from conda to uv (#2215)
* feat: migrate diffusers backend from conda to uv

  - replace conda with UV for diffusers install (prototype for all
    extras backends)
  - add ability to build docker with one/some/all extras backends
    instead of all or nothing

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate autogtpq bark coqui from conda to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: convert exllama over to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate exllama2 to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate mamba to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate parler to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate petals to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: fix tests

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate rerankers to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate sentencetransformers to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: install uv for tests-linux

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: make sure file exists before installing on intel images

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate transformers backend to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate transformers-musicgen to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate vall-e-x to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate vllm to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add uv install to the rest of test-extra.yml

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: adjust file perms on all install/run/test scripts

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add missing acclerate dependencies

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add some more missing dependencies to python backends

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: parler tests venv py dir fix

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: correct filename for transformers-musicgen tests

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: adjust the pwd for valle tests

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: cleanup and optimization work for uv migration

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add setuptools to requirements-install for mamba

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: more size optimization work

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: make installs and tests more consistent, cleanup some deps

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: cleanup

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: mamba backend is cublas only

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: uncomment lines in makefile

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

---------

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-05-10 15:08:08 +02:00