LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-05-22 16:39:52 +00:00

Author	SHA1	Message	Date
Ettore Di Giacinto	80f7f17843	chore(deps): bump llama.cpp to 'e562eece7cb476276bfc4cbb18deb7c0369b2233' (#5552 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-05-31 12:46:32 +02:00
Ettore Di Giacinto	dd7fa6b9f7	chore(deps): bump llama.cpp to 'e83ba3e460651b20a594e9f2f0f0bffb998d3ce1 (#5527 ) chore(deps): bump llama.cpp to 'e83ba3e460651b20a594e9f2f0f0bffb998d3ce1' Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-05-30 10:29:01 +02:00
Ettore Di Giacinto	88de2ea01a	feat(llama.cpp): add support for audio input (#5466 ) * feat(llama.cpp): add support for audio input Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Adapt tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-05-26 16:06:03 +02:00
Ettore Di Giacinto	3b0cf52f6a	feat(llama.cpp): add reranking (#5396 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-05-22 21:49:30 +02:00
Ettore Di Giacinto	6d5bde860b	feat(llama.cpp): upgrade and use libmtmd (#5379 ) * WIP * wip * wip * Make it compile * Update json.hpp * this shouldn't be private for now * Add logs * Reset auto detected template Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Re-enable grammars * This seems to be broken - `360a9c98e1 (diff-a18a8e64e12a01167d8e98fc)`[…]cccf0d4eed09d76d879L2998-L3207 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Placeholder * Simplify image loading * use completion type * disable streaming Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * correctly return timings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Remove some debug logging * Adapt tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Keep header * embedding: do not use oai type Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Sync from server.cpp * Use utils and json directly from llama.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Sync with upstream Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: copy json.hpp from the correct location Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: add httplib * sync llama.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Embeddiongs: set OAICOMPAT_TYPE_EMBEDDING Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat: sync with server.cpp by including it Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * make it darwin-compatible Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-05-17 16:02:53 +02:00
Ettore Di Giacinto	adb24214c6	chore(deps): bump llama.cpp to `b34c859146630dff136943abc9852ca173a7c9d6` (#5323 ) chore(deps): bump llama.cpp to 'b34c859146630dff136943abc9852ca173a7c9d6' Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-05-06 11:21:25 +02:00
Ettore Di Giacinto	1fc6d469ac	chore(deps): bump llama.cpp to '1d36b3670b285e69e58b9d687c770a2a0a192194 (#5307 ) chore(deps): bump llama.cpp to '1d36b3670b285e69e58b9d687c770a2a0a192194' Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-05-03 18:44:40 +02:00
Ettore Di Giacinto	8abecb4a18	chore: bump grpc limits to 50MB (#5212 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-19 08:53:24 +02:00
Richard Palethorpe	1b899e1a68	feat(stablediffusion): Enable SYCL (#5144 ) * feat(sycl): Enable SYCL for stable diffusion This is a pain because we compile with CGO, but SD is compiled with CMake. I don't think we can easily use CMake to set the linker flags necessary. Also I could not find pkg-config calls that would fully set the flags, so some of them are set manually. See https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html for reference. I also resorted to searching the shared object files in MKLROOT/lib for the symbols. Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(ci): Don't set nproc on cmake Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2025-04-10 15:20:53 +02:00
Ettore Di Giacinto	25e6f21322	chore(deps): bump llama.cpp to `4ccea213bc629c4eef7b520f7f6c59ce9bbdaca0` (#5143 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-08 11:26:06 +02:00
Ettore Di Giacinto	ece239966f	chore: ⬆️ Update ggml-org/llama.cpp to `6bf28f0111ff9f21b3c1b1eace20c590281e7ba6` (#5127 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-06 14:01:51 +02:00
Richard Palethorpe	d2cf8ef070	fix(sycl): kernel not found error by forcing -fsycl (#5115 ) * chore(sycl): Update oneapi to 2025:1 Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(sycl): Pass -fsycl flag as workaround -fsycl should be set by llama.cpp's cmake file, but something goes wrong and it doesn't appear to get added Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(build): Speed up llama build by using all CPUs Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2025-04-03 16:22:59 +02:00
Ettore Di Giacinto	18b320d577	chore(deps): bump llama.cpp to 'f01bd02376f919b05ee635f438311be8dfc91d7c (#5110 ) chore(deps): bump llama.cpp to 'f01bd02376f919b05ee635f438311be8dfc91d7c' Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-03 10:23:14 +02:00
Ettore Di Giacinto	c2a39e3639	fix(llama.cpp): properly handle sigterm (#5099 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-03-30 18:08:29 +02:00
Ettore Di Giacinto	423514a5a5	fix(clip): do not imply GPU offload by default (#5010 ) * fix(clip): do not imply GPUs by default Until a better solution is found upstream, be conservative and default to GPU. https://github.com/ggml-org/llama.cpp/pull/12322 https://github.com/ggml-org/llama.cpp/pull/12322#issuecomment-2720970695 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * allow to override gpu via backend options Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-03-13 15:14:11 +01:00
Ettore Di Giacinto	e4fa894153	fix(llama.cpp): correctly handle embeddings in batches (#4957 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-03-07 19:29:52 +01:00
Ettore Di Giacinto	67f7bffd18	chore(deps): update llama.cpp and sync with upstream changes (#4950 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-03-06 00:40:58 +01:00
Ettore Di Giacinto	9e32fda304	fix(llama.cpp): improve context shift handling (#4820 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-02-14 14:55:03 +01:00
Shraddha	03974a4dd4	feat: tokenization with llama.cpp (#4724 ) feat: tokenization Signed-off-by: shraddhazpy <shraddha@shraddhafive.in>	2025-02-02 17:39:43 +00:00
Ettore Di Giacinto	1d6afbd65d	feat(llama.cpp): Add support to grammar triggers (#4733 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-02-02 13:25:03 +01:00
Ettore Di Giacinto	958f6eb722	chore(llama.cpp): update dependency (#4628 ) Update to '3edfa7d3753c29e44b964c0ff424d2ea8d5fdee6' and adapt to upstream changes Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-01-18 11:55:13 +01:00
mintyleaf	96f8ec0402	feat: add machine tag and inference timings (#4577 ) * Add machine tag option, add extraUsage option, grpc-server -> proto -> endpoint extraUsage data is broken for now Signed-off-by: mintyleaf <mintyleafdev@gmail.com> * remove redurant timing fields, fix not working timings output Signed-off-by: mintyleaf <mintyleafdev@gmail.com> * use middleware for Machine-Tag only if tag is specified Signed-off-by: mintyleaf <mintyleafdev@gmail.com> --------- Signed-off-by: mintyleaf <mintyleafdev@gmail.com>	2025-01-17 17:05:58 +01:00
Ettore Di Giacinto	ab5adf40af	chore(deps): bump llama.cpp to '924518e2e5726e81f3aeb2518fb85963a500e… (#4592 ) chore(deps): bump llama.cpp to '924518e2e5726e81f3aeb2518fb85963a500e93a' Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-01-13 17:33:06 +01:00
Ettore Di Giacinto	c553d73748	chore(deps): bump llama.cpp to 4b0c638b9 (#4532 ) deps(llama.cpp): bump to 4b0c638b9 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-01-04 09:40:08 +01:00
Ettore Di Giacinto	0eb2911aad	chore(llava): update clip.patch (#4453 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-12-23 19:11:31 +01:00
Ettore Di Giacinto	708cba0c1b	chore(llama.cpp): bump, drop penalize_nl (#4418 ) deps(llama.cpp): bump, drop penalize_nl Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-12-17 00:47:52 +01:00
Ettore Di Giacinto	fc4a714992	feat(llama.cpp): bump and adapt to upstream changes (#4378 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-12-14 00:30:52 +01:00
Ettore Di Giacinto	d4c1746c7d	feat(llama.cpp): expose cache_type_k and cache_type_v for quant of kv cache (#4329 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-12-06 10:23:59 +01:00
Ettore Di Giacinto	cbedf2f428	fix(llama.cpp): embed metal file into result binary for darwin (#4279 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-11-28 04:17:00 +00:00
Ettore Di Giacinto	2b62260b6d	feat(models): use rwkv from llama.cpp (#4264 ) feat(rwkv): use rwkv from llama.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-11-26 14:22:55 +01:00
Ettore Di Giacinto	404ca3cc23	chore(deps): bump llama.cpp to `47f931c8f9a26c072d71224bc8013cc66ea9e445` (#4263 ) chore(deps): bump llama.cpp to '47f931c8f9a26c072d71224bc8013cc66ea9e445' Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-11-26 11:12:57 +01:00
Ettore Di Giacinto	939fbe59cc	chore(deps): bump llama-cpp to ae8de6d50a09d49545e0afab2e50cc4acfb280e2 (#4157 ) * chore(deps): bump llama-cpp to ae8de6d50a09d49545e0afab2e50cc4acfb280e2 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(metal): metal file has moved Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-11-15 12:51:43 +01:00
Ettore Di Giacinto	3d4bb757d2	chore(deps): bump llama-cpp to 8f275a7c4593aa34147595a90282cf950a853690 (#4016 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-10-30 08:31:13 +01:00
Ettore Di Giacinto	32db787991	chore(deps): bump llama-cpp to cda0e4b648dde8fac162b3430b14a99597d3d74f (#3884 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-10-20 00:26:49 +02:00
Ettore Di Giacinto	6257e2f510	chore(deps): bump llama-cpp to 96776405a17034dcfd53d3ddf5d142d34bdbb657 (#3793 ) This adapts also to upstream changes Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-10-12 01:25:03 +02:00
siddimore	f84b55d1ef	feat: Add Get Token Metrics to GRPC server (#3687 ) * Add Get Token Metrics to GRPC server Signed-off-by: Siddharth More <siddimore@gmail.com> * Expose LocalAI endpoint Signed-off-by: Siddharth More <siddimore@gmail.com> --------- Signed-off-by: Siddharth More <siddimore@gmail.com>	2024-10-01 14:41:20 +02:00
siddimore	50a3b54e34	feat(api): add correlationID to Track Chat requests (#3668 ) * Add CorrelationID to chat request Signed-off-by: Siddharth More <siddimore@gmail.com> * remove get_token_metrics Signed-off-by: Siddharth More <siddimore@gmail.com> * Add CorrelationID to proto Signed-off-by: Siddharth More <siddimore@gmail.com> * fix correlation method name Signed-off-by: Siddharth More <siddimore@gmail.com> * Update core/http/endpoints/openai/chat.go Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Signed-off-by: Siddharth More <siddimore@gmail.com> * Update core/http/endpoints/openai/chat.go Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Signed-off-by: Siddharth More <siddimore@gmail.com> --------- Signed-off-by: Siddharth More <siddimore@gmail.com> Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-09-28 17:23:56 +02:00
Ettore Di Giacinto	25deb4ba95	chore(deps): update llama.cpp to 6262d13e0b2da91f230129a93a996609a2fa2f2 (#3549 ) chore(deps): update llama.cpp to 6262d13e0b2da91f230129a93a996609a2f5a2f2 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-09-16 10:29:20 +02:00
Ettore Di Giacinto	d51444d606	chore(deps): update llama.cpp (#3497 ) * Apply llava patch Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-09-12 20:55:27 +02:00
Ettore Di Giacinto	b8e7a76524	chore(deps): update llama.cpp (#3438 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-08-31 01:21:45 +02:00
Ettore Di Giacinto	409e2d348e	chore(deps): bump llama.cpp, rename `llama_add_bos_token` (#3253 ) deps(llama.cpp): bump, rename llama_add_bos_token Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-08-16 01:20:21 +02:00
Ettore Di Giacinto	abcf0ff000	chore: ⬆️ Update ggerganov/llama.cpp to `1e6f6554aa11fa10160a5fda689e736c3c34169f` (#3189 ) * arrow_up: Update ggerganov/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(llama.cpp): adapt to upstream naming changes Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2024-08-07 01:10:21 +02:00
Ettore Di Giacinto	4e11ca55fd	chore: ⬆️ Update ggerganov/llama.cpp (#3166 ) * arrow_up: Update ggerganov/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(llama.cpp): adapt init function call Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2024-08-06 11:39:35 +02:00
Ettore Di Giacinto	bd900945f7	fix(llama.cpp): do not set anymore lora_base (#2999 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-07-24 12:35:52 +02:00
Ettore Di Giacinto	35561edb6e	feat(llama.cpp): support embeddings endpoints (#2871 ) * feat(llama.cpp): add embeddings Also enable embeddings by default for llama.cpp models Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(Makefile): prepare llama.cpp sources only once Otherwise we keep cloning llama.cpp for each of the variants Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * do not set embeddings to false Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs: add embeddings to the YAML config reference Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-07-15 22:54:16 +02:00
Dave	405794d4ca	fix: speedup `git submodule update` with `--single-branch` (#2847 ) add --single-branch to submodule update commands for speed Signed-off-by: Dave Lee <dave@gray101.com>	2024-07-13 22:32:25 +02:00
Loric	a00e9a82ae	Update remaining git clones to git fetch (#2779 ) Signed-off-by: Loric <117862619+LoricOSC@users.noreply.github.com>	2024-07-12 06:43:58 +00:00
cryptk	c047c19145	fix: make sure the GNUMake jobserver is passed to cmake for the llama.cpp build (#2697 ) Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>	2024-07-02 08:46:59 +02:00
Ettore Di Giacinto	7b1e792732	deps(llama.cpp): bump to latest, update build variables (#2669 ) * arrow_up: Update ggerganov/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * deps(llama.cpp): update build variables to follow upstream Update build recipes with https://github.com/ggerganov/llama.cpp/pull/8006 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Disable shared libs by default in llama.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Disable shared libs in llama.cpp Makefile Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Disable metal embedding for now, until it is tested Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(mac): explicitly enable metal Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * debug Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix typo Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2024-06-27 23:10:04 +02:00
Ettore Di Giacinto	a8bfb6f9c2	feat(options): add `repeat_last_n` (#2660 ) feat(options): add repeat_last_n Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-06-26 14:58:50 +02:00
Ettore Di Giacinto	b783c811db	feat(build): only build llama.cpp relevant targets (#2659 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-06-26 14:58:38 +02:00
Ettore Di Giacinto	3a9408363b	deps(llama.cpp): update and adapt API changes (#2381 ) deps(llama.cpp): update and rename function Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-05-23 01:02:11 +02:00
Ettore Di Giacinto	c89271b2e4	feat(llama.cpp): add distributed llama.cpp inferencing (#2324 ) * feat(llama.cpp): support distributed llama.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat: let tweak how chat messages are merged together Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Makefile: register to ALL_GRPC_BACKENDS Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring, allow disable auto-detection of backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * minor fixups Signed-off-by: mudler <mudler@localai.io> * feat: add cmd to start rpc-server from llama.cpp Signed-off-by: mudler <mudler@localai.io> * ci: add ccache Signed-off-by: mudler <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: mudler <mudler@localai.io>	2024-05-15 01:17:02 +02:00
Ettore Di Giacinto	e49ea0123b	feat(llama.cpp): add `flash_attention` and `no_kv_offloading` (#2310 ) feat(llama.cpp): add flash_attn and no_kv_offload Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-05-13 19:07:51 +02:00
cryptk	28a421cb1d	feat: migrate python backends from conda to uv (#2215 ) * feat: migrate diffusers backend from conda to uv - replace conda with UV for diffusers install (prototype for all extras backends) - add ability to build docker with one/some/all extras backends instead of all or nothing Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * feat: migrate autogtpq bark coqui from conda to uv Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * feat: convert exllama over to uv Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * feat: migrate exllama2 to uv Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * feat: migrate mamba to uv Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * feat: migrate parler to uv Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * feat: migrate petals to uv Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: fix tests Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * feat: migrate rerankers to uv Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * feat: migrate sentencetransformers to uv Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: install uv for tests-linux Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: make sure file exists before installing on intel images Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * feat: migrate transformers backend to uv Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * feat: migrate transformers-musicgen to uv Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * feat: migrate vall-e-x to uv Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * feat: migrate vllm to uv Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: add uv install to the rest of test-extra.yml Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: adjust file perms on all install/run/test scripts Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: add missing acclerate dependencies Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: add some more missing dependencies to python backends Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: parler tests venv py dir fix Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: correct filename for transformers-musicgen tests Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: adjust the pwd for valle tests Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * feat: cleanup and optimization work for uv migration Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: add setuptools to requirements-install for mamba Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * feat: more size optimization work Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * feat: make installs and tests more consistent, cleanup some deps Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: cleanup Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: mamba backend is cublas only Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: uncomment lines in makefile Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> --------- Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>	2024-05-10 15:08:08 +02:00
Ettore Di Giacinto	530bec9c64	feat(llama.cpp): do not specify backends to autoload and add llama.cpp variants (#2232 ) * feat(initializer): do not specify backends to autoload We can simply try to autoload the backends extracted in the asset dir. This will allow to build variants of the same backend (for e.g. with different instructions sets), so to have a single binary for all the variants. Signed-off-by: mudler <mudler@localai.io> * refactor(prepare): refactor out llama.cpp prepare steps Make it so are idempotent and that we can re-build Signed-off-by: mudler <mudler@localai.io> * [TEST] feat(build): build noavx version along Signed-off-by: mudler <mudler@localai.io> * build: make build parallel Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * build: do not override CMAKE_ARGS Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * build: add fallback variant Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(huggingface-langchain): fail if no token is set Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(huggingface-langchain): rename Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: do not autoload local-store Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: give priority between the listed backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: mudler <mudler@localai.io> Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-05-04 17:56:12 +02:00
Ettore Di Giacinto	e843d7df0e	feat(grpc): return consumed token count and update response accordingly (#2035 ) Fixes: #1920	2024-04-15 19:47:11 +02:00
cryptk	a8ebf6f575	fix: respect concurrency from parent build parameters when building GRPC (#2023 ) Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>	2024-04-13 09:14:32 +02:00
Dave	ed5734ae25	test/fix: OSX Test Repair (#1843 ) * test with gguf instead of ggml. Updates testPrompt to match? Adds debugging line to Dockerfile that I've found helpful recently. * fix testPrompt slightly * Sad Experiment: Test GH runner without metal? * break apart CGO_LDFLAGS * switch runner * upstream llama.cpp disables Metal on Github CI! * missed a dir from clean-tests * CGO_LDFLAGS * tmate failure + NO_ACCELERATE * whisper.cpp has a metal fix * do the exact opposite of the name of this branch, but keep it around for unrelated fixes? * add back newlines * add tmate to linux for testing * update fixtures * timeout for tmate	2024-03-18 19:19:43 +01:00
Ettore Di Giacinto	fa9e330fc6	fix(llama.cpp): fix eos without cache (#1852 )	2024-03-18 18:59:24 +01:00
cryptk	020ce29cd8	fix(make): allow to parallelize jobs (#1845 ) * fix: clean up Makefile dependencies to allow for parallel builds * refactor: remove old unused backend from Makefile * fix: finish removing legacy backend, update piper * fix: I broke llama... I fixed llama * feat: give the tests and builds a few threads * fix: ensure libraries are replaced before build, add dropreplace target * Fix image build workflows	2024-03-17 15:39:20 +01:00
Dave	db199f61da	fix: osx build default.metallib (#1837 ) fix: osx build default.metallib (#1837) * port osx fix from refactor pr to slim pr * manually bump llama.cpp version to unstick CI?	2024-03-15 08:18:58 +00:00
Dave	45d520f913	fix: OSX Build Files for llama.cpp (#1836 ) bot ate my changes, seperate branch	2024-03-14 23:07:47 +01:00
cryptk	b423af001d	fix: the correct BUILD_TYPE for OpenCL is clblas (with no t) (#1828 )	2024-03-14 08:39:21 +01:00
Ettore Di Giacinto	bc5f5aa538	deps(llama.cpp): update (#1759 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-02-26 13:18:44 +01:00
Ettore Di Giacinto	8292781045	deps(llama.cpp): update, support Gemma models (#1734 ) deps(llama.cpp): update Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-02-21 17:23:38 +01:00
Ettore Di Giacinto	54ec6348fa	deps(llama.cpp): update (#1714 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-02-21 11:35:44 +01:00
Ettore Di Giacinto	c56b6ddb1c	fix(llama.cpp): disable infinite context shifting (#1704 ) Infinite context loop might as well trigger an infinite loop of context shifting if the model hallucinates and does not stop answering. This has the unpleasant effect that the predicion never terminates, which is the case especially on small models which tends to hallucinate. Workarounds https://github.com/mudler/LocalAI/issues/1333 by removing context-shifting. See also upstream issue: https://github.com/ggerganov/llama.cpp/issues/3969	2024-02-13 21:17:21 +01:00
Ettore Di Giacinto	1c57f8d077	feat(sycl): Add support for Intel GPUs with sycl (#1647 ) (#1660 ) * feat(sycl): Add sycl support (#1647) * onekit: install without prompts * set cmake args only in grpc-server Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * cleanup * fixup sycl source env * Cleanup docs * ci: runs on self-hosted * fix typo * bump llama.cpp * llama.cpp: update server * adapt to upstream changes * adapt to upstream changes * docs: add sycl --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-02-01 19:21:52 +01:00
Ettore Di Giacinto	697c769b64	fix(llama.cpp): enable cont batching when parallel is set (#1622 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-01-21 14:59:48 +01:00
Sebastian	eaf85a30f9	fix(llama.cpp): Enable parallel requests (#1616 ) integrate changes from llama.cpp Signed-off-by: Sebastian <tauven@gmail.com>	2024-01-21 09:56:14 +01:00
Dionysius	441e2965ff	move BUILD_GRPC_FOR_BACKEND_LLAMA logic to makefile: errors in this section now immediately fail the build (#1576 ) * move BUILD_GRPC_FOR_BACKEND_LLAMA option to makefile * review: oversight, fixup cmake_args Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Signed-off-by: Dionysius <1341084+dionysius@users.noreply.github.com> --------- Signed-off-by: Dionysius <1341084+dionysius@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-01-13 10:08:26 +01:00
Ettore Di Giacinto	fd48cb6506	deps(llama.cpp): update and sync grpc server (#1527 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-01-01 14:39:31 +01:00
Chris Natale	e2311a145c	Fix: Set proper Homebrew install location for x86 Macs (#1510 ) * set proper Homebrew install location for x86 Macs * fix: remove prior conditional that my logic replaces	2023-12-30 12:37:26 +01:00
Ettore Di Giacinto	fb6a5bc620	update(llama.cpp): update server, correctly propagate LLAMA_VERSION (#1440 ) * fix(Makefile): correctly propagate LLAMA_VERSION Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * update grpc-server.cpp --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2023-12-15 08:26:48 +01:00
Ettore Di Giacinto	ad0e30bca5	refactor: move backends into the backends directory (#1279 ) * refactor: move backends into the backends directory Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor: move main close to implementation for every backend Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-11-13 22:40:16 +01:00
Ettore Di Giacinto	803a0ac02a	feat(llama.cpp): support lora with scale and yarn (#1277 ) * feat(llama.cpp): support lora with scale Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(llama.cpp): support yarn Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-11-11 18:40:48 +01:00
Ettore Di Giacinto	0eae727366	🔥 add LaVA support and GPT vision API, Multiple requests for llama.cpp, return JSON types (#1254 ) * wip * wip * Make it functional Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * wip * Small fixups * do not inject space on role encoding, encode img at beginning of messages Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add examples/config defaults * Add include dir of current source dir * cleanup * fixes Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups * Revert "fixups" This reverts commit `f1a4731cca`. * fixes Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-11-11 13:14:59 +01:00
Diego	e7fa2e06f8	Fixes the bug 1196 (#1232 ) * Current state of the branch. * Now gRPC is build only when the BUILD_GRPC_FOR_BACKEND_LLAMA variable is defined. * Now the local compilation of gRPC is executed on BUILD_GRPC_FOR_BACKEND_LLAMA. * Revised the Makefile. * Removed replace directives in go.mod. --------- Signed-off-by: Diego <38375572+diego-minguzzi@users.noreply.github.com> Co-authored-by: lunamidori5 <118759930+lunamidori5@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2023-11-06 19:07:46 +01:00
Ettore Di Giacinto	f227e918f9	feat(llama.cpp): Bump llama.cpp, adapt grpc server (#1211 ) * feat(llama.cpp): Bump llama.cpp, adapt grpc server Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-10-25 20:56:25 +02:00
Dave	b839eb80a1	Fix backend/cpp/llama CMakeList.txt on OSX (#1212 ) * Fix backend/cpp/llama CMakeList.txt on OSX - detect OSX and use homebrew libraries * sneak a logging fix in too for gallery debugging * additional logging	2023-10-25 20:53:26 +02:00
Ettore Di Giacinto	004baaa30f	feat(llama.cpp): update Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-10-21 11:04:03 +02:00
Ettore Di Giacinto	128694213f	feat: llama.cpp gRPC C++ backend (#1170 ) * wip: llama.cpp c++ gRPC server Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * make it work, attach it to the build process Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * update deps Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: add protobuf dep Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * try fix protobuf on cmake * cmake: workarounds Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * add packages * cmake: use fixed version of grpc Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * cmake(grpc): install locally * install grpc Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * install required deps for grpc on debian bullseye Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * debug * debug * Fixups * no need to install cmake manually Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: fixup macOS * use brew whenever possible Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * macOS fixups * debug * fix container build Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * workaround * try mac https://stackoverflow.com/questions/23905661/on-mac-g-clang-fails-to-search-usr-local-include-and-usr-local-lib-by-def * Disable temp. arm64 docker image builds --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-10-16 21:46:29 +02:00

1 2 3 4

183 commits