LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

History

Ettore Di Giacinto 42754d33b9 fix(buun-llama-cpp): pass WARP_SIZE to argmax __shfl_xor_sync calls Two call sites in ggml/src/ggml-cuda/argmax.cu (the top-K intra-warp merge added by buun) use the 3-arg CUDA form __shfl_xor_sync(mask, var, laneMask), omitting the optional width parameter. The hipification shim at ggml/src/ggml-cuda/vendors/hip.h:33 is a function-like macro that requires all four arguments, so hipcc fails with: argmax.cu:265: too few arguments provided to function-like macro invocation note: macro '__shfl_xor_sync' defined here: #define __shfl_xor_sync(mask, var, laneMask, width) \ __shfl_xor(var, laneMask, width) Every other call in the same file already passes WARP_SIZE explicitly; aligning these two with that convention fixes the hipblas build without changing CUDA codegen (warpSize is the CUDA default). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>		2026-04-24 16:29:29 +00:00
..
patches	fix(buun-llama-cpp): pass WARP_SIZE to argmax __shfl_xor_sync calls	2026-04-24 16:29:29 +00:00
apply-patches.sh	feat(backend): add buun-llama-cpp fork (DFlash + TCQ KV-cache)	2026-04-24 12:52:53 +00:00
Makefile	feat(backend): add buun-llama-cpp fork (DFlash + TCQ KV-cache)	2026-04-24 12:52:53 +00:00
package.sh	feat(backend): add buun-llama-cpp fork (DFlash + TCQ KV-cache)	2026-04-24 12:52:53 +00:00
patch-grpc-server.sh	fix(buun-llama-cpp): drop logit_bias_eog arg from params_from_json_cmpl	2026-04-24 12:52:53 +00:00
run.sh	feat(backend): add buun-llama-cpp fork (DFlash + TCQ KV-cache)	2026-04-24 12:52:53 +00:00