When you install a model from the gallery (or a YAML file), LocalAI intelligently detects the required backend and your system's capabilities, then downloads the correct version for you. Whether you're running on a standard CPU, an NVIDIA GPU, an AMD GPU, or an Intel GPU, LocalAI handles it automatically.
For advanced use cases or to override auto-detection, you can use the `LOCALAI_FORCE_META_BACKEND_CAPABILITY` environment variable. Here are the available options:
-`default`: Forces CPU-only backend. This is the fallback if no specific hardware is detected.
-`nvidia`: Forces backends compiled with CUDA support for NVIDIA GPUs.
-`amd`: Forces backends compiled with ROCm support for AMD GPUs.
-`intel`: Forces backends compiled with SYCL/oneAPI support for Intel GPUs.
Depending on the model architecture and backend used, there might be different ways to enable GPU acceleration. It is required to configure the model you intend to use with a YAML config file. For example, for `llama.cpp` workloads a configuration file might look like this (where `gpu_layers` is the number of layers to offload to the GPU):
```yaml
name: my-model-name
parameters:
# Relative to the models path
model: llama.cpp-model.ggmlv3.q5_K_M.bin
context_size: 1024
threads: 1
f16: true # enable with GPU acceleration
gpu_layers: 22 # GPU Layers (only used when built with cublas)
```
For diffusers instead, it might look like this instead:
For multi-GPU support with diffusers, you need to configure the model with `tensor_parallel_size` set to the number of GPUs you want to use.
```yaml
name: stable-diffusion-multigpu
model: stabilityai/stable-diffusion-xl-base-1.0
backend: diffusers
parameters:
tensor_parallel_size: 2 # Number of GPUs to use
```
The `tensor_parallel_size` parameter is set in the gRPC proto configuration (in `ModelOptions` message, field 55). When this is set to a value greater than 1, the diffusers backend automatically enables `device_map="auto"` to distribute the model across multiple GPUs.
When using diffusers with multiple GPUs, ensure you have sufficient GPU memory across all devices. The model will be automatically distributed across available GPUs. For optimal performance, use GPUs of the same type and memory capacity.
There are a limited number of tested configurations for ROCm systems however most newer deditated GPU consumer grade devices seem to be supported under the current ROCm6 implementation.
Due to the nature of ROCm it is best to run all implementations in containers as this limits the number of packages required for installation on host system, compatibility and package versions for dependencies across all variations of OS must be tested independently if desired, please refer to the [build]({{%relref "installation/build#Acceleration" %}}) documentation.
LocalAI hipblas images are built against the following targets: gfx900,gfx906,gfx908,gfx940,gfx941,gfx942,gfx90a,gfx1030,gfx1031,gfx1100,gfx1101
If your device is not one of these you must specify the corresponding `GPU_TARGETS` and specify `REBUILD=true`. Otherwise you don't need to specify these in the commands below.
2. Check which ROCm version is compatible with your LLVM target and your chosen OS (pay special attention to supported kernel versions). See the following for compatibility for ([ROCm 6.0.0](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.0.0/reference/system-requirements.html)) or ([ROCm 6.0.2](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html))
3. Install you chosen version of the `dkms` and `rocm` (it is recommended that the native package manager be used for this process for any OS as version changes are executed more easily via this method if updates are required). Take care to restart after installing `amdgpu-dkms` and before installing `rocm`, for details regarding this see the installation documentation for your chosen OS ([6.0.2](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/native-install/index.html) or [6.0.0](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.0.0/how-to/native-install/index.html))
# If your gpu is not already included in the current list of default targets the following build details are required.
- REBUILD=true
- BUILD_TYPE=hipblas
- GPU_TARGETS=gfx906 # Example for Radeon VII
devices:
# AMD GPU only require the following devices be passed through to the container for offloading to occur.
- /dev/dri
- /dev/kfd
```
The same can also be executed as a `run` for your container runtime
```
docker run \
-e DEBUG=true \
-e REBUILD=true \
-e BUILD_TYPE=hipblas \
-e GPU_TARGETS=gfx906 \
--device /dev/dri \
--device /dev/kfd \
quay.io/go-skynet/local-ai:master-aio-gpu-hipblas
```
Please ensure to add all other required environment variables, port forwardings, etc to your `compose` file or `run` command.
The rebuild process will take some time to complete when deploying these containers and it is recommended that you `pull` the image prior to deployment as depending on the version these images may be ~20GB in size.
#### Example (k8s) (Advanced Deployment/WIP)
For k8s deployments there is an additional step required before deployment, this is the deployment of the [ROCm/k8s-device-plugin](https://artifacthub.io/packages/helm/amd-gpu-helm/amd-gpu).
For any k8s environment the documentation provided by AMD from the ROCm project should be successful. It is recommended that if you use rke2 or OpenShift that you deploy the SUSE or RedHat provided version of this resource to ensure compatibility.
After this has been completed the [helm chart from go-skynet](https://github.com/go-skynet/helm-charts) can be configured and deployed mostly un-edited.
The following are details of the changes that should be made to ensure proper function.
While these details may be configurable in the `values.yaml` development of this Helm chart is ongoing and is subject to change.
The following details indicate the final state of the localai deployment relevant to GPU function.
This configuration has been tested on a 'custom' cluster managed by SUSE Rancher that was deployed on top of Ubuntu 22.04.4, certification of other configuration is ongoing and compatibility is not guaranteed.
- When installing the ROCM kernel driver on your system ensure that you are installing an equal or newer version that that which is currently implemented in LocalAI (6.0.0 at time of writing).
- If you encounter an `Error 413` on attempting to upload an audio file or image for whisper or llava/bakllava on a k8s deployment, note that the ingress for your deployment may require the annotation `nginx.ingress.kubernetes.io/proxy-body-size: "25m"` to allow larger uploads. This may be included in future versions of the helm chart.
If building from source, you need to install [Intel oneAPI Base Toolkit](https://software.intel.com/content/www/us/en/develop/tools/oneapi/base-toolkit/download.html) and have the Intel drivers available in the system.
These flags are the same as the sections above, depending on the hardware, for [nvidia](#cudanvidia-acceleration), [AMD](#rocmamd-acceleration) or [Intel](#intel-acceleration-sycl).
If you have mixed hardware, you can pass flags for multiple GPUs, for example: