LocalAI/docs/content/getting-started/troubleshooting.md

+++
disableToc = false
title = "Troubleshooting"
weight = 9
url = '/basics/troubleshooting/'
icon = "build"
+++

This guide covers common issues you may encounter when using LocalAI, organized by category. For each issue, diagnostic steps and solutions are provided.

## Quick Diagnostics

Before diving into specific issues, run these commands to gather diagnostic information:

```bash
# Check LocalAI is running and responsive
curl http://localhost:8080/readyz

# List loaded models
curl http://localhost:8080/v1/models

# Check LocalAI version
local-ai --version

# Enable debug logging for detailed output
DEBUG=true local-ai run
# or
local-ai run --log-level=debug
```

For Docker deployments:

```bash
# View container logs
docker logs local-ai

# Check container status
docker ps -a | grep local-ai

# Test GPU access (NVIDIA)
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi
```

## Installation Issues

### Binary Won't Execute on Linux

**Symptoms:** Permission denied or "cannot execute binary file" errors.

**Solution:**

```bash
chmod +x local-ai-*
./local-ai-Linux-x86_64 run
```

If you see "cannot execute binary file: Exec format error", you downloaded the wrong architecture. Verify with:

```bash
uname -m
# x86_64 → download the x86_64 binary
# aarch64 → download the arm64 binary
```

### macOS: Application Is Quarantined

**Symptoms:** macOS blocks LocalAI from running because the DMG is not signed by Apple.

**Solution:** See [GitHub issue #6268](https://github.com/mudler/LocalAI/issues/6268) for quarantine bypass instructions. This is tracked for resolution in [issue #6244](https://github.com/mudler/LocalAI/issues/6244).


## Model Loading Problems

### Model Not Found

**Symptoms:** API returns `404` or `"model not found"` error.

**Diagnostic steps:**

1. Check the model exists in your models directory:
   ```bash
   ls -la /path/to/models/
   ```

2. Verify your models path is correct:
   ```bash
   # Check what path LocalAI is using
   local-ai run --models-path /path/to/models --log-level=debug
   ```

3. Confirm the model name matches your request:
   ```bash
   # List available models
   curl http://localhost:8080/v1/models | jq '.data[].id'
   ```

### Model Fails to Load (Backend Error)

**Symptoms:** Model is found but fails to load, with backend errors in the logs.

**Common causes and fixes:**

- **Wrong backend:** Ensure the backend in your model YAML matches the model format. GGUF models use `llama-cpp`, diffusion models use `diffusers`, etc. See the [compatibility table](/docs/reference/compatibility-table/) for details.
- **Backend not installed:** Check installed backends:
  ```bash
  local-ai backends list
  # Install a missing backend:
  local-ai backends install llama-cpp
  ```
- **Corrupt model file:** Re-download the model. Partial downloads or disk errors can corrupt files.
- **Wrong model format:** LocalAI uses GGUF format for llama.cpp models. Older GGML format is deprecated.

### Model Configuration Issues

**Symptoms:** Model loads but produces unexpected results or errors during inference.

Check your model YAML configuration:

```yaml
# Example model config
name: my-model
backend: llama-cpp
parameters:
  model: my-model.gguf  # Relative to models directory
context_size: 2048
threads: 4  # Should match physical CPU cores
```

Common mistakes:
- `model` path must be relative to the models directory, not an absolute path
- `threads` set higher than physical CPU cores causes contention
- `context_size` too large for available RAM causes OOM errors

## GPU and Memory Issues

### GPU Not Detected

**NVIDIA (CUDA):**

```bash
# Verify CUDA is available
nvidia-smi

# For Docker, verify GPU passthrough
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi
```

When working correctly, LocalAI logs should show: `ggml_init_cublas: found X CUDA devices`.

Ensure you are using a CUDA-enabled container image (tags containing `cuda11`, `cuda12`, or `cuda13`). CPU-only images cannot use NVIDIA GPUs.

**AMD (ROCm):**

```bash
# Verify ROCm installation
rocminfo

# Docker requires device passthrough
docker run --device=/dev/kfd --device=/dev/dri --group-add=video ...
```

If your GPU is not in the default target list, open up an Issue. Supported targets include: gfx908, gfx90a, gfx942, gfx950, gfx1030, gfx1100, gfx1101, gfx1102, gfx1200, gfx1201.

**Intel (SYCL):**

```bash
# Docker requires device passthrough
docker run --device /dev/dri ...
```

Use container images with `gpu-intel` in the tag. **Known issue:** SYCL hangs when `mmap: true` is set — disable it in your model config:

```yaml
mmap: false
```

**Overriding backend auto-detection:**

If LocalAI picks the wrong GPU backend, override it:

```bash
LOCALAI_FORCE_META_BACKEND_CAPABILITY=nvidia local-ai run
# Options: default, nvidia, amd, intel
```

### Out of Memory (OOM)

**Symptoms:** Model loading fails or the process is killed by the OS.

**Solutions:**

1. **Use smaller quantizations:** Q4_K_S or Q2_K use significantly less memory than Q8_0 or Q6_K
2. **Reduce context size:** Lower `context_size` in your model YAML
3. **Enable low VRAM mode:** Add `low_vram: true` to your model config
4. **Limit active models:** Only keep one model loaded at a time:
   ```bash
   local-ai run --max-active-backends=1
   ```
5. **Enable idle watchdog:** Automatically unload unused models:
   ```bash
   local-ai run --enable-watchdog-idle --watchdog-idle-timeout=10m
   ```
6. **Manually unload a model:**
   ```bash
   curl -X POST http://localhost:8080/backend/shutdown \
     -H "Content-Type: application/json" \
     -d '{"model": "model-name"}'
   ```

### Models Stay Loaded and Consume Memory

By default, models remain loaded in memory after first use. This can exhaust VRAM when switching between models.

**Configure LRU eviction:**

```bash
# Keep at most 2 models loaded; evict least recently used
local-ai run --max-active-backends=2
```

**Configure watchdog auto-unload:**

```bash
local-ai run \
  --enable-watchdog-idle --watchdog-idle-timeout=15m \
  --enable-watchdog-busy --watchdog-busy-timeout=5m
```

These can also be set via environment variables (`LOCALAI_WATCHDOG_IDLE=true`, `LOCALAI_WATCHDOG_IDLE_TIMEOUT=15m`) or in the Web UI under Settings → Watchdog Settings.

See the [VRAM Management guide](/advanced/vram-management/) for more details.

## API Connection Problems

### Connection Refused

**Symptoms:** `curl: (7) Failed to connect to localhost port 8080: Connection refused`

**Diagnostic steps:**

1. Verify LocalAI is running:
   ```bash
   # Direct install
   ps aux | grep local-ai

   # Docker
   docker ps | grep local-ai
   ```

2. Check the bind address and port:
   ```bash
   # Default is :8080. Override with:
   local-ai run --address=0.0.0.0:8080
   # or
   LOCALAI_ADDRESS=":8080" local-ai run
   ```

3. Check for port conflicts:
   ```bash
   ss -tlnp | grep 8080
   ```

### Authentication Errors (401)

**Symptoms:** `401 Unauthorized` response.

If API key authentication is enabled (`LOCALAI_API_KEY` or `--api-keys`), include the key in your requests:

```bash
curl http://localhost:8080/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"
```

Keys can also be passed via `x-api-key` or `xi-api-key` headers.

### Request Errors (400/422)

**Symptoms:** `400 Bad Request` or `422 Unprocessable Entity`.

Common causes:
- Malformed JSON in request body
- Missing required fields (e.g., `model` or `messages`)
- Invalid parameter values (e.g., negative `top_n` for reranking)

Enable debug logging to see the full request/response:

```bash
DEBUG=true local-ai run
```

See the [API Errors reference](/reference/api-errors/) for a complete list of error codes and their meanings.

## Performance Issues

### Slow Inference

**Diagnostic steps:**

1. Enable debug mode to see inference timing:
   ```bash
   DEBUG=true local-ai run
   ```

2. Use streaming to measure time-to-first-token:
   ```bash
   curl http://localhost:8080/v1/chat/completions \
     -H "Content-Type: application/json" \
     -d '{"model": "my-model", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'
   ```

**Common causes and fixes:**

- **Model on HDD:** Move models to an SSD. If stuck with HDD, disable memory mapping (`mmap: false`) to load the model entirely into RAM.
- **Thread overbooking:** Set `--threads` to match your physical CPU core count (not logical/hyperthreaded count).
- **Default sampling:** LocalAI uses mirostat sampling by default, which produces better quality output but is slower. Disable it for benchmarking:
  ```yaml
  # In model config
  mirostat: 0
  ```
- **No GPU offloading:** Ensure `gpu_layers` is set in your model config to offload layers to GPU:
  ```yaml
  gpu_layers: 99  # Offload all layers
  ```
- **Context size too large:** Larger context sizes require more memory and slow down inference. Use the smallest context size that meets your needs.

### High Memory Usage

- Use quantized models (Q4_K_M is a good balance of quality and size)
- Reduce `context_size`
- Enable `low_vram: true` in model config
- Disable `mmlock` (memory locking) if it's enabled
- Set `--max-active-backends=1` to keep only one model in memory

## Docker-Specific Problems

### Container Fails to Start

**Diagnostic steps:**

```bash
# Check container logs
docker logs local-ai

# Check if port is already in use
ss -tlnp | grep 8080

# Verify the image exists
docker images | grep localai
```

### GPU Not Available Inside Container

**NVIDIA:**

```bash
# Ensure nvidia-container-toolkit is installed, then:
docker run --gpus all ...
```

**AMD:**

```bash
docker run --device=/dev/kfd --device=/dev/dri --group-add=video ...
```

**Intel:**

```bash
docker run --device /dev/dri ...
```

### Health Checks Failing

Add a health check to your Docker Compose configuration:

```yaml
services:
  local-ai:
    image: localai/localai:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 30s
      timeout: 10s
      retries: 3
```

### Models Not Persisted Between Restarts

Mount a volume for your models directory:

```yaml
services:
  local-ai:
    volumes:
      - ./models:/build/models:cached
```

## Network and P2P Issues

### P2P Workers Not Discovered

**Symptoms:** Distributed inference setup but workers are not found.

**Key requirements:**

- Use `--net host` or `network_mode: host` in Docker
- Share the same P2P token across all nodes

**Debug P2P connectivity:**

```bash
LOCALAI_P2P_LOGLEVEL=debug \
LOCALAI_P2P_LIB_LOGLEVEL=debug \
LOCALAI_P2P_ENABLE_LIMITS=true \
LOCALAI_P2P_TOKEN="<TOKEN>" \
local-ai run
```

**If DHT is causing issues**, try disabling it to use local mDNS discovery instead:

```bash
LOCALAI_P2P_DISABLE_DHT=true local-ai run
```

### P2P Limitations

- Only a single model is currently supported for distributed inference
- Workers must be detected before inference starts — you cannot add workers mid-inference
- Workers mode supports llama-cpp compatible models only

See the [Distributed Inferencing guide](/features/distributed-inferencing/) for full setup instructions.

## Still Having Issues?

If your issue isn't covered here:

1. **Search existing issues:** Check the [GitHub Issues](https://github.com/mudler/LocalAI/issues) for similar problems
2. **Enable debug logging:** Run with `DEBUG=true` or `--log-level=debug` and include the logs when reporting
3. **Open a new issue:** Include your OS, hardware (CPU/GPU), LocalAI version, model being used, full error logs, and steps to reproduce
4. **Community help:** Join the [LocalAI Discord](https://discord.gg/uJAeKSAGDy) for community support
feat: Create comprehensive troubleshooting guide (M1 task) (#8856) * feat: create comprehensive troubleshooting guide (M1 task) - Consolidates troubleshooting information from scattered documentation - Covers installation, model loading, GPU/memory, API, performance, Docker, and network issues - Includes diagnostic commands and step-by-step solutions - Organized by category for easy navigation * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: localai-bot <localai-bot@noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com> 2026-03-08 20:58:32 +00:00			`+++`
			`disableToc = false`
			`title = "Troubleshooting"`
			`weight = 9`
			`url = '/basics/troubleshooting/'`
			`icon = "build"`
			`+++`

			`This guide covers common issues you may encounter when using LocalAI, organized by category. For each issue, diagnostic steps and solutions are provided.`

			`## Quick Diagnostics`

			`Before diving into specific issues, run these commands to gather diagnostic information:`

			```bash
			`# Check LocalAI is running and responsive`
			`curl http://localhost:8080/readyz`

			`# List loaded models`
			`curl http://localhost:8080/v1/models`

			`# Check LocalAI version`
			`local-ai --version`

			`# Enable debug logging for detailed output`
			`DEBUG=true local-ai run`
			`# or`
			`local-ai run --log-level=debug`
			```

			`For Docker deployments:`

			```bash
			`# View container logs`
			`docker logs local-ai`

			`# Check container status`
			`docker ps -a \| grep local-ai`

			`# Test GPU access (NVIDIA)`
			`docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi`
			```

			`## Installation Issues`

			`### Binary Won't Execute on Linux`

			`Symptoms: Permission denied or "cannot execute binary file" errors.`

			`Solution:`

			```bash
			`chmod +x local-ai-*`
			`./local-ai-Linux-x86_64 run`
			```

			`If you see "cannot execute binary file: Exec format error", you downloaded the wrong architecture. Verify with:`

			```bash
			`uname -m`
			`# x86_64 → download the x86_64 binary`
			`# aarch64 → download the arm64 binary`
			```

			`### macOS: Application Is Quarantined`

			`Symptoms: macOS blocks LocalAI from running because the DMG is not signed by Apple.`

			`Solution: See [GitHub issue #6268](https://github.com/mudler/LocalAI/issues/6268) for quarantine bypass instructions. This is tracked for resolution in [issue #6244](https://github.com/mudler/LocalAI/issues/6244).`






			`## Model Loading Problems`

			`### Model Not Found`

			Symptoms: API returns `404` or `"model not found"` error.

			`Diagnostic steps:`

			`1. Check the model exists in your models directory:`
			```bash
			`ls -la /path/to/models/`
			```

			`2. Verify your models path is correct:`
			```bash
			`# Check what path LocalAI is using`
			`local-ai run --models-path /path/to/models --log-level=debug`
			```

			`3. Confirm the model name matches your request:`
			```bash
			`# List available models`
			`curl http://localhost:8080/v1/models \| jq '.data[].id'`
			```

			`### Model Fails to Load (Backend Error)`

			`Symptoms: Model is found but fails to load, with backend errors in the logs.`

			`Common causes and fixes:`

			- Wrong backend: Ensure the backend in your model YAML matches the model format. GGUF models use `llama-cpp`, diffusion models use `diffusers`, etc. See the [compatibility table](/docs/reference/compatibility-table/) for details.
			`- Backend not installed: Check installed backends:`
			```bash
			`local-ai backends list`
			`# Install a missing backend:`
			`local-ai backends install llama-cpp`
			```
			`- Corrupt model file: Re-download the model. Partial downloads or disk errors can corrupt files.`
			`- Wrong model format: LocalAI uses GGUF format for llama.cpp models. Older GGML format is deprecated.`

			`### Model Configuration Issues`

			`Symptoms: Model loads but produces unexpected results or errors during inference.`

			`Check your model YAML configuration:`

			```yaml
			`# Example model config`
			`name: my-model`
			`backend: llama-cpp`
			`parameters:`
			`model: my-model.gguf # Relative to models directory`
			`context_size: 2048`
			`threads: 4 # Should match physical CPU cores`
			```

			`Common mistakes:`
			- `model` path must be relative to the models directory, not an absolute path
			- `threads` set higher than physical CPU cores causes contention
			- `context_size` too large for available RAM causes OOM errors

			`## GPU and Memory Issues`

			`### GPU Not Detected`

			`NVIDIA (CUDA):`

			```bash
			`# Verify CUDA is available`
			`nvidia-smi`

			`# For Docker, verify GPU passthrough`
			`docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi`
			```

			When working correctly, LocalAI logs should show: `ggml_init_cublas: found X CUDA devices`.

			Ensure you are using a CUDA-enabled container image (tags containing `cuda11`, `cuda12`, or `cuda13`). CPU-only images cannot use NVIDIA GPUs.

			`AMD (ROCm):`

			```bash
			`# Verify ROCm installation`
			`rocminfo`

			`# Docker requires device passthrough`
			`docker run --device=/dev/kfd --device=/dev/dri --group-add=video ...`
			```

feat(rocm): bump to 7.x (#9323) feat(rocm): bump to 7.2.1 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2026-04-12 06:51:30 +00:00			`If your GPU is not in the default target list, open up an Issue. Supported targets include: gfx908, gfx90a, gfx942, gfx950, gfx1030, gfx1100, gfx1101, gfx1102, gfx1200, gfx1201.`
feat: Create comprehensive troubleshooting guide (M1 task) (#8856) * feat: create comprehensive troubleshooting guide (M1 task) - Consolidates troubleshooting information from scattered documentation - Covers installation, model loading, GPU/memory, API, performance, Docker, and network issues - Includes diagnostic commands and step-by-step solutions - Organized by category for easy navigation * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: localai-bot <localai-bot@noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com> 2026-03-08 20:58:32 +00:00
			`Intel (SYCL):`

			```bash
			`# Docker requires device passthrough`
			`docker run --device /dev/dri ...`
			```

			Use container images with `gpu-intel` in the tag. Known issue: SYCL hangs when `mmap: true` is set — disable it in your model config:

			```yaml
			`mmap: false`
			```

			`Overriding backend auto-detection:`

			`If LocalAI picks the wrong GPU backend, override it:`

			```bash
			`LOCALAI_FORCE_META_BACKEND_CAPABILITY=nvidia local-ai run`
			`# Options: default, nvidia, amd, intel`
			```

			`### Out of Memory (OOM)`

			`Symptoms: Model loading fails or the process is killed by the OS.`

			`Solutions:`

			`1. Use smaller quantizations: Q4_K_S or Q2_K use significantly less memory than Q8_0 or Q6_K`
			2. Reduce context size: Lower `context_size` in your model YAML
			3. Enable low VRAM mode: Add `low_vram: true` to your model config
			`4. Limit active models: Only keep one model loaded at a time:`
			```bash
			`local-ai run --max-active-backends=1`
			```
			`5. Enable idle watchdog: Automatically unload unused models:`
			```bash
			`local-ai run --enable-watchdog-idle --watchdog-idle-timeout=10m`
			```
			`6. Manually unload a model:`
			```bash
			`curl -X POST http://localhost:8080/backend/shutdown \`
			`-H "Content-Type: application/json" \`
			`-d '{"model": "model-name"}'`
			```

			`### Models Stay Loaded and Consume Memory`

			`By default, models remain loaded in memory after first use. This can exhaust VRAM when switching between models.`

			`Configure LRU eviction:`

			```bash
			`# Keep at most 2 models loaded; evict least recently used`
			`local-ai run --max-active-backends=2`
			```

			`Configure watchdog auto-unload:`

			```bash
			`local-ai run \`
			`--enable-watchdog-idle --watchdog-idle-timeout=15m \`
			`--enable-watchdog-busy --watchdog-busy-timeout=5m`
			```

			These can also be set via environment variables (`LOCALAI_WATCHDOG_IDLE=true`, `LOCALAI_WATCHDOG_IDLE_TIMEOUT=15m`) or in the Web UI under Settings → Watchdog Settings.

			`See the [VRAM Management guide](/advanced/vram-management/) for more details.`

			`## API Connection Problems`

			`### Connection Refused`

			Symptoms: `curl: (7) Failed to connect to localhost port 8080: Connection refused`

			`Diagnostic steps:`

			`1. Verify LocalAI is running:`
			```bash
			`# Direct install`
			`ps aux \| grep local-ai`

			`# Docker`
			`docker ps \| grep local-ai`
			```

			`2. Check the bind address and port:`
			```bash
			`# Default is :8080. Override with:`
			`local-ai run --address=0.0.0.0:8080`
			`# or`
			`LOCALAI_ADDRESS=":8080" local-ai run`
			```

			`3. Check for port conflicts:`
			```bash
			`ss -tlnp \| grep 8080`
			```

			`### Authentication Errors (401)`

			Symptoms: `401 Unauthorized` response.

			If API key authentication is enabled (`LOCALAI_API_KEY` or `--api-keys`), include the key in your requests:

			```bash
			`curl http://localhost:8080/v1/models \`
			`-H "Authorization: Bearer YOUR_API_KEY"`
			```

			Keys can also be passed via `x-api-key` or `xi-api-key` headers.

			`### Request Errors (400/422)`

			Symptoms: `400 Bad Request` or `422 Unprocessable Entity`.

			`Common causes:`
			`- Malformed JSON in request body`
			- Missing required fields (e.g., `model` or `messages`)
			- Invalid parameter values (e.g., negative `top_n` for reranking)

			`Enable debug logging to see the full request/response:`

			```bash
			`DEBUG=true local-ai run`
			```

			`See the [API Errors reference](/reference/api-errors/) for a complete list of error codes and their meanings.`

			`## Performance Issues`

			`### Slow Inference`

			`Diagnostic steps:`

			`1. Enable debug mode to see inference timing:`
			```bash
			`DEBUG=true local-ai run`
			```

			`2. Use streaming to measure time-to-first-token:`
			```bash
			`curl http://localhost:8080/v1/chat/completions \`
			`-H "Content-Type: application/json" \`
			`-d '{"model": "my-model", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'`
			```

			`Common causes and fixes:`

			- Model on HDD: Move models to an SSD. If stuck with HDD, disable memory mapping (`mmap: false`) to load the model entirely into RAM.
			- Thread overbooking: Set `--threads` to match your physical CPU core count (not logical/hyperthreaded count).
			`- Default sampling: LocalAI uses mirostat sampling by default, which produces better quality output but is slower. Disable it for benchmarking:`
			```yaml
			`# In model config`
			`mirostat: 0`
			```
			- No GPU offloading: Ensure `gpu_layers` is set in your model config to offload layers to GPU:
			```yaml
			`gpu_layers: 99 # Offload all layers`
			```
			`- Context size too large: Larger context sizes require more memory and slow down inference. Use the smallest context size that meets your needs.`

			`### High Memory Usage`

			`- Use quantized models (Q4_K_M is a good balance of quality and size)`
			- Reduce `context_size`
			- Enable `low_vram: true` in model config
			- Disable `mmlock` (memory locking) if it's enabled
			- Set `--max-active-backends=1` to keep only one model in memory

			`## Docker-Specific Problems`

			`### Container Fails to Start`

			`Diagnostic steps:`

			```bash
			`# Check container logs`
			`docker logs local-ai`

			`# Check if port is already in use`
			`ss -tlnp \| grep 8080`

			`# Verify the image exists`
			`docker images \| grep localai`
			```

			`### GPU Not Available Inside Container`

			`NVIDIA:`

			```bash
			`# Ensure nvidia-container-toolkit is installed, then:`
			`docker run --gpus all ...`
			```

			`AMD:`

			```bash
			`docker run --device=/dev/kfd --device=/dev/dri --group-add=video ...`
			```

			`Intel:`

			```bash
			`docker run --device /dev/dri ...`
			```

			`### Health Checks Failing`

			`Add a health check to your Docker Compose configuration:`

			```yaml
			`services:`
			`local-ai:`
			`image: localai/localai:latest`
			`healthcheck:`
			`test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]`
			`interval: 30s`
			`timeout: 10s`
			`retries: 3`
			```

			`### Models Not Persisted Between Restarts`

			`Mount a volume for your models directory:`

			```yaml
			`services:`
			`local-ai:`
			`volumes:`
			`- ./models:/build/models:cached`
			```

			`## Network and P2P Issues`

			`### P2P Workers Not Discovered`

			`Symptoms: Distributed inference setup but workers are not found.`

			`Key requirements:`

			- Use `--net host` or `network_mode: host` in Docker
			`- Share the same P2P token across all nodes`

			`Debug P2P connectivity:`

			```bash
			`LOCALAI_P2P_LOGLEVEL=debug \`
			`LOCALAI_P2P_LIB_LOGLEVEL=debug \`
			`LOCALAI_P2P_ENABLE_LIMITS=true \`
			`LOCALAI_P2P_TOKEN="<TOKEN>" \`
			`local-ai run`
			```

			`If DHT is causing issues, try disabling it to use local mDNS discovery instead:`

			```bash
			`LOCALAI_P2P_DISABLE_DHT=true local-ai run`
			```

			`### P2P Limitations`

			`- Only a single model is currently supported for distributed inference`
			`- Workers must be detected before inference starts — you cannot add workers mid-inference`
			`- Workers mode supports llama-cpp compatible models only`

			`See the [Distributed Inferencing guide](/features/distributed-inferencing/) for full setup instructions.`

			`## Still Having Issues?`

			`If your issue isn't covered here:`

			`1. Search existing issues: Check the [GitHub Issues](https://github.com/mudler/LocalAI/issues) for similar problems`
			2. Enable debug logging: Run with `DEBUG=true` or `--log-level=debug` and include the logs when reporting
			`3. Open a new issue: Include your OS, hardware (CPU/GPU), LocalAI version, model being used, full error logs, and steps to reproduce`
			`4. Community help: Join the [LocalAI Discord](https://discord.gg/uJAeKSAGDy) for community support`