LocalAI/docs/content/features/gpt-vision.md


+++
disableToc = false
title = "GPT Vision"
weight = 14
url = "/features/gpt-vision/"
+++

LocalAI supports understanding images by using [LLaVA](https://llava.hliu.cc/), and implements the [GPT Vision API](https://platform.openai.com/docs/guides/vision) from OpenAI.

![llava](https://github.com/mudler/LocalAI/assets/2420543/cb0a0897-3b58-4350-af66-e6f4387b58d3)

## Usage

OpenAI docs: https://platform.openai.com/docs/guides/vision

To let LocalAI understand and reply with what sees in the image, use the `/v1/chat/completions` endpoint, for example with curl:

```bash
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "llava",
     "messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
```

Grammars and function tools can be used as well in conjunction with vision APIs:

```bash
 curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "llava", "grammar": "root ::= (\"yes\" | \"no\")",
     "messages": [{"role": "user", "content": [{"type":"text", "text": "Is there some grass in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
```

### Setup

To setup the LLaVa models, follow the full example in the [configuration examples](https://github.com/mudler/LocalAI-examples/blob/main/configurations/llava/llava.yaml).
docs: Add llava, update hot topics (#1322) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2023-11-23 17:54:55 +00:00
			`+++`
			`disableToc = false`
fix(docs): fix broken references to distributed mode Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2026-04-03 07:46:06 +00:00			`title = "GPT Vision"`
docs/examples: enhancements (#1572) * docs: re-order sections * fix references * Add mixtral-instruct, tinyllama-chat, dolphin-2.5-mixtral-8x7b * Fix link * Minor corrections * fix: models is a StringSlice, not a String Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP: switch docs theme * content * Fix GH link * enhancements * enhancements * Fixed how to link Signed-off-by: lunamidori5 <118759930+lunamidori5@users.noreply.github.com> * fixups * logo fix * more fixups * final touches --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: lunamidori5 <118759930+lunamidori5@users.noreply.github.com> Co-authored-by: lunamidori5 <118759930+lunamidori5@users.noreply.github.com> 2024-01-18 18:41:08 +00:00			`weight = 14`
docs: re-use original permalinks (#1610) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2024-01-19 18:23:58 +00:00			`url = "/features/gpt-vision/"`
docs: Add llava, update hot topics (#1322) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2023-11-23 17:54:55 +00:00			`+++`

			`LocalAI supports understanding images by using [LLaVA](https://llava.hliu.cc/), and implements the [GPT Vision API](https://platform.openai.com/docs/guides/vision) from OpenAI.`

			`![llava](https://github.com/mudler/LocalAI/assets/2420543/cb0a0897-3b58-4350-af66-e6f4387b58d3)`

			`## Usage`

			`OpenAI docs: https://platform.openai.com/docs/guides/vision`

			To let LocalAI understand and reply with what sees in the image, use the `/v1/chat/completions` endpoint, for example with curl:

			```bash
			`curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{`
			`"model": "llava",`
			`"messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'`
			```

Update gpt-vision.md Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> 2024-04-10 14:29:46 +00:00			`Grammars and function tools can be used as well in conjunction with vision APIs:`

			```bash
Update gpt-vision.md Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> 2024-04-10 14:30:03 +00:00			`curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{`
Update gpt-vision.md Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> 2024-04-10 14:29:46 +00:00			`"model": "llava", "grammar": "root ::= (\"yes\" \| \"no\")",`
			`"messages": [{"role": "user", "content": [{"type":"text", "text": "Is there some grass in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'`
			```

docs: Add llava, update hot topics (#1322) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2023-11-23 17:54:55 +00:00			`### Setup`

feat: docs revamp (#7313) * docs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Small enhancements Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Enhancements * Default to zen-dark Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2025-11-19 21:21:20 +00:00			`To setup the LLaVa models, follow the full example in the [configuration examples](https://github.com/mudler/LocalAI-examples/blob/main/configurations/llava/llava.yaml).`