2023-11-23 17:54:55 +00:00
+++
disableToc = false
2026-04-03 07:46:06 +00:00
title = "GPT Vision"
2024-01-18 18:41:08 +00:00
weight = 14
2024-01-19 18:23:58 +00:00
url = "/features/gpt-vision/"
2023-11-23 17:54:55 +00:00
+++
LocalAI supports understanding images by using [LLaVA ](https://llava.hliu.cc/ ), and implements the [GPT Vision API ](https://platform.openai.com/docs/guides/vision ) from OpenAI.

## Usage
OpenAI docs: https://platform.openai.com/docs/guides/vision
To let LocalAI understand and reply with what sees in the image, use the `/v1/chat/completions` endpoint, for example with curl:
```bash
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "llava",
"messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
```
2024-04-10 14:29:46 +00:00
Grammars and function tools can be used as well in conjunction with vision APIs:
```bash
2024-04-10 14:30:03 +00:00
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
2024-04-10 14:29:46 +00:00
"model": "llava", "grammar": "root ::= (\"yes\" | \"no\")",
"messages": [{"role": "user", "content": [{"type":"text", "text": "Is there some grass in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
```
2023-11-23 17:54:55 +00:00
### Setup
2025-11-19 21:21:20 +00:00
To setup the LLaVa models, follow the full example in the [configuration examples ](https://github.com/mudler/LocalAI-examples/blob/main/configurations/llava/llava.yaml ).