mirror of
https://github.com/mudler/LocalAI
synced 2026-05-06 06:48:23 +00:00
105 lines
4.1 KiB
Markdown
105 lines
4.1 KiB
Markdown
|
|
+++
|
||
|
|
disableToc = false
|
||
|
|
title = "Sound Generation"
|
||
|
|
weight = 19
|
||
|
|
url = "/features/sound-generation/"
|
||
|
|
+++
|
||
|
|
|
||
|
|
LocalAI supports generating audio from text descriptions via the `/v1/sound-generation` endpoint. This endpoint is compatible with the [ElevenLabs sound generation API](https://elevenlabs.io/docs/api-reference/sound-generation) and can produce music, sound effects, and other audio content.
|
||
|
|
|
||
|
|
## API
|
||
|
|
|
||
|
|
- **Method:** `POST`
|
||
|
|
- **Endpoint:** `/v1/sound-generation`
|
||
|
|
|
||
|
|
### Request
|
||
|
|
|
||
|
|
The request body is JSON. There are two usage modes: simple and advanced.
|
||
|
|
|
||
|
|
#### Simple mode
|
||
|
|
|
||
|
|
| Parameter | Type | Required | Description |
|
||
|
|
|------------------|----------|----------|----------------------------------------------|
|
||
|
|
| `model_id` | `string` | Yes | Model identifier |
|
||
|
|
| `text` | `string` | Yes | Audio description or prompt |
|
||
|
|
| `instrumental` | `bool` | No | Generate instrumental audio (no vocals) |
|
||
|
|
| `vocal_language` | `string` | No | Language code for vocals (e.g. `bn`, `ja`) |
|
||
|
|
|
||
|
|
#### Advanced mode
|
||
|
|
|
||
|
|
| Parameter | Type | Required | Description |
|
||
|
|
|---------------------|----------|----------|-------------------------------------------------|
|
||
|
|
| `model_id` | `string` | Yes | Model identifier |
|
||
|
|
| `text` | `string` | Yes | Text prompt or description |
|
||
|
|
| `duration_seconds` | `float` | No | Target duration in seconds |
|
||
|
|
| `prompt_influence` | `float` | No | Temperature / prompt influence parameter |
|
||
|
|
| `do_sample` | `bool` | No | Enable sampling |
|
||
|
|
| `think` | `bool` | No | Enable extended thinking for generation |
|
||
|
|
| `caption` | `string` | No | Caption describing the audio |
|
||
|
|
| `lyrics` | `string` | No | Lyrics for the generated audio |
|
||
|
|
| `bpm` | `int` | No | Beats per minute |
|
||
|
|
| `keyscale` | `string` | No | Musical key/scale (e.g. `Ab major`) |
|
||
|
|
| `language` | `string` | No | Language code |
|
||
|
|
| `vocal_language` | `string` | No | Vocal language (fallback if `language` is empty) |
|
||
|
|
| `timesignature` | `string` | No | Time signature (e.g. `4`) |
|
||
|
|
| `instrumental` | `bool` | No | Generate instrumental audio (no vocals) |
|
||
|
|
|
||
|
|
### Response
|
||
|
|
|
||
|
|
Returns a binary audio file with the appropriate `Content-Type` header (e.g. `audio/wav`, `audio/mpeg`, `audio/flac`, `audio/ogg`).
|
||
|
|
|
||
|
|
## Usage
|
||
|
|
|
||
|
|
### Generate a sound effect
|
||
|
|
|
||
|
|
```bash
|
||
|
|
curl http://localhost:8080/v1/sound-generation \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-d '{
|
||
|
|
"model_id": "sound-model",
|
||
|
|
"text": "rain falling on a tin roof"
|
||
|
|
}' \
|
||
|
|
--output rain.wav
|
||
|
|
```
|
||
|
|
|
||
|
|
### Generate a song with vocals
|
||
|
|
|
||
|
|
```bash
|
||
|
|
curl http://localhost:8080/v1/sound-generation \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-d '{
|
||
|
|
"model_id": "sound-model",
|
||
|
|
"text": "a soft Bengali love song for a quiet evening",
|
||
|
|
"instrumental": false,
|
||
|
|
"vocal_language": "bn"
|
||
|
|
}' \
|
||
|
|
--output song.wav
|
||
|
|
```
|
||
|
|
|
||
|
|
### Generate music with advanced parameters
|
||
|
|
|
||
|
|
```bash
|
||
|
|
curl http://localhost:8080/v1/sound-generation \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-d '{
|
||
|
|
"model_id": "sound-model",
|
||
|
|
"text": "upbeat pop",
|
||
|
|
"caption": "A funky Japanese disco track",
|
||
|
|
"lyrics": "[Verse 1]\nDancing in the neon lights",
|
||
|
|
"think": true,
|
||
|
|
"bpm": 120,
|
||
|
|
"duration_seconds": 225,
|
||
|
|
"keyscale": "Ab major",
|
||
|
|
"language": "ja",
|
||
|
|
"timesignature": "4"
|
||
|
|
}' \
|
||
|
|
--output disco.wav
|
||
|
|
```
|
||
|
|
|
||
|
|
## Error Responses
|
||
|
|
|
||
|
|
| Status Code | Description |
|
||
|
|
|-------------|--------------------------------------------------|
|
||
|
|
| 400 | Missing or invalid model or request parameters |
|
||
|
|
| 500 | Backend error during sound generation |
|