Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.5, DeepSeek, gpt-oss locally.
Find a file
Roland Tannous 21e9a91a57
Studio: forward standard OpenAI tools / tool_choice on /v1/responses (Codex compat) (#5122)
* Studio: forward standard OpenAI tools / tool_choice on /v1/responses

Mirrors the /v1/chat/completions client-side tool pass-through from #5099
so clients (OpenAI Codex CLI, OpenAI Python SDK, ...) that target the
Responses API receive structured function_call output items instead of
plain text with tool-call tokens leaking into content.

- ResponsesRequest: type tools/tool_choice properly, add parallel_tool_calls;
  accept function_call and function_call_output input items for multi-turn
- Translate flat Responses tool / tool_choice shape to the nested Chat
  Completions shape before forwarding to llama-server
- _normalise_responses_input: map function_call_output -> role="tool",
  function_call -> assistant tool_calls (preserving call_id)
- Non-streaming: map returned tool_calls -> top-level function_call
  output items keyed by call_id
- Streaming: emit response.output_item.added (function_call),
  response.function_call_arguments.delta/.done, and response.output_item.done
  per tool call while keeping the text message at output_index 0
- Pytest coverage: tools/tool_choice translation, multi-turn input mapping,
  non-streaming tool_calls mapping, response round-trip

* Studio: merge system messages and close inner stream on /v1/responses

Fixes two issues surfacing when OpenAI Codex CLI drives /v1/responses
against a GGUF with a strict chat template (gpt-oss harmony, Qwen3, ...).

1. "System message must be at the beginning" upstream errors
   Codex sends `instructions` AND a `role:"developer"` message in `input`,
   producing two separate system-role messages. Strict templates raise
   when a second system message exists or when one appears after a user
   turn. _normalise_responses_input now hoists all instructions / system /
   developer content into a single merged system message at the top of
   the Chat Completions message list.

2. "async generator ignored GeneratorExit" / "Attempted to exit cancel
   scope in a different task"
   _responses_stream consumed the inner chat-completions body_iterator
   without an explicit aclose() in a finally block. On client disconnect
   (Codex frequently cancels mid-stream), Python 3.13 finalized the inner
   async generator on a different task, tripping anyio's cancel-scope
   check. Mirrored the same try/finally + aclose pattern used by the
   /v1/messages, /v1/chat/completions, and /v1/completions passthroughs.

Tests: hoisting of instructions + developer, developer mid-conversation,
multiple system messages in input, no-system passthrough.

* Studio: accept Codex multi-turn shapes and fix cross-task stream close on /v1/responses

Two issues observed driving /v1/responses from OpenAI Codex CLI against a
GGUF backend.

1. 422 on every turn after the first
   Codex replays prior assistant turns with
   `content:[{"type":"output_text","text":...,"annotations":[],"logprobs":[]}]`
   and carries forward `reasoning` items (o-series / gpt-5) between turns.
   Our `ResponsesContentPart` union only accepted input_text / input_image,
   and `ResponsesInputItem` only message / function_call / function_call_output,
   so Pydantic failed the whole list and FastAPI returned
   `"Input should be a valid string"` against the `str` branch of the
   outer union.

   - Add `ResponsesOutputTextPart` for assistant-replay content.
   - Add `ResponsesUnknownContentPart` and `ResponsesUnknownInputItem`
     as permissive catch-alls (drop during normalisation).
   - Wire an explicit `Discriminator` so dispatch is deterministic and
     the fallthrough reaches the catch-all instead of misreporting via
     the outer `Union[str, list[...]]`.
   - `_normalise_responses_input` now accepts output_text parts, flattens
     single-part assistant text to a plain string (keeps legacy chat
     templates happy), and silently drops reasoning / unknown items.

2. "async generator ignored GeneratorExit" / cross-task cancel scope
   `_responses_stream` awaited `openai_chat_completions` in the parent
   route-handler task, which opens the httpx client for the inner
   passthrough on *that* task. The outer `StreamingResponse` then iterates
   in a child task, so the asyncgen GC finalises the inner httpcore byte
   stream on the child task, tripping anyio's "Attempted to exit cancel
   scope in a different task". Move the `await` inside `event_generator`
   so the httpx lifecycle stays within the single streaming child task,
   and surface any HTTPException as a `response.failed` SSE frame.

Tests: assistant output_text replay, reasoning-item tolerance, unknown
content-part tolerance, end-to-end Codex-shape payload (developer + user +
reasoning + function_call + function_call_output + assistant output_text +
user), and single-part assistant flattening to plain string.

* Studio: call llama-server directly from streaming /v1/responses

The previous fix (running the inner await inside event_generator) was not
enough. Wrapping the existing `openai_chat_completions` pass-through still
stacks two async generators: when the outer generator is closed, the
innermost `HTTP11ConnectionByteStream.__aiter__` in httpcore doesn't
receive GeneratorExit before Python's asyncgen GC finalises it in a
sibling task, tripping "Attempted to exit cancel scope in a different
task" and "async generator ignored GeneratorExit" — the same Python 3.13
+ httpcore 1.0.x interaction already seen in PRs #4956, #4981, #5099.

Cure both pass-throughs had: a single same-task httpx lifecycle with
explicit `aiter_lines().aclose()` BEFORE `resp.aclose()` / `client.aclose()`
in the generator's finally block.

Apply it at the Responses layer by dropping the wrapper entirely for GGUF:
open httpx, consume `resp.aiter_lines()`, parse `chat.completion.chunk`,
emit Responses SSE events, close everything in finally — all in the
single StreamingResponse child task. Non-GGUF streaming is rejected with
a 400 (wrapping the transformers backend would re-introduce the
double-layer pattern and isn't a Codex-compatible path today anyway).

Also surfaces upstream httpx.RequestError / non-200 as a
`response.failed` SSE frame rather than a dropped stream now that the
request is dispatched after SSE headers have gone out.

* Studio: silence benign httpcore asyncgen GC warnings on Python 3.13

The streaming pass-throughs (/v1/chat/completions, /v1/messages,
/v1/responses, /v1/completions) all use the proven #4981 / #5099 pattern
— single-task httpx lifecycle with explicit aiter_lines().aclose() ahead
of resp.aclose() / client.aclose() in the generator's finally block.
That handles our own iterators correctly.

The residual noise ("async generator ignored GeneratorExit" /
"Attempted to exit cancel scope in a different task") comes from an
innermost HTTP11ConnectionByteStream.__aiter__ that httpcore creates
internally inside its pool. We hold no reference to it, so we cannot
aclose it ourselves. Python 3.13's asyncgen GC hook finalises it on the
finaliser task, its aclose path enters an anyio CancelScope shield, and
Python flags the cross-task exit. The response has already been
delivered with a 200 by then — it is purely log noise, not a functional
failure. Same interaction seen in modelcontextprotocol/python-sdk #831,
agno #3556, chainlit #2361, langchain-mcp-adapters #254.

Install a targeted sys.unraisablehook that swallows this specific tuple
— RuntimeError mentioning "cancel scope" or "GeneratorExit" plus an
object repr referencing HTTP11ConnectionByteStream — and defers to the
default hook for every other unraisable. Idempotent; guarded by a
sentinel attribute so repeated imports don't stack filters.
2026-04-21 13:17:20 +04:00
.github Update dependabot.yml (#4915) 2026-04-08 03:39:50 -07:00
images Add files via upload 2026-04-02 03:00:10 -07:00
scripts Add qwen3.6 script (#5084) 2026-04-17 01:21:30 -07:00
studio Studio: forward standard OpenAI tools / tool_choice on /v1/responses (Codex compat) (#5122) 2026-04-21 13:17:20 +04:00
tests feat: Add cactus QAT scheme support (#4679) 2026-04-15 07:40:03 -07:00
unsloth Studio: Show LoRA live logs and update GGUF quant options (#5058) 2026-04-20 23:14:49 +04:00
unsloth_cli studio: stream export worker output into the export dialog (#4897) 2026-04-14 08:55:43 -07:00
.gitattributes EOL LF (unix line endings) normalization (#3478) 2025-10-17 16:22:42 -07:00
.gitignore Improve AI Assist: Update default model, model output parsing, logging, and dataset mapping UX (#4323) 2026-03-16 16:04:35 +04:00
.pre-commit-ci.yaml pre-commit CI config (#3565) 2025-11-07 14:44:18 -08:00
.pre-commit-config.yaml [pre-commit.ci] pre-commit autoupdate (#5004) 2026-04-14 09:49:18 -07:00
build.sh perf(studio): upgrade to Vite 8 + auto-install bun for faster frontend builds (#4522) 2026-03-25 04:27:41 -07:00
cli.py Rename cli/ to unsloth_cli/ to fix namespace collision with stringzilla (#4393) 2026-03-17 20:40:21 -07:00
CODE_OF_CONDUCT.md Update CODE_OF_CONDUCT.md 2025-10-25 19:31:05 -07:00
CONTRIBUTING.md Revert "Improve documentation on how to export model from Colab" 2026-03-13 22:38:41 -07:00
COPYING Rename cli/ to unsloth_cli/ to fix namespace collision with stringzilla (#4393) 2026-03-17 20:40:21 -07:00
install.ps1 Remove legacy venv Scripts entry from User PATH on upgrade (#5060) 2026-04-16 07:36:59 -07:00
install.sh Bump installer minimum to 2026.4.5 (#5041) 2026-04-15 08:23:41 -07:00
LICENSE Rename cli/ to unsloth_cli/ to fix namespace collision with stringzilla (#4393) 2026-03-17 20:40:21 -07:00
pyproject.toml Versioning 2026-04-16 12:06:10 -07:00
README.md Update README.md 2026-04-20 00:37:40 -07:00
unsloth-cli.py Merge pull request #3612 from Vangmay/feature/raw-text-dataprep 2026-01-08 03:38:15 -08:00

Unsloth logo

Unsloth Studio lets you run and train models locally.

FeaturesQuickstartNotebooksDocumentation


unsloth studio ui homepage

Get started

macOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex

Community:

Features

Unsloth Studio (Beta) lets you run and train text, audio, embedding, vision models on Windows, Linux and macOS.

Inference

Training

  • Train and RL 500+ models up to 2x faster with up to 70% less VRAM, with no accuracy loss.
  • Custom Triton and mathematical kernels. See some collabs we did with PyTorch and Hugging Face.
  • Data Recipes: Auto-create datasets from PDF, CSV, DOCX etc. Edit data in a visual-node workflow.
  • Reinforcement Learning (RL): The most efficient RL library, using 80% less VRAM for GRPO, FP8 etc.
  • Supports full fine-tuning, RL, pretraining, 4-bit, 16-bit and, FP8 training.
  • Observability: Monitor training live, track loss and GPU usage and customize graphs.
  • Multi-GPU training is supported, with major improvements coming soon.

📥 Install

Unsloth can be used in two ways: through Unsloth Studio, the web UI, or through Unsloth Core, the code-based version. Each has different requirements.

Unsloth Studio (web UI)

Unsloth Studio (Beta) works on Windows, Linux, WSL and macOS.

  • CPU: Supported for Chat and Data Recipes currently
  • NVIDIA: Training works on RTX 30/40/50, Blackwell, DGX Spark, Station and more
  • macOS: Currently supports chat and Data Recipes. MLX training is coming very soon
  • AMD: Chat + Data works. Train with Unsloth Core. Studio support is out soon.
  • Coming soon: Training support for Apple MLX, AMD, and Intel.
  • Multi-GPU: Available now, with a major upgrade on the way

macOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex

Launch

unsloth studio -H 0.0.0.0 -p 8888

Update

To update, use the same install commands as above. Or run (does not work on Windows):

unsloth studio update

Docker

Use our Docker image unsloth/unsloth container. Run:

docker run -d -e JUPYTER_PASSWORD="mypassword" \
  -p 8888:8888 -p 8000:8000 -p 2222:22 \
  -v $(pwd)/work:/workspace/work \
  --gpus all \
  unsloth/unsloth

Developer, Nightly, Uninstall

To see developer, nightly and uninstallation etc. instructions, see advanced installation.

Unsloth Core (code-based)

Linux, WSL:

curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv unsloth_env --python 3.13
source unsloth_env/bin/activate
uv pip install unsloth --torch-backend=auto

Windows:

winget install -e --id Python.Python.3.13
winget install --id=astral-sh.uv  -e
uv venv unsloth_env --python 3.13
.\unsloth_env\Scripts\activate
uv pip install unsloth --torch-backend=auto

For Windows, pip install unsloth works only if you have PyTorch installed. Read our Windows Guide. You can use the same Docker image as Unsloth Studio.

AMD, Intel:

For RTX 50x, B200, 6000 GPUs: uv pip install unsloth --torch-backend=auto. Read our guides for: Blackwell and DGX Spark.
To install Unsloth on AMD and Intel GPUs, follow our AMD Guide and Intel Guide.

📒 Free Notebooks

Train for free with our notebooks. You can use our new free Unsloth Studio notebook to run and train models for free in a web UI. Read our guide. Add dataset, run, then deploy your trained model.

Model Free Notebooks Performance Memory use
Gemma 4 (E2B) ▶️ Start for free 1.5x faster 50% less
Qwen3.5 (4B) ▶️ Start for free 1.5x faster 60% less
gpt-oss (20B) ▶️ Start for free 2x faster 70% less
Qwen3.5 GSPO ▶️ Start for free 2x faster 70% less
gpt-oss (20B): GRPO ▶️ Start for free 2x faster 80% less
Qwen3: Advanced GRPO ▶️ Start for free 2x faster 70% less
embeddinggemma (300M) ▶️ Start for free 2x faster 20% less
Mistral Ministral 3 (3B) ▶️ Start for free 1.5x faster 60% less
Llama 3.1 (8B) Alpaca ▶️ Start for free 2x faster 70% less
Llama 3.2 Conversational ▶️ Start for free 2x faster 70% less
Orpheus-TTS (3B) ▶️ Start for free 1.5x faster 50% less

🦥 Unsloth News

  • Qwen3.6: Qwen3.6-35B-A3B can now be trained and run in Unsloth Studio. Blog
  • Gemma 4: Run and train Googles new models directly in Unsloth. Blog
  • Introducing Unsloth Studio: our new web UI for running and training LLMs. Blog
  • Qwen3.5 - 0.8B, 2B, 4B, 9B, 27B, 35-A3B, 112B-A10B are now supported. Guide + notebooks
  • Train MoE LLMs 12x faster with 35% less VRAM - DeepSeek, GLM, Qwen and gpt-oss. Blog
  • Embedding models: Unsloth now supports ~1.8-3.3x faster embedding fine-tuning. BlogNotebooks
  • New 7x longer context RL vs. all other setups, via our new batching algorithms. Blog
  • New RoPE & MLP Triton Kernels & Padding Free + Packing: 3x faster training & 30% less VRAM. Blog
  • 500K Context: Training a 20B model with >500K context is now possible on an 80GB GPU. Blog
  • FP8 & Vision RL: You can now do FP8 & VLM GRPO on consumer GPUs. FP8 BlogVision RL
  • gpt-oss by OpenAI: Read our RL blog, Flex Attention blog and Guide.

📥 Advanced Installation

The below advanced instructions are for Unsloth Studio. For Unsloth Core advanced installation, view our docs.

Developer installs: macOS, Linux, WSL:

git clone https://github.com/unslothai/unsloth
cd unsloth
./install.sh --local
unsloth studio -H 0.0.0.0 -p 8888

Then to update :

unsloth studio update

Developer installs: Windows PowerShell:

git clone https://github.com/unslothai/unsloth.git
cd unsloth
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
.\install.ps1 --local
unsloth studio -H 0.0.0.0 -p 8888

Then to update :

unsloth studio update

Nightly: MacOS, Linux, WSL:

git clone https://github.com/unslothai/unsloth
cd unsloth
git checkout nightly
./install.sh --local
unsloth studio -H 0.0.0.0 -p 8888

Then to launch every time:

unsloth studio -H 0.0.0.0 -p 8888

Nightly: Windows:

Run in Windows Powershell:

git clone https://github.com/unslothai/unsloth.git
cd unsloth
git checkout nightly
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
.\install.ps1 --local
unsloth studio -H 0.0.0.0 -p 8888

Then to launch every time:

unsloth studio -H 0.0.0.0 -p 8888

Uninstall

You can uninstall Unsloth Studio by deleting its install folder usually located under $HOME/.unsloth/studio on Mac/Linux/WSL and %USERPROFILE%\.unsloth\studio on Windows. Using the rm -rf commands will delete everything, including your history, cache:

  • MacOS, WSL, Linux: rm -rf ~/.unsloth/studio
  • Windows (PowerShell): Remove-Item -Recurse -Force "$HOME\.unsloth\studio"

For more info, see our docs.

Deleting model files

You can delete old model files either from the bin icon in model search or by removing the relevant cached model folder from the default Hugging Face cache directory. By default, HF uses:

  • MacOS, Linux, WSL: ~/.cache/huggingface/hub/
  • Windows: %USERPROFILE%\.cache\huggingface\hub\
Type Links
  Discord Join Discord server
  r/unsloth Reddit Join Reddit community
📚 Documentation & Wiki Read Our Docs
  Twitter (aka X) Follow us on X
🔮 Our Models Unsloth Catalog
✍️ Blog Read our Blogs

Citation

You can cite the Unsloth repo as follows:

@software{unsloth,
  author = {Daniel Han, Michael Han and Unsloth team},
  title = {Unsloth},
  url = {https://github.com/unslothai/unsloth},
  year = {2023}
}

If you trained a model with 🦥Unsloth, you can use this cool sticker!  

License

Unsloth uses a dual-licensing model of Apache 2.0 and AGPL-3.0. The core Unsloth package remains licensed under Apache 2.0, while certain optional components, such as the Unsloth Studio UI are licensed under the open-source license AGPL-3.0.

This structure helps support ongoing Unsloth development while keeping the project open source and enabling the broader ecosystem to continue growing.

Thank You to

  • The llama.cpp library that lets users run and save models with Unsloth
  • The Hugging Face team and their libraries: transformers and TRL
  • The Pytorch and Torch AO team for their contributions
  • NVIDIA for their NeMo DataDesigner library and their contributions
  • And of course for every single person who has contributed or has used Unsloth!