Commit graph

5103 commits

Author SHA1 Message Date
Roland Tannous
daaea21af1
Merge branch 'main' into fix/studio-stop-button 2026-04-20 20:54:53 +04:00
Michael Han
b24f3f61b8
Update README.md 2026-04-20 00:37:40 -07:00
Michael Han
f5eec8a6f2
Qwen3.6 and ReadMe revamp.md 2026-04-19 23:16:36 -07:00
pre-commit-ci[bot]
8037c9f928 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-04-19 11:56:36 +00:00
Daniel Han
02562e7449 Merge remote-tracking branch 'staging/pr-5069-tests' into pr-5069-head 2026-04-19 11:56:05 +00:00
Daniel Han
6ee93148eb Consolidate review tests for Studio stop-button cancel flow
Move review-added tests out of test_cancel_dispatch_edges.py into the
existing PR test files that already cover the same areas:
- backend registry fan-out / exclusivity / idempotency / falsy-keys
  edge cases moved into tests/studio/test_cancel_atomicity.py
- frontend plain-fetch (not authFetch) + manual Authorization header
  moved into tests/studio/test_cancel_id_wiring.py
Delete the now-empty test_cancel_dispatch_edges.py.
2026-04-19 11:54:31 +00:00
Daniel Han
e770e76e9f studio: trim comments on stop-button review changes
Collapse multi-paragraph rationale blocks on the cancel registry,
_openai_passthrough_stream, and the frontend onAbortCancel handler
into one-line explanations of why the non-obvious behaviour exists.
Drop authFetch import that became unused when the cancel POST
switched to plain fetch.
2026-04-19 11:51:36 +00:00
Daniel Han
348065814e Add review tests for Studio stop-button cancel flow 2026-04-19 11:48:54 +00:00
Daniel Han
9f60dfedd9 studio: harden cancel registry against ghost-cancel and leak paths
- Revert the session_id/completion_id stash in the fallback cancel
  helper. session_id is thread-scoped and reused across runs, so
  stashing it on an unmatched POST would fire cancel_event for the
  user's next unrelated request via _TrackedCancel.__enter__.
  cancel_id remains the only per-run unique key that gets stashed.
- Default max_tokens to _DEFAULT_MAX_TOKENS in the tool-passthrough
  body. Mirror the direct GGUF path so OpenAI/Anthropic passthrough
  callers who omit max_tokens get the same zombie-decode cap instead
  of relying on the wall-clock backstop alone.
- Wrap _openai_passthrough_stream setup with an outer try/except
  BaseException. The inner except httpx.RequestError does not catch
  asyncio.CancelledError at await client.send, which would otherwise
  leave _tracker registered in _CANCEL_REGISTRY indefinitely.
- Frontend stop POST uses plain fetch + manual Authorization header
  instead of authFetch. A 401 on the cancel POST no longer refreshes
  tokens or redirects the user to the login page mid-stop.
2026-04-19 11:43:39 +00:00
Daniel Han
35f6af4ad0 studio: extend stop-path to passthrough streams; tighten wall-clock cap
- Lower _DEFAULT_T_MAX_PREDICT_MS from 1 hour to 10 minutes so the
  wall-clock backstop actually bounds runaway decodes when cancel
  signaling fails.
- Wire _TrackedCancel and cancel_event.is_set() into
  _openai_passthrough_stream and _anthropic_passthrough_stream and
  disable httpx keepalive so stop requests from /v1 and /v1/messages
  tool-calling clients reach llama-server.
- Apply t_max_predict_ms to the tool-passthrough request body so the
  backstop covers passthrough paths as well.
- Symmetric pre-registration stash for session_id/completion_id
  cancels (_cancel_by_keys_or_stash) so early cancels by those keys
  replay on later registration like cancel_id.
- Drop dead except BaseException guards around StreamingResponse()
  at four streaming sites; cleanup lives in the generator's finally.
2026-04-19 11:19:00 +00:00
pre-commit-ci[bot]
34a4825311 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-04-19 04:40:06 +00:00
Daniel Han
d12c448a31 Consolidate review tests for Studio stop-button cancel flow
- Delete standalone test_cancel_registry.py at repo root: tests duplicated
  test_cancel_atomicity.py / test_cancel_id_wiring.py and re-implemented
  registry primitives inline (scaffolding).
- Extend tests/studio/test_stream_cancel_registration_timing.py with
  regression guards for the iter-1 cancel-loop fixes:
    structural: each streaming generator checks cancel_event in its loop;
                audio_input_stream offloads next() via asyncio.to_thread;
                stream_chunks cancel branch calls reset_generation_state().
    runtime:    Unsloth loop breaks on external cancel and resets state;
                audio loop stays responsive under blocking next();
                both loops emit zero tokens on pre-set cancel (replay path).
2026-04-19 04:38:15 +00:00
Daniel Han
e9f9dcfebb Add review tests for Studio stop-button cancel flow 2026-04-19 04:33:38 +00:00
Daniel Han
f22ed4acc0 studio: make cancel-via-POST interrupt Unsloth and audio-input streams
Close two remaining gaps in the stop-button cancellation wiring:

- stream_chunks (Unsloth path): add a top-of-loop cancel_event check and
  call backend.reset_generation_state() so cancel POSTs flush GPU state
  and close the SSE cleanly instead of relying on request.is_disconnected
  (which does not fire through proxies like Colab's).
- audio_input_stream: run the synchronous audio_input_generate() via
  asyncio.to_thread so blocking whisper chunks do not freeze the event
  loop, matching the pattern already used by the GGUF streaming paths.
2026-04-19 04:12:24 +00:00
pre-commit-ci[bot]
d3b8afdaa9 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-04-19 01:19:10 +00:00
Daniel Han
420c1a9fbd Merge remote-tracking branch 'staging/pr-5069-tests' into pr-5069-head 2026-04-19 01:18:57 +00:00
Daniel Han
f12e07e1bd Consolidate review tests for Studio stop-button cancel flow
- Merge the 6 behavioral tests from test_stream_cleanup_on_disconnect.py
  (finally cleanup on normal/exception/aclose, pre-set cancel_event
  pattern, and its regressions) into test_stream_cancel_registration_timing.py,
  which is the PR's existing file covering the same area.
- Extend structural invariants to include audio_input_stream alongside the
  three GGUF / Unsloth streaming generators: no _tracker.__enter__ inside
  the async gen body, cleanup via try/finally, no background= on
  StreamingResponse.
- Delete test_stream_cleanup_on_disconnect.py (now empty).
2026-04-19 01:17:08 +00:00
Daniel Han
a174b871d8 studio: wire audio-input stream into cancel registry
- Register cancel_event with _TrackedCancel on the audio-input streaming
  path so POST /api/inference/cancel can stop whisper / audio-input GGUF
  runs. Previously the registry stayed empty on this branch, so the stop
  button returned {"cancelled":0} and the decode ran to completion.
- Apply the same finally-based cleanup and pre-iteration cancel-event
  check used on the other three streaming paths.
- Update the _CANCEL_REGISTRY block comment to list cancel_id as the
  primary key (was stale "session_id preferred").
2026-04-19 01:10:56 +00:00
Daniel Han
4400c90181 Add review tests for Studio stop-button 2026-04-19 00:51:41 +00:00
Daniel Han
caa56091fa studio: move cancel cleanup to generator finally; drop dead helper
- Move _tracker.__exit__ from Starlette BackgroundTask into each
  streaming generator's finally block. Starlette skips the background
  callback when stream_response raises (OSError / ClientDisconnect),
  which leaked _CANCEL_REGISTRY entries on abrupt disconnect.
- Check cancel_event.is_set() at the top of each GGUF while loop so a
  pending-replay cancel falls through to final_chunk + [DONE] instead
  of propagating GeneratorExit out of _stream_with_retry.
- Remove unused _remember_pending_cancel; _cancel_by_cancel_id_or_stash
  superseded it.
2026-04-19 00:47:42 +00:00
pre-commit-ci[bot]
510e2115da [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-04-18 12:04:27 +00:00
Daniel Han
6e0a3eb517 Align cancel-route test with exclusive cancel_id semantics 2026-04-18 12:03:09 +00:00
Daniel Han
acdea3f2d5 Consolidate review tests for Studio stop button 2026-04-18 12:03:09 +00:00
Daniel Han
0667520771 Add review tests for Studio stop button 2026-04-18 12:03:09 +00:00
Daniel Han
78573842e0 studio/llama_cpp: drop upstream PR hashes from benchmark comment 2026-04-18 11:56:57 +00:00
Daniel Han
7fbf4061f1 studio: trim verbose comments and docstrings in cancel path 2026-04-18 11:52:40 +00:00
Daniel Han
023bc0cd14 studio: close TOCTOU race and restore wall-clock backstop on UI path
- Close TOCTOU race in the pending-cancel mechanism. The previous fix
  split cancel_inference's (cancel_by_keys + remember_pending_cancel)
  and _TrackedCancel.__enter__'s (register + consume_pending) into
  four separate lock acquisitions. Under contention a cancel POST
  could acquire-then-release the lock, find the registry empty, and
  stash ONLY AFTER __enter__ had already registered and consumed an
  empty pending map -- silently dropping the cancel. Both call sites
  now do their work inside a single _CANCEL_LOCK critical section, via
  the new atomic helper _cancel_by_cancel_id_or_stash() and an
  inlined consume-pending step in __enter__. Reproduced the race under
  forced interleaving pre-fix; 0/2000 drops post-fix under parallel
  stress.

- Apply t_max_predict_ms UNCONDITIONALLY at all three llama-server
  payload sites. The previous iteration gated the cap on
  `max_tokens is None`, which turned out to be dead code on the
  primary Studio UI path: chat-adapter.ts sets
  maxTokens=loadResp.context_length after every model load, so every
  chat request carries an explicit max_tokens and the wall-clock
  safety net never fired. The cap's original purpose is to bound
  stuck decodes regardless of the token budget; it must always apply.

- Raise _DEFAULT_T_MAX_PREDICT_MS from 10 minutes to 1 hour. 10
  minutes was too aggressive for legitimate slow-CPU chat responses
  (a 4096-token reply at 2 tok/s takes ~34 min); 1 hour accommodates
  that and still catches genuine zombie decodes.

- Prune _PENDING_CANCELS inside _cancel_by_keys as well, so stashed
  entries expire proportionally to overall cancel traffic rather than
  only to cancel_id-specific POSTs.
2026-04-18 11:48:23 +00:00
Daniel Han
132d4202c0 studio: harden stop-button cancel semantics and wall-clock cap
- Make /inference/cancel match cancel_id EXCLUSIVELY when supplied.
  Previously the handler iterated ('cancel_id','session_id','completion_id')
  and unioned matches, so a stale cancel POST carrying {cancel_id:old,
  session_id:thr} would still cancel a later run on the same thread via
  the shared session_id. cancel_id is now a per-run exclusive key;
  session_id / completion_id are only used as fallbacks when cancel_id
  is absent.

- Close the early-cancel race. If /inference/cancel lands before the
  streaming handler reaches _TrackedCancel.__enter__() (stop clicked
  during prefill / warmup / proxy buffering), the cancel was silently
  dropped. Stash unmatched cancel_ids in _PENDING_CANCELS with a 30 s
  TTL; _TrackedCancel.__enter__() now replays any matching pending
  cancel by set()-ing the event immediately after registration.

- Make t_max_predict_ms = _DEFAULT_T_MAX_PREDICT_MS conditional on
  max_tokens is None at all three llama-server payload sites. The cap
  is a safety net for callers who leave max_tokens unset (otherwise
  llama-server defaults n_predict to n_ctx, up to 262144). Callers who
  set an explicit max_tokens are already self-limiting and must not be
  silently truncated at 10 minutes on slow CPU / macOS / Windows
  legitimate long generations.

- Guard each StreamingResponse return with try/except BaseException so
  _tracker.__exit__ runs even if StreamingResponse construction or any
  preceding statement raises between _tracker.__enter__() and the
  BackgroundTask attachment. Prevents a registry leak on that narrow
  window.
2026-04-18 11:21:10 +00:00
Daniel Han
2aee7a6c3d Merge remote-tracking branch 'origin/main' 2026-04-18 10:57:37 +00:00
Roland Tannous
ac2daf8b7a
Studio: forward standard OpenAI tools / tool_choice to llama-server (#5099)
* fix(studio): forward OpenAI tools/tool_choice to llama-server (#4999)

Studio's /v1/chat/completions silently stripped standard OpenAI `tools`
and `tool_choice` fields, so clients using standard function calling
(opencode, Claude Code, Cursor, Continue, ...) never got structured
tool_calls back. Adds a client-side pass-through path mirroring the
existing Anthropic /v1/messages flow: when `tools` is present without
Studio's `enable_tools` shorthand, the request is forwarded to
llama-server verbatim so the client sees native id, finish_reason
("tool_calls"), delta.tool_calls, and accurate usage tokens.

Also wires Anthropic tool_choice forwarding: /v1/messages previously
accepted tool_choice on the request model but silently dropped it with
a warning. Translate the four Anthropic shapes to OpenAI format and
forward them so agentic clients can actually enforce tool use.

- ChatCompletionRequest: add tools, tool_choice, stop; extra="allow"
- ChatMessage: accept role="tool", optional tool_call_id / tool_calls /
  name; content is now optional (assistant with only tool_calls)
- routes/inference.py: _openai_passthrough_stream /
  _openai_passthrough_non_streaming helpers, routing branch in
  openai_chat_completions, vision+tools via content-parts injection
- _build_passthrough_payload: tool_choice parameter (default "auto")
- anthropic_compat: anthropic_tool_choice_to_openai() translator
- tests/test_openai_tool_passthrough.py: Pydantic + translator unit tests
- tests/test_studio_api.py: 5 new E2E tests (non-stream, stream,
  multi-turn, OpenAI SDK, Anthropic tool_choice=any regression)

* fix(studio): surface httpx transport errors from OpenAI passthrough

When the managed llama-server subprocess crashes mid-request, the
async pass-through helpers in routes/inference.py used to return a
bare 500 (non-streaming) or an "An internal error occurred" SSE chunk
(streaming) because _friendly_error only recognized the sync path's
"Lost connection to llama-server" substring -- httpx transport
failures (ConnectError / ReadError / RemoteProtocolError /
ReadTimeout) stringify differently and fell through to the generic
case.

- _friendly_error: map any httpx.RequestError subclass to the same
  "Lost connection to the model server" message the sync chat path
  emits. Placed before the substring heuristics so the streaming path
  automatically picks it up via its existing except Exception catch.
- _openai_passthrough_non_streaming: wrap the httpx.AsyncClient.post
  in a try/except httpx.RequestError and re-raise as HTTPException
  502 with the friendly detail.
- tests/test_openai_tool_passthrough.py: new TestFriendlyErrorHttpx
  class pinning the mapping for ConnectError, ReadError,
  RemoteProtocolError, ReadTimeout, and confirming non-httpx paths
  (context-size heuristic, generic fallback) are unchanged.

* fix(studio): close aiter_bytes/aiter_lines explicitly in passthroughs

The httpcore asyncgen cleanup fix in 5cedd9a5 is incomplete on Python
3.13 + httpcore 1.0.x: it switched to manual client/response lifecycle
but still used anonymous `async for raw_line in resp.aiter_lines():`
patterns in all three streaming paths. Python's async for does NOT
auto-close the iterator on break/return, so the aiter_lines /
aiter_bytes async generator remains alive, reachable only from the
surrounding coroutine frame. Once `_stream()` returns the frame is
GC'd and the orphaned asyncgen is finalized on a LATER GC pass in a
DIFFERENT asyncio task, where httpcore's
HTTP11ConnectionByteStream.aclose() enters anyio.CancelScope.__exit__
with a mismatched task and prints "Exception ignored in: <async
generator>" / "async generator ignored GeneratorExit" / "Attempted
to exit cancel scope in a different task" to the server log.

User observed this on /v1/messages after successful (status 200)
requests, with the traceback pointing at HTTP11ConnectionByteStream
.__aiter__ / .aclose inside httpcore.

Fix: save resp.aiter_lines() / resp.aiter_bytes() as a variable and
explicitly `await iter.aclose()` in the finally block BEFORE
resp.aclose() / client.aclose(). This closes the asyncgen inside the
current task's event loop, so the internal httpcore byte stream is
cleaned up before Python's asyncgen GC hook has anything orphaned to
finalize. Each aclose is wrapped in try/except Exception so nested
anyio cleanup noise can't bubble out.

Applied to all three streaming passthrough paths:
- _anthropic_passthrough_stream (/v1/messages client-side tool path)
- _openai_passthrough_stream (/v1/chat/completions client-side tool
  path, new in this PR)
- openai_completions (/v1/completions bytes proxy from PR #4956)

* fix(studio): default ChatCompletionRequest.stream to false per OpenAI spec

OpenAI's /v1/chat/completions spec defaults `stream` to false, so
clients that omit the field (naive curl, minimal integrations) expect
a single JSON response back. Studio was defaulting to true, silently
switching those clients into SSE and breaking any parser that didn't
also handle streaming. ResponsesRequest and AnthropicMessagesRequest
already default to false correctly; only ChatCompletionRequest was
wrong.

Studio's own frontend always sets `stream` explicitly on every
chat-adapter / chat-api / runtime-provider call site, so the flip has
no UI impact. SDK users (OpenAI Python/JS SDK, opencode, Claude Code,
Cursor, Continue) also always pass `stream` explicitly, so they're
unaffected. The only clients feeling the change are raw-curl users
who were relying on the wrong default -- those get the correct OpenAI
behavior now.

Added a regression test pinning the default so it can't silently
flip back.

* fix(studio): reject images in OpenAI tool passthrough for text-only GGUFs

The new tool passthrough branch runs before _extract_content_parts,
skipping the existing not is_vision guard. Requests combining tools
with an image on a text-only tool-capable GGUF were forwarded to
llama-server, producing opaque upstream errors instead of the
pre-existing clear 400. Restore the guard inline at the dispatch
point, checking both legacy image_base64 and inline image_url parts.

* fix(studio): require tool_call_id on role=tool chat messages

Enforce the OpenAI spec rule that role="tool" messages must carry a
tool_call_id. Without it, upstream backends cannot associate a tool
result with the assistant's prior tool_calls entry and the request
fails in non-obvious ways through the passthrough path. Reject at the
request boundary with a 422 instead.

* fix(studio): harden OpenAI tool passthrough validation and error surfacing

Three related fixes called out by the PR review:

1. Preserve upstream status codes in the streaming passthrough. The
   httpx request is now dispatched before the StreamingResponse is
   constructed. Non-200 upstream responses and httpx RequestError
   transport failures raise HTTPException with the real status
   instead of being buried inside a 200 SSE error frame, so OpenAI
   SDK clients see APIError/BadRequestError/... as expected.

2. Require non-empty content on user/system/tool messages. Per the
   OpenAI spec, content may only be omitted on assistant messages
   that carry tool_calls; enforce that at the request boundary so
   malformed messages never reach the passthrough path.

3. Role-constrain tool-call metadata. tool_calls is only valid on
   role=assistant, tool_call_id and name only on role=tool. Without
   this, a user/system message with tool_calls would flip the
   passthrough branch on and be forwarded to llama-server, surfacing
   as an opaque upstream error.

* fix(studio): normalize image mode and passthrough JSON verbatim

Two Gemini-code-assist review findings on PR #5099:

1. Unconditionally convert decoded images to RGB before PNG encoding.
   The prior code only handled RGBA, letting CMYK/I/F images crash
   at img.save(format="PNG") and surface as opaque 400s. Applied to
   both the passthrough helper and the non-passthrough GGUF path
   that originally carried this pattern, keeping the two sites in
   sync.

2. Return the upstream JSON body as raw bytes via Response rather
   than parse-then-re-serialize with JSONResponse. Matches the
   passthrough helper's "verbatim" contract and drops a redundant
   round-trip.

---------

Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-18 12:53:23 +04:00
pre-commit-ci[bot]
163052a734 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-04-18 01:02:00 +00:00
Daniel Han
46ca892958 Add review tests for PR #5069 2026-04-18 00:59:48 +00:00
Daniel Han
f259215286 studio: close cancel-race and stale-cancel gaps in stop path
- Register the cancel tracker before returning StreamingResponse so a
  stop POST that arrives during prefill / warmup / proxy buffering
  finds an entry in _CANCEL_REGISTRY. Cleanup now runs via a Starlette
  BackgroundTask instead of a finally inside the async generator body.
- Add a per-run cancel_id on the frontend (crypto.randomUUID) and in
  ChatCompletionRequest so /api/inference/cancel matches one specific
  generation. Removes the stale-cancel bug where pressing stop then
  starting a new run in the same thread would cancel the retry.
- Apply t_max_predict_ms unconditionally in all three llama-server
  payload builders (previously gated on max_tokens=None, which made it
  dead code for UI callers that always send params.maxTokens). Raise
  the default to 10 minutes so slow CPU / macOS / Windows installs are
  not cut off mid-generation.
- Make _cancel_by_keys refuse empty input (return 0) so a future
  internal caller can not accidentally mass-cancel every in-flight
  request.
- Accept cancel_id (primary), session_id, and completion_id on the
  /api/inference/cancel route. Unify the three streaming sites on the
  same _cancel_keys / _tracker variable names.
- Annotate _CANCEL_REGISTRY as dict[str, set[threading.Event]].
2026-04-18 00:56:39 +00:00
Daniel Han
b0735f71db studio: harden stop-button cancel path and scope cancel route
- Require at least one identifier for /api/inference/cancel so a missing
  thread id cannot silently cancel every in-flight generation.
- Scope /cancel to a dedicated studio_router so it is not exposed under
  the /v1 OpenAI-compat prefix as a surprise endpoint.
- Store a set of cancel events per key in _CANCEL_REGISTRY so concurrent
  requests on the same session_id do not overwrite each other, and
  deduplicate in _cancel_by_keys so the cancelled count reflects unique
  requests.
- Always send session_id with chat completions (not only when tools are
  enabled) so non-tool GGUF streams register under it and are reachable
  from /cancel.
- Register the non-GGUF stream_chunks path in the cancel registry too,
  so transformers-based stop-button works behind proxies that swallow
  fetch aborts.
- Only apply the 2-minute t_max_predict_ms wall-clock cap when the
  caller did not pass max_tokens, so legitimate long generations on
  slow CPU/macOS/Windows supported installs are not silently truncated.
- Remove the abort listener on normal stream completion so reused
  AbortSignals cannot fire a spurious cancel POST after the fact.
2026-04-18 00:31:16 +00:00
Daniel Han
667dfd66f8 Merge remote-tracking branch 'origin/main' into pr-5069-head 2026-04-18 00:13:50 +00:00
Manan Shah
7d0d2f256c
Add qwen3.6 script (#5084)
* unsloth gemma4 support files

* some fixes

* Fixing cache.empty() calls (#4813)

* Fixing cache.empty() calls

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Manan Shah <mananshah@Manans-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix/gemma4 mlx (#4816)

* Fixing cache.empty() calls

* fixing for mlx versions

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Manan Shah <mananshah@Manans-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* removed bidirectional check for 31b (#4839)

Co-authored-by: Manan17 <shahmanan170602@gmail.coml>

* Add Gemma 4 26B MoE support (MLX) (#4844)

* removed bidirectional check for 31b

* Change gemma4_text for moe

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Manan Shah <mananshah@Manans-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix(gemma4): cast RoPE offset to int before mx.arange() (#4901)

* fix(gemma4): cast RoPE offset to int before mx.arange()

* fix(gemma4): use zero-based arange + offset to avoid CPU-GPU sync

* qwen3.6 patches for multi-turn chat

* qwen3.6 script

* removing unnecessary scripts

* displaying errors for not installed packages

---------

Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: Manan Shah <mananshah@Manans-MacBook-Pro.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Manan17 <shahmanan170602@gmail.coml>
Co-authored-by: Théophile Lafargue <138336683+eauchs@users.noreply.github.com>
2026-04-17 01:21:30 -07:00
Daniel Han
d20b306755 Versioning 2026-04-16 12:06:10 -07:00
pre-commit-ci[bot]
5dfbf37aa1 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-04-16 18:58:46 +00:00
danielhanchen
9d26096fe3 Studio: make stop button actually stop generation
The UI stop button routes through assistant-ui's cancelRun, which aborts
the frontend fetch. Four issues combined to let llama-server keep decoding
long after the user clicked stop:

1. request.is_disconnected() does not fire reliably behind proxies
   (e.g. Colab) that don't propagate fetch aborts.
2. llama-server defaults n_predict to n_ctx when max_tokens is not sent,
   so a cancelled request keeps producing tokens up to 262144.
3. The httpx.Client pool keeps TCP keep-alive, so even a cleanly closed
   stream reuses the same connection and llama-server's liveness poll
   never sees a disconnect.
4. No explicit backend route to cancel - every cancel path relied on
   is_disconnected.

Changes:
- Add POST /api/inference/cancel keyed by session_id/completion_id, with
  a registry populated for the lifetime of each streaming response.
- Have the frontend (chat-adapter.ts) POST /inference/cancel on
  AbortController abort, alongside the existing fetch teardown.
- Send max_tokens=4096 + t_max_predict_ms=120000 as defaults on every
  outbound chat completion to llama-server; honoured by user overrides.
- Disable httpx keep-alive on the streaming client so connection close
  reaches llama-server and its 1s liveness check fires.

No behaviour changes for non-streaming paths or for existing callers
that already pass max_tokens/session_id.
2026-04-16 18:57:59 +00:00
Daniel Han
0b57884120
Add Qwen3.6 inference defaults for Studio (#5065)
* Add Qwen3.6 inference defaults for Studio

Add qwen3.6 family entry to inference_defaults.json with the
recommended sampling parameters from Qwen's documentation:
temperature=0.7, top_p=0.8, top_k=20, min_p=0.0,
presence_penalty=1.5, repetition_penalty=1.0.

Without this, Qwen3.6 models fall through to the generic qwen3
pattern which uses different defaults (temperature=0.6,
top_p=0.95, no presence_penalty).

* Add Qwen3.6-35B-A3B-GGUF to default model lists

* Add Qwen3.5/3.6 presence_penalty to thinking toggle and small-model disable logic

- Thinking toggle (on-load + button click) now sets presencePenalty: 1.5 for
  Qwen3.5 and Qwen3.6 models (both thinking-ON and thinking-OFF states)
- Small-model thinking-disable check (<9B defaults to no-thinking) extended
  from Qwen3.5-only to also cover Qwen3.6, in all 3 locations:
  frontend on-load, frontend refresh, backend llama_cpp.py
2026-04-16 11:42:42 -07:00
Daniel Han
d56f980452
fix: multi-GPU inference crash for bnb 4-bit/8-bit models (#5068)
* fix: multi-GPU inference crash for bnb 4-bit/8-bit models

When load_in_4bit or load_in_8bit is used with device_map="sequential"
and max_memory constraints that place weights across multiple GPUs (or
entirely on a non-default GPU like cuda:1), the bitsandbytes loading
path in transformers never calls dispatch_model. No AlignDevicesHook is
installed, and the first forward/generate call crashes with:

  RuntimeError: Expected all tensors to be on the same device

This adds _attach_bnb_multidevice_hooks() which is called after
from_pretrained returns. It infers a device map from actual parameter
placements and calls dispatch_model(force_hooks=True) to install the
missing hooks. The function is a complete no-op for the common
single-GPU cuda:0 case.

Call sites: FastBaseModel.from_pretrained (vision.py) and
FastLlamaModel.from_pretrained (llama.py).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: align with PR #5053 final review improvements

- Add hook call to the bnb quantized loading branch in llama.py (the
  primary load_in_4bit path), not just the non-fast-inference fallback
- Expand bnb detection: also check model.is_loaded_in_4bit,
  model.is_loaded_in_8bit, model.quantization_method
- Pass explicit main_device and skip_keys to dispatch_model
- Use logger.info instead of print for the success message
- Use kwargs.get("load_in_8bit", False) at llama.py call sites

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-16 11:35:02 -07:00
Lee Jackson
ee86530e55
chore: switch helper and no-cache fallback to Gemma (#5066) 2026-04-16 22:27:30 +04:00
Wasim Yousef Said
bc9ddb3af6
Fix onboarding followups (#5064)
* Fix onboarding followups

* Rename sidebar studio to train
2026-04-16 10:11:35 -07:00
Wasim Yousef Said
7ef65bd2e5
Chat first onboarding (#5063)
* auth: default to chat

* settings: relaunch onboarding

* onboarding: return to launch page

* studio: stop auto guided tour

* ui: soften global radius

* cleanup: rename onboarding exit prop

* fix onboarding redirect safety

* Show real Unsloth version in settings

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-16 09:58:10 -07:00
हिमांशु
f4422b0a62
change torchcodec version to 0.10.0 in extra-no-deps (#5043)
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
2026-04-16 19:50:57 +04:00
Wasim Yousef Said
b01e9af124
feat(studio): replace navbar with collapsible sidebar (#4936)
* feat(studio): replace navbar navigation with collapsible sidebar

Add an app-wide sidebar with hover-expand and pin-to-dock behavior.
Navigation items (Studio, Recipes, Export, Chat) move from the center
pill navbar to the sidebar. Chat threads and recipes render as
collapsible sub-lists. Navbar simplified to logo + update + close.

- Extend SidebarProvider with pinned/hovered state model
- New AppSidebar with animated active indicator, sloth profile menu,
  theme toggle, guided tour, back/forward navigation
- Chat page refactored to URL-driven view state via search params
- Extract reusable hooks for chat thread and recipe sidebar data
- Guard startViewTransition for browser compatibility
- Wrap chat deletions in Dexie transaction for data integrity

* feat(studio): move logo to sidebar and make navbar overlay

- Sidebar is now full-height with logo in SidebarHeader
- Collapsed sidebar shows sticker.png, expanded shows full logo
- Navbar is absolute-positioned overlay (no layout space)
- Main content extends to top, aligning with navbar controls

* feat(studio): full-height sidebar with recents, edge-to-edge nav buttons

- Sidebar outside max-w-7xl, pinned to left edge
- Remove sidebar rounding, menu buttons rounded-md
- Nav buttons flush to sidebar edges with no left rounding
- Replace collapsible recipes/chat with flat nav items
- Add Recents section with chat history (1 item when not on chat, full on chat)
- New Chat as first nav item with PencilEdit02Icon
- Cursor pointer on all sidebar buttons
- Navbar temporarily hidden for screenshots

* fix(studio): fix chat scroll, action bar hover, collapsible recents

- Fix sticky composer by removing `relative` override on viewport footer
- Action bar buttons only show on hover (autohide=always)
- Remove floating border/shadow from action bar
- Add scroll space above composer for last message actions
- Back/forward buttons use router history (stay in-app)
- Recents section collapsible with chevron on chat route
- Set html/body/#root height for proper h-full chain

* fix(studio): address review feedback, clean up unused code

- Unhide navbar (was left hidden from screenshot)
- Remove unused imports: SidebarMenuSub*, BubbleChatIcon, ColumnInsertIcon
- Remove unused vars: recipeItems, activeRecipeId, canCompare, recipesOpen
- Include compare query id in active sidebar selection
- Use store type for contextUsage instead of inline type
- Simplify noop in sidebar.tsx
- Remove empty className prop

* feat(studio): add mobile sidebar, recent runs section, and misc UX fixes

* feat(studio): scaffold settings feature module with dialog store

* feat(studio): add tri-state theme store for settings

* feat(chat): add clear-all-chats and export-chat-history utils

* feat(studio): add settings dialog shell with tab rail

* feat(studio): add appearance tab with theme and sidebar pin

* feat(studio): add settings general tab with hf token, auto-title, reset prefs

* feat(studio): add settings chat tab with export and clear

* feat(studio): add api keys tab with list and revoke flow

* feat(studio): add create-key form and reveal dialog

* feat(studio): add usage examples panel to api keys tab

* feat(studio): add settings about tab with update and shutdown

* feat(studio): add settings dropdown item and cmd-comma shortcut

* feat(studio): remove legacy api-keys route and chat-sheet preference rows

* fix(studio): settings dialog a11y + polish pass

* feat(studio): inline api key reveal card replacing nested dialog

* fix(studio): hide revoked keys from settings list

* refactor(studio): strip navbar and hoist training unload guard

* feat(studio): explicit sidebar toggle, remove hover-open and pin icons

* fix(studio): use SidebarRight01Icon for collapsed sidebar open toggle

* fix(studio): address code review findings for settings dialog

* feat(studio): collapsible navigate group with standalone new-chat and compare

* fix(studio): chat-only standalone actions, use ColumnInsertIcon for compare

* fix(studio): sidebar new-chat/compare state reset and icon-mode collapsible

* feat(studio): add compact logo assets for sidebar header

* Fixed sidebar design

* fix(studio): sidebar delete icon hover contrast and sizing

* feat(studio): route-gate sidebar recents (chats off /studio, runs on /studio)

* feat(studio): add chat search store

* feat(studio): add chat search index hook with snapshot-on-open

* feat(studio): add chat search command dialog with global shortcut

* feat(studio): wire chat search into sidebar

* fix(studio): trim hf token on save, add show/hide toggle, commit on close

* revert(studio): restore original sidebar/border colors, brighten sidebar

* feat(studio): forward overlayClassName through CommandDialog

* fix(studio): wrap search dialog in Command context, redesign as flat 635px card

* fix(studio): reserve right padding on recent items so delete icon stops overlapping title

* fix(studio): skip hf token unmount-commit during reset-prefs reload

* chore(studio): drop unused icon import and unreachable runs navigate branch

* fix(studio): chat search index filters archived before limit, batches message query, picks up reasoning text

* fix(studio): keep CommandEmpty in tree so empty state renders correctly

* fix(studio): cap system prompt and chat template textareas so they scroll instead of growing

* fix(studio): attach chat-compare tour anchor to sidebar compare button

* fix(studio): persist system theme explicitly so next-themes does not clobber on reload

* fix(studio): auto-switch to history tab when selecting a recent run from sidebar

* UI overhaul: chatbox, scrollbar, sidebar, and compare view

UI Changes:
- Redesigned the Compare UI with general cleanup
- Redesigned the Chatbox UI
- Reduced the width of the user chat bubble for improved readability
- Narrowed the user chat box across the content page
- Adjusted thinking-box text color to be slightly darker
- Removed faded text effect from chat messages
- Removed faded text effect from the thinking box
- Added a small LLM chat safety note at the bottom of the chatbox
- Restyled the scrollbar

Layout & Behavior:
- Reworked the scrollbar to span the full height of the page (no top/bottom padding) and remain persistently visible when content is scrollable, rather than only on hover
- Reworked the Configuration sidebar to span full height — removed rounded corners and borders, with the scrollbar adjusted to match the full top-to-bottom layout
- Adjusted the top menu and bottom chatbox content areas to work correctly with the new full-page scroll behavior
- Made chat content match the chatbox width, with content sliding slightly behind the chatbox when scrolling
- Aligned chat text width with the chatbox for visual consistency, including how far the text extends behind the chatbox

Fixes:
- Fixed the chatbox not auto-expanding when typing multi-line input while bottom-positioned during an active chat (previously only worked before a chat had started)
- Fixed positioning and design of the user chat hover menu buttons to match the assistant chat box — now displayed below the chat bubble instead of on the left side

* Fix user message layout in thread component

* swap code icon

* fix compare layout

* fix compare pane flex

* Sidebar improvements and fixes

- Added scrolling support to the sidebar so menus and recent chats no longer get hidden
- Recent chats are now always visible in the sidebar, not hidden when in Studio, Recipes, or Export
- Recent chat is now deselected when selecting other navigations
- Fixed sidebar glitch where browser resize could make the sidebar and expand button disappear completely
- Fixed glitch where the open-sidebar hover tooltip appeared above the logo when clicking expand sidebar
- Reduced sidebar width on mobile to around 2/3 of the screen (was too wide)
- Made the close-sidebar hover tooltip consistent with the rest of the design
- Removed sidebar collapse/expand animation
- Small adjustment to chat width

* Fix route scrolling, polling, and theme sync issues

* Fix Studio page scrolling

---------

Co-authored-by: sneakr <hauzin@hotmail.com>
2026-04-16 08:46:16 -07:00
Daniel Han
05ec0f110b
Studio: Ollama support, recommended folders, Custom Folders UX polish (#5050)
* Studio: Ollama support, recommended folders, Custom Folders UX polish

Backend:
- Add _scan_ollama_dir that reads manifests/registry.ollama.ai/library/*
  and creates .gguf symlinks under <ollama_dir>/.studio_links/ pointing
  at the content-addressable blobs, so detect_gguf_model and llama-server
  -m work unchanged for Ollama models
- Filter entries under .studio_links from the generic models/hf/lmstudio
  scanners to avoid duplicate rows and leaked internal paths in the UI
- New GET /api/models/recommended-folders endpoint returning LM Studio
  and Ollama model directories that currently exist on the machine
  (OLLAMA_MODELS env var + standard paths, ~/.lmstudio/models, legacy
  LM Studio cache), used by the Custom Folders quick-add chips
- detect_gguf_model now uses os.path.abspath instead of Path.resolve so
  the readable symlink name is preserved as display_name (e.g.
  qwen2.5-0.5b-Q4_K_M.gguf instead of sha256-abc...)
- llama-server failure with a path under .studio_links or .cache/ollama
  surfaces a friendlier message ("Some Ollama models do not work with
  llama.cpp. Try a different model, or use this model directly through
  Ollama instead.") instead of the generic validation error

Frontend:
- ListLabel supports an optional leading icon and collapse toggle; used
  for Downloaded (download icon), Custom Folders (folder icon), and
  Recommended (star icon)
- Custom Folders header gets folder icon on the left, and +, search,
  and chevron buttons on the right; chevron uses ml-auto so it aligns
  with the Downloaded and Recommended chevrons
- New recommended folder chips render below the registered scan folders
  when there are unregistered well-known paths; one click adds them as
  a scan folder
- Custom folder rows that are direct .gguf files (Ollama symlinks) load
  immediately via onSelect instead of opening the GGUF variant expander
  (which is for repos containing multiple quants, not single files)
- When loading a direct .gguf file path, send max_seq_length = 0 so the
  backend uses the model's native context instead of the 4096 chat
  default (qwen2.5:0.5b now loads at 32768 instead of 4096)
- New listRecommendedFolders() helper on the chat API

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review: log silent exceptions and support read-only Ollama dirs

Replace silent except blocks in _scan_ollama_dir and the
recommended-folders endpoint with narrower exception types plus debug
or warning logs, so failures are diagnosable without hiding signal.

Add _ollama_links_dir helper that falls back to a per-ollama-dir hashed
namespace under Studio's own cache (~/.unsloth/studio/cache/ollama_links)
when the Ollama models directory is read-only. Common for system installs
at /usr/share/ollama/.ollama/models and /var/lib/ollama/.ollama/models
where the Studio process has read but not write access. Previously the
scanner returned an empty list in that case and Ollama models would
silently not appear.

The fallback preserves the .gguf suffix on symlink names so
detect_gguf_model keeps recognising them. The prior "raw sha256 blob
path" fallback would have missed the suffix check and failed to load.

* Address review: detect mmproj next to symlink target for vision GGUFs

Codex P1 on model_config.py:1012: when detect_gguf_model returns the
symlink path (to preserve readable display names), detect_mmproj_file
searched the symlink's parent directory instead of the target's. For
vision GGUFs surfaced via Ollama's .studio_links/ -- where the weight
file is symlinked but any mmproj sidecar lives next to the real blob
-- mmproj was no longer detected, so the model was misclassified as
text-only and llama-server would start without --mmproj.

detect_mmproj_file now adds the resolved target's parent to the scan
order when path is a symlink. Direct (non-symlink) .gguf paths are
unchanged, so LM Studio and HF cache layouts keep working exactly as
before. Verified with a fake layout reproducing the bug plus a
regression check on a non-symlink LM Studio model.

* Address review: support all Ollama namespaces and vision projector layers

- Iterate over all directories under registry.ollama.ai/ instead of
  hardcoding the "library" namespace. Custom namespaces like
  "mradermacher/llama3" now get scanned and include the namespace
  prefix in display names, model IDs, and symlink names to avoid
  collisions.

- Create companion -mmproj.gguf symlinks for Ollama vision models
  that have an "application/vnd.ollama.image.projector" layer, so
  detect_mmproj_file can find the projector alongside the model.

- Extract symlink creation into _make_symlink helper to reduce
  duplication between model and projector paths.

* Address review: move imports to top level and add scan limit

- Move hashlib and json imports to the top of the file (PEP 8).
- Remove inline `import json as _json` and `import hashlib` from
  function bodies, use the top-level imports directly.
- Add `limit` parameter to `_scan_ollama_dir()` with early exit
  when the threshold is reached.
- Pass `_MAX_MODELS_PER_FOLDER` into the scanner so it stops
  traversing once enough models are found.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review: Windows fallback, all registry hosts, collision safety

_make_link (formerly _make_symlink):
- Falls back to os.link() hardlink when symlink_to() fails (Windows
  without Developer Mode), then to shutil.copy2 as last resort
- Uses atomic os.replace via tmp file to avoid race window where the
  .gguf path is missing during rescan

Scanner now handles all Ollama registry layouts:
- Uses rglob over manifests/ instead of hardcoding registry.ollama.ai
- Discovers hf.co/org/repo:tag and any other host, not just library/
- Filenames include a stable sha1 hash of the manifest path to prevent
  collisions between models that normalize to the same stem

Per-model subdirectories under .studio_links/:
- Each model's links live in their own hash-keyed subdirectory
- detect_mmproj_file only sees the projector for that specific model,
  not siblings from other Ollama models

Friendly Ollama error detection:
- Now also matches ollama_links/ (the read-only fallback cache path)
  and model_identifier starting with "ollama/"

Recommended folders:
- Added os.access(R_OK | X_OK) check so unreadable system directories
  like /var/lib/ollama/.ollama/models are not advertised as chips

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review: filter ollama_links from generic scanners

The generic scanners (models_dir, hf_cache, lmstudio) already filter
out .studio_links to avoid duplicate Ollama entries, but missed the
ollama_links fallback cache directory used for read-only Ollama
installs. Add it to the filter.

* Address review: idempotent link creation and path-component filter

_make_link:
- Skip recreation when a valid link/copy already exists (samefile or
  matching size check). Prevents blocking the model-list API with
  multi-GB copies on repeated scans.
- Use uuid4 instead of os.getpid() for tmp file names to avoid race
  conditions from concurrent scans.
- Log cleanup errors instead of silently swallowing them.

Path filter:
- Use os.sep-bounded checks instead of bare substring match to avoid
  false positives on paths like "my.studio_links.backup/model.gguf".

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review: drop copy fallback, targeted glob, robust path filter

_make_link:
- Drop shutil.copy2 fallback -- copying multi-GB GGUFs inside a sync
  API request would block the backend. Log a warning and skip the
  model when both symlink and hardlink fail.

Scanner:
- Replace rglob("*") with targeted glob patterns (*/*/* and */*/*/*)
  to avoid traversing unrelated subdirectories in large custom folders.

Path filter:
- Use Path.parts membership check instead of os.sep substring matching
  for robustness across platforms.

Scan limit:
- Skip _scan_ollama_dir when _generic already fills the per-folder cap.

* Address review: sha256, top-level uuid import, Path.absolute()

- Switch hashlib.sha1 to hashlib.sha256 for path hashing consistency.
- Move uuid import to the top of the file instead of inside _make_link.
- Replace os.path.abspath with Path.absolute() in detect_gguf_model
  to match the pathlib style used throughout the codebase.

* Address review: fix stale comments (sha1, rglob, copy fallback)

Update three docstrings/comments that still referenced the old
implementation after recent changes:
- sha1 comment now says "not a security boundary" (no hash name)
- "rglob" -> "targeted glob patterns"
- "file copies as a last resort" -> removed (copy fallback was dropped)

* Address review: fix stale links, support all manifest depths, scope error

_make_link:
- Drop size-based idempotency shortcut that kept stale links after
  ollama pull updates a tag to a same-sized blob. Only samefile()
  is used now -- if the link doesn't point at the exact same inode,
  it gets replaced.

Scanner:
- Revert targeted glob back to rglob so deeper OCI-style repo names
  (5+ path segments) are not silently skipped.

Ollama error:
- Only show "Some Ollama models do not work with llama.cpp" when the
  server output contains GGUF compatibility hints (key not found,
  unknown architecture, failed to load). Unrelated failures like
  OOM or missing binaries now show the generic error instead of
  being misdiagnosed.

---------

Co-authored-by: Daniel Han <info@unsloth.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: danielhanchen <michaelhan2050@gmail.com>
2026-04-16 08:24:08 -07:00
Daniel Han
ff23ce40b4
Fix review findings for chat-template repair (#5049) (#5056)
* Fix review findings for PR #49

1. Sandbox fallback Jinja env in _VariantTokenizerProxy.apply_chat_template
   (use SandboxedEnvironment, matching _derive_assistant_prefix_by_render)
2. Unwrap benign outer-If guards in _template_ends_with_toplevel_for so
   templates like {% if messages %}{% for ... %}{% endfor %}{% endif %}
   are still repairable (preserves Qwen3-Guard rejection via else-branch
   and add_generation_prompt-name checks)
3. Preserve raw name_or_path in _VariantTokenizerProxy._source_path so
   local-path detection works for dict/list variant tokenizers
4. Context-aware strict-mode messages: omit "will still load" and
   "Set UNSLOTH_STRICT_CHAT_TEMPLATE=1" when already raising

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-16 08:02:05 -07:00
Daniel Han
b42e3a120d
Remove legacy venv Scripts entry from User PATH on upgrade (#5060)
Older installers persisted the venv Scripts directory directly in the
User PATH registry. The shim approach from #4961 no longer writes that
entry, but on upgrade the old one survived and python.exe / pip.exe
from the unsloth venv continued winning resolution in every new shell.

Before creating the shim, read the current User PATH, filter out any
entry matching $VenvDir\Scripts (using the same symmetric raw+expanded
comparison as Add-ToUserPath), and write back if changed. No-op on
fresh installs where the legacy entry was never written.

Confirmed on a real Windows machine: `where.exe python` was returning
the venv interpreter first even after the shim PR merged.
2026-04-16 07:36:59 -07:00
Daniel Han
5b8643969e Revert "Remove legacy venv Scripts entry from User PATH on upgrade"
This reverts commit cae4a74297.
2026-04-16 14:20:43 +00:00