Commit graph

  • 6d816b3cb6
    Merge e348be8ce0 into 21e9a91a57 Daniel Han 2026-04-21 13:19:11 +0000
  • e348be8ce0 inference: add AGPLv3 license headers flex-fast-inference-gate Daniel Han 2026-04-21 13:19:01 +0000
  • 5e1ec3395a [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2026-04-21 12:00:15 +0000
  • 35231d4ff4 inference: expose flex backend via UNSLOTH_FAST_INFERENCE=1 Daniel Han 2026-04-21 11:59:47 +0000
  • e72a3802ed
    Merge 1847125b7a into 21e9a91a57 Daniel Han 2026-04-21 11:50:37 +0000
  • 1847125b7a [pre-commit.ci] auto fixes from pre-commit.com hooks transformers-continuous-batching-qwen3 pre-commit-ci[bot] 2026-04-21 11:50:30 +0000
  • b294fbd3dc benchmarks: verify_gemma4_numerics -- compare against raw HF, not just shell Daniel Han 2026-04-21 10:00:48 +0000
  • ca9dbce98d benchmarks: add verify_*_numerics scripts to cross-check flex vs vanilla HF Daniel Han 2026-04-21 09:54:13 +0000
  • dee9371769 benchmarks: gemma4_flex_inference -- drop sidecar, link shared layers to store cache Daniel Han 2026-04-21 09:40:12 +0000
  • 37a7405fc7
    Merge daaea21af1 into 21e9a91a57 Daniel Han 2026-04-21 09:39:55 +0000
  • 37c2feb5a0
    Merge a73223d5c7 into 21e9a91a57 Wojciech 2026-04-21 19:21:10 +1000
  • 21e9a91a57
    Studio: forward standard OpenAI tools / tool_choice on /v1/responses (Codex compat) (#5122) main Roland Tannous 2026-04-21 13:17:20 +0400
  • 11f5a3a08b
    Merge branch 'main' into feature/codex-compatibility-responses-api-tools Roland Tannous 2026-04-21 13:15:24 +0400
  • db68b28a89
    Merge 713135690b into c20959dbf4 cheehook 2026-04-21 17:13:24 +0800
  • d5a4ee22ad benchmarks: gemma4_flex_inference -- per-layer-type sliding window mask Daniel Han 2026-04-21 08:59:54 +0000
  • 8fb0c2e2a7 benchmarks: add gemma4_flex_inference for unsloth/gemma-4-E2B-it Daniel Han 2026-04-21 08:45:05 +0000
  • 50870a8e87
    Merge 4b23854854 into c20959dbf4 shivam johri 2026-04-21 14:11:21 +0530
  • 4b23854854
    Merge branch 'main' into fix/private-dataset-splits-metadata shivam johri 2026-04-21 14:11:18 +0530
  • e854c02f48 Studio: silence benign httpcore asyncgen GC warnings on Python 3.13 Roland Tannous 2026-04-21 12:29:12 +0400
  • 96b1ffd376 benchmarks: add --chat_template and --enforce_eager to cb_vs_vllm_generation Daniel Han 2026-04-21 07:52:47 +0000
  • 108c4f1b05
    Merge dd6219dab7 into c20959dbf4 Konstantin Azizov 2026-04-21 07:36:13 +0000
  • 97874d8be1
    Merge f32a5465e2 into c20959dbf4 Lei Zhenyuan 2026-04-21 09:34:15 +0200
  • ff75e5c96e [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2026-04-21 06:52:20 +0000
  • 5bfefc2377 flex: generalize qwen3_flex_inference.py to Llama-3.2 Daniel Han 2026-04-21 06:51:51 +0000
  • 04443f318c
    Merge 3edc9cf178 into c20959dbf4 Daniel Han 2026-04-20 23:33:23 -0700
  • dfbe0d0795
    Merge 05354af3a6 into c20959dbf4 Daniel Han 2026-04-21 15:33:23 +0900
  • 4466a57ae7
    Merge b408ddf7e5 into c20959dbf4 Datta Nimmaturi 2026-04-21 06:29:00 +0000
  • cab06d8d73
    Merge 831aea71cf into c20959dbf4 Daniel Han 2026-04-21 15:27:23 +0900
  • b408ddf7e5 gate tokenizer.model saving Datta Nimmaturi 2026-04-21 06:25:29 +0000
  • ce2b3ec475
    Merge cee070f4ea into c20959dbf4 Datta Nimmaturi 2026-04-21 06:24:26 +0000
  • cee070f4ea [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2026-04-21 06:24:20 +0000
  • bc73ee7beb Add gemma4 chat template tests Daniel Han 2026-04-21 06:22:51 +0000
  • 238989b492
    Merge 3d64e11cc7 into c20959dbf4 Daniel Han 2026-04-21 15:21:23 +0900
  • feffd60a4b
    Merge f003507878 into c20959dbf4 Daniel Han 2026-04-21 15:21:23 +0900
  • a94cece8f6 [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2026-04-21 06:19:54 +0000
  • a68d346e77 benchmarks: consolidate GRPO entrypoints and extract shared helpers Daniel Han 2026-04-21 06:18:23 +0000
  • f22f8d8b6f
    Merge 08abf2f8a8 into c20959dbf4 Daniel Han 2026-04-21 15:09:23 +0900
  • 2ad97aa75f update template for gemma4 Datta Nimmaturi 2026-04-21 06:03:54 +0000
  • 8792e5da7b flex: drop scripts/benchmarks/results/stats JSONs Daniel Han 2026-04-21 05:52:11 +0000
  • 77db85f15c
    Merge 76054a562a into c20959dbf4 Daniel Han 2026-04-21 15:48:16 +1000
  • dc76ef6cbb flex: drop 13 unreferenced stats JSONs from results/stats Daniel Han 2026-04-21 05:47:10 +0000
  • 677d3d5f2b
    Merge 280f69a6bb into c20959dbf4 Leo Borcherding 2026-04-21 00:45:20 -0500
  • 280f69a6bb [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2026-04-21 05:45:15 +0000
  • 736ba25b6f [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2026-04-21 05:41:08 +0000
  • 82e14e7ec8 flex: auto-detect FA4 prefill on Hopper / Blackwell Daniel Han 2026-04-21 05:39:49 +0000
  • 669ec080ee
    Merge e32db1e1a1 into c20959dbf4 Daniel Han 2026-04-21 11:00:52 +0530
  • a2abdc9034
    Merge eb93dda4fc into c20959dbf4 Daniel Han 2026-04-21 10:58:57 +0530
  • 4d9fc836a4
    Merge ae38c3639d into c20959dbf4 Daniel Han 2026-04-21 10:50:10 +0530
  • 0c1145e612 fix: distinct warning when UNSLOTH_NO_TORCH set with torch installed refs #5008 LeoBorcherding 2026-04-21 00:19:52 -0500
  • 71f87e44f2
    Merge f35e027ece into c20959dbf4 Daniel Han 2026-04-21 10:31:25 +0530
  • c52a21457f fix: only warn 'no GPU' for CPU torch path, not no-torch path refs #5008 LeoBorcherding 2026-04-21 00:00:55 -0500
  • 1b2bd65e38 [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2026-04-21 04:58:24 +0000
  • 61a0058157
    Merge 345fe0373a into c20959dbf4 Daniel Han 2026-04-21 10:23:45 +0530
  • 4c47207497 flex: mark create_block_mask compile dynamic so bs=64 prefill works Daniel Han 2026-04-21 04:53:20 +0000
  • 314ab6ae86 flex: fuse LoRA refresh into a single torch.addmm per layer Daniel Han 2026-04-21 04:53:06 +0000
  • 06a1007c6c flex: double-copy LoRA rollout to avoid bf16 merge/unmerge drift Daniel Han 2026-04-21 04:52:49 +0000
  • 69cc8008e9
    Merge 9b6bc70fe3 into c20959dbf4 Lennie Budgell 2026-04-20 22:38:33 -0600
  • dbccc44ba6 fix: respect UNSLOTH_NO_TORCH flag even when torch is installed refs #5008 LeoBorcherding 2026-04-20 23:35:48 -0500
  • 33625ed0c0 fix: clean up _HAS_TORCH and _NO_TORCH_MODE module-level variables refs #5008 LeoBorcherding 2026-04-20 23:28:09 -0500
  • 08f34a5d10
    Merge e558b71df9 into c20959dbf4 Daniel Han 2026-04-21 09:53:12 +0530
  • 1e41b7fee4 fix: remove env var mutation for no-torch mode detection refs #5008 LeoBorcherding 2026-04-20 23:19:03 -0500
  • 7d9d895d2f
    Merge feabba825a into c20959dbf4 Daniel Han 2026-04-21 13:13:27 +0900
  • 20bc1a73a6
    Merge e82835f17c into c20959dbf4 Daniel Han 2026-04-21 13:12:27 +0900
  • eddd4c2ce0
    Merge ef98930175 into c20959dbf4 Daniel Han 2026-04-21 00:11:07 -0400
  • 9d6c024a7f fix(cpu): use warnings.warn for CPU mode message for consistency refs #5008 LeoBorcherding 2026-04-20 23:08:48 -0500
  • a6fff19c73 fix(cpu): remove redundant early return that bypassed torch.accelerator check refs #5008 LeoBorcherding 2026-04-20 23:07:09 -0500
  • 859a70bba1
    Merge branch 'main' into gemma4_template Datta Nimmaturi 2026-04-21 09:27:45 +0530
  • 1af3b9c7fb
    Merge branch 'main' into fix_tokenizer_save_gemma Datta Nimmaturi 2026-04-21 09:27:17 +0530
  • 8502cc5417 fix: allow CPU and no-torch import initialization refs #5008 LeoBorcherding 2026-04-20 19:19:17 -0700
  • 2118e201f5
    Merge 5b4f360ff0 into c20959dbf4 Zohair Shafi 2026-04-21 03:09:28 +0000
  • 5b4f360ff0 [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2026-04-21 03:09:22 +0000
  • 2476a313ed wiki: harden merge maintenance and ingest stream handling zohairshafi 2026-04-20 23:06:00 -0400
  • 61c2e5c105 flex: switch to merge_adapter (reversible) + reframe writeup Daniel Han 2026-04-21 03:05:28 +0000
  • 676fe095a9
    Merge 97b9fede9d into c20959dbf4 Daniel Han 2026-04-21 02:34:05 +0000
  • ab37acd5e0 flex: fair comparison -- benchmark LoRA active, not merged Daniel Han 2026-04-21 02:22:48 +0000
  • 4717bce97e flex: support --load_in_4bit with PEFT adapter (bnb-4bit shard) Daniel Han 2026-04-21 02:13:06 +0000
  • 7f73110005
    Merge cc33d27e5e into c20959dbf4 Samit Shah 2026-04-20 19:08:39 -0700
  • 7b729c0a68
    Merge 01cdf0f974 into c20959dbf4 Daniel Han 2026-04-21 02:02:30 +0000
  • f401341d54 fix: allow no-torch import path during setup refs #5008 LeoBorcherding 2026-04-20 20:43:31 -0500
  • 7ea97fbaca
    Merge c9cb7d0e20 into c20959dbf4 Daniel Han 2026-04-21 01:42:20 +0000
  • a6ad4436f8
    Merge 0400928135 into c20959dbf4 Daniel Han 2026-04-20 17:42:51 -0700
  • 3c16803519
    Merge 5564fe1750 into c20959dbf4 Daniel Han 2026-04-21 00:36:57 +0000
  • 5564fe1750 [pre-commit.ci] auto fixes from pre-commit.com hooks fix/bnb-multidevice-inference-hooks pre-commit-ci[bot] 2026-04-21 00:36:51 +0000
  • 78e2ece54c Merge remote-tracking branch 'staging/pr-5053-tests' into pr-5053-head Daniel Han 2026-04-21 00:36:39 +0000
  • e169ff2cd7 Consolidate review tests Daniel Han 2026-04-21 00:34:27 +0000
  • 0c00397304 Add review tests Daniel Han 2026-04-21 00:34:27 +0000
  • a38ceafc88 Add review tests Daniel Han 2026-04-21 00:34:27 +0000
  • f11e520d06 Merge remote-tracking branch 'origin/main' Daniel Han 2026-04-21 00:34:27 +0000
  • 2b9aee82db Merge remote-tracking branch 'origin/main' Daniel Han 2026-04-20 23:24:32 +0000
  • 7441e6d72a [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2026-04-21 00:25:25 +0000
  • cc033fee19 flex: test FA4 prefill + Inductor autotune replay (both regress) Daniel Han 2026-04-21 00:24:45 +0000
  • 019107ac9f [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2026-04-20 23:30:16 +0000
  • 69723ee31c FlexKernelOptions sweep: flex reaches 72% of vLLM at batch 64 + LoRA Daniel Han 2026-04-20 23:30:05 +0000
  • 9741e2ff0b Split: keep only 10 file(s) Daniel Han 2026-04-20 23:19:40 +0000
  • 7bf1119493 Studio: call llama-server directly from streaming /v1/responses Roland Tannous 2026-04-21 03:16:28 +0400
  • 74ce9594c6 Studio: accept Codex multi-turn shapes and fix cross-task stream close on /v1/responses Roland Tannous 2026-04-21 02:59:01 +0400
  • c20959dbf4
    Studio: Improve chat composition, fix scroll behaviour, and refine sidebar UX (#5089) Lee Jackson 2026-04-20 23:20:45 +0100
  • 537eb3ad99 Studio: merge system messages and close inner stream on /v1/responses Roland Tannous 2026-04-21 02:14:39 +0400
  • f9edad6022 Revert "Refactor compare page dual chat scrolling behavior" sneakr 2026-04-21 00:06:24 +0200
  • 70dff56b12 fix: print CPU mode message when no GPU detected refs #5008 LeoBorcherding 2026-04-20 17:03:44 -0500