inference: add AGPLv3 license headers

All eight files introduced on this branch now carry the SPDX AGPLv3 header used by the MoE kernels. flex_paged_attention.py keeps its BSD 3-Clause attribution to attention-gym alongside the new header.
2026-04-21 13:37:39 +00:00 · 2026-04-21 13:19:01 +00:00 · 2026-04-21 13:19:01 +00:00 · e348be8ce0
commit e348be8ce0
parent 5e1ec3395a
8 changed files with 27 additions and 4 deletions
--- a/tests/flex_fastlm_bench.py
+++ b/tests/flex_fastlm_bench.py
@ -1,3 +1,6 @@
+# SPDX-License-Identifier: GNU Affero General Public License v3.0
+# Copyright 2023-present the Unsloth team. All rights reserved.
+
 """Batched steady-state throughput bench through ``FastLanguageModel`` +
 ``UNSLOTH_FAST_INFERENCE=1``. First ``generate`` call primes CUDA graphs;
 subsequent calls report steady state. Compare against April CLI-only
--- a/tests/flex_fastlm_smoke.py
+++ b/tests/flex_fastlm_smoke.py
@ -1,3 +1,6 @@
+# SPDX-License-Identifier: GNU Affero General Public License v3.0
+# Copyright 2023-present the Unsloth team. All rights reserved.
+
 """Smoke-test the ``UNSLOTH_FAST_INFERENCE=1`` path through
 ``FastLanguageModel.from_pretrained``.

--- a/unsloth/inference/init.py
+++ b/unsloth/inference/init.py
@ -1,3 +1,6 @@
+# SPDX-License-Identifier: GNU Affero General Public License v3.0
+# Copyright 2023-present the Unsloth team. All rights reserved.
+
 """Flex-attention inference engines.

 ``UNSLOTH_FAST_INFERENCE=1`` routes ``FastLanguageModel.from_pretrained``
--- a/unsloth/inference/flex_engine.py
+++ b/unsloth/inference/flex_engine.py
@ -1,3 +1,6 @@
+# SPDX-License-Identifier: GNU Affero General Public License v3.0
+# Copyright 2023-present the Unsloth team. All rights reserved.
+
 """FlexEngine: vLLM-compatible LLM surface for the flex inference backends.

 When ``UNSLOTH_FAST_INFERENCE=1`` is set, :func:`load_flex` wraps the HF model
--- a/unsloth/inference/flex_gemma4.py
+++ b/unsloth/inference/flex_gemma4.py
@ -1,3 +1,6 @@
+# SPDX-License-Identifier: GNU Affero General Public License v3.0
+# Copyright 2023-present the Unsloth team. All rights reserved.
+
 """Gemma-4-E2B-it inference with flex_attention + paged KV cache + CUDA graphs.

 Extends the Qwen3/Llama-3.2 engine in `qwen3_flex_inference.py` to a third
--- a/unsloth/inference/flex_paged_attention.py
+++ b/unsloth/inference/flex_paged_attention.py
@ -1,7 +1,9 @@
-# Adapted from attention-gym
-# Original source: https://github.com/pytorch-labs/attention-gym
-# License: BSD 3-Clause (see THIRD_PARTY_LICENSES.md)
-# Copyright (c) 2023, Driss Guessous
+# SPDX-License-Identifier: GNU Affero General Public License v3.0
+# Copyright 2023-present the Unsloth team. All rights reserved.
+#
+# Adapted from attention-gym (https://github.com/pytorch-labs/attention-gym)
+# Copyright (c) 2023, Driss Guessous, licensed under BSD 3-Clause
+# (see THIRD_PARTY_LICENSES.md).

 # the original implementation has some bugs and has some feature that lives outside of the PageTable class

--- a/unsloth/inference/flex_qwen3_llama.py
+++ b/unsloth/inference/flex_qwen3_llama.py
@ -1,3 +1,6 @@
+# SPDX-License-Identifier: GNU Affero General Public License v3.0
+# Copyright 2023-present the Unsloth team. All rights reserved.
+
 """Llama / Qwen3 inference with flex_attention + paged KV cache + CUDA graphs.

 The transformers continuous-batching path tops out at ~10% of vLLM on this
--- a/unsloth/inference/vllm_shim.py
+++ b/unsloth/inference/vllm_shim.py
@ -1,3 +1,6 @@
+# SPDX-License-Identifier: GNU Affero General Public License v3.0
+# Copyright 2023-present the Unsloth team. All rights reserved.
+
 """vLLM-API surface for the flex inference backend.

 `FlexEngine.generate` / `.chat` return :class:`RequestOutput` objects with the