inference: add AGPLv3 license headers

All eight files introduced on this branch now carry the SPDX AGPLv3
header used by the MoE kernels. flex_paged_attention.py keeps its
BSD 3-Clause attribution to attention-gym alongside the new header.
This commit is contained in:
Daniel Han 2026-04-21 13:19:01 +00:00
parent 5e1ec3395a
commit e348be8ce0
8 changed files with 27 additions and 4 deletions

View file

@ -1,3 +1,6 @@
# SPDX-License-Identifier: GNU Affero General Public License v3.0
# Copyright 2023-present the Unsloth team. All rights reserved.
"""Batched steady-state throughput bench through ``FastLanguageModel`` +
``UNSLOTH_FAST_INFERENCE=1``. First ``generate`` call primes CUDA graphs;
subsequent calls report steady state. Compare against April CLI-only

View file

@ -1,3 +1,6 @@
# SPDX-License-Identifier: GNU Affero General Public License v3.0
# Copyright 2023-present the Unsloth team. All rights reserved.
"""Smoke-test the ``UNSLOTH_FAST_INFERENCE=1`` path through
``FastLanguageModel.from_pretrained``.

View file

@ -1,3 +1,6 @@
# SPDX-License-Identifier: GNU Affero General Public License v3.0
# Copyright 2023-present the Unsloth team. All rights reserved.
"""Flex-attention inference engines.
``UNSLOTH_FAST_INFERENCE=1`` routes ``FastLanguageModel.from_pretrained``

View file

@ -1,3 +1,6 @@
# SPDX-License-Identifier: GNU Affero General Public License v3.0
# Copyright 2023-present the Unsloth team. All rights reserved.
"""FlexEngine: vLLM-compatible LLM surface for the flex inference backends.
When ``UNSLOTH_FAST_INFERENCE=1`` is set, :func:`load_flex` wraps the HF model

View file

@ -1,3 +1,6 @@
# SPDX-License-Identifier: GNU Affero General Public License v3.0
# Copyright 2023-present the Unsloth team. All rights reserved.
"""Gemma-4-E2B-it inference with flex_attention + paged KV cache + CUDA graphs.
Extends the Qwen3/Llama-3.2 engine in `qwen3_flex_inference.py` to a third

View file

@ -1,7 +1,9 @@
# Adapted from attention-gym
# Original source: https://github.com/pytorch-labs/attention-gym
# License: BSD 3-Clause (see THIRD_PARTY_LICENSES.md)
# Copyright (c) 2023, Driss Guessous
# SPDX-License-Identifier: GNU Affero General Public License v3.0
# Copyright 2023-present the Unsloth team. All rights reserved.
#
# Adapted from attention-gym (https://github.com/pytorch-labs/attention-gym)
# Copyright (c) 2023, Driss Guessous, licensed under BSD 3-Clause
# (see THIRD_PARTY_LICENSES.md).
# the original implementation has some bugs and has some feature that lives outside of the PageTable class

View file

@ -1,3 +1,6 @@
# SPDX-License-Identifier: GNU Affero General Public License v3.0
# Copyright 2023-present the Unsloth team. All rights reserved.
"""Llama / Qwen3 inference with flex_attention + paged KV cache + CUDA graphs.
The transformers continuous-batching path tops out at ~10% of vLLM on this

View file

@ -1,3 +1,6 @@
# SPDX-License-Identifier: GNU Affero General Public License v3.0
# Copyright 2023-present the Unsloth team. All rights reserved.
"""vLLM-API surface for the flex inference backend.
`FlexEngine.generate` / `.chat` return :class:`RequestOutput` objects with the