mirror of
https://github.com/unslothai/unsloth
synced 2026-04-21 13:37:39 +00:00
inference: add AGPLv3 license headers
All eight files introduced on this branch now carry the SPDX AGPLv3 header used by the MoE kernels. flex_paged_attention.py keeps its BSD 3-Clause attribution to attention-gym alongside the new header.
This commit is contained in:
parent
5e1ec3395a
commit
e348be8ce0
8 changed files with 27 additions and 4 deletions
|
|
@ -1,3 +1,6 @@
|
|||
# SPDX-License-Identifier: GNU Affero General Public License v3.0
|
||||
# Copyright 2023-present the Unsloth team. All rights reserved.
|
||||
|
||||
"""Batched steady-state throughput bench through ``FastLanguageModel`` +
|
||||
``UNSLOTH_FAST_INFERENCE=1``. First ``generate`` call primes CUDA graphs;
|
||||
subsequent calls report steady state. Compare against April CLI-only
|
||||
|
|
|
|||
|
|
@ -1,3 +1,6 @@
|
|||
# SPDX-License-Identifier: GNU Affero General Public License v3.0
|
||||
# Copyright 2023-present the Unsloth team. All rights reserved.
|
||||
|
||||
"""Smoke-test the ``UNSLOTH_FAST_INFERENCE=1`` path through
|
||||
``FastLanguageModel.from_pretrained``.
|
||||
|
||||
|
|
|
|||
|
|
@ -1,3 +1,6 @@
|
|||
# SPDX-License-Identifier: GNU Affero General Public License v3.0
|
||||
# Copyright 2023-present the Unsloth team. All rights reserved.
|
||||
|
||||
"""Flex-attention inference engines.
|
||||
|
||||
``UNSLOTH_FAST_INFERENCE=1`` routes ``FastLanguageModel.from_pretrained``
|
||||
|
|
|
|||
|
|
@ -1,3 +1,6 @@
|
|||
# SPDX-License-Identifier: GNU Affero General Public License v3.0
|
||||
# Copyright 2023-present the Unsloth team. All rights reserved.
|
||||
|
||||
"""FlexEngine: vLLM-compatible LLM surface for the flex inference backends.
|
||||
|
||||
When ``UNSLOTH_FAST_INFERENCE=1`` is set, :func:`load_flex` wraps the HF model
|
||||
|
|
|
|||
|
|
@ -1,3 +1,6 @@
|
|||
# SPDX-License-Identifier: GNU Affero General Public License v3.0
|
||||
# Copyright 2023-present the Unsloth team. All rights reserved.
|
||||
|
||||
"""Gemma-4-E2B-it inference with flex_attention + paged KV cache + CUDA graphs.
|
||||
|
||||
Extends the Qwen3/Llama-3.2 engine in `qwen3_flex_inference.py` to a third
|
||||
|
|
|
|||
|
|
@ -1,7 +1,9 @@
|
|||
# Adapted from attention-gym
|
||||
# Original source: https://github.com/pytorch-labs/attention-gym
|
||||
# License: BSD 3-Clause (see THIRD_PARTY_LICENSES.md)
|
||||
# Copyright (c) 2023, Driss Guessous
|
||||
# SPDX-License-Identifier: GNU Affero General Public License v3.0
|
||||
# Copyright 2023-present the Unsloth team. All rights reserved.
|
||||
#
|
||||
# Adapted from attention-gym (https://github.com/pytorch-labs/attention-gym)
|
||||
# Copyright (c) 2023, Driss Guessous, licensed under BSD 3-Clause
|
||||
# (see THIRD_PARTY_LICENSES.md).
|
||||
|
||||
# the original implementation has some bugs and has some feature that lives outside of the PageTable class
|
||||
|
||||
|
|
|
|||
|
|
@ -1,3 +1,6 @@
|
|||
# SPDX-License-Identifier: GNU Affero General Public License v3.0
|
||||
# Copyright 2023-present the Unsloth team. All rights reserved.
|
||||
|
||||
"""Llama / Qwen3 inference with flex_attention + paged KV cache + CUDA graphs.
|
||||
|
||||
The transformers continuous-batching path tops out at ~10% of vLLM on this
|
||||
|
|
|
|||
|
|
@ -1,3 +1,6 @@
|
|||
# SPDX-License-Identifier: GNU Affero General Public License v3.0
|
||||
# Copyright 2023-present the Unsloth team. All rights reserved.
|
||||
|
||||
"""vLLM-API surface for the flex inference backend.
|
||||
|
||||
`FlexEngine.generate` / `.chat` return :class:`RequestOutput` objects with the
|
||||
|
|
|
|||
Loading…
Reference in a new issue