Torch version, docs, readme, general loader

2026-04-21 13:37:39 +00:00 · 2023-12-18 04:23:16 +11:00 · 2023-12-18 04:23:16 +11:00 · feec338c08
commit feec338c08
parent cf5204a52d
6 changed files with 83 additions and 19 deletions
--- a/README.md
+++ b/README.md
@ -5,23 +5,20 @@
 </div>

 ## 2-5x faster 60% less memory local QLoRA finetuning
-* Supports Llama 7b, 13b, 70b, CodeLlama 34b, Mistral 7b, TinyLlama and all Llama archs!
-* Llama 7b [Colab T4 example](https://colab.research.google.com/drive/1n-fgduZhRUsSjgpqNtVkXA3rSfE7iBdg?usp=sharing) on 1 T4 2x faster, uses 43% less VRAM (8.4GB) LAION dataset. [Alpaca T4 example](https://colab.research.google.com/drive/1oW55fBmwzCOrBVX66RcpptL3a99qWBxb?usp=sharing) 2x faster on 1 T4, using 6.4GB VRAM.
-* Mistral 7b [Colab A100 example](https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing) on 1 A100 2.2x faster, uses 62% less VRAM (12.4GB). [Colab T4 example](https://colab.research.google.com/drive/15pyLgRN97B_jA56HS0esx56knA9I5tuv?usp=sharing)
-* CodeLlama 34b [Colab example](https://colab.research.google.com/drive/1gdHyAx8XJsz2yNV-DHvbHjR1iCef5Qmh?usp=sharing) does not OOM is 1.9x faster, uses 32% less VRAM (27GB).
-* Kaggle 2 Tesla T4s 5.28x faster on Alpaca. [Kaggle example](https://www.kaggle.com/danielhanchen/unsloth-laion-t4-ddp)
+| Llama 7b                    | Mistral 7b                  | CodeLlama 34b           | Llama 7b Kaggle 2x T4  |
+|-----------------------------|-----------------------------|-------------------------|------------------------|
+| **2.2x faster, -43%  VRAM**     | **2.2x faster, -62%  VRAM**     | **1.9x faster, -27% VRAM**  | **5.5x faster, -44% VRAM** |
+| [Colab Alpaca example + inference](https://colab.research.google.com/drive/1oW55fBmwzCOrBVX66RcpptL3a99qWBxb?usp=sharing) | [Colab T4 example](https://colab.research.google.com/drive/15pyLgRN97B_jA56HS0esx56knA9I5tuv?usp=sharing) | [A100 example](https://colab.research.google.com/drive/1gdHyAx8XJsz2yNV-DHvbHjR1iCef5Qmh?usp=sharing) | [Kaggle Alpaca example](https://www.kaggle.com/danielhanchen/unsloth-alpaca-t4-ddp) |
+| [Colab A100 example](https://colab.research.google.com/drive/1YIPY_18xm-K0iJDgvNkRoJsgkPMPAO3G?usp=sharing) | [Colab A100 example](https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing) | (59 more examples if you scroll down) | [Kaggle Slim Orca](https://www.kaggle.com/danielhanchen/unsloth-slimorca-t4-ddp) |
+
+* Supports Llama (7, 13, 70b), Yi (6, 34b), Mistral (7b), Tinyllama, CodeLlama (7, 13, 34b), and all Llama / Mistral derived architectures!
 * All kernels written in [OpenAI's Triton](https://openai.com/research/triton) language.
-* 0% loss in accuracy - no approximation methods - all exact.
+* **0% loss in accuracy** - no approximation methods - all exact.
 * No change of hardware necessary. Supports NVIDIA GPUs since 2018+. Minimum CUDA Compute Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc) [Check your GPU](https://developer.nvidia.com/cuda-gpus)
 * **NEW!** Works on **Linux** and **Windows** via WSL.
 * **NEW!** Experimental support for [DPO (Direct Preference Optimization)](https://arxiv.org/abs/2305.18290)!
 * Supports 4bit and 16bit QLoRA / LoRA finetuning via [bitsandbytes](https://github.com/TimDettmers/bitsandbytes).
 * Open source version trains 5x faster or you can check out [Unsloth Pro and Max](https://unsloth.ai/) codepaths for **30x faster training**!
-  
-<div class="align-center">
-  <img src="./images/Slim Orca 2GPUs.png" width="400" />
-  <img src="./images/LAION 2GPU.png" width="400" />
-</div>

 | 1 A100 40GB | Hugging Face | Flash Attention 2 | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |
 |--------------|-------------|-------------|-----------------|--------------|---------------|-------------|
@ -35,7 +32,7 @@ If you trained a model with Unsloth, we made a cool sticker!!
 <img src="./images/unsloth made with love.png" width="200" />

 # Installation Instructions - Conda
-Unsloth currently only supports Linux distros and Pytorch >= 2.1.
+Unsloth currently only supports Linux distros and Pytorch == 2.1.
 ```
 conda install cudatoolkit xformers bitsandbytes pytorch pytorch-cuda=12.1 \
  -c pytorch -c nvidia -c xformers -c conda-forge -y
@ -47,6 +44,11 @@ pip install "unsloth[kaggle] @ git+https://github.com/unslothai/unsloth.git"
 ```
 import torch; torch.version.cuda
 ```
+2. We only support Pytorch 2.1 (2.1.1 bugs out for now): You can update Pytorch via Pip (interchange cu121 / cu118)
+```
+pip install --upgrade --force-reinstall --no-cache-dir torch==2.1.0 triton \
+  --index-url https://download.pytorch.org/whl/cu121
+```
 2. Select either cu118 for CUDA 11.8 or cu121 for CUDA 12.1. If you have a RTX 3060 or higher (A100, H100 etc), use the "ampere" path.
 ```
 pip install "unsloth[cu118] @ git+https://github.com/unslothai/unsloth.git"
@ -54,11 +56,6 @@ pip install "unsloth[cu121] @ git+https://github.com/unslothai/unsloth.git"
 pip install "unsloth[cu118_ampere] @ git+https://github.com/unslothai/unsloth.git"
 pip install "unsloth[cu121_ampere] @ git+https://github.com/unslothai/unsloth.git"
 ```
-3. We only support Pytorch 2.1: You can update Pytorch via Pip:
-```
-pip install --upgrade --force-reinstall --no-cache-dir torch triton \
-  --index-url https://download.pytorch.org/whl/cu121
-```
 Change `cu121` to `cu118` for CUDA version 11.8 or 12.1. Go to https://pytorch.org/ to learn more.

 4. If you get errors, try the below first, then go back to step 1:
--- a/images/unsloth
+++ b/images/unsloth
--- a/unsloth/init.py
+++ b/unsloth/init.py
@ -47,7 +47,9 @@ except:
                      "We have some installation instructions on our Github page.")

 # We only support torch 2.1
-major_torch, minor_torch, _ = torch.__version__.split(".")
+# Fixes https://github.com/unslothai/unsloth/issues/38
+torch_version =  = torch.__version__.split(".")
+major_torch, minor_torch = torch_version[0], torch_version[1]
 major_torch, minor_torch = int(major_torch), int(minor_torch)
 if (major_torch != 2) or (major_torch == 2 and minor_torch < 1):
    raise ImportError("Unsloth only supports Pytorch 2.1 for now. Please update your Pytorch to 2.1.\n"\
--- a/unsloth/models/init.py
+++ b/unsloth/models/init.py
@ -12,5 +12,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

+from .loader import FastLanguageModel
 from .llama import FastLlamaModel
 from .mistral import FastMistralModel
--- a/unsloth/models/_utils.py
+++ b/unsloth/models/_utils.py
@ -20,6 +20,7 @@ import gc
 warnings.filterwarnings(action = "ignore", category = UserWarning, module = "torch")
 import bitsandbytes as bnb
 from transformers.models.llama.modeling_llama import logger
+import platform

 __version__ = "2023.12"
 __all__ = [
@ -99,6 +100,6 @@ def print_unsloth_message(name):
       f"   \\\   /|    GPU: {gpu_stats.name}. Max memory: {max_memory} GB\n"\
       f"O^O/ \_/ \\    CUDA compute capability = {gpu_stats.major}.{gpu_stats.minor}\n"\
       f"\        /    Pytorch version: {torch.__version__}. CUDA Toolkit = {torch.version.cuda}\n"\
-       f' "-____-"     bfloat16 support = {str(SUPPORTS_BFLOAT16).upper()}\n'
+       f' "-____-"     bfloat16 = {str(SUPPORTS_BFLOAT16).upper()}. Platform = {platform.system()}\n'
    print(statistics)
 pass
--- a/unsloth/models/loader.py
+++ b/unsloth/models/loader.py
@ -0,0 +1,63 @@
+# Copyright 2023-present Daniel Han-Chen & the Unsloth team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .llama import FastLlamaModel, logger
+from .mistral import FastMistralModel
+from transformers import AutoConfig
+
+
+class FastLanguageModel:
+    @staticmethod
+    def from_pretrained(
+        model_name = "mistralai/Mistral-7B-v0.1",
+        max_seq_length = 4096,
+        dtype = None,
+        load_in_4bit = True,
+        token = None,
+        device_map = "sequential",
+        rope_scaling = None,
+        *args, **kwargs,
+    ):
+        model_config = AutoConfig.from_pretrained(model_name)
+        model_type = model_config.model_type
+
+        if model_type == "llama":
+            return FastLlamaModel.from_pretrained(
+                model_name = model_name,
+                max_seq_length = max_seq_length,
+                dtype = dtype,
+                load_in_4bit = load_in_4bit,
+                token = token,
+                device_map = device_map,
+                rope_scaling = rope_scaling,
+                *args, **kwargs,
+            )
+        elif model_type == "mistral":
+            if rope_scaling is not None:
+                logger.warning_once("Mistral models do not support RoPE scaling.")
+            return FastMistralModel.from_pretrained(
+                model_name = model_name,
+                max_seq_length = max_seq_length,
+                dtype = dtype,
+                load_in_4bit = load_in_4bit,
+                token = token,
+                device_map = device_map,
+                *args, **kwargs,
+            )
+        else:
+            raise NotImplementedError(
+                f"{model_name} not supported yet! Make an issue to https://github.com/unslothai/unsloth!",
+            )
+    pass
+pass