mirror of
https://github.com/unslothai/unsloth
synced 2026-04-21 13:37:39 +00:00
Torch version, docs, readme, general loader
This commit is contained in:
parent
cf5204a52d
commit
feec338c08
6 changed files with 83 additions and 19 deletions
31
README.md
31
README.md
|
|
@ -5,23 +5,20 @@
|
|||
</div>
|
||||
|
||||
## 2-5x faster 60% less memory local QLoRA finetuning
|
||||
* Supports Llama 7b, 13b, 70b, CodeLlama 34b, Mistral 7b, TinyLlama and all Llama archs!
|
||||
* Llama 7b [Colab T4 example](https://colab.research.google.com/drive/1n-fgduZhRUsSjgpqNtVkXA3rSfE7iBdg?usp=sharing) on 1 T4 2x faster, uses 43% less VRAM (8.4GB) LAION dataset. [Alpaca T4 example](https://colab.research.google.com/drive/1oW55fBmwzCOrBVX66RcpptL3a99qWBxb?usp=sharing) 2x faster on 1 T4, using 6.4GB VRAM.
|
||||
* Mistral 7b [Colab A100 example](https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing) on 1 A100 2.2x faster, uses 62% less VRAM (12.4GB). [Colab T4 example](https://colab.research.google.com/drive/15pyLgRN97B_jA56HS0esx56knA9I5tuv?usp=sharing)
|
||||
* CodeLlama 34b [Colab example](https://colab.research.google.com/drive/1gdHyAx8XJsz2yNV-DHvbHjR1iCef5Qmh?usp=sharing) does not OOM is 1.9x faster, uses 32% less VRAM (27GB).
|
||||
* Kaggle 2 Tesla T4s 5.28x faster on Alpaca. [Kaggle example](https://www.kaggle.com/danielhanchen/unsloth-laion-t4-ddp)
|
||||
| Llama 7b | Mistral 7b | CodeLlama 34b | Llama 7b Kaggle 2x T4 |
|
||||
|-----------------------------|-----------------------------|-------------------------|------------------------|
|
||||
| **2.2x faster, -43% VRAM** | **2.2x faster, -62% VRAM** | **1.9x faster, -27% VRAM** | **5.5x faster, -44% VRAM** |
|
||||
| [Colab Alpaca example + inference](https://colab.research.google.com/drive/1oW55fBmwzCOrBVX66RcpptL3a99qWBxb?usp=sharing) | [Colab T4 example](https://colab.research.google.com/drive/15pyLgRN97B_jA56HS0esx56knA9I5tuv?usp=sharing) | [A100 example](https://colab.research.google.com/drive/1gdHyAx8XJsz2yNV-DHvbHjR1iCef5Qmh?usp=sharing) | [Kaggle Alpaca example](https://www.kaggle.com/danielhanchen/unsloth-alpaca-t4-ddp) |
|
||||
| [Colab A100 example](https://colab.research.google.com/drive/1YIPY_18xm-K0iJDgvNkRoJsgkPMPAO3G?usp=sharing) | [Colab A100 example](https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing) | (59 more examples if you scroll down) | [Kaggle Slim Orca](https://www.kaggle.com/danielhanchen/unsloth-slimorca-t4-ddp) |
|
||||
|
||||
* Supports Llama (7, 13, 70b), Yi (6, 34b), Mistral (7b), Tinyllama, CodeLlama (7, 13, 34b), and all Llama / Mistral derived architectures!
|
||||
* All kernels written in [OpenAI's Triton](https://openai.com/research/triton) language.
|
||||
* 0% loss in accuracy - no approximation methods - all exact.
|
||||
* **0% loss in accuracy** - no approximation methods - all exact.
|
||||
* No change of hardware necessary. Supports NVIDIA GPUs since 2018+. Minimum CUDA Compute Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc) [Check your GPU](https://developer.nvidia.com/cuda-gpus)
|
||||
* **NEW!** Works on **Linux** and **Windows** via WSL.
|
||||
* **NEW!** Experimental support for [DPO (Direct Preference Optimization)](https://arxiv.org/abs/2305.18290)!
|
||||
* Supports 4bit and 16bit QLoRA / LoRA finetuning via [bitsandbytes](https://github.com/TimDettmers/bitsandbytes).
|
||||
* Open source version trains 5x faster or you can check out [Unsloth Pro and Max](https://unsloth.ai/) codepaths for **30x faster training**!
|
||||
|
||||
<div class="align-center">
|
||||
<img src="./images/Slim Orca 2GPUs.png" width="400" />
|
||||
<img src="./images/LAION 2GPU.png" width="400" />
|
||||
</div>
|
||||
|
||||
| 1 A100 40GB | Hugging Face | Flash Attention 2 | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |
|
||||
|--------------|-------------|-------------|-----------------|--------------|---------------|-------------|
|
||||
|
|
@ -35,7 +32,7 @@ If you trained a model with Unsloth, we made a cool sticker!!
|
|||
<img src="./images/unsloth made with love.png" width="200" />
|
||||
|
||||
# Installation Instructions - Conda
|
||||
Unsloth currently only supports Linux distros and Pytorch >= 2.1.
|
||||
Unsloth currently only supports Linux distros and Pytorch == 2.1.
|
||||
```
|
||||
conda install cudatoolkit xformers bitsandbytes pytorch pytorch-cuda=12.1 \
|
||||
-c pytorch -c nvidia -c xformers -c conda-forge -y
|
||||
|
|
@ -47,6 +44,11 @@ pip install "unsloth[kaggle] @ git+https://github.com/unslothai/unsloth.git"
|
|||
```
|
||||
import torch; torch.version.cuda
|
||||
```
|
||||
2. We only support Pytorch 2.1 (2.1.1 bugs out for now): You can update Pytorch via Pip (interchange cu121 / cu118)
|
||||
```
|
||||
pip install --upgrade --force-reinstall --no-cache-dir torch==2.1.0 triton \
|
||||
--index-url https://download.pytorch.org/whl/cu121
|
||||
```
|
||||
2. Select either cu118 for CUDA 11.8 or cu121 for CUDA 12.1. If you have a RTX 3060 or higher (A100, H100 etc), use the "ampere" path.
|
||||
```
|
||||
pip install "unsloth[cu118] @ git+https://github.com/unslothai/unsloth.git"
|
||||
|
|
@ -54,11 +56,6 @@ pip install "unsloth[cu121] @ git+https://github.com/unslothai/unsloth.git"
|
|||
pip install "unsloth[cu118_ampere] @ git+https://github.com/unslothai/unsloth.git"
|
||||
pip install "unsloth[cu121_ampere] @ git+https://github.com/unslothai/unsloth.git"
|
||||
```
|
||||
3. We only support Pytorch 2.1: You can update Pytorch via Pip:
|
||||
```
|
||||
pip install --upgrade --force-reinstall --no-cache-dir torch triton \
|
||||
--index-url https://download.pytorch.org/whl/cu121
|
||||
```
|
||||
Change `cu121` to `cu118` for CUDA version 11.8 or 12.1. Go to https://pytorch.org/ to learn more.
|
||||
|
||||
4. If you get errors, try the below first, then go back to step 1:
|
||||
|
|
|
|||
Binary file not shown.
|
Before Width: | Height: | Size: 62 KiB After Width: | Height: | Size: 59 KiB |
|
|
@ -47,7 +47,9 @@ except:
|
|||
"We have some installation instructions on our Github page.")
|
||||
|
||||
# We only support torch 2.1
|
||||
major_torch, minor_torch, _ = torch.__version__.split(".")
|
||||
# Fixes https://github.com/unslothai/unsloth/issues/38
|
||||
torch_version = = torch.__version__.split(".")
|
||||
major_torch, minor_torch = torch_version[0], torch_version[1]
|
||||
major_torch, minor_torch = int(major_torch), int(minor_torch)
|
||||
if (major_torch != 2) or (major_torch == 2 and minor_torch < 1):
|
||||
raise ImportError("Unsloth only supports Pytorch 2.1 for now. Please update your Pytorch to 2.1.\n"\
|
||||
|
|
|
|||
|
|
@ -12,5 +12,6 @@
|
|||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from .loader import FastLanguageModel
|
||||
from .llama import FastLlamaModel
|
||||
from .mistral import FastMistralModel
|
||||
|
|
|
|||
|
|
@ -20,6 +20,7 @@ import gc
|
|||
warnings.filterwarnings(action = "ignore", category = UserWarning, module = "torch")
|
||||
import bitsandbytes as bnb
|
||||
from transformers.models.llama.modeling_llama import logger
|
||||
import platform
|
||||
|
||||
__version__ = "2023.12"
|
||||
__all__ = [
|
||||
|
|
@ -99,6 +100,6 @@ def print_unsloth_message(name):
|
|||
f" \\\ /| GPU: {gpu_stats.name}. Max memory: {max_memory} GB\n"\
|
||||
f"O^O/ \_/ \\ CUDA compute capability = {gpu_stats.major}.{gpu_stats.minor}\n"\
|
||||
f"\ / Pytorch version: {torch.__version__}. CUDA Toolkit = {torch.version.cuda}\n"\
|
||||
f' "-____-" bfloat16 support = {str(SUPPORTS_BFLOAT16).upper()}\n'
|
||||
f' "-____-" bfloat16 = {str(SUPPORTS_BFLOAT16).upper()}. Platform = {platform.system()}\n'
|
||||
print(statistics)
|
||||
pass
|
||||
|
|
|
|||
63
unsloth/models/loader.py
Normal file
63
unsloth/models/loader.py
Normal file
|
|
@ -0,0 +1,63 @@
|
|||
# Copyright 2023-present Daniel Han-Chen & the Unsloth team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from .llama import FastLlamaModel, logger
|
||||
from .mistral import FastMistralModel
|
||||
from transformers import AutoConfig
|
||||
|
||||
|
||||
class FastLanguageModel:
|
||||
@staticmethod
|
||||
def from_pretrained(
|
||||
model_name = "mistralai/Mistral-7B-v0.1",
|
||||
max_seq_length = 4096,
|
||||
dtype = None,
|
||||
load_in_4bit = True,
|
||||
token = None,
|
||||
device_map = "sequential",
|
||||
rope_scaling = None,
|
||||
*args, **kwargs,
|
||||
):
|
||||
model_config = AutoConfig.from_pretrained(model_name)
|
||||
model_type = model_config.model_type
|
||||
|
||||
if model_type == "llama":
|
||||
return FastLlamaModel.from_pretrained(
|
||||
model_name = model_name,
|
||||
max_seq_length = max_seq_length,
|
||||
dtype = dtype,
|
||||
load_in_4bit = load_in_4bit,
|
||||
token = token,
|
||||
device_map = device_map,
|
||||
rope_scaling = rope_scaling,
|
||||
*args, **kwargs,
|
||||
)
|
||||
elif model_type == "mistral":
|
||||
if rope_scaling is not None:
|
||||
logger.warning_once("Mistral models do not support RoPE scaling.")
|
||||
return FastMistralModel.from_pretrained(
|
||||
model_name = model_name,
|
||||
max_seq_length = max_seq_length,
|
||||
dtype = dtype,
|
||||
load_in_4bit = load_in_4bit,
|
||||
token = token,
|
||||
device_map = device_map,
|
||||
*args, **kwargs,
|
||||
)
|
||||
else:
|
||||
raise NotImplementedError(
|
||||
f"{model_name} not supported yet! Make an issue to https://github.com/unslothai/unsloth!",
|
||||
)
|
||||
pass
|
||||
pass
|
||||
Loading…
Reference in a new issue