Torch version, docs, readme, general loader

This commit is contained in:
Daniel Han-Chen 2023-12-18 04:23:16 +11:00
parent cf5204a52d
commit feec338c08
6 changed files with 83 additions and 19 deletions

View file

@ -5,23 +5,20 @@
</div>
## 2-5x faster 60% less memory local QLoRA finetuning
* Supports Llama 7b, 13b, 70b, CodeLlama 34b, Mistral 7b, TinyLlama and all Llama archs!
* Llama 7b [Colab T4 example](https://colab.research.google.com/drive/1n-fgduZhRUsSjgpqNtVkXA3rSfE7iBdg?usp=sharing) on 1 T4 2x faster, uses 43% less VRAM (8.4GB) LAION dataset. [Alpaca T4 example](https://colab.research.google.com/drive/1oW55fBmwzCOrBVX66RcpptL3a99qWBxb?usp=sharing) 2x faster on 1 T4, using 6.4GB VRAM.
* Mistral 7b [Colab A100 example](https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing) on 1 A100 2.2x faster, uses 62% less VRAM (12.4GB). [Colab T4 example](https://colab.research.google.com/drive/15pyLgRN97B_jA56HS0esx56knA9I5tuv?usp=sharing)
* CodeLlama 34b [Colab example](https://colab.research.google.com/drive/1gdHyAx8XJsz2yNV-DHvbHjR1iCef5Qmh?usp=sharing) does not OOM is 1.9x faster, uses 32% less VRAM (27GB).
* Kaggle 2 Tesla T4s 5.28x faster on Alpaca. [Kaggle example](https://www.kaggle.com/danielhanchen/unsloth-laion-t4-ddp)
| Llama 7b | Mistral 7b | CodeLlama 34b | Llama 7b Kaggle 2x T4 |
|-----------------------------|-----------------------------|-------------------------|------------------------|
| **2.2x faster, -43% VRAM** | **2.2x faster, -62% VRAM** | **1.9x faster, -27% VRAM** | **5.5x faster, -44% VRAM** |
| [Colab Alpaca example + inference](https://colab.research.google.com/drive/1oW55fBmwzCOrBVX66RcpptL3a99qWBxb?usp=sharing) | [Colab T4 example](https://colab.research.google.com/drive/15pyLgRN97B_jA56HS0esx56knA9I5tuv?usp=sharing) | [A100 example](https://colab.research.google.com/drive/1gdHyAx8XJsz2yNV-DHvbHjR1iCef5Qmh?usp=sharing) | [Kaggle Alpaca example](https://www.kaggle.com/danielhanchen/unsloth-alpaca-t4-ddp) |
| [Colab A100 example](https://colab.research.google.com/drive/1YIPY_18xm-K0iJDgvNkRoJsgkPMPAO3G?usp=sharing) | [Colab A100 example](https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing) | (59 more examples if you scroll down) | [Kaggle Slim Orca](https://www.kaggle.com/danielhanchen/unsloth-slimorca-t4-ddp) |
* Supports Llama (7, 13, 70b), Yi (6, 34b), Mistral (7b), Tinyllama, CodeLlama (7, 13, 34b), and all Llama / Mistral derived architectures!
* All kernels written in [OpenAI's Triton](https://openai.com/research/triton) language.
* 0% loss in accuracy - no approximation methods - all exact.
* **0% loss in accuracy** - no approximation methods - all exact.
* No change of hardware necessary. Supports NVIDIA GPUs since 2018+. Minimum CUDA Compute Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc) [Check your GPU](https://developer.nvidia.com/cuda-gpus)
* **NEW!** Works on **Linux** and **Windows** via WSL.
* **NEW!** Experimental support for [DPO (Direct Preference Optimization)](https://arxiv.org/abs/2305.18290)!
* Supports 4bit and 16bit QLoRA / LoRA finetuning via [bitsandbytes](https://github.com/TimDettmers/bitsandbytes).
* Open source version trains 5x faster or you can check out [Unsloth Pro and Max](https://unsloth.ai/) codepaths for **30x faster training**!
<div class="align-center">
<img src="./images/Slim Orca 2GPUs.png" width="400" />
<img src="./images/LAION 2GPU.png" width="400" />
</div>
| 1 A100 40GB | Hugging Face | Flash Attention 2 | Unsloth Open | Unsloth Equal | Unsloth Pro | Unsloth Max |
|--------------|-------------|-------------|-----------------|--------------|---------------|-------------|
@ -35,7 +32,7 @@ If you trained a model with Unsloth, we made a cool sticker!!
<img src="./images/unsloth made with love.png" width="200" />
# Installation Instructions - Conda
Unsloth currently only supports Linux distros and Pytorch >= 2.1.
Unsloth currently only supports Linux distros and Pytorch == 2.1.
```
conda install cudatoolkit xformers bitsandbytes pytorch pytorch-cuda=12.1 \
-c pytorch -c nvidia -c xformers -c conda-forge -y
@ -47,6 +44,11 @@ pip install "unsloth[kaggle] @ git+https://github.com/unslothai/unsloth.git"
```
import torch; torch.version.cuda
```
2. We only support Pytorch 2.1 (2.1.1 bugs out for now): You can update Pytorch via Pip (interchange cu121 / cu118)
```
pip install --upgrade --force-reinstall --no-cache-dir torch==2.1.0 triton \
--index-url https://download.pytorch.org/whl/cu121
```
2. Select either cu118 for CUDA 11.8 or cu121 for CUDA 12.1. If you have a RTX 3060 or higher (A100, H100 etc), use the "ampere" path.
```
pip install "unsloth[cu118] @ git+https://github.com/unslothai/unsloth.git"
@ -54,11 +56,6 @@ pip install "unsloth[cu121] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu118_ampere] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121_ampere] @ git+https://github.com/unslothai/unsloth.git"
```
3. We only support Pytorch 2.1: You can update Pytorch via Pip:
```
pip install --upgrade --force-reinstall --no-cache-dir torch triton \
--index-url https://download.pytorch.org/whl/cu121
```
Change `cu121` to `cu118` for CUDA version 11.8 or 12.1. Go to https://pytorch.org/ to learn more.
4. If you get errors, try the below first, then go back to step 1:

Binary file not shown.

Before

Width:  |  Height:  |  Size: 62 KiB

After

Width:  |  Height:  |  Size: 59 KiB

View file

@ -47,7 +47,9 @@ except:
"We have some installation instructions on our Github page.")
# We only support torch 2.1
major_torch, minor_torch, _ = torch.__version__.split(".")
# Fixes https://github.com/unslothai/unsloth/issues/38
torch_version = = torch.__version__.split(".")
major_torch, minor_torch = torch_version[0], torch_version[1]
major_torch, minor_torch = int(major_torch), int(minor_torch)
if (major_torch != 2) or (major_torch == 2 and minor_torch < 1):
raise ImportError("Unsloth only supports Pytorch 2.1 for now. Please update your Pytorch to 2.1.\n"\

View file

@ -12,5 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
from .loader import FastLanguageModel
from .llama import FastLlamaModel
from .mistral import FastMistralModel

View file

@ -20,6 +20,7 @@ import gc
warnings.filterwarnings(action = "ignore", category = UserWarning, module = "torch")
import bitsandbytes as bnb
from transformers.models.llama.modeling_llama import logger
import platform
__version__ = "2023.12"
__all__ = [
@ -99,6 +100,6 @@ def print_unsloth_message(name):
f" \\\ /| GPU: {gpu_stats.name}. Max memory: {max_memory} GB\n"\
f"O^O/ \_/ \\ CUDA compute capability = {gpu_stats.major}.{gpu_stats.minor}\n"\
f"\ / Pytorch version: {torch.__version__}. CUDA Toolkit = {torch.version.cuda}\n"\
f' "-____-" bfloat16 support = {str(SUPPORTS_BFLOAT16).upper()}\n'
f' "-____-" bfloat16 = {str(SUPPORTS_BFLOAT16).upper()}. Platform = {platform.system()}\n'
print(statistics)
pass

63
unsloth/models/loader.py Normal file
View file

@ -0,0 +1,63 @@
# Copyright 2023-present Daniel Han-Chen & the Unsloth team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .llama import FastLlamaModel, logger
from .mistral import FastMistralModel
from transformers import AutoConfig
class FastLanguageModel:
@staticmethod
def from_pretrained(
model_name = "mistralai/Mistral-7B-v0.1",
max_seq_length = 4096,
dtype = None,
load_in_4bit = True,
token = None,
device_map = "sequential",
rope_scaling = None,
*args, **kwargs,
):
model_config = AutoConfig.from_pretrained(model_name)
model_type = model_config.model_type
if model_type == "llama":
return FastLlamaModel.from_pretrained(
model_name = model_name,
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
token = token,
device_map = device_map,
rope_scaling = rope_scaling,
*args, **kwargs,
)
elif model_type == "mistral":
if rope_scaling is not None:
logger.warning_once("Mistral models do not support RoPE scaling.")
return FastMistralModel.from_pretrained(
model_name = model_name,
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
token = token,
device_map = device_map,
*args, **kwargs,
)
else:
raise NotImplementedError(
f"{model_name} not supported yet! Make an issue to https://github.com/unslothai/unsloth!",
)
pass
pass