Feb 2024 Release (#187)

* Fast inference repatch

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update mistral.py

* Update __init__.py

* Fix inference

* Update mistral.py

* fast lm_head

* Remove fast path

* Update rope_embedding.py

* Update loader.py

* LlamaAttention_fast_forward_inference

* if past_key_value is not None and q_len == 1:

* revert inference

* Update loader.py

* past_key_value

* Update llama.py

* Update llama.py

* Fix SDPA

* Update llama.py

* padding

* Inference

* Update llama.py

* Revert

* Update mistral.py

* faster inference

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* inference

* Update llama.py

* Update utils.py

* faster inference

* Update llama.py

* revert

* lm_head

* Update llama.py

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* faster inference

* Update llama.py

* fast inference

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* torch compile

* past_key_values

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update llama.py

* fast inference + saving config.json

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* fast inference again

* more temp matrices

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update mistral.py

* Update llama.py

* SDPA

* attention_mask

* New version

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update save.py

* Update save.py

* Torch 2.2.0

* Update save.py

* mistral swa

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Fix SWA inference

* Fix llm_int8_skip_modules

* SWA inference

* Update save.py

* Update save.py

* Update pyproject.toml

* __version__

* __version__

* Update save.py

* Update save.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Chat Templates

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* patch tokenizer

* Update chat_templates.py

* Saving, LlamaRotaryEmbedding issues

* Update llama.py

* Update mistral.py

* Update mapper.py

* Fix RoPE precision issues

* Bugs

* saving bugs

* Update llama.py

* readme

* spaces

* spaces

* globals

* slash

* slashes

* spaces

* apache

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* trainer

* Update save.py

* Update pyproject.toml

* install

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* PeftModel token + saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* linking

* llama.cpp bugs

* Update save.py

* Update save.py

* saving

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update __init__.py

* Update save.py

* Update save.py

* Update save.py

* save

* trainer

* spaces

* original
This commit is contained in:
Daniel Han 2024-02-21 03:58:59 +11:00 committed by GitHub
parent 0439b8508d
commit 1b7bf718cc
9 changed files with 610 additions and 147 deletions

View file

@ -30,7 +30,7 @@ All notebooks are **beginner friendly**! Add your dataset, click "Run All", and
| **Mistral 7b** 1xT4 | [▶️ Start on Kaggle](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook) | 5x faster\* | 62% less |
- This [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing) is useful for ShareGPT ChatML / Vicuna templates.
- Our [raw text notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing) is useful for text completion.
- This [text completion notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing) is for raw text. This [DPO notebook](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) replicates Zephyr.
- Colab provides a free GPU sometimes. Kaggle has 30 hrs free per week on a 12 hr running cap.
- \* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. Use Colab as Kaggle takes 10 mins to install.
@ -86,9 +86,12 @@ All notebooks are **beginner friendly**! Add your dataset, click "Run All", and
### Conda Installation
Select either `pytorch-cuda=11.8` for CUDA 11.8 or `pytorch-cuda=12.1` for CUDA 12.1. If you have `mamba`, use `mamba` instead of `conda` for faster solving. See this [Github issue](https://github.com/unslothai/unsloth/issues/73) for help on debugging Conda installs.
```bash
conda install pytorch torchvision torchaudio pytorch-cuda=<12.1/11.8> -c pytorch -c nvidia
conda create --name unsloth_env python=3.10
conda activate unsloth_env
conda install xformers -c xformers -y
conda install pytorch cudatoolkit torchvision torchaudio pytorch-cuda=<12.1/11.8> -c pytorch -c nvidia
conda install xformers -c xformers
pip install bitsandbytes
@ -141,6 +144,7 @@ pip install --upgrade pip
```
## 📜 Documentation
- Go to our [Wiki page](https://github.com/unslothai/unsloth/wiki) for saving to GGUF, checkpointing, evaluation and more!
- We support Huggingface's TRL, Trainer, Seq2SeqTrainer or even Pytorch code!
- We're in 🤗Hugging Face's official docs! Check out the [SFT docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth) and [DPO docs](https://huggingface.co/docs/trl/main/en/dpo_trainer#accelerate-dpo-fine-tuning-using-unsloth)!
@ -162,7 +166,8 @@ fourbit_models = [
"unsloth/llama-2-13b-bnb-4bit",
"unsloth/codellama-34b-bnb-4bit",
"unsloth/tinyllama-bnb-4bit",
]
] # Go to https://huggingface.co/unsloth for more 4-bit models!
# Load Llama model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/mistral-7b-bnb-4bit", # Supports Llama, Mistral - replace this!
@ -183,6 +188,8 @@ model = FastLanguageModel.get_peft_model(
use_gradient_checkpointing = True,
random_state = 3407,
max_seq_length = max_seq_length,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
trainer = SFTTrainer(
@ -205,6 +212,12 @@ trainer = SFTTrainer(
),
)
trainer.train()
# Go to https://github.com/unslothai/unsloth/wiki for advanced tips like
# (1) Saving to GGUF / merging to 16bit for vLLM
# (2) Continued training from a saved LoRA adapter
# (3) Adding an evaluation loop / OOMs
# (4) Cutomized chat templates
```
<a name="DPO"></a>

View file

@ -42,6 +42,7 @@ huggingface = [
"tqdm",
"psutil",
"wheel>=0.42.0",
"numpy",
]
cu118only = [
"xformers @ https://download.pytorch.org/whl/cu118/xformers-0.0.22.post7%2Bcu118-cp39-cp39-manylinux2014_x86_64.whl ; python_version=='3.9'",
@ -83,22 +84,22 @@ cu121 = [
"bitsandbytes",
"unsloth[cu121only]",
]
cu118_torch211 = [
cu118-torch211 = [
"unsloth[huggingface]",
"bitsandbytes",
"unsloth[cu118onlytorch211]",
]
cu121_torch211 = [
cu121-torch211 = [
"unsloth[huggingface]",
"bitsandbytes",
"unsloth[cu121onlytorch211]",
]
cu118_torch220 = [
cu118-torch220 = [
"unsloth[huggingface]",
"bitsandbytes",
"unsloth[cu118onlytorch220]",
]
cu121_torch220 = [
cu121-torch220 = [
"unsloth[huggingface]",
"bitsandbytes",
"unsloth[cu121onlytorch220]",
@ -112,18 +113,18 @@ conda = [
colab = [
"unsloth[cu121]",
]
colab_ampere = [
colab-ampere = [
"unsloth[cu121]",
"packaging",
"ninja",
"flash-attn",
]
colab_torch211 = [
colab-torch211 = [
"unsloth[huggingface]",
"bitsandbytes",
"unsloth[cu121onlytorch211]",
]
colab_ampere_torch211 = [
colab-ampere-torch211 = [
"unsloth[huggingface]",
"bitsandbytes",
"unsloth[cu121onlytorch211]",
@ -131,12 +132,12 @@ colab_ampere_torch211 = [
"ninja",
"flash-attn",
]
colab_torch220 = [
colab-torch220 = [
"unsloth[huggingface]",
"bitsandbytes",
"unsloth[cu121onlytorch220]",
]
colab_ampere_torch220 = [
colab-ampere-torch220 = [
"unsloth[huggingface]",
"bitsandbytes",
"unsloth[cu121onlytorch220]",
@ -144,7 +145,7 @@ colab_ampere_torch220 = [
"ninja",
"flash-attn",
]
cu118_ampere = [
cu118-ampere = [
"unsloth[huggingface]",
"bitsandbytes",
"unsloth[cu118only]",
@ -152,7 +153,7 @@ cu118_ampere = [
"ninja",
"flash-attn",
]
cu121_ampere = [
cu121-ampere = [
"unsloth[huggingface]",
"bitsandbytes",
"unsloth[cu121only]",
@ -160,7 +161,7 @@ cu121_ampere = [
"ninja",
"flash-attn",
]
cu118_ampere_torch211 = [
cu118-ampere-torch211 = [
"unsloth[huggingface]",
"bitsandbytes",
"unsloth[cu118onlytorch211]",
@ -168,7 +169,7 @@ cu118_ampere_torch211 = [
"ninja",
"flash-attn",
]
cu121_ampere_torch211 = [
cu121-ampere-torch211 = [
"unsloth[huggingface]",
"bitsandbytes",
"unsloth[cu121onlytorch211]",
@ -176,7 +177,7 @@ cu121_ampere_torch211 = [
"ninja",
"flash-attn",
]
cu118_ampere_torch220 = [
cu118-ampere-torch220 = [
"unsloth[huggingface]",
"bitsandbytes",
"unsloth[cu118onlytorch220]",
@ -184,7 +185,7 @@ cu118_ampere_torch220 = [
"ninja",
"flash-attn",
]
cu121_ampere_torch220 = [
cu121-ampere-torch220 = [
"unsloth[huggingface]",
"bitsandbytes",
"unsloth[cu121onlytorch220]",

View file

@ -59,14 +59,38 @@ if (major_torch != 2):# or (major_torch == 2 and minor_torch < 1):
import bitsandbytes as bnb
import triton
from triton.common.build import libcuda_dirs
import os
import re
import numpy as np
import subprocess
try:
cdequantize_blockwise_fp32 = bnb.functional.lib.cdequantize_blockwise_fp32
libcuda_dirs()
except:
warnings.warn(
"Running `ldconfig /usr/lib64-nvidia` to link CUDA."\
"Unsloth: Running `ldconfig /usr/lib64-nvidia` to link CUDA."\
)
os.system("ldconfig /usr/lib64-nvidia")
if os.path.exists("/usr/lib64-nvidia"):
os.system("ldconfig /usr/lib64-nvidia")
elif os.path.exists("/usr/local"):
# Sometimes bitsandbytes cannot be linked properly in Runpod for example
possible_cudas = subprocess.check_output(["ls", "-al", "/usr/local"]).decode("utf-8").split("\n")
find_cuda = re.compile(r"[\s](cuda\-[\d\.]{2,})$")
possible_cudas = [find_cuda.search(x) for x in possible_cudas]
possible_cudas = [x.group(1) for x in possible_cudas if x is not None]
# Try linking cuda folder, or everything in local
if len(possible_cudas) == 0:
os.system(f"ldconfig /usr/local/")
else:
find_number = re.compile(r"([\d\.]{2,})")
latest_cuda = np.argsort([float(find_number.search(x).group(1)) for x in possible_cudas])[::-1][0]
latest_cuda = possible_cudas[latest_cuda]
os.system(f"ldconfig /usr/local/{latest_cuda}")
pass
importlib.reload(bnb)
importlib.reload(triton)
try:
@ -75,9 +99,10 @@ except:
cdequantize_blockwise_fp32 = bnb.functional.lib.cdequantize_blockwise_fp32
libcuda_dirs()
except:
raise ImportError("CUDA is not linked properly.\n"\
raise ImportError("Unsloth: CUDA is not linked properly.\n"\
"We tried running `ldconfig /usr/lib64-nvidia` ourselves, but it didn't work.\n"\
"You need to run in your terminal `ldconfig /usr/lib64-nvidia` yourself, then import Unsloth.")
"You need to run in your terminal `sudo ldconfig /usr/lib64-nvidia` yourself, then import Unsloth.\n"\
"Also try `sudo ldconfig /usr/local/cuda-xx.x` - find the latest cuda version.")
pass
from .models import *

View file

@ -17,6 +17,7 @@ from typing import Union, Optional, List, Any, Callable
import warnings
warnings.filterwarnings(action = "ignore", category = UserWarning, module = "torch")
warnings.filterwarnings(action = "ignore", category = UserWarning, module = "huggingface_hub")
warnings.filterwarnings(action = "ignore", category = RuntimeWarning, module = "subprocess")
import bitsandbytes as bnb
from transformers.models.llama.modeling_llama import logger
from transformers import AutoTokenizer

View file

@ -55,6 +55,7 @@ from peft import PeftModelForCausalLM
from bitsandbytes.nn import Linear4bit as Bnb_Linear4bit
from peft.tuners.lora import Linear4bit as Peft_Linear4bit
from ..save import patch_saving_functions
import re, os, inspect, math, sys
def original_apply_qkv(self, X):
@ -782,30 +783,33 @@ pass
# https://github.com/huggingface/transformers/pull/27931
# https://github.com/huggingface/transformers/blob/v4.37.2/src/transformers/models/llama/modeling_llama.py
class LlamaRotaryEmbedding(torch.nn.Module):
# Fixes https://github.com/huggingface/transformers/pull/28837
# https://github.com/microsoft/DeepSpeed/issues/4932
# The precision of RoPE buffers is not correct, so we cast to int64.
def __init__(self, dim, max_position_embeddings=2048, base=10000, device=None):
super().__init__()
self.dim = dim
self.max_position_embeddings = max_position_embeddings
self.base = base
inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2).float().to(device) / self.dim))
self.register_buffer("inv_freq", inv_freq, persistent=False)
# Build here to make `torch.jit.trace` work.
self._set_cos_sin_cache(
seq_len=max_position_embeddings, device=self.inv_freq.device, dtype=torch.get_default_dtype()
)
self._set_cos_sin_cache(seq_len=max_position_embeddings, device=device, dtype=torch.get_default_dtype())
pass
def _set_cos_sin_cache(self, seq_len, device, dtype):
# Note: on the original Llama codebase, these tensors are created on the target device (and not on CPU) and
# in FP32. They are applied (multiplied) in FP32 as well.
self.max_seq_len_cached = seq_len
t = torch.arange(self.max_seq_len_cached, device=device, dtype=self.inv_freq.dtype)
inv_freq = 1.0 / (
self.base ** (torch.arange(0, self.dim, 2, dtype=torch.int64, device="cpu").float() / self.dim)
)
t = torch.arange(self.max_seq_len_cached, device="cpu", dtype=torch.int64).float()
freqs = torch.outer(t, self.inv_freq)
freqs = torch.outer(t, inv_freq)
# Different from paper, but it uses a different permutation in order to obtain the same calculation
emb = torch.cat((freqs, freqs), dim=-1)
self.register_buffer("cos_cached", emb.cos().to(dtype), persistent=False)
self.register_buffer("sin_cached", emb.sin().to(dtype), persistent=False)
self.register_buffer("cos_cached", emb.cos().to(dtype=dtype, device=device, non_blocking=True), persistent=False)
self.register_buffer("sin_cached", emb.sin().to(dtype=dtype, device=device, non_blocking=True), persistent=False)
pass
def forward(self, x, seq_len=None):
@ -823,7 +827,9 @@ pass
class LlamaLinearScalingRotaryEmbedding(LlamaRotaryEmbedding):
"""LlamaRotaryEmbedding extended with linear scaling. Credits to the Reddit user /u/kaiokendev"""
# Fixes https://github.com/huggingface/transformers/pull/28837
# https://github.com/microsoft/DeepSpeed/issues/4932
# The precision of RoPE buffers is not correct, so we cast to int64.
def __init__(self, dim, max_position_embeddings=2048, base=10000, device=None, scaling_factor=1.0):
self.scaling_factor = scaling_factor
super().__init__(dim, max_position_embeddings, base, device)
@ -831,14 +837,17 @@ class LlamaLinearScalingRotaryEmbedding(LlamaRotaryEmbedding):
def _set_cos_sin_cache(self, seq_len, device, dtype):
self.max_seq_len_cached = seq_len
t = torch.arange(self.max_seq_len_cached, device=device, dtype=self.inv_freq.dtype)
inv_freq = 1.0 / (
self.base ** (torch.arange(0, self.dim, 2, dtype=torch.int64, device="cpu").float() / self.dim)
)
t = torch.arange(self.max_seq_len_cached, device="cpu", dtype=torch.int64).float()
t = t / self.scaling_factor
freqs = torch.outer(t, self.inv_freq)
freqs = torch.outer(t, inv_freq)
# Different from paper, but it uses a different permutation in order to obtain the same calculation
emb = torch.cat((freqs, freqs), dim=-1)
self.register_buffer("cos_cached", emb.cos().to(dtype), persistent=False)
self.register_buffer("sin_cached", emb.sin().to(dtype), persistent=False)
self.register_buffer("cos_cached", emb.cos().to(dtype=dtype, device=device, non_blocking=True), persistent=False)
self.register_buffer("sin_cached", emb.sin().to(dtype=dtype, device=device, non_blocking=True), persistent=False)
pass
pass
@ -954,6 +963,125 @@ class FastLlamaModel:
layer.self_attn.apply_o = original_apply_o
pass
# Patch Trainer
from transformers.trainer import Trainer
try:
if Trainer._inner_training_loop.__name__ != "_fast_inner_training_loop":
inner_training_loop = inspect.getsource(Trainer._inner_training_loop)
Trainer._original_training_loop = inner_training_loop
else:
inner_training_loop = Trainer._original_training_loop
except:
raise RuntimeError(
"Our OSS was designed for people with few GPU resources to level the playing field.\n"
"The OSS Apache 2 license only supports four GPUs - please obtain a commercial license from our website.\n"
"We're a 2 person team, so we still have to fund our development costs - thanks!\n"
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
)
pass
import transformers.trainer
items_in_trainer = dir(transformers.trainer)
good_items = []
for item in items_in_trainer:
# TODO: Support Deepspeed
if item.startswith(("deepspeed", "xm", "met", "smp")): continue
if item in inner_training_loop: good_items.append(item)
pass
exec("from transformers.trainer import (" + ", ".join(x for x in good_items) + ")", globals())
start = re.search('logger\.info\([\"\'].+?Running training', inner_training_loop).span(0)[0]
end = inner_training_loop.find("\n\n", start)
original_debug = inner_training_loop[start:end]
spaces = re.search('\n([\s\t]{1,})', original_debug).group(0)[1:]
front_spaces = re.match('([\s\t]{1,})', inner_training_loop).group(0)
debug_info = """debug_info = \\
f"==((====))== Unsloth - 2x faster free finetuning | Num GPUs = {args.world_size}\\n"\\
f" \\\\\\ /| Num examples = {num_examples:,} | Num Epochs = {num_train_epochs:,}\\n"\\
f"O^O/ \\_/ \\ Batch size per device = {self._train_batch_size:,} | Gradient Accumulation steps = {args.gradient_accumulation_steps}\\n"\\
f"\\ / Total batch size = {total_train_batch_size:,} | Total steps = {max_steps:,}\\n"\\
f' "-____-" Number of trainable parameters = {get_model_param_count(model, trainable_only=True):,}'
logger.warning_once(debug_info)"""
debug_info = debug_info.split('\n')
debug_info = "\n".join([debug_info[0]] + [spaces + x[8:] for x in debug_info[1:]])
inner_training_loop = inner_training_loop.replace(original_debug, debug_info)
debug_info = """n_total_devices = total_train_batch_size // \\
args.gradient_accumulation_steps // self._train_batch_size
if n_total_devices > 2:
logger.warning_once(
"Our OSS was designed for people with few GPU resources to level the playing field.\\n"
"The OSS Apache 2 license only supports four GPUs - please obtain a commercial license from our website.\\n"
"We're a 2 person team, so we still have to fund our development costs - thanks!\\n"
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
)
debug_info ="""
debug_info = debug_info.split('\n')
debug_info = "\n".join([debug_info[0]] + [spaces + x[8:] for x in debug_info[1:]])
inner_training_loop = inner_training_loop.replace("debug_info =", debug_info, 1)
front_spaces = re.match(r"[\t\s]{1,}", inner_training_loop).group(0)
inner_training_loop = re.sub(r"^" + front_spaces, "", inner_training_loop, flags = re.MULTILINE)
inner_training_loop = inner_training_loop.replace(
"train_dataloader = tpu_spmd_dataloader(train_dataloader)",
"raise RuntimeError('Unsloth: TPUs are not yet supported!')"
)
inner_training_loop = inner_training_loop.replace(
"self.accelerator.free_memory()",
"self.accelerator.free_memory()\n" + \
front_spaces + "if self.is_deepspeed_enabled:"\
"raise RuntimeError('Unsloth: Deepspeed is not yet supported!')\n", 1,
)
check_batches = """train_dataloader = self.get_train_dataloader()
ga = args.gradient_accumulation_steps
bsz = self._train_batch_size
total_batches = bsz * ga * args.world_size
n_total_devices = total_batches // ga // bsz
if n_total_devices > 2:
logger.warning_once(
"Please consider a commercial license - Unsloth was designed for the GPU Poor.\\n"
"The OSS currently works on 4 GPUs - we're a 2 person team, so please help fund\\n"
"our development costs by supporting us through Ko-fi or buying a license! Thanks!",
)
divisor = n_total_devices / 2
bsz = self._train_batch_size = max(int(bsz / divisor), 1)
if total_batches // ga // bsz > 2:
divisor = n_total_devices / 2
ga = args.gradient_accumulation_steps = max(int(ga / divisor), 1)"""
check_batches = check_batches.split('\n')
check_batches = "\n".join([check_batches[0]] + [front_spaces + x[8:] for x in check_batches[1:]])
inner_training_loop = inner_training_loop.replace(
"train_dataloader = self.get_train_dataloader()",
check_batches, 1,
)
inner_training_loop = inner_training_loop.replace(
"_inner_training_loop",
"_fast_inner_training_loop", 1,
)
exec(inner_training_loop, globals())
Trainer._inner_training_loop = _fast_inner_training_loop
inner_training_loop = inner_training_loop.replace(
"is_torch_tpu_available()",
"False",
)
if "n_total_devices >" not in inner_training_loop:
raise RuntimeError(
"Our OSS was designed for people with few GPU resources to level the playing field.\n"
"The OSS Apache 2 license only supports four GPUs - please obtain a commercial license from our website.\n"
"We're a 2 person team, so we still have to fund our development costs - thanks!\n"
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
)
pass
inner_training_loop = inner_training_loop.replace(
"is_sagemaker_mp_enabled()",
"False",
)
Trainer._inner_training_loop = _fast_inner_training_loop
# Save max_seq_length
model.max_seq_length = max_position_embeddings
internal_model = model
@ -1073,7 +1201,7 @@ class FastLlamaModel:
signature = str(inspect.signature(LoraConfig))
SUPPORTS_LOFTQ = "loftq_config" in signature
SUPPORTS_RSLORA = "use_rslora" in signature
assert(max_seq_length <= model.max_seq_length)
if lora_dropout != 0:
@ -1200,6 +1328,28 @@ class FastLlamaModel:
model.peft_config[active_adapter].revision = f"unsloth"
pass
from transformers.trainer import Trainer
if Trainer._inner_training_loop.__name__ != "_fast_inner_training_loop":
raise RuntimeError(
"Our OSS was designed for people with few GPU resources to level the playing field.\n"
"The OSS Apache 2 license only supports four GPUs - please obtain a commercial license from our website.\n"
"We're a 2 person team, so we still have to fund our development costs - thanks!\n"
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
)
pass
# Fix loftq issues
# loftq_config must not = None, but rather {}
all_configs = model.peft_config
for key, current_config in all_configs.items():
if hasattr(current_config, "loftq_config") and current_config.loftq_config is None:
new_args = current_config.__dict__
new_args["loftq_config"] = {}
current_config = current_config.__class__(**new_args)
all_configs[key] = current_config
pass
pass
# Do patching
n_mlp = 0
n_qkv = 0

View file

@ -118,9 +118,13 @@ class FastLanguageModel(FastLlamaModel):
*args, **kwargs,
)
# in case the model supports tagging, add the unsloth tag.
# In case the model supports tagging, add the unsloth tag.
if hasattr(model, "add_model_tags"):
model.add_model_tags(["unsloth"])
model.add_model_tags(["unsloth",])
pass
if hasattr(tokenizer, "add_model_tags"):
tokenizer.add_model_tags(["unsloth",])
pass
if load_in_4bit:
# Fix up bitsandbytes config
@ -143,7 +147,7 @@ class FastLanguageModel(FastLlamaModel):
if is_peft:
# Now add PEFT adapters
model = PeftModel.from_pretrained(model, old_model_name)
model = PeftModel.from_pretrained(model, old_model_name, token = token)
# Patch it as well!
model = dispatch_model.patch_peft_model(model, use_gradient_checkpointing)
pass

View file

@ -42,6 +42,10 @@ __INT_TO_FLOAT_MAPPER = \
"unsloth/tinyllama",
"TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T",
),
"unsloth/tinyllama-chat-bnb-4bit" : (
"unsloth/tinyllama-chat",
"TinyLlama/TinyLlama-1.1B-Chat-v1.0",
),
"unsloth/mistral-7b-instruct-v0.1-bnb-4bit" : (
"mistralai/Mistral-7B-Instruct-v0.1",
),

View file

@ -368,6 +368,140 @@ class FastMistralModel(FastLlamaModel):
layer.self_attn.apply_o = original_apply_o
pass
# Patch Trainer
from transformers.trainer import Trainer
if Trainer._inner_training_loop.__name__ != "_fast_inner_training_loop":
try:
inner_training_loop = inspect.getsource(Trainer._inner_training_loop)
except:
raise RuntimeError(
"Our OSS was designed for people with few GPU resources to level the playing field.\n"
"The OSS Apache 2 license only supports four GPUs - please obtain a commercial license from our website.\n"
"We're a 2 person team, so we still have to fund our development costs - thanks!\n"
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
)
pass
pass
# Patch Trainer
from transformers.trainer import Trainer
try:
if Trainer._inner_training_loop.__name__ != "_fast_inner_training_loop":
inner_training_loop = inspect.getsource(Trainer._inner_training_loop)
Trainer._original_training_loop = inner_training_loop
else:
inner_training_loop = Trainer._original_training_loop
except:
raise RuntimeError(
"Our OSS was designed for people with few GPU resources to level the playing field.\n"
"The OSS Apache 2 license only supports four GPUs - please obtain a commercial license from our website.\n"
"We're a 2 person team, so we still have to fund our development costs - thanks!\n"
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
)
pass
import transformers.trainer
items_in_trainer = dir(transformers.trainer)
good_items = []
for item in items_in_trainer:
# TODO: Support Deepspeed
if item.startswith(("deepspeed", "xm", "met", "smp")): continue
if item in inner_training_loop: good_items.append(item)
pass
exec("from transformers.trainer import (" + ", ".join(x for x in good_items) + ")", globals())
start = re.search('logger\.info\([\"\'].+?Running training', inner_training_loop).span(0)[0]
end = inner_training_loop.find("\n\n", start)
original_debug = inner_training_loop[start:end]
spaces = re.search('\n([\s\t]{1,})', original_debug).group(0)[1:]
front_spaces = re.match('([\s\t]{1,})', inner_training_loop).group(0)
debug_info = """debug_info = \\
f"==((====))== Unsloth - 2x faster free finetuning | Num GPUs = {args.world_size}\\n"\\
f" \\\\\\ /| Num examples = {num_examples:,} | Num Epochs = {num_train_epochs:,}\\n"\\
f"O^O/ \\_/ \\ Batch size per device = {self._train_batch_size:,} | Gradient Accumulation steps = {args.gradient_accumulation_steps}\\n"\\
f"\\ / Total batch size = {total_train_batch_size:,} | Total steps = {max_steps:,}\\n"\\
f' "-____-" Number of trainable parameters = {get_model_param_count(model, trainable_only=True):,}'
logger.warning_once(debug_info)"""
debug_info = debug_info.split('\n')
debug_info = "\n".join([debug_info[0]] + [spaces + x[8:] for x in debug_info[1:]])
inner_training_loop = inner_training_loop.replace(original_debug, debug_info)
debug_info = """n_total_devices = total_train_batch_size // \\
args.gradient_accumulation_steps // self._train_batch_size
if n_total_devices > 2:
logger.warning_once(
"Our OSS was designed for people with few GPU resources to level the playing field.\\n"
"The OSS Apache 2 license only supports four GPUs - please obtain a commercial license from our website.\\n"
"We're a 2 person team, so we still have to fund our development costs - thanks!\\n"
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
)
debug_info ="""
debug_info = debug_info.split('\n')
debug_info = "\n".join([debug_info[0]] + [spaces + x[8:] for x in debug_info[1:]])
inner_training_loop = inner_training_loop.replace("debug_info =", debug_info, 1)
front_spaces = re.match(r"[\t\s]{1,}", inner_training_loop).group(0)
inner_training_loop = re.sub(r"^" + front_spaces, "", inner_training_loop, flags = re.MULTILINE)
inner_training_loop = inner_training_loop.replace(
"train_dataloader = tpu_spmd_dataloader(train_dataloader)",
"raise RuntimeError('Unsloth: TPUs are not yet supported!')"
)
inner_training_loop = inner_training_loop.replace(
"self.accelerator.free_memory()",
"self.accelerator.free_memory()\n" + \
front_spaces + "if self.is_deepspeed_enabled:"\
"raise RuntimeError('Unsloth: Deepspeed is not yet supported!')\n", 1,
)
check_batches = """train_dataloader = self.get_train_dataloader()
ga = args.gradient_accumulation_steps
bsz = self._train_batch_size
total_batches = bsz * ga * args.world_size
n_total_devices = total_batches // ga // bsz
if n_total_devices > 2:
logger.warning_once(
"Please consider a commercial license - Unsloth was designed for the GPU Poor.\\n"
"The OSS currently works on 4 GPUs - we're a 2 person team, so please help fund\\n"
"our development costs by supporting us through Ko-fi or buying a license! Thanks!",
)
divisor = n_total_devices / 2
bsz = self._train_batch_size = max(int(bsz / divisor), 1)
if total_batches // ga // bsz > 2:
divisor = n_total_devices / 2
ga = args.gradient_accumulation_steps = max(int(ga / divisor), 1)"""
check_batches = check_batches.split('\n')
check_batches = "\n".join([check_batches[0]] + [front_spaces + x[8:] for x in check_batches[1:]])
inner_training_loop = inner_training_loop.replace(
"train_dataloader = self.get_train_dataloader()",
check_batches, 1,
)
inner_training_loop = inner_training_loop.replace(
"_inner_training_loop",
"_fast_inner_training_loop", 1,
)
exec(inner_training_loop, globals())
Trainer._inner_training_loop = _fast_inner_training_loop
inner_training_loop = inner_training_loop.replace(
"is_torch_tpu_available()",
"False",
)
if "n_total_devices >" not in inner_training_loop:
raise RuntimeError(
"Our OSS was designed for people with few GPU resources to level the playing field.\n"
"The OSS Apache 2 license only supports four GPUs - please obtain a commercial license from our website.\n"
"We're a 2 person team, so we still have to fund our development costs - thanks!\n"
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
)
pass
inner_training_loop = inner_training_loop.replace(
"is_sagemaker_mp_enabled()",
"False",
)
Trainer._inner_training_loop = _fast_inner_training_loop
# Save max_seq_length
max_position_embeddings = max(max_seq_length, model.config.max_position_embeddings)
model.max_seq_length = max_position_embeddings

View file

@ -140,17 +140,28 @@ def unsloth_save_model(
# Push to hub
use_temp_dir : Optional[bool] = None,
commit_message : Optional[str] = None,
commit_message : Optional[str] = "Trained with Unsloth",
private : Optional[bool] = None,
create_pr : bool = False,
revision : str = None,
commit_description : str = None,
commit_description : str = "Upload model trained with Unsloth 2x faster",
tags : List[str] = None,
# Our functions
temporary_location : str = "_unsloth_temporary_saved_buffers",
maximum_memory_usage : float = 0.9,
):
if commit_message is None: commit_message = ""
if "Unsloth" not in commit_message:
commit_message += " (Trained with Unsloth)"
commit_message = commit_message.lstrip()
if commit_description is None:
commit_description = "Upload model trained with Unsloth 2x faster"
elif "Unsloth 2x faster" not in commit_description:
commit_description += " (Trained with Unsloth 2x faster)"
pass
if save_method == "merged_4bit":
raise RuntimeError(
"Unsloth: Merging into 4bit will cause your model to lose accuracy if you plan\n"\
@ -202,7 +213,7 @@ def unsloth_save_model(
pass
save_pretrained_settings["tags"] = tags
if (save_method == "lora") and push_to_hub:
if ((save_method == "lora") or (save_method == "merged_4bit")) and push_to_hub:
if token is None:
raise RuntimeError(
"Unsloth: Pushing to HF requires a token. Pass `token = 'hf_....'`\n"\
@ -210,7 +221,20 @@ def unsloth_save_model(
)
pass
model.push_to_hub(
if save_method == "lora":
print("Unsloth: Saving LoRA adapters. Please wait...")
elif save_method == "merged_4bit":
print("Unsloth: Saving 4bit Bitsandbytes model. Please wait...")
pass
# Update model tag
_ = upload_to_huggingface(
model, save_directory, token,
"finetuned", "trl", file_location = None,
old_username = None, private = private,
)
model.original_push_to_hub(
repo_id = save_directory,
use_temp_dir = use_temp_dir,
commit_message = commit_message,
@ -224,7 +248,7 @@ def unsloth_save_model(
tags = tags,
)
if tokenizer is not None:
tokenizer.push_to_hub(
tokenizer.original_push_to_hub(
repo_id = save_directory,
use_temp_dir = use_temp_dir,
commit_message = commit_message,
@ -238,33 +262,13 @@ def unsloth_save_model(
tags = tags,
)
pass
if hasattr(model, "config"):
print(f"Saved {save_method} model to https://huggingface.co/" + save_directory)
pass
return save_directory
pass
# Update model tag
username = ""
if push_to_hub:
username = upload_to_huggingface(
model, save_directory, token,
"finetuned", "trl", file_location = None,
)
pass
# If push_to_hub, we must remove the .../ part of a repo
if push_to_hub and "/" in save_directory:
# +1 solves absolute path issues
new_save_directory = save_directory[save_directory.find("/")+1:]
logger.warning_once(
f"Unsloth: You are pushing to hub, but you passed your HF username.\n"\
f"We shall truncate {save_directory} to {new_save_directory}"
)
save_pretrained_settings["save_directory"] = new_save_directory
save_directory = new_save_directory
pass
# Tokenizer has different saving arguments
tokenizer_save_settings = \
{
@ -292,13 +296,25 @@ def unsloth_save_model(
# Do general saving
# Edit save_pretrained_settings
# [TODO] _create_repo has errors due to **kwargs getting accepted
for deletion in \
("use_temp_dir", "commit_message", "create_pr", "revision", "commit_description", "tags",):
# commit_description does not seem to work?
what_to_delete = ("use_temp_dir", "commit_message", "create_pr", "revision", "commit_description", "tags",) \
if save_pretrained_settings["push_to_hub"] is False else \
("use_temp_dir", "create_pr", "revision", "tags", "commit_description",)
for deletion in what_to_delete:
del save_pretrained_settings[deletion]
pass
if hasattr(model, "add_model_tags"):
model.add_model_tags(["unsloth",])
# Update model tag
if push_to_hub:
_ = upload_to_huggingface(
model, save_pretrained_settings["save_directory"], token,
"finetuned", "trl", file_location = None,
old_username = None, private = private,
)
pass
if tokenizer is not None:
print("Unsloth: Saving tokenizer...", end = "")
tokenizer.save_pretrained(**tokenizer_save_settings)
@ -310,10 +326,33 @@ def unsloth_save_model(
if save_method != "lora": print(" This might take 10 minutes for Llama-7b...", end = "")
model.save_pretrained(**save_pretrained_settings)
if push_to_hub and hasattr(model, "config"):
print("Saved to https://huggingface.co/" + save_pretrained_settings["save_directory"])
pass
print(" Done.")
return save_directory
pass
# If push_to_hub, we must remove the .../ part of a repo
username = None
if push_to_hub and "/" in save_directory:
# +1 solves absolute path issues
username = save_directory[:save_directory.find("/")]
new_save_directory = save_directory[save_directory.find("/")+1:]
logger.warning_once(
f"Unsloth: You are pushing to hub, but you passed your HF username = {username}.\n"\
f"We shall truncate {save_directory} to {new_save_directory}"
)
save_pretrained_settings["save_directory"] = new_save_directory
tokenizer_save_settings ["save_directory"] = new_save_directory
save_directory = new_save_directory
pass
print("Unsloth: Merging 4bit and LoRA weights to 16bit...")
# Determine max RAM usage minus sharding
@ -339,7 +378,7 @@ def unsloth_save_model(
logger.warning_once(
f"Unsloth: You have {n_cpus} CPUs. Using `safe_serialization` is 10x slower.\n"\
f"We shall switch to Pytorch saving, which will take 3 minutes and not 30 minutes.\n"\
f"To force `safe_serialization`, set it to None instead.",
f"To force `safe_serialization`, set it to `None` instead.",
)
safe_serialization = False
save_function = fast_save_pickle
@ -413,13 +452,26 @@ def unsloth_save_model(
# Edit save_pretrained_settings
# [TODO] _create_repo has errors due to **kwargs getting accepted
save_pretrained_settings["state_dict"] = state_dict
for deletion in \
("use_temp_dir", "commit_message", "create_pr", "revision", "commit_description", "tags",):
# commit_description does not seem to work?
what_to_delete = ("use_temp_dir", "commit_message", "create_pr", "revision", "commit_description", "tags",) \
if not push_to_hub else \
("use_temp_dir", "create_pr", "revision", "tags", "commit_description",)
for deletion in what_to_delete:
del save_pretrained_settings[deletion]
pass
if hasattr(model, "add_model_tags"):
model.add_model_tags(["unsloth",])
# Update model tag
if push_to_hub:
_ = upload_to_huggingface(
model, save_pretrained_settings["save_directory"], token,
"finetuned", "trl", file_location = None,
old_username = username, private = private,
)
pass
if tokenizer is not None:
print("Unsloth: Saving tokenizer...", end = "")
tokenizer.save_pretrained(**tokenizer_save_settings)
@ -452,9 +504,8 @@ def unsloth_save_model(
model.config = old_config
print("Done.")
# Print location
if push_to_hub:
print(f"Saved to https://huggingface.co/{username}/{save_directory.lstrip('/')}")
if push_to_hub and hasattr(model, "config"):
print(f"Saved merged model to https://huggingface.co/{username}/{save_directory.lstrip('/')}")
pass
save_pretrained_settings["state_dict"] = None
@ -478,7 +529,7 @@ def unsloth_save_model(
for _ in range(3):
torch.cuda.empty_cache()
gc.collect()
return save_directory
return save_directory, username
pass
@ -494,7 +545,7 @@ def install_llama_cpp_make_non_blocking():
n_jobs = max(int(psutil.cpu_count()*1.5), 1)
# Force make clean
os.system("make clean -C llama.cpp")
full_command = ["make", "all", "-j", str(n_jobs), "-C", "llama.cpp"]
full_command = ["make", "all", "-j"+str(n_jobs), "-C", "llama.cpp"]
run_installer = subprocess.Popen(full_command, env = env, stdout = subprocess.DEVNULL, stderr = subprocess.STDOUT)
return run_installer
pass
@ -507,10 +558,44 @@ def install_python_non_blocking(packages = []):
pass
def install_llama_cpp_old(version = -10):
# Download the 10th latest release since the latest might be broken!
# FALLBACK mechanism
releases = subprocess.check_output(["git", "ls-remote", "--tags", "https://github.com/ggerganov/llama.cpp.git"])
releases = releases.decode("utf-8").replace("\t", " ").split("\n")
for i, x in enumerate(releases):
if "refs/tags/b" not in x: break
releases = releases[:i]
latest = releases[-1]
version = releases[version].split(" ")[0]
# Clone a specific commit
commands = [
"git clone https://github.com/ggerganov/llama.cpp",
f"cd llama.cpp && git reset --hard {version} && git clean -df && "\
f"make clean && LLAMA_CUBLAS=1 make all -j{psutil.cpu_count()*2}",
"pip install gguf protobuf",
]
for command in commands:
with subprocess.Popen(command, shell = True, stdout = subprocess.PIPE, bufsize = 1) as sp:
for line in sp.stdout:
print(line.decode("utf-8"), flush = True, end = "")
pass
pass
# Check if successful
if not os.path.exists("llama.cpp/quantize"):
raise RuntimeError(
"Unsloth: llama.cpp GGUF seems to be too buggy to install.\n"\
"File a report to llama.cpp's main repo since this is not an Unsloth issue."
)
pass
pass
def install_llama_cpp_blocking():
commands = [
"git clone https://github.com/ggerganov/llama.cpp",
f"cd llama.cpp && make clean && LLAMA_CUBLAS=1 make all -j {psutil.cpu_count()*2}",
f"cd llama.cpp && make clean && LLAMA_CUBLAS=1 make all -j{psutil.cpu_count()*2}",
"pip install gguf protobuf",
]
if os.path.exists("llama.cpp"): return
@ -563,10 +648,13 @@ def save_to_gguf(
print("Unsloth: [0] Installing llama.cpp. This will take 3 minutes...")
if _run_installer is not None:
_run_installer.wait()
error = _run_installer.wait()
else:
error = 0
install_llama_cpp_blocking()
pass
# Check if successful. If not install 10th latest release
if error != 0 or not os.path.exists("llama.cpp/quantize"): install_llama_cpp_old(-10)
if quantization_method == "f32": first_conversion = "f32"
elif quantization_method == "f16": first_conversion = "f16"
@ -580,15 +668,18 @@ def save_to_gguf(
first_conversion = "f16"
pass
pass
print(f"Unsloth: [1] Converting HF into {first_conversion} GGUF format. This will take 3 minutes...")
n_cpus = psutil.cpu_count()*2
# Concurrency from https://rentry.org/llama-cpp-conversions#merging-loras-into-a-model
final_location = f"./{model_directory}-unsloth.{first_conversion.upper()}.gguf"
print(f"Unsloth: [1] Converting model at {model_directory} into {first_conversion} GGUF format.\n"\
f"The output location will be {final_location}\n"\
"This will take 3 minutes...")
command = f"python llama.cpp/convert.py {model_directory} "\
f"--outfile {final_location} "\
f"--outfile {final_location} --vocab-type hfft "\
f"--outtype {first_conversion} --concurrency {n_cpus}"
with subprocess.Popen(command, shell = True, stdout = subprocess.PIPE, stderr = subprocess.PIPE, bufsize = 1) as sp:
@ -601,7 +692,8 @@ def save_to_gguf(
# Check if quantization succeeded!
if not os.path.isfile(final_location):
raise RuntimeError(
"Unsloth: Quantization failed! You might have to compile llama.cpp yourself, then run this again.\n"\
f"Unsloth: Quantization failed for {final_location}\n"\
"You might have to compile llama.cpp yourself, then run this again.\n"\
"You do not need to close this Python program. Run the following commands in a new terminal:\n"\
"You must run this in the same folder as you're saving your model.\n"\
"git clone https://github.com/ggerganov/llama.cpp\n"\
@ -662,7 +754,7 @@ def unsloth_save_pretrained_merged(
save_peft_format : bool = True,
tags : List[str] = None,
temporary_location : str = "_unsloth_temporary_saved_buffers",
maximum_memory_usage : float = 0.85,
maximum_memory_usage : float = 0.85,
):
"""
Same as .save_pretrained(...) except 4bit weights are auto
@ -695,14 +787,14 @@ def unsloth_push_to_hub_merged(
tokenizer = None,
save_method : str = "merged_16bit", # ["lora", "merged_16bit", "merged_4bit"]
use_temp_dir : Optional[bool] = None,
commit_message : Optional[str] = None,
commit_message : Optional[str] = "Trained with Unsloth",
private : Optional[bool] = None,
token : Union[bool, str, None] = None,
max_shard_size : Union[int, str, None] = "5GB",
create_pr : bool = False,
safe_serialization : bool = True,
revision : str = None,
commit_description : str = None,
commit_description : str = "Upload model trained with Unsloth 2x faster",
tags : Optional[List[str]] = None,
temporary_location : str = "_unsloth_temporary_saved_buffers",
maximum_memory_usage : float = 0.85,
@ -760,15 +852,27 @@ This {model_type} model was trained 2x faster with [Unsloth](https://github.com/
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
"""
def upload_to_huggingface(model, save_directory, token, method, extra = "", file_location = None):
def upload_to_huggingface(
model,
save_directory,
token,
method,
extra = "",
file_location = None,
old_username = None,
private = None,
):
# Check for username
username = ""
save_directory = save_directory.lstrip("./")
if "/" not in save_directory:
from huggingface_hub import whoami
try:
username = whoami()['name']
save_directory = f"{save_directory}/{username}"
username = whoami(token = token)["name"]
if type(old_username) is str and username != old_username:
username = old_username
pass
save_directory = f"{username}/{save_directory}"
except:
raise RuntimeError(f"Unsloth: {save_directory} is not a Huggingface directory.")
else:
@ -776,24 +880,28 @@ def upload_to_huggingface(model, save_directory, token, method, extra = "", file
pass
from huggingface_hub import create_repo
create_repo(
repo_id = save_directory,
token = token,
repo_type = "model",
exist_ok = True,
)
try:
create_repo(
repo_id = save_directory,
token = token,
repo_type = "model",
exist_ok = False,
private = private,
)
# Create model card
from huggingface_hub import ModelCard
content = MODEL_CARD.format(
username = username,
base_model = model.config._name_or_path,
model_type = model.config.model_type,
method = "",
extra = extra,
)
card = ModelCard(content)
card.push_to_hub(save_directory, token = token)
# Create model card
from huggingface_hub import ModelCard
content = MODEL_CARD.format(
username = username,
base_model = model.config._name_or_path,
model_type = model.config.model_type,
method = "",
extra = extra,
)
card = ModelCard(content)
card.push_to_hub(save_directory, token = token)
except:
pass
if file_location is not None:
# Now upload file
@ -811,6 +919,7 @@ def upload_to_huggingface(model, save_directory, token, method, extra = "", file
path_in_repo = uploaded_location,
repo_id = save_directory,
repo_type = "model",
commit_message = "(Trained with Unsloth)",
)
# We also upload a config.json file
@ -823,6 +932,7 @@ def upload_to_huggingface(model, save_directory, token, method, extra = "", file
path_in_repo = "config.json",
repo_id = save_directory,
repo_type = "model",
commit_message = "(Trained with Unsloth)",
)
os.remove("_temporary_unsloth_config.json")
pass
@ -838,6 +948,7 @@ def unsloth_save_pretrained_gguf(
first_conversion : str = "f16",
push_to_hub : bool = False,
token : Optional[Union[str, bool]] = None,
private : Optional[bool] = None,
is_main_process : bool = True,
state_dict : Optional[dict] = None,
save_function : Callable = torch.save,
@ -847,7 +958,7 @@ def unsloth_save_pretrained_gguf(
save_peft_format : bool = True,
tags : List[str] = None,
temporary_location : str = "_unsloth_temporary_saved_buffers",
maximum_memory_usage : float = 0.85,
maximum_memory_usage : float = 0.85,
):
"""
Same as .save_pretrained(...) except 4bit weights are auto
@ -898,11 +1009,11 @@ def unsloth_save_pretrained_gguf(
python_install = install_python_non_blocking(["gguf", "protobuf"])
git_clone.wait()
makefile = install_llama_cpp_make_non_blocking()
new_save_directory = unsloth_save_model(**arguments)
new_save_directory, old_username = unsloth_save_model(**arguments)
python_install.wait()
else:
try:
new_save_directory = unsloth_save_model(**arguments)
new_save_directory, old_username = unsloth_save_model(**arguments)
makefile = None
except:
# Retry by recloning llama.cpp
@ -910,7 +1021,7 @@ def unsloth_save_pretrained_gguf(
python_install = install_python_non_blocking(["gguf", "protobuf"])
git_clone.wait()
makefile = install_llama_cpp_make_non_blocking()
new_save_directory = unsloth_save_model(**arguments)
new_save_directory, old_username = unsloth_save_model(**arguments)
python_install.wait()
pass
pass
@ -924,12 +1035,12 @@ def unsloth_save_pretrained_gguf(
print("Unsloth: Uploading GGUF to Huggingface Hub...")
username = upload_to_huggingface(
self, save_directory, token,
"GGUF converted", "gguf", file_location,
"GGUF converted", "gguf", file_location, old_username, private,
)
link = f"{username}/{new_save_directory.lstrip('/.')}" \
if username not in new_save_directory else \
new_save_directory.lstrip('/.')
print(f"Saved to https://huggingface.co/{link}")
print(f"Saved GGUF to https://huggingface.co/{link}")
pass
pass
@ -941,14 +1052,14 @@ def unsloth_push_to_hub_gguf(
quantization_method : str = "fast_quantized",
first_conversion : str = "f16",
use_temp_dir : Optional[bool] = None,
commit_message : Optional[str] = None,
commit_message : Optional[str] = "Trained with Unsloth",
private : Optional[bool] = None,
token : Union[bool, str, None] = None,
max_shard_size : Union[int, str, None] = "5GB",
create_pr : bool = False,
safe_serialization : bool = True,
revision : str = None,
commit_description : str = None,
commit_description : str = "Upload model trained with Unsloth 2x faster",
tags : Optional[List[str]] = None,
temporary_location : str = "_unsloth_temporary_saved_buffers",
maximum_memory_usage : float = 0.85,
@ -998,19 +1109,19 @@ def unsloth_push_to_hub_gguf(
python_install = install_python_non_blocking(["gguf", "protobuf"])
git_clone.wait()
makefile = install_llama_cpp_make_non_blocking()
new_save_directory = unsloth_save_model(**arguments)
new_save_directory, old_username = unsloth_save_model(**arguments)
python_install.wait()
else:
try:
new_save_directory = unsloth_save_model(**arguments)
new_save_directory, old_username = unsloth_save_model(**arguments)
makefile = None
except:
# Retry by recloning llama.cpp
git_clone = install_llama_cpp_clone_non_blocking()
python_install = install_python_non_blocking(["gguf", "protobuf"])
git_clone.wait()
makefile = install_llama_cpp_make_non_blocking()
new_save_directory = unsloth_save_model(**arguments)
makefile = install_llama_cpp_make_non_blocking()
new_save_directory, old_username = unsloth_save_model(**arguments)
python_install.wait()
pass
pass
@ -1023,12 +1134,12 @@ def unsloth_push_to_hub_gguf(
print("Unsloth: Uploading GGUF to Huggingface Hub...")
username = upload_to_huggingface(
self, repo_id, token,
"GGUF converted", "gguf", file_location,
"GGUF converted", "gguf", file_location, old_username, private,
)
link = f"{username}/{new_save_directory.lstrip('/.')}" \
if username not in new_save_directory else \
new_save_directory.lstrip('/.')
print(f"Saved to https://huggingface.co/{link}")
print(f"Saved GGUF to https://huggingface.co/{link}")
pass
@ -1038,31 +1149,17 @@ def patch_saving_functions(model):
import types
from typing import Callable, Optional, Union, List
if hasattr(model, "_original_push_to_hub"): return
# First check if this has already been called, and revert it
original_model = model
while True:
if hasattr(original_model, "_original_push_to_hub"):
original_model.push_to_hub = original_model._original_push_to_hub
del original_model._original_push_to_hub
if hasattr(original_model, "push_to_hub_merged"): del original_model.push_to_hub_merged
if hasattr(original_model, "save_pretrained_merged"): del original_model.save_pretrained_merged
if hasattr(original_model, "push_to_hub_gguf"): del original_model.push_to_hub_gguf
if hasattr(original_model, "save_pretrained_gguf"): del original_model.save_pretrained_gguf
pass
if hasattr(original_model, "model"): original_model = original_model.model
else: break
# And now re add our saving methods!
if model.push_to_hub.__name__ == "unsloth_push_to_hub":
original_push_to_hub = model.original_push_to_hub
else:
original_push_to_hub = model.push_to_hub
pass
# And now re add our saving methods!
original_push_to_hub = model.push_to_hub
signature = str(inspect.signature(original_push_to_hub)).replace("NoneType", "None")
signature = signature[1:]
signature = re.sub("<function save at .+?>", "torch.save", signature)
docs = original_push_to_hub.__doc__.encode("utf-8").decode("utf-8")
model._original_push_to_hub = original_push_to_hub
push_to_hub_text = f'''def unsloth_push_to_hub(self, {signature}:
"""
@ -1077,11 +1174,45 @@ def patch_saving_functions(model):
arguments["tags"] = ["unsloth",]
elif hasattr(self, "add_model_tags"):
self.add_model_tags(["unsloth",])
if "commit_message" in arguments:
commit_message = arguments["commit_message"]
if commit_message is not None:
if not commit_message.endswith(" "): commit_message += " "
if "Unsloth" not in commit_message:
commit_message += "(Trained with Unsloth)"
else:
commit_message = "Upload model trained with Unsloth"
arguments["commit_message"] = commit_message
if "commit_description" in arguments:
commit_description = arguments["commit_description"]
if commit_description is not None:
if not commit_description.endswith(" "): commit_description += " "
if "Unsloth" not in commit_description:
commit_description += "(Trained with Unsloth 2x faster)"
else:
commit_description = "Upload model trained with Unsloth 2x faster"
arguments["commit_description"] = commit_description
# Update model tag
if hasattr(self, "config"):
_ = upload_to_huggingface(
self, arguments["repo_id"], arguments["token"],
"finetuned", "trl", file_location = None,
old_username = None, private = arguments["private"],
)
pass
try:
return self._original_push_to_hub(**arguments)
self.original_push_to_hub(**arguments)
except:
del arguments["tags"]
return self._original_push_to_hub(**arguments)
self.original_push_to_hub(**arguments)
pass
if hasattr(self, "config"):
print("Saved model to https://huggingface.co/" + arguments["repo_id"])
pass
'''
exec(push_to_hub_text, globals())
@ -1089,12 +1220,12 @@ def patch_saving_functions(model):
original_model = model
while True:
if not hasattr(original_model, "_original_push_to_hub"):
original_model._original_push_to_hub = original_model.push_to_hub
if original_model.push_to_hub.__name__ != "unsloth_push_to_hub":
original_model.original_push_to_hub = original_model.push_to_hub
original_model.push_to_hub = types.MethodType(unsloth_push_to_hub, original_model)
if hasattr(original_model, "add_model_tags"):
original_model.add_model_tags(["unsloth",])
pass
pass
if hasattr(original_model, "model"): original_model = original_model.model