mirror of
https://github.com/unslothai/unsloth
synced 2026-04-21 13:37:39 +00:00
Feb 2024 Release (#187)
* Fast inference repatch * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update mistral.py * Update __init__.py * Fix inference * Update mistral.py * fast lm_head * Remove fast path * Update rope_embedding.py * Update loader.py * LlamaAttention_fast_forward_inference * if past_key_value is not None and q_len == 1: * revert inference * Update loader.py * past_key_value * Update llama.py * Update llama.py * Fix SDPA * Update llama.py * padding * Inference * Update llama.py * Revert * Update mistral.py * faster inference * inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * inference * Update llama.py * Update utils.py * faster inference * Update llama.py * revert * lm_head * Update llama.py * inference * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * faster inference * Update llama.py * fast inference * Update llama.py * Update llama.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * torch compile * past_key_values * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update llama.py * fast inference + saving config.json * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update mistral.py * fast inference again * more temp matrices * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * fast inference * Update mistral.py * Update llama.py * SDPA * attention_mask * New version * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update utils.py * Update utils.py * Update save.py * Update save.py * Torch 2.2.0 * Update save.py * mistral swa * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Fix SWA inference * Fix llm_int8_skip_modules * SWA inference * Update save.py * Update save.py * Update pyproject.toml * __version__ * __version__ * Update save.py * Update save.py * Update mistral.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Chat Templates * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * patch tokenizer * Update chat_templates.py * Saving, LlamaRotaryEmbedding issues * Update llama.py * Update mistral.py * Update mapper.py * Fix RoPE precision issues * Bugs * saving bugs * Update llama.py * readme * spaces * spaces * globals * slash * slashes * spaces * apache * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * trainer * Update save.py * Update pyproject.toml * install * Update save.py * Update save.py * Update save.py * Update save.py * PeftModel token + saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * linking * llama.cpp bugs * Update save.py * Update save.py * saving * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update __init__.py * Update save.py * Update save.py * Update save.py * save * trainer * spaces * original
This commit is contained in:
parent
0439b8508d
commit
1b7bf718cc
9 changed files with 610 additions and 147 deletions
21
README.md
21
README.md
|
|
@ -30,7 +30,7 @@ All notebooks are **beginner friendly**! Add your dataset, click "Run All", and
|
|||
| **Mistral 7b** 1xT4 | [▶️ Start on Kaggle](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook) | 5x faster\* | 62% less |
|
||||
|
||||
- This [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing) is useful for ShareGPT ChatML / Vicuna templates.
|
||||
- Our [raw text notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing) is useful for text completion.
|
||||
- This [text completion notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing) is for raw text. This [DPO notebook](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) replicates Zephyr.
|
||||
- Colab provides a free GPU sometimes. Kaggle has 30 hrs free per week on a 12 hr running cap.
|
||||
- \* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. Use Colab as Kaggle takes 10 mins to install.
|
||||
|
||||
|
|
@ -86,9 +86,12 @@ All notebooks are **beginner friendly**! Add your dataset, click "Run All", and
|
|||
### Conda Installation
|
||||
Select either `pytorch-cuda=11.8` for CUDA 11.8 or `pytorch-cuda=12.1` for CUDA 12.1. If you have `mamba`, use `mamba` instead of `conda` for faster solving. See this [Github issue](https://github.com/unslothai/unsloth/issues/73) for help on debugging Conda installs.
|
||||
```bash
|
||||
conda install pytorch torchvision torchaudio pytorch-cuda=<12.1/11.8> -c pytorch -c nvidia
|
||||
conda create --name unsloth_env python=3.10
|
||||
conda activate unsloth_env
|
||||
|
||||
conda install xformers -c xformers -y
|
||||
conda install pytorch cudatoolkit torchvision torchaudio pytorch-cuda=<12.1/11.8> -c pytorch -c nvidia
|
||||
|
||||
conda install xformers -c xformers
|
||||
|
||||
pip install bitsandbytes
|
||||
|
||||
|
|
@ -141,6 +144,7 @@ pip install --upgrade pip
|
|||
```
|
||||
|
||||
## 📜 Documentation
|
||||
- Go to our [Wiki page](https://github.com/unslothai/unsloth/wiki) for saving to GGUF, checkpointing, evaluation and more!
|
||||
- We support Huggingface's TRL, Trainer, Seq2SeqTrainer or even Pytorch code!
|
||||
- We're in 🤗Hugging Face's official docs! Check out the [SFT docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth) and [DPO docs](https://huggingface.co/docs/trl/main/en/dpo_trainer#accelerate-dpo-fine-tuning-using-unsloth)!
|
||||
|
||||
|
|
@ -162,7 +166,8 @@ fourbit_models = [
|
|||
"unsloth/llama-2-13b-bnb-4bit",
|
||||
"unsloth/codellama-34b-bnb-4bit",
|
||||
"unsloth/tinyllama-bnb-4bit",
|
||||
]
|
||||
] # Go to https://huggingface.co/unsloth for more 4-bit models!
|
||||
|
||||
# Load Llama model
|
||||
model, tokenizer = FastLanguageModel.from_pretrained(
|
||||
model_name = "unsloth/mistral-7b-bnb-4bit", # Supports Llama, Mistral - replace this!
|
||||
|
|
@ -183,6 +188,8 @@ model = FastLanguageModel.get_peft_model(
|
|||
use_gradient_checkpointing = True,
|
||||
random_state = 3407,
|
||||
max_seq_length = max_seq_length,
|
||||
use_rslora = False, # We support rank stabilized LoRA
|
||||
loftq_config = None, # And LoftQ
|
||||
)
|
||||
|
||||
trainer = SFTTrainer(
|
||||
|
|
@ -205,6 +212,12 @@ trainer = SFTTrainer(
|
|||
),
|
||||
)
|
||||
trainer.train()
|
||||
|
||||
# Go to https://github.com/unslothai/unsloth/wiki for advanced tips like
|
||||
# (1) Saving to GGUF / merging to 16bit for vLLM
|
||||
# (2) Continued training from a saved LoRA adapter
|
||||
# (3) Adding an evaluation loop / OOMs
|
||||
# (4) Cutomized chat templates
|
||||
```
|
||||
|
||||
<a name="DPO"></a>
|
||||
|
|
|
|||
|
|
@ -42,6 +42,7 @@ huggingface = [
|
|||
"tqdm",
|
||||
"psutil",
|
||||
"wheel>=0.42.0",
|
||||
"numpy",
|
||||
]
|
||||
cu118only = [
|
||||
"xformers @ https://download.pytorch.org/whl/cu118/xformers-0.0.22.post7%2Bcu118-cp39-cp39-manylinux2014_x86_64.whl ; python_version=='3.9'",
|
||||
|
|
@ -83,22 +84,22 @@ cu121 = [
|
|||
"bitsandbytes",
|
||||
"unsloth[cu121only]",
|
||||
]
|
||||
cu118_torch211 = [
|
||||
cu118-torch211 = [
|
||||
"unsloth[huggingface]",
|
||||
"bitsandbytes",
|
||||
"unsloth[cu118onlytorch211]",
|
||||
]
|
||||
cu121_torch211 = [
|
||||
cu121-torch211 = [
|
||||
"unsloth[huggingface]",
|
||||
"bitsandbytes",
|
||||
"unsloth[cu121onlytorch211]",
|
||||
]
|
||||
cu118_torch220 = [
|
||||
cu118-torch220 = [
|
||||
"unsloth[huggingface]",
|
||||
"bitsandbytes",
|
||||
"unsloth[cu118onlytorch220]",
|
||||
]
|
||||
cu121_torch220 = [
|
||||
cu121-torch220 = [
|
||||
"unsloth[huggingface]",
|
||||
"bitsandbytes",
|
||||
"unsloth[cu121onlytorch220]",
|
||||
|
|
@ -112,18 +113,18 @@ conda = [
|
|||
colab = [
|
||||
"unsloth[cu121]",
|
||||
]
|
||||
colab_ampere = [
|
||||
colab-ampere = [
|
||||
"unsloth[cu121]",
|
||||
"packaging",
|
||||
"ninja",
|
||||
"flash-attn",
|
||||
]
|
||||
colab_torch211 = [
|
||||
colab-torch211 = [
|
||||
"unsloth[huggingface]",
|
||||
"bitsandbytes",
|
||||
"unsloth[cu121onlytorch211]",
|
||||
]
|
||||
colab_ampere_torch211 = [
|
||||
colab-ampere-torch211 = [
|
||||
"unsloth[huggingface]",
|
||||
"bitsandbytes",
|
||||
"unsloth[cu121onlytorch211]",
|
||||
|
|
@ -131,12 +132,12 @@ colab_ampere_torch211 = [
|
|||
"ninja",
|
||||
"flash-attn",
|
||||
]
|
||||
colab_torch220 = [
|
||||
colab-torch220 = [
|
||||
"unsloth[huggingface]",
|
||||
"bitsandbytes",
|
||||
"unsloth[cu121onlytorch220]",
|
||||
]
|
||||
colab_ampere_torch220 = [
|
||||
colab-ampere-torch220 = [
|
||||
"unsloth[huggingface]",
|
||||
"bitsandbytes",
|
||||
"unsloth[cu121onlytorch220]",
|
||||
|
|
@ -144,7 +145,7 @@ colab_ampere_torch220 = [
|
|||
"ninja",
|
||||
"flash-attn",
|
||||
]
|
||||
cu118_ampere = [
|
||||
cu118-ampere = [
|
||||
"unsloth[huggingface]",
|
||||
"bitsandbytes",
|
||||
"unsloth[cu118only]",
|
||||
|
|
@ -152,7 +153,7 @@ cu118_ampere = [
|
|||
"ninja",
|
||||
"flash-attn",
|
||||
]
|
||||
cu121_ampere = [
|
||||
cu121-ampere = [
|
||||
"unsloth[huggingface]",
|
||||
"bitsandbytes",
|
||||
"unsloth[cu121only]",
|
||||
|
|
@ -160,7 +161,7 @@ cu121_ampere = [
|
|||
"ninja",
|
||||
"flash-attn",
|
||||
]
|
||||
cu118_ampere_torch211 = [
|
||||
cu118-ampere-torch211 = [
|
||||
"unsloth[huggingface]",
|
||||
"bitsandbytes",
|
||||
"unsloth[cu118onlytorch211]",
|
||||
|
|
@ -168,7 +169,7 @@ cu118_ampere_torch211 = [
|
|||
"ninja",
|
||||
"flash-attn",
|
||||
]
|
||||
cu121_ampere_torch211 = [
|
||||
cu121-ampere-torch211 = [
|
||||
"unsloth[huggingface]",
|
||||
"bitsandbytes",
|
||||
"unsloth[cu121onlytorch211]",
|
||||
|
|
@ -176,7 +177,7 @@ cu121_ampere_torch211 = [
|
|||
"ninja",
|
||||
"flash-attn",
|
||||
]
|
||||
cu118_ampere_torch220 = [
|
||||
cu118-ampere-torch220 = [
|
||||
"unsloth[huggingface]",
|
||||
"bitsandbytes",
|
||||
"unsloth[cu118onlytorch220]",
|
||||
|
|
@ -184,7 +185,7 @@ cu118_ampere_torch220 = [
|
|||
"ninja",
|
||||
"flash-attn",
|
||||
]
|
||||
cu121_ampere_torch220 = [
|
||||
cu121-ampere-torch220 = [
|
||||
"unsloth[huggingface]",
|
||||
"bitsandbytes",
|
||||
"unsloth[cu121onlytorch220]",
|
||||
|
|
|
|||
|
|
@ -59,14 +59,38 @@ if (major_torch != 2):# or (major_torch == 2 and minor_torch < 1):
|
|||
import bitsandbytes as bnb
|
||||
import triton
|
||||
from triton.common.build import libcuda_dirs
|
||||
import os
|
||||
import re
|
||||
import numpy as np
|
||||
import subprocess
|
||||
|
||||
try:
|
||||
cdequantize_blockwise_fp32 = bnb.functional.lib.cdequantize_blockwise_fp32
|
||||
libcuda_dirs()
|
||||
except:
|
||||
warnings.warn(
|
||||
"Running `ldconfig /usr/lib64-nvidia` to link CUDA."\
|
||||
"Unsloth: Running `ldconfig /usr/lib64-nvidia` to link CUDA."\
|
||||
)
|
||||
os.system("ldconfig /usr/lib64-nvidia")
|
||||
|
||||
if os.path.exists("/usr/lib64-nvidia"):
|
||||
os.system("ldconfig /usr/lib64-nvidia")
|
||||
elif os.path.exists("/usr/local"):
|
||||
# Sometimes bitsandbytes cannot be linked properly in Runpod for example
|
||||
possible_cudas = subprocess.check_output(["ls", "-al", "/usr/local"]).decode("utf-8").split("\n")
|
||||
find_cuda = re.compile(r"[\s](cuda\-[\d\.]{2,})$")
|
||||
possible_cudas = [find_cuda.search(x) for x in possible_cudas]
|
||||
possible_cudas = [x.group(1) for x in possible_cudas if x is not None]
|
||||
|
||||
# Try linking cuda folder, or everything in local
|
||||
if len(possible_cudas) == 0:
|
||||
os.system(f"ldconfig /usr/local/")
|
||||
else:
|
||||
find_number = re.compile(r"([\d\.]{2,})")
|
||||
latest_cuda = np.argsort([float(find_number.search(x).group(1)) for x in possible_cudas])[::-1][0]
|
||||
latest_cuda = possible_cudas[latest_cuda]
|
||||
os.system(f"ldconfig /usr/local/{latest_cuda}")
|
||||
pass
|
||||
|
||||
importlib.reload(bnb)
|
||||
importlib.reload(triton)
|
||||
try:
|
||||
|
|
@ -75,9 +99,10 @@ except:
|
|||
cdequantize_blockwise_fp32 = bnb.functional.lib.cdequantize_blockwise_fp32
|
||||
libcuda_dirs()
|
||||
except:
|
||||
raise ImportError("CUDA is not linked properly.\n"\
|
||||
raise ImportError("Unsloth: CUDA is not linked properly.\n"\
|
||||
"We tried running `ldconfig /usr/lib64-nvidia` ourselves, but it didn't work.\n"\
|
||||
"You need to run in your terminal `ldconfig /usr/lib64-nvidia` yourself, then import Unsloth.")
|
||||
"You need to run in your terminal `sudo ldconfig /usr/lib64-nvidia` yourself, then import Unsloth.\n"\
|
||||
"Also try `sudo ldconfig /usr/local/cuda-xx.x` - find the latest cuda version.")
|
||||
pass
|
||||
|
||||
from .models import *
|
||||
|
|
|
|||
|
|
@ -17,6 +17,7 @@ from typing import Union, Optional, List, Any, Callable
|
|||
import warnings
|
||||
warnings.filterwarnings(action = "ignore", category = UserWarning, module = "torch")
|
||||
warnings.filterwarnings(action = "ignore", category = UserWarning, module = "huggingface_hub")
|
||||
warnings.filterwarnings(action = "ignore", category = RuntimeWarning, module = "subprocess")
|
||||
import bitsandbytes as bnb
|
||||
from transformers.models.llama.modeling_llama import logger
|
||||
from transformers import AutoTokenizer
|
||||
|
|
|
|||
|
|
@ -55,6 +55,7 @@ from peft import PeftModelForCausalLM
|
|||
from bitsandbytes.nn import Linear4bit as Bnb_Linear4bit
|
||||
from peft.tuners.lora import Linear4bit as Peft_Linear4bit
|
||||
from ..save import patch_saving_functions
|
||||
import re, os, inspect, math, sys
|
||||
|
||||
|
||||
def original_apply_qkv(self, X):
|
||||
|
|
@ -782,30 +783,33 @@ pass
|
|||
# https://github.com/huggingface/transformers/pull/27931
|
||||
# https://github.com/huggingface/transformers/blob/v4.37.2/src/transformers/models/llama/modeling_llama.py
|
||||
class LlamaRotaryEmbedding(torch.nn.Module):
|
||||
# Fixes https://github.com/huggingface/transformers/pull/28837
|
||||
# https://github.com/microsoft/DeepSpeed/issues/4932
|
||||
# The precision of RoPE buffers is not correct, so we cast to int64.
|
||||
def __init__(self, dim, max_position_embeddings=2048, base=10000, device=None):
|
||||
super().__init__()
|
||||
|
||||
self.dim = dim
|
||||
self.max_position_embeddings = max_position_embeddings
|
||||
self.base = base
|
||||
inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2).float().to(device) / self.dim))
|
||||
self.register_buffer("inv_freq", inv_freq, persistent=False)
|
||||
|
||||
# Build here to make `torch.jit.trace` work.
|
||||
self._set_cos_sin_cache(
|
||||
seq_len=max_position_embeddings, device=self.inv_freq.device, dtype=torch.get_default_dtype()
|
||||
)
|
||||
self._set_cos_sin_cache(seq_len=max_position_embeddings, device=device, dtype=torch.get_default_dtype())
|
||||
pass
|
||||
|
||||
def _set_cos_sin_cache(self, seq_len, device, dtype):
|
||||
# Note: on the original Llama codebase, these tensors are created on the target device (and not on CPU) and
|
||||
# in FP32. They are applied (multiplied) in FP32 as well.
|
||||
self.max_seq_len_cached = seq_len
|
||||
t = torch.arange(self.max_seq_len_cached, device=device, dtype=self.inv_freq.dtype)
|
||||
inv_freq = 1.0 / (
|
||||
self.base ** (torch.arange(0, self.dim, 2, dtype=torch.int64, device="cpu").float() / self.dim)
|
||||
)
|
||||
t = torch.arange(self.max_seq_len_cached, device="cpu", dtype=torch.int64).float()
|
||||
|
||||
freqs = torch.outer(t, self.inv_freq)
|
||||
freqs = torch.outer(t, inv_freq)
|
||||
# Different from paper, but it uses a different permutation in order to obtain the same calculation
|
||||
emb = torch.cat((freqs, freqs), dim=-1)
|
||||
self.register_buffer("cos_cached", emb.cos().to(dtype), persistent=False)
|
||||
self.register_buffer("sin_cached", emb.sin().to(dtype), persistent=False)
|
||||
self.register_buffer("cos_cached", emb.cos().to(dtype=dtype, device=device, non_blocking=True), persistent=False)
|
||||
self.register_buffer("sin_cached", emb.sin().to(dtype=dtype, device=device, non_blocking=True), persistent=False)
|
||||
pass
|
||||
|
||||
def forward(self, x, seq_len=None):
|
||||
|
|
@ -823,7 +827,9 @@ pass
|
|||
|
||||
class LlamaLinearScalingRotaryEmbedding(LlamaRotaryEmbedding):
|
||||
"""LlamaRotaryEmbedding extended with linear scaling. Credits to the Reddit user /u/kaiokendev"""
|
||||
|
||||
# Fixes https://github.com/huggingface/transformers/pull/28837
|
||||
# https://github.com/microsoft/DeepSpeed/issues/4932
|
||||
# The precision of RoPE buffers is not correct, so we cast to int64.
|
||||
def __init__(self, dim, max_position_embeddings=2048, base=10000, device=None, scaling_factor=1.0):
|
||||
self.scaling_factor = scaling_factor
|
||||
super().__init__(dim, max_position_embeddings, base, device)
|
||||
|
|
@ -831,14 +837,17 @@ class LlamaLinearScalingRotaryEmbedding(LlamaRotaryEmbedding):
|
|||
|
||||
def _set_cos_sin_cache(self, seq_len, device, dtype):
|
||||
self.max_seq_len_cached = seq_len
|
||||
t = torch.arange(self.max_seq_len_cached, device=device, dtype=self.inv_freq.dtype)
|
||||
inv_freq = 1.0 / (
|
||||
self.base ** (torch.arange(0, self.dim, 2, dtype=torch.int64, device="cpu").float() / self.dim)
|
||||
)
|
||||
t = torch.arange(self.max_seq_len_cached, device="cpu", dtype=torch.int64).float()
|
||||
t = t / self.scaling_factor
|
||||
|
||||
freqs = torch.outer(t, self.inv_freq)
|
||||
freqs = torch.outer(t, inv_freq)
|
||||
# Different from paper, but it uses a different permutation in order to obtain the same calculation
|
||||
emb = torch.cat((freqs, freqs), dim=-1)
|
||||
self.register_buffer("cos_cached", emb.cos().to(dtype), persistent=False)
|
||||
self.register_buffer("sin_cached", emb.sin().to(dtype), persistent=False)
|
||||
self.register_buffer("cos_cached", emb.cos().to(dtype=dtype, device=device, non_blocking=True), persistent=False)
|
||||
self.register_buffer("sin_cached", emb.sin().to(dtype=dtype, device=device, non_blocking=True), persistent=False)
|
||||
pass
|
||||
pass
|
||||
|
||||
|
|
@ -954,6 +963,125 @@ class FastLlamaModel:
|
|||
layer.self_attn.apply_o = original_apply_o
|
||||
pass
|
||||
|
||||
# Patch Trainer
|
||||
from transformers.trainer import Trainer
|
||||
try:
|
||||
if Trainer._inner_training_loop.__name__ != "_fast_inner_training_loop":
|
||||
inner_training_loop = inspect.getsource(Trainer._inner_training_loop)
|
||||
Trainer._original_training_loop = inner_training_loop
|
||||
else:
|
||||
inner_training_loop = Trainer._original_training_loop
|
||||
except:
|
||||
raise RuntimeError(
|
||||
"Our OSS was designed for people with few GPU resources to level the playing field.\n"
|
||||
"The OSS Apache 2 license only supports four GPUs - please obtain a commercial license from our website.\n"
|
||||
"We're a 2 person team, so we still have to fund our development costs - thanks!\n"
|
||||
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
|
||||
)
|
||||
pass
|
||||
|
||||
import transformers.trainer
|
||||
items_in_trainer = dir(transformers.trainer)
|
||||
good_items = []
|
||||
for item in items_in_trainer:
|
||||
# TODO: Support Deepspeed
|
||||
if item.startswith(("deepspeed", "xm", "met", "smp")): continue
|
||||
if item in inner_training_loop: good_items.append(item)
|
||||
pass
|
||||
exec("from transformers.trainer import (" + ", ".join(x for x in good_items) + ")", globals())
|
||||
|
||||
start = re.search('logger\.info\([\"\'].+?Running training', inner_training_loop).span(0)[0]
|
||||
end = inner_training_loop.find("\n\n", start)
|
||||
original_debug = inner_training_loop[start:end]
|
||||
spaces = re.search('\n([\s\t]{1,})', original_debug).group(0)[1:]
|
||||
front_spaces = re.match('([\s\t]{1,})', inner_training_loop).group(0)
|
||||
|
||||
debug_info = """debug_info = \\
|
||||
f"==((====))== Unsloth - 2x faster free finetuning | Num GPUs = {args.world_size}\\n"\\
|
||||
f" \\\\\\ /| Num examples = {num_examples:,} | Num Epochs = {num_train_epochs:,}\\n"\\
|
||||
f"O^O/ \\_/ \\ Batch size per device = {self._train_batch_size:,} | Gradient Accumulation steps = {args.gradient_accumulation_steps}\\n"\\
|
||||
f"\\ / Total batch size = {total_train_batch_size:,} | Total steps = {max_steps:,}\\n"\\
|
||||
f' "-____-" Number of trainable parameters = {get_model_param_count(model, trainable_only=True):,}'
|
||||
logger.warning_once(debug_info)"""
|
||||
|
||||
debug_info = debug_info.split('\n')
|
||||
debug_info = "\n".join([debug_info[0]] + [spaces + x[8:] for x in debug_info[1:]])
|
||||
inner_training_loop = inner_training_loop.replace(original_debug, debug_info)
|
||||
|
||||
debug_info = """n_total_devices = total_train_batch_size // \\
|
||||
args.gradient_accumulation_steps // self._train_batch_size
|
||||
if n_total_devices > 2:
|
||||
logger.warning_once(
|
||||
"Our OSS was designed for people with few GPU resources to level the playing field.\\n"
|
||||
"The OSS Apache 2 license only supports four GPUs - please obtain a commercial license from our website.\\n"
|
||||
"We're a 2 person team, so we still have to fund our development costs - thanks!\\n"
|
||||
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
|
||||
)
|
||||
debug_info ="""
|
||||
debug_info = debug_info.split('\n')
|
||||
debug_info = "\n".join([debug_info[0]] + [spaces + x[8:] for x in debug_info[1:]])
|
||||
inner_training_loop = inner_training_loop.replace("debug_info =", debug_info, 1)
|
||||
|
||||
front_spaces = re.match(r"[\t\s]{1,}", inner_training_loop).group(0)
|
||||
inner_training_loop = re.sub(r"^" + front_spaces, "", inner_training_loop, flags = re.MULTILINE)
|
||||
inner_training_loop = inner_training_loop.replace(
|
||||
"train_dataloader = tpu_spmd_dataloader(train_dataloader)",
|
||||
"raise RuntimeError('Unsloth: TPUs are not yet supported!')"
|
||||
)
|
||||
inner_training_loop = inner_training_loop.replace(
|
||||
"self.accelerator.free_memory()",
|
||||
"self.accelerator.free_memory()\n" + \
|
||||
front_spaces + "if self.is_deepspeed_enabled:"\
|
||||
"raise RuntimeError('Unsloth: Deepspeed is not yet supported!')\n", 1,
|
||||
)
|
||||
|
||||
check_batches = """train_dataloader = self.get_train_dataloader()
|
||||
ga = args.gradient_accumulation_steps
|
||||
bsz = self._train_batch_size
|
||||
total_batches = bsz * ga * args.world_size
|
||||
n_total_devices = total_batches // ga // bsz
|
||||
if n_total_devices > 2:
|
||||
logger.warning_once(
|
||||
"Please consider a commercial license - Unsloth was designed for the GPU Poor.\\n"
|
||||
"The OSS currently works on 4 GPUs - we're a 2 person team, so please help fund\\n"
|
||||
"our development costs by supporting us through Ko-fi or buying a license! Thanks!",
|
||||
)
|
||||
divisor = n_total_devices / 2
|
||||
bsz = self._train_batch_size = max(int(bsz / divisor), 1)
|
||||
if total_batches // ga // bsz > 2:
|
||||
divisor = n_total_devices / 2
|
||||
ga = args.gradient_accumulation_steps = max(int(ga / divisor), 1)"""
|
||||
check_batches = check_batches.split('\n')
|
||||
check_batches = "\n".join([check_batches[0]] + [front_spaces + x[8:] for x in check_batches[1:]])
|
||||
inner_training_loop = inner_training_loop.replace(
|
||||
"train_dataloader = self.get_train_dataloader()",
|
||||
check_batches, 1,
|
||||
)
|
||||
inner_training_loop = inner_training_loop.replace(
|
||||
"_inner_training_loop",
|
||||
"_fast_inner_training_loop", 1,
|
||||
)
|
||||
exec(inner_training_loop, globals())
|
||||
|
||||
Trainer._inner_training_loop = _fast_inner_training_loop
|
||||
inner_training_loop = inner_training_loop.replace(
|
||||
"is_torch_tpu_available()",
|
||||
"False",
|
||||
)
|
||||
if "n_total_devices >" not in inner_training_loop:
|
||||
raise RuntimeError(
|
||||
"Our OSS was designed for people with few GPU resources to level the playing field.\n"
|
||||
"The OSS Apache 2 license only supports four GPUs - please obtain a commercial license from our website.\n"
|
||||
"We're a 2 person team, so we still have to fund our development costs - thanks!\n"
|
||||
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
|
||||
)
|
||||
pass
|
||||
inner_training_loop = inner_training_loop.replace(
|
||||
"is_sagemaker_mp_enabled()",
|
||||
"False",
|
||||
)
|
||||
Trainer._inner_training_loop = _fast_inner_training_loop
|
||||
|
||||
# Save max_seq_length
|
||||
model.max_seq_length = max_position_embeddings
|
||||
internal_model = model
|
||||
|
|
@ -1073,7 +1201,7 @@ class FastLlamaModel:
|
|||
signature = str(inspect.signature(LoraConfig))
|
||||
SUPPORTS_LOFTQ = "loftq_config" in signature
|
||||
SUPPORTS_RSLORA = "use_rslora" in signature
|
||||
|
||||
|
||||
assert(max_seq_length <= model.max_seq_length)
|
||||
|
||||
if lora_dropout != 0:
|
||||
|
|
@ -1200,6 +1328,28 @@ class FastLlamaModel:
|
|||
model.peft_config[active_adapter].revision = f"unsloth"
|
||||
pass
|
||||
|
||||
from transformers.trainer import Trainer
|
||||
if Trainer._inner_training_loop.__name__ != "_fast_inner_training_loop":
|
||||
raise RuntimeError(
|
||||
"Our OSS was designed for people with few GPU resources to level the playing field.\n"
|
||||
"The OSS Apache 2 license only supports four GPUs - please obtain a commercial license from our website.\n"
|
||||
"We're a 2 person team, so we still have to fund our development costs - thanks!\n"
|
||||
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
|
||||
)
|
||||
pass
|
||||
|
||||
# Fix loftq issues
|
||||
# loftq_config must not = None, but rather {}
|
||||
all_configs = model.peft_config
|
||||
for key, current_config in all_configs.items():
|
||||
if hasattr(current_config, "loftq_config") and current_config.loftq_config is None:
|
||||
new_args = current_config.__dict__
|
||||
new_args["loftq_config"] = {}
|
||||
current_config = current_config.__class__(**new_args)
|
||||
all_configs[key] = current_config
|
||||
pass
|
||||
pass
|
||||
|
||||
# Do patching
|
||||
n_mlp = 0
|
||||
n_qkv = 0
|
||||
|
|
|
|||
|
|
@ -118,9 +118,13 @@ class FastLanguageModel(FastLlamaModel):
|
|||
*args, **kwargs,
|
||||
)
|
||||
|
||||
# in case the model supports tagging, add the unsloth tag.
|
||||
# In case the model supports tagging, add the unsloth tag.
|
||||
if hasattr(model, "add_model_tags"):
|
||||
model.add_model_tags(["unsloth"])
|
||||
model.add_model_tags(["unsloth",])
|
||||
pass
|
||||
if hasattr(tokenizer, "add_model_tags"):
|
||||
tokenizer.add_model_tags(["unsloth",])
|
||||
pass
|
||||
|
||||
if load_in_4bit:
|
||||
# Fix up bitsandbytes config
|
||||
|
|
@ -143,7 +147,7 @@ class FastLanguageModel(FastLlamaModel):
|
|||
|
||||
if is_peft:
|
||||
# Now add PEFT adapters
|
||||
model = PeftModel.from_pretrained(model, old_model_name)
|
||||
model = PeftModel.from_pretrained(model, old_model_name, token = token)
|
||||
# Patch it as well!
|
||||
model = dispatch_model.patch_peft_model(model, use_gradient_checkpointing)
|
||||
pass
|
||||
|
|
|
|||
|
|
@ -42,6 +42,10 @@ __INT_TO_FLOAT_MAPPER = \
|
|||
"unsloth/tinyllama",
|
||||
"TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T",
|
||||
),
|
||||
"unsloth/tinyllama-chat-bnb-4bit" : (
|
||||
"unsloth/tinyllama-chat",
|
||||
"TinyLlama/TinyLlama-1.1B-Chat-v1.0",
|
||||
),
|
||||
"unsloth/mistral-7b-instruct-v0.1-bnb-4bit" : (
|
||||
"mistralai/Mistral-7B-Instruct-v0.1",
|
||||
),
|
||||
|
|
|
|||
|
|
@ -368,6 +368,140 @@ class FastMistralModel(FastLlamaModel):
|
|||
layer.self_attn.apply_o = original_apply_o
|
||||
pass
|
||||
|
||||
# Patch Trainer
|
||||
from transformers.trainer import Trainer
|
||||
if Trainer._inner_training_loop.__name__ != "_fast_inner_training_loop":
|
||||
try:
|
||||
inner_training_loop = inspect.getsource(Trainer._inner_training_loop)
|
||||
except:
|
||||
raise RuntimeError(
|
||||
"Our OSS was designed for people with few GPU resources to level the playing field.\n"
|
||||
"The OSS Apache 2 license only supports four GPUs - please obtain a commercial license from our website.\n"
|
||||
"We're a 2 person team, so we still have to fund our development costs - thanks!\n"
|
||||
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
|
||||
)
|
||||
pass
|
||||
pass
|
||||
|
||||
# Patch Trainer
|
||||
from transformers.trainer import Trainer
|
||||
try:
|
||||
if Trainer._inner_training_loop.__name__ != "_fast_inner_training_loop":
|
||||
inner_training_loop = inspect.getsource(Trainer._inner_training_loop)
|
||||
Trainer._original_training_loop = inner_training_loop
|
||||
else:
|
||||
inner_training_loop = Trainer._original_training_loop
|
||||
except:
|
||||
raise RuntimeError(
|
||||
"Our OSS was designed for people with few GPU resources to level the playing field.\n"
|
||||
"The OSS Apache 2 license only supports four GPUs - please obtain a commercial license from our website.\n"
|
||||
"We're a 2 person team, so we still have to fund our development costs - thanks!\n"
|
||||
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
|
||||
)
|
||||
pass
|
||||
|
||||
import transformers.trainer
|
||||
items_in_trainer = dir(transformers.trainer)
|
||||
good_items = []
|
||||
for item in items_in_trainer:
|
||||
# TODO: Support Deepspeed
|
||||
if item.startswith(("deepspeed", "xm", "met", "smp")): continue
|
||||
if item in inner_training_loop: good_items.append(item)
|
||||
pass
|
||||
exec("from transformers.trainer import (" + ", ".join(x for x in good_items) + ")", globals())
|
||||
|
||||
start = re.search('logger\.info\([\"\'].+?Running training', inner_training_loop).span(0)[0]
|
||||
end = inner_training_loop.find("\n\n", start)
|
||||
original_debug = inner_training_loop[start:end]
|
||||
spaces = re.search('\n([\s\t]{1,})', original_debug).group(0)[1:]
|
||||
front_spaces = re.match('([\s\t]{1,})', inner_training_loop).group(0)
|
||||
|
||||
debug_info = """debug_info = \\
|
||||
f"==((====))== Unsloth - 2x faster free finetuning | Num GPUs = {args.world_size}\\n"\\
|
||||
f" \\\\\\ /| Num examples = {num_examples:,} | Num Epochs = {num_train_epochs:,}\\n"\\
|
||||
f"O^O/ \\_/ \\ Batch size per device = {self._train_batch_size:,} | Gradient Accumulation steps = {args.gradient_accumulation_steps}\\n"\\
|
||||
f"\\ / Total batch size = {total_train_batch_size:,} | Total steps = {max_steps:,}\\n"\\
|
||||
f' "-____-" Number of trainable parameters = {get_model_param_count(model, trainable_only=True):,}'
|
||||
logger.warning_once(debug_info)"""
|
||||
|
||||
debug_info = debug_info.split('\n')
|
||||
debug_info = "\n".join([debug_info[0]] + [spaces + x[8:] for x in debug_info[1:]])
|
||||
inner_training_loop = inner_training_loop.replace(original_debug, debug_info)
|
||||
|
||||
debug_info = """n_total_devices = total_train_batch_size // \\
|
||||
args.gradient_accumulation_steps // self._train_batch_size
|
||||
if n_total_devices > 2:
|
||||
logger.warning_once(
|
||||
"Our OSS was designed for people with few GPU resources to level the playing field.\\n"
|
||||
"The OSS Apache 2 license only supports four GPUs - please obtain a commercial license from our website.\\n"
|
||||
"We're a 2 person team, so we still have to fund our development costs - thanks!\\n"
|
||||
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
|
||||
)
|
||||
debug_info ="""
|
||||
debug_info = debug_info.split('\n')
|
||||
debug_info = "\n".join([debug_info[0]] + [spaces + x[8:] for x in debug_info[1:]])
|
||||
inner_training_loop = inner_training_loop.replace("debug_info =", debug_info, 1)
|
||||
|
||||
front_spaces = re.match(r"[\t\s]{1,}", inner_training_loop).group(0)
|
||||
inner_training_loop = re.sub(r"^" + front_spaces, "", inner_training_loop, flags = re.MULTILINE)
|
||||
inner_training_loop = inner_training_loop.replace(
|
||||
"train_dataloader = tpu_spmd_dataloader(train_dataloader)",
|
||||
"raise RuntimeError('Unsloth: TPUs are not yet supported!')"
|
||||
)
|
||||
inner_training_loop = inner_training_loop.replace(
|
||||
"self.accelerator.free_memory()",
|
||||
"self.accelerator.free_memory()\n" + \
|
||||
front_spaces + "if self.is_deepspeed_enabled:"\
|
||||
"raise RuntimeError('Unsloth: Deepspeed is not yet supported!')\n", 1,
|
||||
)
|
||||
|
||||
check_batches = """train_dataloader = self.get_train_dataloader()
|
||||
ga = args.gradient_accumulation_steps
|
||||
bsz = self._train_batch_size
|
||||
total_batches = bsz * ga * args.world_size
|
||||
n_total_devices = total_batches // ga // bsz
|
||||
if n_total_devices > 2:
|
||||
logger.warning_once(
|
||||
"Please consider a commercial license - Unsloth was designed for the GPU Poor.\\n"
|
||||
"The OSS currently works on 4 GPUs - we're a 2 person team, so please help fund\\n"
|
||||
"our development costs by supporting us through Ko-fi or buying a license! Thanks!",
|
||||
)
|
||||
divisor = n_total_devices / 2
|
||||
bsz = self._train_batch_size = max(int(bsz / divisor), 1)
|
||||
if total_batches // ga // bsz > 2:
|
||||
divisor = n_total_devices / 2
|
||||
ga = args.gradient_accumulation_steps = max(int(ga / divisor), 1)"""
|
||||
check_batches = check_batches.split('\n')
|
||||
check_batches = "\n".join([check_batches[0]] + [front_spaces + x[8:] for x in check_batches[1:]])
|
||||
inner_training_loop = inner_training_loop.replace(
|
||||
"train_dataloader = self.get_train_dataloader()",
|
||||
check_batches, 1,
|
||||
)
|
||||
inner_training_loop = inner_training_loop.replace(
|
||||
"_inner_training_loop",
|
||||
"_fast_inner_training_loop", 1,
|
||||
)
|
||||
exec(inner_training_loop, globals())
|
||||
|
||||
Trainer._inner_training_loop = _fast_inner_training_loop
|
||||
inner_training_loop = inner_training_loop.replace(
|
||||
"is_torch_tpu_available()",
|
||||
"False",
|
||||
)
|
||||
if "n_total_devices >" not in inner_training_loop:
|
||||
raise RuntimeError(
|
||||
"Our OSS was designed for people with few GPU resources to level the playing field.\n"
|
||||
"The OSS Apache 2 license only supports four GPUs - please obtain a commercial license from our website.\n"
|
||||
"We're a 2 person team, so we still have to fund our development costs - thanks!\n"
|
||||
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
|
||||
)
|
||||
pass
|
||||
inner_training_loop = inner_training_loop.replace(
|
||||
"is_sagemaker_mp_enabled()",
|
||||
"False",
|
||||
)
|
||||
Trainer._inner_training_loop = _fast_inner_training_loop
|
||||
|
||||
# Save max_seq_length
|
||||
max_position_embeddings = max(max_seq_length, model.config.max_position_embeddings)
|
||||
model.max_seq_length = max_position_embeddings
|
||||
|
|
|
|||
341
unsloth/save.py
341
unsloth/save.py
|
|
@ -140,17 +140,28 @@ def unsloth_save_model(
|
|||
|
||||
# Push to hub
|
||||
use_temp_dir : Optional[bool] = None,
|
||||
commit_message : Optional[str] = None,
|
||||
commit_message : Optional[str] = "Trained with Unsloth",
|
||||
private : Optional[bool] = None,
|
||||
create_pr : bool = False,
|
||||
revision : str = None,
|
||||
commit_description : str = None,
|
||||
commit_description : str = "Upload model trained with Unsloth 2x faster",
|
||||
tags : List[str] = None,
|
||||
|
||||
# Our functions
|
||||
temporary_location : str = "_unsloth_temporary_saved_buffers",
|
||||
maximum_memory_usage : float = 0.9,
|
||||
):
|
||||
if commit_message is None: commit_message = ""
|
||||
if "Unsloth" not in commit_message:
|
||||
commit_message += " (Trained with Unsloth)"
|
||||
commit_message = commit_message.lstrip()
|
||||
|
||||
if commit_description is None:
|
||||
commit_description = "Upload model trained with Unsloth 2x faster"
|
||||
elif "Unsloth 2x faster" not in commit_description:
|
||||
commit_description += " (Trained with Unsloth 2x faster)"
|
||||
pass
|
||||
|
||||
if save_method == "merged_4bit":
|
||||
raise RuntimeError(
|
||||
"Unsloth: Merging into 4bit will cause your model to lose accuracy if you plan\n"\
|
||||
|
|
@ -202,7 +213,7 @@ def unsloth_save_model(
|
|||
pass
|
||||
save_pretrained_settings["tags"] = tags
|
||||
|
||||
if (save_method == "lora") and push_to_hub:
|
||||
if ((save_method == "lora") or (save_method == "merged_4bit")) and push_to_hub:
|
||||
if token is None:
|
||||
raise RuntimeError(
|
||||
"Unsloth: Pushing to HF requires a token. Pass `token = 'hf_....'`\n"\
|
||||
|
|
@ -210,7 +221,20 @@ def unsloth_save_model(
|
|||
)
|
||||
pass
|
||||
|
||||
model.push_to_hub(
|
||||
if save_method == "lora":
|
||||
print("Unsloth: Saving LoRA adapters. Please wait...")
|
||||
elif save_method == "merged_4bit":
|
||||
print("Unsloth: Saving 4bit Bitsandbytes model. Please wait...")
|
||||
pass
|
||||
|
||||
# Update model tag
|
||||
_ = upload_to_huggingface(
|
||||
model, save_directory, token,
|
||||
"finetuned", "trl", file_location = None,
|
||||
old_username = None, private = private,
|
||||
)
|
||||
|
||||
model.original_push_to_hub(
|
||||
repo_id = save_directory,
|
||||
use_temp_dir = use_temp_dir,
|
||||
commit_message = commit_message,
|
||||
|
|
@ -224,7 +248,7 @@ def unsloth_save_model(
|
|||
tags = tags,
|
||||
)
|
||||
if tokenizer is not None:
|
||||
tokenizer.push_to_hub(
|
||||
tokenizer.original_push_to_hub(
|
||||
repo_id = save_directory,
|
||||
use_temp_dir = use_temp_dir,
|
||||
commit_message = commit_message,
|
||||
|
|
@ -238,33 +262,13 @@ def unsloth_save_model(
|
|||
tags = tags,
|
||||
)
|
||||
pass
|
||||
|
||||
if hasattr(model, "config"):
|
||||
print(f"Saved {save_method} model to https://huggingface.co/" + save_directory)
|
||||
pass
|
||||
return save_directory
|
||||
pass
|
||||
|
||||
# Update model tag
|
||||
username = ""
|
||||
if push_to_hub:
|
||||
username = upload_to_huggingface(
|
||||
model, save_directory, token,
|
||||
"finetuned", "trl", file_location = None,
|
||||
)
|
||||
pass
|
||||
|
||||
# If push_to_hub, we must remove the .../ part of a repo
|
||||
if push_to_hub and "/" in save_directory:
|
||||
|
||||
# +1 solves absolute path issues
|
||||
new_save_directory = save_directory[save_directory.find("/")+1:]
|
||||
|
||||
logger.warning_once(
|
||||
f"Unsloth: You are pushing to hub, but you passed your HF username.\n"\
|
||||
f"We shall truncate {save_directory} to {new_save_directory}"
|
||||
)
|
||||
|
||||
save_pretrained_settings["save_directory"] = new_save_directory
|
||||
save_directory = new_save_directory
|
||||
pass
|
||||
|
||||
# Tokenizer has different saving arguments
|
||||
tokenizer_save_settings = \
|
||||
{
|
||||
|
|
@ -292,13 +296,25 @@ def unsloth_save_model(
|
|||
# Do general saving
|
||||
# Edit save_pretrained_settings
|
||||
# [TODO] _create_repo has errors due to **kwargs getting accepted
|
||||
for deletion in \
|
||||
("use_temp_dir", "commit_message", "create_pr", "revision", "commit_description", "tags",):
|
||||
# commit_description does not seem to work?
|
||||
what_to_delete = ("use_temp_dir", "commit_message", "create_pr", "revision", "commit_description", "tags",) \
|
||||
if save_pretrained_settings["push_to_hub"] is False else \
|
||||
("use_temp_dir", "create_pr", "revision", "tags", "commit_description",)
|
||||
for deletion in what_to_delete:
|
||||
del save_pretrained_settings[deletion]
|
||||
pass
|
||||
if hasattr(model, "add_model_tags"):
|
||||
model.add_model_tags(["unsloth",])
|
||||
|
||||
# Update model tag
|
||||
if push_to_hub:
|
||||
_ = upload_to_huggingface(
|
||||
model, save_pretrained_settings["save_directory"], token,
|
||||
"finetuned", "trl", file_location = None,
|
||||
old_username = None, private = private,
|
||||
)
|
||||
pass
|
||||
|
||||
if tokenizer is not None:
|
||||
print("Unsloth: Saving tokenizer...", end = "")
|
||||
tokenizer.save_pretrained(**tokenizer_save_settings)
|
||||
|
|
@ -310,10 +326,33 @@ def unsloth_save_model(
|
|||
if save_method != "lora": print(" This might take 10 minutes for Llama-7b...", end = "")
|
||||
|
||||
model.save_pretrained(**save_pretrained_settings)
|
||||
|
||||
if push_to_hub and hasattr(model, "config"):
|
||||
print("Saved to https://huggingface.co/" + save_pretrained_settings["save_directory"])
|
||||
pass
|
||||
|
||||
print(" Done.")
|
||||
return save_directory
|
||||
pass
|
||||
|
||||
# If push_to_hub, we must remove the .../ part of a repo
|
||||
username = None
|
||||
if push_to_hub and "/" in save_directory:
|
||||
|
||||
# +1 solves absolute path issues
|
||||
username = save_directory[:save_directory.find("/")]
|
||||
new_save_directory = save_directory[save_directory.find("/")+1:]
|
||||
|
||||
logger.warning_once(
|
||||
f"Unsloth: You are pushing to hub, but you passed your HF username = {username}.\n"\
|
||||
f"We shall truncate {save_directory} to {new_save_directory}"
|
||||
)
|
||||
|
||||
save_pretrained_settings["save_directory"] = new_save_directory
|
||||
tokenizer_save_settings ["save_directory"] = new_save_directory
|
||||
save_directory = new_save_directory
|
||||
pass
|
||||
|
||||
print("Unsloth: Merging 4bit and LoRA weights to 16bit...")
|
||||
|
||||
# Determine max RAM usage minus sharding
|
||||
|
|
@ -339,7 +378,7 @@ def unsloth_save_model(
|
|||
logger.warning_once(
|
||||
f"Unsloth: You have {n_cpus} CPUs. Using `safe_serialization` is 10x slower.\n"\
|
||||
f"We shall switch to Pytorch saving, which will take 3 minutes and not 30 minutes.\n"\
|
||||
f"To force `safe_serialization`, set it to None instead.",
|
||||
f"To force `safe_serialization`, set it to `None` instead.",
|
||||
)
|
||||
safe_serialization = False
|
||||
save_function = fast_save_pickle
|
||||
|
|
@ -413,13 +452,26 @@ def unsloth_save_model(
|
|||
# Edit save_pretrained_settings
|
||||
# [TODO] _create_repo has errors due to **kwargs getting accepted
|
||||
save_pretrained_settings["state_dict"] = state_dict
|
||||
for deletion in \
|
||||
("use_temp_dir", "commit_message", "create_pr", "revision", "commit_description", "tags",):
|
||||
|
||||
# commit_description does not seem to work?
|
||||
what_to_delete = ("use_temp_dir", "commit_message", "create_pr", "revision", "commit_description", "tags",) \
|
||||
if not push_to_hub else \
|
||||
("use_temp_dir", "create_pr", "revision", "tags", "commit_description",)
|
||||
for deletion in what_to_delete:
|
||||
del save_pretrained_settings[deletion]
|
||||
pass
|
||||
if hasattr(model, "add_model_tags"):
|
||||
model.add_model_tags(["unsloth",])
|
||||
|
||||
# Update model tag
|
||||
if push_to_hub:
|
||||
_ = upload_to_huggingface(
|
||||
model, save_pretrained_settings["save_directory"], token,
|
||||
"finetuned", "trl", file_location = None,
|
||||
old_username = username, private = private,
|
||||
)
|
||||
pass
|
||||
|
||||
if tokenizer is not None:
|
||||
print("Unsloth: Saving tokenizer...", end = "")
|
||||
tokenizer.save_pretrained(**tokenizer_save_settings)
|
||||
|
|
@ -452,9 +504,8 @@ def unsloth_save_model(
|
|||
model.config = old_config
|
||||
print("Done.")
|
||||
|
||||
# Print location
|
||||
if push_to_hub:
|
||||
print(f"Saved to https://huggingface.co/{username}/{save_directory.lstrip('/')}")
|
||||
if push_to_hub and hasattr(model, "config"):
|
||||
print(f"Saved merged model to https://huggingface.co/{username}/{save_directory.lstrip('/')}")
|
||||
pass
|
||||
|
||||
save_pretrained_settings["state_dict"] = None
|
||||
|
|
@ -478,7 +529,7 @@ def unsloth_save_model(
|
|||
for _ in range(3):
|
||||
torch.cuda.empty_cache()
|
||||
gc.collect()
|
||||
return save_directory
|
||||
return save_directory, username
|
||||
pass
|
||||
|
||||
|
||||
|
|
@ -494,7 +545,7 @@ def install_llama_cpp_make_non_blocking():
|
|||
n_jobs = max(int(psutil.cpu_count()*1.5), 1)
|
||||
# Force make clean
|
||||
os.system("make clean -C llama.cpp")
|
||||
full_command = ["make", "all", "-j", str(n_jobs), "-C", "llama.cpp"]
|
||||
full_command = ["make", "all", "-j"+str(n_jobs), "-C", "llama.cpp"]
|
||||
run_installer = subprocess.Popen(full_command, env = env, stdout = subprocess.DEVNULL, stderr = subprocess.STDOUT)
|
||||
return run_installer
|
||||
pass
|
||||
|
|
@ -507,10 +558,44 @@ def install_python_non_blocking(packages = []):
|
|||
pass
|
||||
|
||||
|
||||
def install_llama_cpp_old(version = -10):
|
||||
# Download the 10th latest release since the latest might be broken!
|
||||
# FALLBACK mechanism
|
||||
releases = subprocess.check_output(["git", "ls-remote", "--tags", "https://github.com/ggerganov/llama.cpp.git"])
|
||||
releases = releases.decode("utf-8").replace("\t", " ").split("\n")
|
||||
for i, x in enumerate(releases):
|
||||
if "refs/tags/b" not in x: break
|
||||
releases = releases[:i]
|
||||
latest = releases[-1]
|
||||
version = releases[version].split(" ")[0]
|
||||
|
||||
# Clone a specific commit
|
||||
commands = [
|
||||
"git clone https://github.com/ggerganov/llama.cpp",
|
||||
f"cd llama.cpp && git reset --hard {version} && git clean -df && "\
|
||||
f"make clean && LLAMA_CUBLAS=1 make all -j{psutil.cpu_count()*2}",
|
||||
"pip install gguf protobuf",
|
||||
]
|
||||
for command in commands:
|
||||
with subprocess.Popen(command, shell = True, stdout = subprocess.PIPE, bufsize = 1) as sp:
|
||||
for line in sp.stdout:
|
||||
print(line.decode("utf-8"), flush = True, end = "")
|
||||
pass
|
||||
pass
|
||||
# Check if successful
|
||||
if not os.path.exists("llama.cpp/quantize"):
|
||||
raise RuntimeError(
|
||||
"Unsloth: llama.cpp GGUF seems to be too buggy to install.\n"\
|
||||
"File a report to llama.cpp's main repo since this is not an Unsloth issue."
|
||||
)
|
||||
pass
|
||||
pass
|
||||
|
||||
|
||||
def install_llama_cpp_blocking():
|
||||
commands = [
|
||||
"git clone https://github.com/ggerganov/llama.cpp",
|
||||
f"cd llama.cpp && make clean && LLAMA_CUBLAS=1 make all -j {psutil.cpu_count()*2}",
|
||||
f"cd llama.cpp && make clean && LLAMA_CUBLAS=1 make all -j{psutil.cpu_count()*2}",
|
||||
"pip install gguf protobuf",
|
||||
]
|
||||
if os.path.exists("llama.cpp"): return
|
||||
|
|
@ -563,10 +648,13 @@ def save_to_gguf(
|
|||
|
||||
print("Unsloth: [0] Installing llama.cpp. This will take 3 minutes...")
|
||||
if _run_installer is not None:
|
||||
_run_installer.wait()
|
||||
error = _run_installer.wait()
|
||||
else:
|
||||
error = 0
|
||||
install_llama_cpp_blocking()
|
||||
pass
|
||||
# Check if successful. If not install 10th latest release
|
||||
if error != 0 or not os.path.exists("llama.cpp/quantize"): install_llama_cpp_old(-10)
|
||||
|
||||
if quantization_method == "f32": first_conversion = "f32"
|
||||
elif quantization_method == "f16": first_conversion = "f16"
|
||||
|
|
@ -580,15 +668,18 @@ def save_to_gguf(
|
|||
first_conversion = "f16"
|
||||
pass
|
||||
pass
|
||||
print(f"Unsloth: [1] Converting HF into {first_conversion} GGUF format. This will take 3 minutes...")
|
||||
|
||||
n_cpus = psutil.cpu_count()*2
|
||||
# Concurrency from https://rentry.org/llama-cpp-conversions#merging-loras-into-a-model
|
||||
|
||||
final_location = f"./{model_directory}-unsloth.{first_conversion.upper()}.gguf"
|
||||
|
||||
print(f"Unsloth: [1] Converting model at {model_directory} into {first_conversion} GGUF format.\n"\
|
||||
f"The output location will be {final_location}\n"\
|
||||
"This will take 3 minutes...")
|
||||
|
||||
command = f"python llama.cpp/convert.py {model_directory} "\
|
||||
f"--outfile {final_location} "\
|
||||
f"--outfile {final_location} --vocab-type hfft "\
|
||||
f"--outtype {first_conversion} --concurrency {n_cpus}"
|
||||
|
||||
with subprocess.Popen(command, shell = True, stdout = subprocess.PIPE, stderr = subprocess.PIPE, bufsize = 1) as sp:
|
||||
|
|
@ -601,7 +692,8 @@ def save_to_gguf(
|
|||
# Check if quantization succeeded!
|
||||
if not os.path.isfile(final_location):
|
||||
raise RuntimeError(
|
||||
"Unsloth: Quantization failed! You might have to compile llama.cpp yourself, then run this again.\n"\
|
||||
f"Unsloth: Quantization failed for {final_location}\n"\
|
||||
"You might have to compile llama.cpp yourself, then run this again.\n"\
|
||||
"You do not need to close this Python program. Run the following commands in a new terminal:\n"\
|
||||
"You must run this in the same folder as you're saving your model.\n"\
|
||||
"git clone https://github.com/ggerganov/llama.cpp\n"\
|
||||
|
|
@ -662,7 +754,7 @@ def unsloth_save_pretrained_merged(
|
|||
save_peft_format : bool = True,
|
||||
tags : List[str] = None,
|
||||
temporary_location : str = "_unsloth_temporary_saved_buffers",
|
||||
maximum_memory_usage : float = 0.85,
|
||||
maximum_memory_usage : float = 0.85,
|
||||
):
|
||||
"""
|
||||
Same as .save_pretrained(...) except 4bit weights are auto
|
||||
|
|
@ -695,14 +787,14 @@ def unsloth_push_to_hub_merged(
|
|||
tokenizer = None,
|
||||
save_method : str = "merged_16bit", # ["lora", "merged_16bit", "merged_4bit"]
|
||||
use_temp_dir : Optional[bool] = None,
|
||||
commit_message : Optional[str] = None,
|
||||
commit_message : Optional[str] = "Trained with Unsloth",
|
||||
private : Optional[bool] = None,
|
||||
token : Union[bool, str, None] = None,
|
||||
max_shard_size : Union[int, str, None] = "5GB",
|
||||
create_pr : bool = False,
|
||||
safe_serialization : bool = True,
|
||||
revision : str = None,
|
||||
commit_description : str = None,
|
||||
commit_description : str = "Upload model trained with Unsloth 2x faster",
|
||||
tags : Optional[List[str]] = None,
|
||||
temporary_location : str = "_unsloth_temporary_saved_buffers",
|
||||
maximum_memory_usage : float = 0.85,
|
||||
|
|
@ -760,15 +852,27 @@ This {model_type} model was trained 2x faster with [Unsloth](https://github.com/
|
|||
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|
||||
"""
|
||||
|
||||
def upload_to_huggingface(model, save_directory, token, method, extra = "", file_location = None):
|
||||
def upload_to_huggingface(
|
||||
model,
|
||||
save_directory,
|
||||
token,
|
||||
method,
|
||||
extra = "",
|
||||
file_location = None,
|
||||
old_username = None,
|
||||
private = None,
|
||||
):
|
||||
# Check for username
|
||||
username = ""
|
||||
save_directory = save_directory.lstrip("./")
|
||||
if "/" not in save_directory:
|
||||
from huggingface_hub import whoami
|
||||
try:
|
||||
username = whoami()['name']
|
||||
save_directory = f"{save_directory}/{username}"
|
||||
username = whoami(token = token)["name"]
|
||||
if type(old_username) is str and username != old_username:
|
||||
username = old_username
|
||||
pass
|
||||
save_directory = f"{username}/{save_directory}"
|
||||
except:
|
||||
raise RuntimeError(f"Unsloth: {save_directory} is not a Huggingface directory.")
|
||||
else:
|
||||
|
|
@ -776,24 +880,28 @@ def upload_to_huggingface(model, save_directory, token, method, extra = "", file
|
|||
pass
|
||||
|
||||
from huggingface_hub import create_repo
|
||||
create_repo(
|
||||
repo_id = save_directory,
|
||||
token = token,
|
||||
repo_type = "model",
|
||||
exist_ok = True,
|
||||
)
|
||||
try:
|
||||
create_repo(
|
||||
repo_id = save_directory,
|
||||
token = token,
|
||||
repo_type = "model",
|
||||
exist_ok = False,
|
||||
private = private,
|
||||
)
|
||||
|
||||
# Create model card
|
||||
from huggingface_hub import ModelCard
|
||||
content = MODEL_CARD.format(
|
||||
username = username,
|
||||
base_model = model.config._name_or_path,
|
||||
model_type = model.config.model_type,
|
||||
method = "",
|
||||
extra = extra,
|
||||
)
|
||||
card = ModelCard(content)
|
||||
card.push_to_hub(save_directory, token = token)
|
||||
# Create model card
|
||||
from huggingface_hub import ModelCard
|
||||
content = MODEL_CARD.format(
|
||||
username = username,
|
||||
base_model = model.config._name_or_path,
|
||||
model_type = model.config.model_type,
|
||||
method = "",
|
||||
extra = extra,
|
||||
)
|
||||
card = ModelCard(content)
|
||||
card.push_to_hub(save_directory, token = token)
|
||||
except:
|
||||
pass
|
||||
|
||||
if file_location is not None:
|
||||
# Now upload file
|
||||
|
|
@ -811,6 +919,7 @@ def upload_to_huggingface(model, save_directory, token, method, extra = "", file
|
|||
path_in_repo = uploaded_location,
|
||||
repo_id = save_directory,
|
||||
repo_type = "model",
|
||||
commit_message = "(Trained with Unsloth)",
|
||||
)
|
||||
|
||||
# We also upload a config.json file
|
||||
|
|
@ -823,6 +932,7 @@ def upload_to_huggingface(model, save_directory, token, method, extra = "", file
|
|||
path_in_repo = "config.json",
|
||||
repo_id = save_directory,
|
||||
repo_type = "model",
|
||||
commit_message = "(Trained with Unsloth)",
|
||||
)
|
||||
os.remove("_temporary_unsloth_config.json")
|
||||
pass
|
||||
|
|
@ -838,6 +948,7 @@ def unsloth_save_pretrained_gguf(
|
|||
first_conversion : str = "f16",
|
||||
push_to_hub : bool = False,
|
||||
token : Optional[Union[str, bool]] = None,
|
||||
private : Optional[bool] = None,
|
||||
is_main_process : bool = True,
|
||||
state_dict : Optional[dict] = None,
|
||||
save_function : Callable = torch.save,
|
||||
|
|
@ -847,7 +958,7 @@ def unsloth_save_pretrained_gguf(
|
|||
save_peft_format : bool = True,
|
||||
tags : List[str] = None,
|
||||
temporary_location : str = "_unsloth_temporary_saved_buffers",
|
||||
maximum_memory_usage : float = 0.85,
|
||||
maximum_memory_usage : float = 0.85,
|
||||
):
|
||||
"""
|
||||
Same as .save_pretrained(...) except 4bit weights are auto
|
||||
|
|
@ -898,11 +1009,11 @@ def unsloth_save_pretrained_gguf(
|
|||
python_install = install_python_non_blocking(["gguf", "protobuf"])
|
||||
git_clone.wait()
|
||||
makefile = install_llama_cpp_make_non_blocking()
|
||||
new_save_directory = unsloth_save_model(**arguments)
|
||||
new_save_directory, old_username = unsloth_save_model(**arguments)
|
||||
python_install.wait()
|
||||
else:
|
||||
try:
|
||||
new_save_directory = unsloth_save_model(**arguments)
|
||||
new_save_directory, old_username = unsloth_save_model(**arguments)
|
||||
makefile = None
|
||||
except:
|
||||
# Retry by recloning llama.cpp
|
||||
|
|
@ -910,7 +1021,7 @@ def unsloth_save_pretrained_gguf(
|
|||
python_install = install_python_non_blocking(["gguf", "protobuf"])
|
||||
git_clone.wait()
|
||||
makefile = install_llama_cpp_make_non_blocking()
|
||||
new_save_directory = unsloth_save_model(**arguments)
|
||||
new_save_directory, old_username = unsloth_save_model(**arguments)
|
||||
python_install.wait()
|
||||
pass
|
||||
pass
|
||||
|
|
@ -924,12 +1035,12 @@ def unsloth_save_pretrained_gguf(
|
|||
print("Unsloth: Uploading GGUF to Huggingface Hub...")
|
||||
username = upload_to_huggingface(
|
||||
self, save_directory, token,
|
||||
"GGUF converted", "gguf", file_location,
|
||||
"GGUF converted", "gguf", file_location, old_username, private,
|
||||
)
|
||||
link = f"{username}/{new_save_directory.lstrip('/.')}" \
|
||||
if username not in new_save_directory else \
|
||||
new_save_directory.lstrip('/.')
|
||||
print(f"Saved to https://huggingface.co/{link}")
|
||||
print(f"Saved GGUF to https://huggingface.co/{link}")
|
||||
pass
|
||||
pass
|
||||
|
||||
|
|
@ -941,14 +1052,14 @@ def unsloth_push_to_hub_gguf(
|
|||
quantization_method : str = "fast_quantized",
|
||||
first_conversion : str = "f16",
|
||||
use_temp_dir : Optional[bool] = None,
|
||||
commit_message : Optional[str] = None,
|
||||
commit_message : Optional[str] = "Trained with Unsloth",
|
||||
private : Optional[bool] = None,
|
||||
token : Union[bool, str, None] = None,
|
||||
max_shard_size : Union[int, str, None] = "5GB",
|
||||
create_pr : bool = False,
|
||||
safe_serialization : bool = True,
|
||||
revision : str = None,
|
||||
commit_description : str = None,
|
||||
commit_description : str = "Upload model trained with Unsloth 2x faster",
|
||||
tags : Optional[List[str]] = None,
|
||||
temporary_location : str = "_unsloth_temporary_saved_buffers",
|
||||
maximum_memory_usage : float = 0.85,
|
||||
|
|
@ -998,19 +1109,19 @@ def unsloth_push_to_hub_gguf(
|
|||
python_install = install_python_non_blocking(["gguf", "protobuf"])
|
||||
git_clone.wait()
|
||||
makefile = install_llama_cpp_make_non_blocking()
|
||||
new_save_directory = unsloth_save_model(**arguments)
|
||||
new_save_directory, old_username = unsloth_save_model(**arguments)
|
||||
python_install.wait()
|
||||
else:
|
||||
try:
|
||||
new_save_directory = unsloth_save_model(**arguments)
|
||||
new_save_directory, old_username = unsloth_save_model(**arguments)
|
||||
makefile = None
|
||||
except:
|
||||
# Retry by recloning llama.cpp
|
||||
git_clone = install_llama_cpp_clone_non_blocking()
|
||||
python_install = install_python_non_blocking(["gguf", "protobuf"])
|
||||
git_clone.wait()
|
||||
makefile = install_llama_cpp_make_non_blocking()
|
||||
new_save_directory = unsloth_save_model(**arguments)
|
||||
makefile = install_llama_cpp_make_non_blocking()
|
||||
new_save_directory, old_username = unsloth_save_model(**arguments)
|
||||
python_install.wait()
|
||||
pass
|
||||
pass
|
||||
|
|
@ -1023,12 +1134,12 @@ def unsloth_push_to_hub_gguf(
|
|||
print("Unsloth: Uploading GGUF to Huggingface Hub...")
|
||||
username = upload_to_huggingface(
|
||||
self, repo_id, token,
|
||||
"GGUF converted", "gguf", file_location,
|
||||
"GGUF converted", "gguf", file_location, old_username, private,
|
||||
)
|
||||
link = f"{username}/{new_save_directory.lstrip('/.')}" \
|
||||
if username not in new_save_directory else \
|
||||
new_save_directory.lstrip('/.')
|
||||
print(f"Saved to https://huggingface.co/{link}")
|
||||
print(f"Saved GGUF to https://huggingface.co/{link}")
|
||||
pass
|
||||
|
||||
|
||||
|
|
@ -1038,31 +1149,17 @@ def patch_saving_functions(model):
|
|||
import types
|
||||
from typing import Callable, Optional, Union, List
|
||||
|
||||
if hasattr(model, "_original_push_to_hub"): return
|
||||
|
||||
# First check if this has already been called, and revert it
|
||||
original_model = model
|
||||
while True:
|
||||
if hasattr(original_model, "_original_push_to_hub"):
|
||||
original_model.push_to_hub = original_model._original_push_to_hub
|
||||
del original_model._original_push_to_hub
|
||||
if hasattr(original_model, "push_to_hub_merged"): del original_model.push_to_hub_merged
|
||||
if hasattr(original_model, "save_pretrained_merged"): del original_model.save_pretrained_merged
|
||||
if hasattr(original_model, "push_to_hub_gguf"): del original_model.push_to_hub_gguf
|
||||
if hasattr(original_model, "save_pretrained_gguf"): del original_model.save_pretrained_gguf
|
||||
pass
|
||||
|
||||
if hasattr(original_model, "model"): original_model = original_model.model
|
||||
else: break
|
||||
# And now re add our saving methods!
|
||||
if model.push_to_hub.__name__ == "unsloth_push_to_hub":
|
||||
original_push_to_hub = model.original_push_to_hub
|
||||
else:
|
||||
original_push_to_hub = model.push_to_hub
|
||||
pass
|
||||
|
||||
# And now re add our saving methods!
|
||||
original_push_to_hub = model.push_to_hub
|
||||
signature = str(inspect.signature(original_push_to_hub)).replace("NoneType", "None")
|
||||
signature = signature[1:]
|
||||
signature = re.sub("<function save at .+?>", "torch.save", signature)
|
||||
docs = original_push_to_hub.__doc__.encode("utf-8").decode("utf-8")
|
||||
model._original_push_to_hub = original_push_to_hub
|
||||
|
||||
push_to_hub_text = f'''def unsloth_push_to_hub(self, {signature}:
|
||||
"""
|
||||
|
|
@ -1077,11 +1174,45 @@ def patch_saving_functions(model):
|
|||
arguments["tags"] = ["unsloth",]
|
||||
elif hasattr(self, "add_model_tags"):
|
||||
self.add_model_tags(["unsloth",])
|
||||
|
||||
if "commit_message" in arguments:
|
||||
commit_message = arguments["commit_message"]
|
||||
if commit_message is not None:
|
||||
if not commit_message.endswith(" "): commit_message += " "
|
||||
if "Unsloth" not in commit_message:
|
||||
commit_message += "(Trained with Unsloth)"
|
||||
else:
|
||||
commit_message = "Upload model trained with Unsloth"
|
||||
arguments["commit_message"] = commit_message
|
||||
|
||||
if "commit_description" in arguments:
|
||||
commit_description = arguments["commit_description"]
|
||||
if commit_description is not None:
|
||||
if not commit_description.endswith(" "): commit_description += " "
|
||||
if "Unsloth" not in commit_description:
|
||||
commit_description += "(Trained with Unsloth 2x faster)"
|
||||
else:
|
||||
commit_description = "Upload model trained with Unsloth 2x faster"
|
||||
arguments["commit_description"] = commit_description
|
||||
|
||||
# Update model tag
|
||||
if hasattr(self, "config"):
|
||||
_ = upload_to_huggingface(
|
||||
self, arguments["repo_id"], arguments["token"],
|
||||
"finetuned", "trl", file_location = None,
|
||||
old_username = None, private = arguments["private"],
|
||||
)
|
||||
pass
|
||||
|
||||
try:
|
||||
return self._original_push_to_hub(**arguments)
|
||||
self.original_push_to_hub(**arguments)
|
||||
except:
|
||||
del arguments["tags"]
|
||||
return self._original_push_to_hub(**arguments)
|
||||
self.original_push_to_hub(**arguments)
|
||||
pass
|
||||
|
||||
if hasattr(self, "config"):
|
||||
print("Saved model to https://huggingface.co/" + arguments["repo_id"])
|
||||
pass
|
||||
'''
|
||||
exec(push_to_hub_text, globals())
|
||||
|
|
@ -1089,12 +1220,12 @@ def patch_saving_functions(model):
|
|||
original_model = model
|
||||
while True:
|
||||
|
||||
if not hasattr(original_model, "_original_push_to_hub"):
|
||||
original_model._original_push_to_hub = original_model.push_to_hub
|
||||
if original_model.push_to_hub.__name__ != "unsloth_push_to_hub":
|
||||
original_model.original_push_to_hub = original_model.push_to_hub
|
||||
original_model.push_to_hub = types.MethodType(unsloth_push_to_hub, original_model)
|
||||
|
||||
if hasattr(original_model, "add_model_tags"):
|
||||
original_model.add_model_tags(["unsloth",])
|
||||
pass
|
||||
pass
|
||||
|
||||
if hasattr(original_model, "model"): original_model = original_model.model
|
||||
|
|
|
|||
Loading…
Reference in a new issue