Commit graph

122 commits

Author SHA1 Message Date
Michael Han
9807456b29 Update README.md 2025-02-09 19:57:15 -08:00
Diogo Neves
36c3d36e74 Fixed Triton url (#1607)
Triton's link was pointing to the old research url
2025-02-08 19:41:39 -08:00
Michael Han
74fce13683 Update README.md 2025-02-06 17:20:19 -08:00
Michael Han
789af5b7f9 Update README.md 2025-01-30 21:05:45 -08:00
Michael Han
748d1f1fd0 Update README.md
Updating super old benchmarks
2025-01-26 14:11:58 -08:00
Michael Han
b4c3b5eea9 Update README.md 2025-01-20 22:13:07 -08:00
Michael Han
e3162dc5bf Update README.md
Update to benchmark tables
2025-01-14 23:20:07 -08:00
Michael Han
08c330b7cc Update README.md 2025-01-11 17:34:51 -08:00
Michael Han
9569392187 Merge pull request #1515 from unslothai/shimmyshimmer-patch-1
Update README.md for Notebooks
2025-01-10 10:13:04 -08:00
Michael Han
db14c7f182 Update README.md 2025-01-09 16:59:43 -08:00
Michael Han
59d7cd9888 Update README.md 2025-01-08 23:02:27 -08:00
Daniel Han
63782ea3af Bug fixes (#1516)
* use exact model name

* Update save.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* print

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update vision.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update loader.py

* accurate_accumulation

* Update loader.py

* Update loader.py

* Update _utils.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update pyproject.toml

* Update __init__.py

* Update pyproject.toml

* Update __init__.py

* Update __init__.py

* Fix Triton heuristics

https://github.com/triton-lang/triton/issues/5224

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Xformers

* Update loader.py

* Update loader.py

* Rewind

* Update _utils.py

* Update _utils.py

* requires grad

* Update loader.py

* Update _utils.py

* Update loader.py

* changing model to base_model if peft model is already used

* Improve debugging experience (#1512)

* Create CONTRIBUTING.md (#1472)

Creating contributing guidelines

* Update CONTRIBUTING.md

improved sentence

* Improve logging control in `unsloth_compile_transformers` by conditionally redirecting stdout based on UNSLOTH_DISABLE_LOGGER environment variable

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>

* Update loader.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit b7ddf962d2.

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Auto change is_bfloat16_supported

* Update llama.py

* Force data-type

* Update llama.py

* All attention refactor fix (#1491)

* change initilization of n_heads, n_kv_heads, hidden_size in llama.py

* do the same for cohere, mistral, gemma2, granite

* do the same for flexattention,cohere, mistral, granite

* Update llama.py

* Update llama.py

* Update granite to work with latest post_patch methods (#1502)

* Update granite to work with latest post_patch methods

* Pass position_embeddings for granite even if transformers<4.47

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Minor fixes for granite models (#1503)

* Update granite.py

Grab residual multiplier directly from layer

* Update llama.py

Version should read >= 4.47.1 as that is the version requiring the changes

* Update granite.py

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* support modelscope models and datasets (#1481)

* support modelscope

* change modelscope args

* remove useless import

* remove useless import

* fix

* wip

* fix

* remove useless code

* add readme

* add some comments

* change print to raise error

* update comment

* Update loader.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

---------

Co-authored-by: Itsuro Tajima <tajima@georepublic.de>
Co-authored-by: Muhammad Osama <muhammadosama1994@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Z <coffeevampirebusiness@gmail.com>
Co-authored-by: tastelikefeet <58414341+tastelikefeet@users.noreply.github.com>
2025-01-07 04:23:14 -08:00
Michael Han
4ce92cfe2c Update README.md
Notebook links
2025-01-07 02:02:59 -08:00
Scott Phillips
104eeac1db Fix loader.py to work on Windows (#1453)
* Update README.md

Llama 3.3 + Reddit

* Update README.md

Apple ML Cross Entropy

* Update README.md

Removing double citation

* Fix loader.py to work on Windows

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2024-12-20 02:20:15 -08:00
Edd
eaee5ddfa9 Add citation section to README.md (#1377)
* Add citation section to README.md

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2024-12-04 23:59:13 -08:00
Michael Han
da7cdb2c8c Update README.md
Unsloth Dynamic 4-bit Quantization Update
2024-12-04 21:32:23 -08:00
Michael Han
16cf998173 Update README.md
Fixing Qwen links
2024-12-03 16:50:52 -08:00
Daniel Han
6d34ab821b Vision (#1318)
* Add files via upload

* Add files via upload

* Add files via upload

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Update README.md

---------

Co-authored-by: Michael <107991372+shimmyshimmer@users.noreply.github.com>
2024-11-21 11:24:12 -08:00
Daniel Han
2dca0cb94b Bug fixes (#1288)
* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

* Update _utils.py

* fix/transformers-unpack (#1180)

* Fix DPO, ORPO (#1177)

* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

* n_items

* Update cross_entropy_loss.py

* Fix DPO, ORPO

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* donot upcast lm_head and embeddings to float32 (#1186)

* Cleanup upcast logs (#1188)

* Fix/phi-longrope (#1193)

* Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding

* Typo

* Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache

* Update llama.py

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update transformers

* Unk token issues

* Update _utils.py

* Fix pad token

* Update llama.py

* Typo

* ignored labels

* Revert "ignored labels"

This reverts commit 4b25138ac7.

* More patching

* Update _utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Feat/all tmp (#1219)

* Update save.py

Check whether path is in /tmp dir for Kaggle environment

* Update save.py

Move temporary_location to /tmp in Kaggle

* Enhance Kaggle environment support in save and tokenizer utilities

---------

Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>

* Bug fixes

* Update pyproject.toml

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Tied weights

* Revert "Tied weights"

This reverts commit 820cd4efef.

* Tied weights

* Utils

* CE Loss patching

* Update __init__.py

* Update __init__.py

* Patching

* Update cross_entropy_loss.py

* CE Loss

* Update _utils.py

* Update _utils.py

* CE Loss

* Update _utils.py

* Update _utils.py

* Layernorm

* Update _utils.py

* Update _utils.py

* Post patch

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* typing

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* int64

* Update _utils.py

* Update cross_entropy_loss.py

* constexpr

* constexpr

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* CE

* Update cross_entropy_loss.py

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update utils.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* Update rms_layernorm.py

* typing

* Update rope_embedding.py

* types

* Disable compiling

* Update _utils.py

* Update _utils.py

* Forward hook

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update llama.py

* CE Loss

* Update cross_entropy_loss.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update cross_entropy_loss.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254)

* Fix: cast logits to float32 in cross_entropy_forward to prevent errors

* Update cross_entropy_loss.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Throw error when inferencing longer than max_popsition_embeddings (#1236)

* Throw error when inferencing longer than max_popsition_embeddings without rope scaling

* Update llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* CLI now handles user input strings for dtype correctly (#1235)

Co-authored-by: root <root@ieeres.chu.cam.ac.uk>

* Update flex_attention.py

* Update _utils.py

* Update _utils.py

* Update flex_attention.py

* Update flex_attention.py

* Update loader.py

* Update loader.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* Update _utils.py

* Update cross_entropy_loss.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* triton_cast

* Update utils.py

* Qwen 2.5 Coder

* Fix/export mistral (#1281)

* Enhance install_python_non_blocking to handle protobuf installation and process management

* Revert "Enhance install_python_non_blocking to handle protobuf installation and process management"

This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266"

This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58.

* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* DOC Update - Update README.md with os.environ in example (#1269)

* Update README.md with os.environ in example

Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup. 
As currently the  unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required.
Small change but a bit time saver for those who straight away copies the tutorials

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/get_chat_template (#1246)

* Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to

* Remove type hinting

* Update chat_templates.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix/sft-trainer (#1276)

* Add patch for SFTTrainer to maintain backward compatibility with TRL changes

* Update trainer.py

* Update trainer.py

* Refactor trainer patch to maintain backward compatibility with TRL changes

* Update trainer.py

* Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update __init__.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update tokenizer_utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
2024-11-13 19:05:40 -08:00
Daniel Han
e7ede2f7db Torch 2.5 2024-10-26 18:03:15 -07:00
Daniel Han
4c85177719 Many bug fixes (#1162)
* Fix TRL

* Update mistral.py

* Patch processing_class

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Installation guide (#1165)

* chore: update chat_templates.py (#1166)

orginal -> original

* Disable Flex Attention

* Update tokenizer_utils.py

* Update _utils.py

---------

Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
2024-10-23 03:14:57 -07:00
Daniel Han
139c3b29b3 Update README.md 2024-10-17 20:46:11 -07:00
Daniel Han
3a33dad3c9 Update README.md 2024-10-17 20:45:40 -07:00
Daniel Han
3c47723bb2 Update README.md 2024-10-01 00:40:17 -07:00
Daniel Han
88a542a129 Update README.md 2024-09-26 00:12:42 -07:00
Daniel Han
6bbca3aaa8 Update README.md 2024-09-26 00:05:38 -07:00
Daniel Han
4f4ef22035 Update README.md 2024-09-26 00:02:15 -07:00
Daniel Han
9c26f9d3bb Update README.md 2024-09-23 01:36:50 -07:00
Daniel Han
45ca9501a4 Qwen 2.5 2024-09-23 01:27:12 -07:00
Daniel Han
1d4ae059c5 Update README.md (#1036) 2024-09-18 13:23:45 -07:00
Daniel Han
c5d7bb591d Update README.md (#1033) 2024-09-15 17:42:09 -07:00
Daniel Han
ffb6aa905f Update README.md 2024-09-08 14:30:54 -07:00
Daniel Han
1bba6954f1 Update README.md 2024-09-08 12:29:31 -07:00
Daniel Han
353991f14a Phi 3.5 bug fix (#955)
* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* update token retrieval logic (#952)

* Fix DPO (#947)

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* update hf token retrieval logic

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* get_token

* Update README.md

---------

Co-authored-by: Hafedh <70411813+not-lain@users.noreply.github.com>
2024-08-23 17:38:24 -07:00
Daniel Han
cadff4f883 Update README.md (#941)
Co-authored-by: Michael <107991372+shimmyshimmer@users.noreply.github.com>
2024-08-20 17:59:50 -07:00
Daniel Han
fb60340a90 Phi 3.5 (#940)
* LongRoPE

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mapper.py

* Phi 3.5
2024-08-20 16:51:39 -07:00
Daniel Han
0927c34392 Update README.md (#938) 2024-08-19 17:18:30 -07:00
Daniel Han
9be6480ec5 Update README.md 2024-08-05 00:00:53 -07:00
Daniel Han
ba87b3dd31 Update README.md 2024-08-04 23:59:57 -07:00
Daniel Han
d9e330ded7 Update README.md 2024-08-04 23:50:40 -07:00
emuchogu
fe4b9da764 pascal support (#870)
Co-authored-by: Edward Muchogu <muchogu@gmail.com>
2024-08-04 23:45:51 -07:00
Daniel Han
2521a8b39f Update README.md 2024-07-31 09:50:11 -07:00
Daniel Han
4e03b77673 Gemma (#843)
* bugs

* Update _utils.py

* flash-attn softcapping

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update gemma2.py

* Update mapper.py

* Update README.md

* Update _utils.py
2024-07-31 08:54:58 -07:00
Daniel Han
27b23a9bd4 Update README.md 2024-07-23 15:08:09 -07:00
Daniel Han
dd781e0c60 Update README.md 2024-07-23 12:07:27 -07:00
Daniel Han
faa36e853a Update README.md 2024-07-23 11:51:08 -07:00
Daniel Han
56cbd06f1f Llama 3.1 (#797)
* Llama 3.1

* Update _utils.py

* Llama 3.1

* Update _utils.py

* Update llama.py

* Update llama.py

* hack for rotary

* patch RoPE

* refix rope

* Update _utils.py

* Update llama.py

* Llama 3.1 check

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py
2024-07-23 11:40:49 -07:00
Daniel Han
256b55fcdd Update README.md 2024-07-19 03:05:15 -07:00
Daniel Han
b8e6560b8d Update README.md 2024-07-19 03:03:50 -07:00
Daniel Han
2510a4abc4 Gemma2 (#723)
* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

---------

Co-authored-by: Michael <107991372+shimmyshimmer@users.noreply.github.com>
2024-07-03 12:12:21 -07:00