Michael Han
9807456b29
Update README.md
2025-02-09 19:57:15 -08:00
Diogo Neves
36c3d36e74
Fixed Triton url ( #1607 )
...
Triton's link was pointing to the old research url
2025-02-08 19:41:39 -08:00
Michael Han
74fce13683
Update README.md
2025-02-06 17:20:19 -08:00
Michael Han
789af5b7f9
Update README.md
2025-01-30 21:05:45 -08:00
Michael Han
748d1f1fd0
Update README.md
...
Updating super old benchmarks
2025-01-26 14:11:58 -08:00
Michael Han
b4c3b5eea9
Update README.md
2025-01-20 22:13:07 -08:00
Michael Han
e3162dc5bf
Update README.md
...
Update to benchmark tables
2025-01-14 23:20:07 -08:00
Michael Han
08c330b7cc
Update README.md
2025-01-11 17:34:51 -08:00
Michael Han
9569392187
Merge pull request #1515 from unslothai/shimmyshimmer-patch-1
...
Update README.md for Notebooks
2025-01-10 10:13:04 -08:00
Michael Han
db14c7f182
Update README.md
2025-01-09 16:59:43 -08:00
Michael Han
59d7cd9888
Update README.md
2025-01-08 23:02:27 -08:00
Daniel Han
63782ea3af
Bug fixes ( #1516 )
...
* use exact model name
* Update save.py
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Update _utils.py
* print
* Update _utils.py
* Update _utils.py
* Update llama.py
* Update _utils.py
* Update vision.py
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Update loader.py
* accurate_accumulation
* Update loader.py
* Update loader.py
* Update _utils.py
* Update loader.py
* Update loader.py
* Update loader.py
* Update loader.py
* Update pyproject.toml
* Update __init__.py
* Update pyproject.toml
* Update __init__.py
* Update __init__.py
* Fix Triton heuristics
https://github.com/triton-lang/triton/issues/5224
* Update __init__.py
* Update __init__.py
* Update __init__.py
* Update __init__.py
* Xformers
* Update loader.py
* Update loader.py
* Rewind
* Update _utils.py
* Update _utils.py
* requires grad
* Update loader.py
* Update _utils.py
* Update loader.py
* changing model to base_model if peft model is already used
* Improve debugging experience (#1512 )
* Create CONTRIBUTING.md (#1472 )
Creating contributing guidelines
* Update CONTRIBUTING.md
improved sentence
* Improve logging control in `unsloth_compile_transformers` by conditionally redirecting stdout based on UNSLOTH_DISABLE_LOGGER environment variable
---------
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
* Update loader.py
* Update llama.py
* Update llama.py
* Revert "Update llama.py"
This reverts commit b7ddf962d2 .
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
* Auto change is_bfloat16_supported
* Update llama.py
* Force data-type
* Update llama.py
* All attention refactor fix (#1491 )
* change initilization of n_heads, n_kv_heads, hidden_size in llama.py
* do the same for cohere, mistral, gemma2, granite
* do the same for flexattention,cohere, mistral, granite
* Update llama.py
* Update llama.py
* Update granite to work with latest post_patch methods (#1502 )
* Update granite to work with latest post_patch methods
* Pass position_embeddings for granite even if transformers<4.47
* Update llama.py
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* Minor fixes for granite models (#1503 )
* Update granite.py
Grab residual multiplier directly from layer
* Update llama.py
Version should read >= 4.47.1 as that is the version requiring the changes
* Update granite.py
* Update llama.py
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* support modelscope models and datasets (#1481 )
* support modelscope
* change modelscope args
* remove useless import
* remove useless import
* fix
* wip
* fix
* remove useless code
* add readme
* add some comments
* change print to raise error
* update comment
* Update loader.py
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
---------
Co-authored-by: Itsuro Tajima <tajima@georepublic.de>
Co-authored-by: Muhammad Osama <muhammadosama1994@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nino Risteski <95188570+NinoRisteski@users.noreply.github.com>
Co-authored-by: Kareem <81531392+KareemMusleh@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: Z <coffeevampirebusiness@gmail.com>
Co-authored-by: tastelikefeet <58414341+tastelikefeet@users.noreply.github.com>
2025-01-07 04:23:14 -08:00
Michael Han
4ce92cfe2c
Update README.md
...
Notebook links
2025-01-07 02:02:59 -08:00
Scott Phillips
104eeac1db
Fix loader.py to work on Windows ( #1453 )
...
* Update README.md
Llama 3.3 + Reddit
* Update README.md
Apple ML Cross Entropy
* Update README.md
Removing double citation
* Fix loader.py to work on Windows
---------
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
2024-12-20 02:20:15 -08:00
Edd
eaee5ddfa9
Add citation section to README.md ( #1377 )
...
* Add citation section to README.md
* Update README.md
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2024-12-04 23:59:13 -08:00
Michael Han
da7cdb2c8c
Update README.md
...
Unsloth Dynamic 4-bit Quantization Update
2024-12-04 21:32:23 -08:00
Michael Han
16cf998173
Update README.md
...
Fixing Qwen links
2024-12-03 16:50:52 -08:00
Daniel Han
6d34ab821b
Vision ( #1318 )
...
* Add files via upload
* Add files via upload
* Add files via upload
* Add files via upload
* Update README.md
* Update README.md
* Update README.md
* Update README.md
---------
Co-authored-by: Michael <107991372+shimmyshimmer@users.noreply.github.com>
2024-11-21 11:24:12 -08:00
Daniel Han
2dca0cb94b
Bug fixes ( #1288 )
...
* Fix TRL
* Update mistral.py
* Patch processing_class
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Installation guide (#1165 )
* chore: update chat_templates.py (#1166 )
orginal -> original
* Disable Flex Attention
* Update tokenizer_utils.py
* Update _utils.py
* n_items
* Update cross_entropy_loss.py
* Fix DPO, ORPO
* Update _utils.py
* Update _utils.py
* fix/transformers-unpack (#1180 )
* Fix DPO, ORPO (#1177 )
* Fix TRL
* Update mistral.py
* Patch processing_class
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Installation guide (#1165 )
* chore: update chat_templates.py (#1166 )
orginal -> original
* Disable Flex Attention
* Update tokenizer_utils.py
* Update _utils.py
* n_items
* Update cross_entropy_loss.py
* Fix DPO, ORPO
* Update _utils.py
---------
Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
* Add warning for missing Unpack and KwargsForCausalLM in older Transformers versions
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
* Update cross_entropy_loss.py
* Update _utils.py
* Update _utils.py
* donot upcast lm_head and embeddings to float32 (#1186 )
* Cleanup upcast logs (#1188 )
* Fix/phi-longrope (#1193 )
* Enhance rotary embedding handling in LlamaAttention and LongRopeRotaryEmbedding
* Typo
* Improve rotary embedding handling in LlamaAttention to prevent errors with short KV cache
* Update llama.py
* Update llama.py
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* Update transformers
* Unk token issues
* Update _utils.py
* Fix pad token
* Update llama.py
* Typo
* ignored labels
* Revert "ignored labels"
This reverts commit 4b25138ac7 .
* More patching
* Update _utils.py
* Update _utils.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Feat/all tmp (#1219 )
* Update save.py
Check whether path is in /tmp dir for Kaggle environment
* Update save.py
Move temporary_location to /tmp in Kaggle
* Enhance Kaggle environment support in save and tokenizer utilities
---------
Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>
* Bug fixes
* Update pyproject.toml
* Update _utils.py
* Update __init__.py
* Update __init__.py
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Tied weights
* Revert "Tied weights"
This reverts commit 820cd4efef .
* Tied weights
* Utils
* CE Loss patching
* Update __init__.py
* Update __init__.py
* Patching
* Update cross_entropy_loss.py
* CE Loss
* Update _utils.py
* Update _utils.py
* CE Loss
* Update _utils.py
* Update _utils.py
* Layernorm
* Update _utils.py
* Update _utils.py
* Post patch
* Update _utils.py
* Update llama.py
* Update _utils.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* typing
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* int64
* Update _utils.py
* Update cross_entropy_loss.py
* constexpr
* constexpr
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update _utils.py
* Update _utils.py
* Update _utils.py
* CE
* Update cross_entropy_loss.py
* Update _utils.py
* Update llama.py
* Update _utils.py
* Update rms_layernorm.py
* Update rms_layernorm.py
* Update rms_layernorm.py
* Update rms_layernorm.py
* Update rms_layernorm.py
* Update rms_layernorm.py
* Update utils.py
* Update rms_layernorm.py
* Update rms_layernorm.py
* Update rms_layernorm.py
* Update rms_layernorm.py
* Update rms_layernorm.py
* Update rms_layernorm.py
* Update rms_layernorm.py
* Update rms_layernorm.py
* Update rms_layernorm.py
* Update rms_layernorm.py
* Update rms_layernorm.py
* Update rms_layernorm.py
* typing
* Update rope_embedding.py
* types
* Disable compiling
* Update _utils.py
* Update _utils.py
* Forward hook
* Update _utils.py
* Update llama.py
* Update _utils.py
* Update llama.py
* Update llama.py
* Update _utils.py
* Update pyproject.toml
* Update _utils.py
* Update llama.py
* CE Loss
* Update cross_entropy_loss.py
* Update _utils.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update cross_entropy_loss.py
* Update llama.py
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Fix: cast logits to float32 in cross_entropy_forward to prevent errors (#1254 )
* Fix: cast logits to float32 in cross_entropy_forward to prevent errors
* Update cross_entropy_loss.py
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* Throw error when inferencing longer than max_popsition_embeddings (#1236 )
* Throw error when inferencing longer than max_popsition_embeddings without rope scaling
* Update llama.py
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* CLI now handles user input strings for dtype correctly (#1235 )
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
* Update flex_attention.py
* Update _utils.py
* Update _utils.py
* Update flex_attention.py
* Update flex_attention.py
* Update loader.py
* Update loader.py
* Update flex_attention.py
* Update flex_attention.py
* Update flex_attention.py
* Update flex_attention.py
* Update _utils.py
* Update cross_entropy_loss.py
* Update _utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* triton_cast
* Update utils.py
* Qwen 2.5 Coder
* Fix/export mistral (#1281 )
* Enhance install_python_non_blocking to handle protobuf installation and process management
* Revert "Enhance install_python_non_blocking to handle protobuf installation and process management"
This reverts commit a3b796a05841fb8d93c652c845591e12cf81ea93.
* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266
* Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266 "
This reverts commit f00fbf5eac7ad4f5d48c70b98d770255d1a9ef58.
* Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue #1266
* Update __init__.py
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* DOC Update - Update README.md with os.environ in example (#1269 )
* Update README.md with os.environ in example
Added OS Environ in example to avoid device conflicts , for a user at least in jupyter notebook this allows to select GPU in a multi GPU setup.
As currently the unsloth init checks all GPU's and takes the first in the order which can be a issue when some GPU's are in use and the list still shows them. So to manually avoid this, this os config is required.
Small change but a bit time saver for those who straight away copies the tutorials
* Update README.md
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* fix/get_chat_template (#1246 )
* Refactor `get_chat_template` to now support system message instead. It supposed to fix ollama tokenizer chattemplate to
* Remove type hinting
* Update chat_templates.py
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* fix/sft-trainer (#1276 )
* Add patch for SFTTrainer to maintain backward compatibility with TRL changes
* Update trainer.py
* Update trainer.py
* Refactor trainer patch to maintain backward compatibility with TRL changes
* Update trainer.py
* Refactor trainer.py to exclude non-convertible trainers from backward compatibility patch
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* Update __init__.py
* Update trainer.py
* Update trainer.py
* Update trainer.py
* Update tokenizer_utils.py
---------
Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: Edd <68678137+Erland366@users.noreply.github.com>
Co-authored-by: Datta Nimmaturi <datta.nimmaturi@nutanix.com>
Co-authored-by: dendarrion <37800703+dendarrion@users.noreply.github.com>
Co-authored-by: Erland366 <erland.pg366@gmail.com>
Co-authored-by: Edwin Fennell <edwinfennell1@gmail.com>
Co-authored-by: root <root@ieeres.chu.cam.ac.uk>
Co-authored-by: Uday Girish Maradana <einsteingirish@gmail.com>
2024-11-13 19:05:40 -08:00
Daniel Han
e7ede2f7db
Torch 2.5
2024-10-26 18:03:15 -07:00
Daniel Han
4c85177719
Many bug fixes ( #1162 )
...
* Fix TRL
* Update mistral.py
* Patch processing_class
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Installation guide (#1165 )
* chore: update chat_templates.py (#1166 )
orginal -> original
* Disable Flex Attention
* Update tokenizer_utils.py
* Update _utils.py
---------
Co-authored-by: timothelaborie <97834767+timothelaborie@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
2024-10-23 03:14:57 -07:00
Daniel Han
139c3b29b3
Update README.md
2024-10-17 20:46:11 -07:00
Daniel Han
3a33dad3c9
Update README.md
2024-10-17 20:45:40 -07:00
Daniel Han
3c47723bb2
Update README.md
2024-10-01 00:40:17 -07:00
Daniel Han
88a542a129
Update README.md
2024-09-26 00:12:42 -07:00
Daniel Han
6bbca3aaa8
Update README.md
2024-09-26 00:05:38 -07:00
Daniel Han
4f4ef22035
Update README.md
2024-09-26 00:02:15 -07:00
Daniel Han
9c26f9d3bb
Update README.md
2024-09-23 01:36:50 -07:00
Daniel Han
45ca9501a4
Qwen 2.5
2024-09-23 01:27:12 -07:00
Daniel Han
1d4ae059c5
Update README.md ( #1036 )
2024-09-18 13:23:45 -07:00
Daniel Han
c5d7bb591d
Update README.md ( #1033 )
2024-09-15 17:42:09 -07:00
Daniel Han
ffb6aa905f
Update README.md
2024-09-08 14:30:54 -07:00
Daniel Han
1bba6954f1
Update README.md
2024-09-08 12:29:31 -07:00
Daniel Han
353991f14a
Phi 3.5 bug fix ( #955 )
...
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* update token retrieval logic (#952 )
* Fix DPO (#947 )
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* Update tokenizer_utils.py
* update hf token retrieval logic
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* Update llama.py
* get_token
* Update README.md
---------
Co-authored-by: Hafedh <70411813+not-lain@users.noreply.github.com>
2024-08-23 17:38:24 -07:00
Daniel Han
cadff4f883
Update README.md ( #941 )
...
Co-authored-by: Michael <107991372+shimmyshimmer@users.noreply.github.com>
2024-08-20 17:59:50 -07:00
Daniel Han
fb60340a90
Phi 3.5 ( #940 )
...
* LongRoPE
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Update _utils.py
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
* Update mapper.py
* Phi 3.5
2024-08-20 16:51:39 -07:00
Daniel Han
0927c34392
Update README.md ( #938 )
2024-08-19 17:18:30 -07:00
Daniel Han
9be6480ec5
Update README.md
2024-08-05 00:00:53 -07:00
Daniel Han
ba87b3dd31
Update README.md
2024-08-04 23:59:57 -07:00
Daniel Han
d9e330ded7
Update README.md
2024-08-04 23:50:40 -07:00
emuchogu
fe4b9da764
pascal support ( #870 )
...
Co-authored-by: Edward Muchogu <muchogu@gmail.com>
2024-08-04 23:45:51 -07:00
Daniel Han
2521a8b39f
Update README.md
2024-07-31 09:50:11 -07:00
Daniel Han
4e03b77673
Gemma ( #843 )
...
* bugs
* Update _utils.py
* flash-attn softcapping
* Update gemma2.py
* Update gemma2.py
* Update gemma2.py
* Update gemma2.py
* Update mapper.py
* Update README.md
* Update _utils.py
2024-07-31 08:54:58 -07:00
Daniel Han
27b23a9bd4
Update README.md
2024-07-23 15:08:09 -07:00
Daniel Han
dd781e0c60
Update README.md
2024-07-23 12:07:27 -07:00
Daniel Han
faa36e853a
Update README.md
2024-07-23 11:51:08 -07:00
Daniel Han
56cbd06f1f
Llama 3.1 ( #797 )
...
* Llama 3.1
* Update _utils.py
* Llama 3.1
* Update _utils.py
* Update llama.py
* Update llama.py
* hack for rotary
* patch RoPE
* refix rope
* Update _utils.py
* Update llama.py
* Llama 3.1 check
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
* Update llama.py
2024-07-23 11:40:49 -07:00
Daniel Han
256b55fcdd
Update README.md
2024-07-19 03:05:15 -07:00
Daniel Han
b8e6560b8d
Update README.md
2024-07-19 03:03:50 -07:00
Daniel Han
2510a4abc4
Gemma2 ( #723 )
...
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
---------
Co-authored-by: Michael <107991372+shimmyshimmer@users.noreply.github.com>
2024-07-03 12:12:21 -07:00