unsloth/unsloth-cli.py
Daniel Han d6bb89ad44 Formatting & bug fixes (#3563)
* Update rl.py

* Fix CE Loss

* Versioning

* Update loader.py

* Update loader.py

* extract_model_type_from_config

* Model types

* Update loader.py

* get_transformers_model_type

* Update loader.py

* Update loader.py

* Update loader.py

* Update rl.py

* Update pyproject.toml

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update vision.py

* Update vision.py

* Fix DataParallel

* Update _utils.py

* Update rl.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update mapper.py

* Versioning

* Update loader.py

* Update loader.py

* Update rl.py

* Versioning

* Update _utils.py

* Fix auto_mapping

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Message

* Update vision.py

* Update loader.py

* Update vision.py

* cache_implementation

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Save max_seq_length

* Update _utils.py

* Update rl.py

* Update vision.py

* Update llama.py

* Mistral3 vllm (#3349)

* [WIP] use vLLM for vision language models

* Update README.md

Editing icon sizes

* Update README.md

Updating icon sizes

* Update README.md (#2885)

* MoE kernels AGPLv3

* versioning

* Many bug fixes (#2908)

* add deepseek v3

* add deepseek r1 base

* add deepseek r1 zero

* add deepseek distill llama

* add deepseek distill models

* remove redundant code when constructing model names

* add mistral small to registry

* rename model registration methods

* rename deepseek registration methods

* refactor naming for mistral and phi

* add global register models

* refactor model registration tests for new registry apis

* add model search method

* remove deprecated registration api

* add quant type test

* add registry readme

* make llama registration more specific

* clear registry when executing individual model registration file

* more registry readme updates

* Update _auto_install.py

* Llama4

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Synthetic data

* Update mapper.py

* Xet and Synthetic

* Update synthetic.py

* Update loader.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912)

* Dynamically adjust get_per_token_logps function and patch as well (#2911)

* add intel gpu with vllm support (#2903)

* [bugs] fix for casual mask (#2868)

* fix for casual mask

* use un_casual in sdpa

* add missing mask

* fix for type

* Explicitly check if xformers exists for attention (#2889)

* Update __init__.py

* Update llama.py

* if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913)

* Move inputs to right devices. (#2919)

* Move tensors to right devices

* fix multi gpu for non mistral models

* multi GPU RoPE for gemma2

* Finish up multi GPU inference

* Make multiGPU rope a list

* Remove unnecessary transfer to CPU

* Remove unnecessary move to CPU

* Donot move inputs to device yet

will be handled separately in another PR

* Move inputs to appropriate decoder device

* Make device count global variable

* Cleanup RoPE device code

* Fixup num_gpu to device count

* Cleanup device counts

* Use device index for RoPE get_cache

* Donot typecast

* Use tuple instead of list for tensors. Use device index directly

* fixup move to device logic

* WIP VLM vLLM

* Make vLLM patch a function

* Add save and load lora functions

* Make fast_inference setup depend on the flag

* Improve fast inference patching mechanism

* Make vision setting depend on checks in fastbasemodel

* Check LoRA and vLLM intercompatibility for vision models

* Comment pointing to vLLM LoRA check

* Improve lora validation on vLLM

* Error out on no vLLM and increase max lora rank

* Bug fixes (#3017)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* fix for casual mask (#3011)

* [intel] add for intel path for llama.py (#3012)

* fix for intel path

* remove unuse code

* Update unsloth/models/llama.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update llama.py

* Fix Gemma 2 (#3024)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* falcon force float32 on sm<75 machines (#3026)

* Fix torch compile issues (#3028)

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update pyproject.toml

* Delete .gitignore

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update _utils.py

* Update pyproject.toml

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update chat_templates.py

* Seasame force float16 / float32

* Fix Seasame

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* is_multimodal

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update vision.py

* UNSLOTH_DISABLE_STATIC_GENERATION

* Update vision.py

* Auto vision detection

* Sesame

* Whisper

* Update loader.py

* Update loader.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

* Update _utils.py

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* logging

* Update pyproject.toml

* Update rl.py

* versioning

* Update rl.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* logits / temperature

* Update rl_replacements.py

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Debugging only

* Update llama.py

* Update llama.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Generic efficient GRPO

* Update rl_replacements.py

* Update rl_replacements.py

* Remove debugging

* Update rl_replacements.py

* Update rl_replacements.py

* Update vision.py

* Update llama.py

* Update rl_replacements.py

* versioning

* Update _utils.py

* Update vision.py

* Update mapper.py

* Update loader.py

* Update mapper.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update loader.py

* Update _utils.py

* Update vision.py

* gradient checkpointing

* Gemma 3N fixes

* Update loader.py

* Versioning

* Gemma 3N fixes

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Fix setup.py

* setup.py

* Prints

* Update setup.py

* Update setup.py

* Update setup.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update vision.py

* Update vision.py

* Update pyproject.toml

* Update vision.py

* Update _utils.py

* Update __init__.py

* Update __init__.py

* Small fixes

* Update vision.py

* Update vision.py

* versioning

* Update __init__.py

* Update llama.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update vision.py

* Update vision.py

* compiler stance

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990)

This reverts commit 4021da634a.

* skip_guard_eval_unsafe fix

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update synthetic.py

* Update llama.py

* Update llama.py

* Fix `quantization_method`

* versioning

* Update _utils.py

* Update _utils.py

* Update _utils.py

* check stride

* Cleanup

* Update rope_embedding.py

* Update gemma2.py

* Fix `set_stance`

* Update pyproject.toml

* Update _utils.py

* Fixup patch vllm

* Disable mllama

* Use variables to decide VLM support

* Better attn_impl handling

* Patch TF protobuf incompatability

* Torch 2.8 (#3186)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update _auto_install.py

* Update pyproject.toml

* Update rl.py

* Protobuf issue

* Update pyproject.toml

* Fix extras transformers typo in pyproject.toml

* Update _utils.py

* Bug fixes (#3195)

* Fix mamba

* Update loader.py

* Update vision.py

* Update loader.py

* Filter vLLM standby logs (#3131)

* filter vLLM standby logs

* safeguard standby logger patch

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

* Update unsloth/models/_utils.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update loader.py

* Add scaler

* Update llama.py

* Update _utils.py

* Versioning

* GPT OSS fix

* GPT OSS fix

* Update loader.py

* Update vision.py

* Update vision.py

* Update loader.py

* Update vision.py

* Update vision.py

* Update llama.py

* Update llama.py

* Update llama.py

* Versioning

* Update mapper.py

* Update vision.py

* Update vision.py

* Update vision.py

* Upcast norms

* Update loader.py

* Update vision.py

* Upcast layernorms

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update save.py

* Update rl.py

* Update pyproject.toml

* Update rl.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Update __init__.py

* Torch 2.8

* Update rl_replacements.py

* Update loader.py

* UNSLOTH_ENABLE_CCE

* Fix

* Update loader.py

* Update loader.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Import fixes

* Update loader.py

* Fix aimv2 issue

* Update loader.py

* Update import_fixes.py

* Update import_fixes.py

* Update loader.py

* Update loader.py

* Update loader.py

* Upgrade

* Update loader.py

* Update loader.py

* Update loader.py

* Update loader.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* adallow float32 dtype in FastLanguageModel (#3204)

* Update loader.py

* Update vision.py

* Suppress message and use unsloth sampling params

* Use trl sampling params for now

* Improve error message

* fixup quantized fast inference model name

* Add mistral 3 support

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>

* Set padding to 0

* Fix patch

* fixup patch (#3359)

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update vision.py

* Versioning

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* MXFP4 dequant

* Update loader.py

* Update vision.py

* load_in_16bit

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl.py

* Update vision.py

* offload_embedding

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update rl_replacements.py

* Update loader.py

* Fix padding issue

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* Update vision.py

* New models

* Update llama.py

* Versioning

* Update _utils.py

* Update llama.py

* Update _utils.py

* Update llama.py

* Fix AMD

* Update _utils.py

* Update llama.py

* Update vision.py

* DEVICE_TYPE_TORCH

* Update __init__.py

* Update __init__.py

* Update _utils.py

* Move DEVICE_TYPE

* Update rl_replacements.py

* Update loader.py

* AMD install script

* Move AMD

* Update _amd_install.sh

* Update pyproject.toml

* Update pyproject.toml

* Delete _amd_install.sh

* Update device_type.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* Versioning

* Update pyproject.toml

* Update loader.py

* Update _utils.py

* Update pyproject.toml

* Update pyproject.toml

* Update _utils.py

* Update pyproject.toml

* Update _utils.py

* Update _utils.py

* Update loader.py

* Update _utils.py

* Update _utils.py

* local_files_only

* Cut Cross Entropy

* Update llama.py

* Update vision.py

* Update vision.py

* Update vision.py

* Qwen 3 VL vLLM (#3489)

* Update __init__.py

* patch_torchao

* torchao_logger

* Update rl_replacements.py

* Fix

* Update rl.py

* Update rl.py

* Update rl.py

* Update rl.py

* Update _utils.py

* Versioning

* fbgemm fp8 block quant support (>=1.4.0) (#3531)

* fbgemm fp8 block quant support (>=1.4.0)

* Verify for fp8 support before proceeding

* Use unsloth zoo's Version and improve comments

* spacessss

* Update vision.py

* Update vision.py

* Update rl.py

* vllm_sampling_params

* Update rl.py

* Update rl.py

* Update rl.py

* Add `ruff` pre-commit hook and apply it (#3424)

* Add Ruff pre-commit config and workflow

* Add kwarg spacing enforcement helper

* Apply Ruff formatting

* Update fp8.py

* Revert ruff on some files

* Update

* force-exclude = true

* Datasets issue

* Ruff

* Remove mapper

* Update mapper.py

* Update pyproject.toml

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: jeromeku <jerome.ku@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com>
Co-authored-by: parth2510 <parthguptapg7326@gmail.com>
Co-authored-by: Dan Saunders <danjsaund@gmail.com>
2025-11-07 06:00:22 -08:00

393 lines
13 KiB
Python

#!/usr/bin/env python3
"""
🦥 Starter Script for Fine-Tuning FastLanguageModel with Unsloth
This script is designed as a starting point for fine-tuning your models using unsloth.
It includes configurable options for model loading, PEFT parameters, training arguments,
and model saving/pushing functionalities.
You will likely want to customize this script to suit your specific use case
and requirements.
Here are a few suggestions for customization:
- Modify the dataset loading and preprocessing steps to match your data.
- Customize the model saving and pushing configurations.
Usage: (most of the options have valid default values this is an extended example for demonstration purposes)
python unsloth-cli.py --model_name "unsloth/llama-3-8b" --max_seq_length 8192 --dtype None --load_in_4bit \
--r 64 --lora_alpha 32 --lora_dropout 0.1 --bias "none" --use_gradient_checkpointing "unsloth" \
--random_state 3407 --use_rslora --per_device_train_batch_size 4 --gradient_accumulation_steps 8 \
--warmup_steps 5 --max_steps 400 --learning_rate 2e-6 --logging_steps 1 --optim "adamw_8bit" \
--weight_decay 0.005 --lr_scheduler_type "linear" --seed 3407 --output_dir "outputs" \
--report_to "tensorboard" --save_model --save_path "model" --quantization_method "f16" \
--push_model --hub_path "hf/model" --hub_token "your_hf_token"
To see a full list of configurable options, use:
python unsloth-cli.py --help
Happy fine-tuning!
"""
import argparse
import os
def run(args):
import torch
from unsloth import FastLanguageModel
from datasets import load_dataset
from transformers.utils import strtobool
from trl import SFTTrainer, SFTConfig
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
import logging
logging.getLogger("hf-to-gguf").setLevel(logging.WARNING)
# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = args.model_name,
max_seq_length = args.max_seq_length,
dtype = args.dtype,
load_in_4bit = args.load_in_4bit,
)
# Configure PEFT model
model = FastLanguageModel.get_peft_model(
model,
r = args.r,
target_modules = [
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj",
],
lora_alpha = args.lora_alpha,
lora_dropout = args.lora_dropout,
bias = args.bias,
use_gradient_checkpointing = args.use_gradient_checkpointing,
random_state = args.random_state,
use_rslora = args.use_rslora,
loftq_config = args.loftq_config,
)
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
instructions = examples["instruction"]
inputs = examples["input"]
outputs = examples["output"]
texts = []
for instruction, input, output in zip(instructions, inputs, outputs):
text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
texts.append(text)
return {"text": texts}
use_modelscope = strtobool(os.environ.get("UNSLOTH_USE_MODELSCOPE", "False"))
if use_modelscope:
from modelscope import MsDataset
dataset = MsDataset.load(args.dataset, split = "train")
else:
# Load and format dataset
dataset = load_dataset(args.dataset, split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True)
print("Data is formatted and ready!")
# Configure training arguments
training_args = SFTConfig(
per_device_train_batch_size = args.per_device_train_batch_size,
gradient_accumulation_steps = args.gradient_accumulation_steps,
warmup_steps = args.warmup_steps,
max_steps = args.max_steps,
learning_rate = args.learning_rate,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = args.logging_steps,
optim = args.optim,
weight_decay = args.weight_decay,
lr_scheduler_type = args.lr_scheduler_type,
seed = args.seed,
output_dir = args.output_dir,
report_to = args.report_to,
max_length = args.max_seq_length,
dataset_num_proc = 2,
packing = False,
)
# Initialize trainer
trainer = SFTTrainer(
model = model,
processing_class = tokenizer,
train_dataset = dataset,
args = training_args,
)
# Train model
trainer_stats = trainer.train()
# Save model
if args.save_model:
# if args.quantization_method is a list, we will save the model for each quantization method
if args.save_gguf:
if isinstance(args.quantization, list):
for quantization_method in args.quantization:
print(
f"Saving model with quantization method: {quantization_method}"
)
model.save_pretrained_gguf(
args.save_path,
tokenizer,
quantization_method = quantization_method,
)
if args.push_model:
model.push_to_hub_gguf(
hub_path = args.hub_path,
hub_token = args.hub_token,
quantization_method = quantization_method,
)
else:
print(f"Saving model with quantization method: {args.quantization}")
model.save_pretrained_gguf(
args.save_path, tokenizer, quantization_method = args.quantization
)
if args.push_model:
model.push_to_hub_gguf(
hub_path = args.hub_path,
hub_token = args.hub_token,
quantization_method = quantization_method,
)
else:
model.save_pretrained_merged(args.save_path, tokenizer, args.save_method)
if args.push_model:
model.push_to_hub_merged(args.save_path, tokenizer, args.hub_token)
else:
print("Warning: The model is not saved!")
if __name__ == "__main__":
# Define argument parser
parser = argparse.ArgumentParser(
description = "🦥 Fine-tune your llm faster using unsloth!"
)
model_group = parser.add_argument_group("🤖 Model Options")
model_group.add_argument(
"--model_name",
type = str,
default = "unsloth/llama-3-8b",
help = "Model name to load",
)
model_group.add_argument(
"--max_seq_length",
type = int,
default = 2048,
help = "Maximum sequence length, default is 2048. We auto support RoPE Scaling internally!",
)
model_group.add_argument(
"--dtype",
type = str,
default = None,
help = "Data type for model (None for auto detection)",
)
model_group.add_argument(
"--load_in_4bit",
action = "store_true",
help = "Use 4bit quantization to reduce memory usage",
)
model_group.add_argument(
"--dataset",
type = str,
default = "yahma/alpaca-cleaned",
help = "Huggingface dataset to use for training",
)
lora_group = parser.add_argument_group(
"🧠 LoRA Options", "These options are used to configure the LoRA model."
)
lora_group.add_argument(
"--r",
type = int,
default = 16,
help = "Rank for Lora model, default is 16. (common values: 8, 16, 32, 64, 128)",
)
lora_group.add_argument(
"--lora_alpha",
type = int,
default = 16,
help = "LoRA alpha parameter, default is 16. (common values: 8, 16, 32, 64, 128)",
)
lora_group.add_argument(
"--lora_dropout",
type = float,
default = 0.0,
help = "LoRA dropout rate, default is 0.0 which is optimized.",
)
lora_group.add_argument(
"--bias", type = str, default = "none", help = "Bias setting for LoRA"
)
lora_group.add_argument(
"--use_gradient_checkpointing",
type = str,
default = "unsloth",
help = "Use gradient checkpointing",
)
lora_group.add_argument(
"--random_state",
type = int,
default = 3407,
help = "Random state for reproducibility, default is 3407.",
)
lora_group.add_argument(
"--use_rslora", action = "store_true", help = "Use rank stabilized LoRA"
)
lora_group.add_argument(
"--loftq_config", type = str, default = None, help = "Configuration for LoftQ"
)
training_group = parser.add_argument_group("🎓 Training Options")
training_group.add_argument(
"--per_device_train_batch_size",
type = int,
default = 2,
help = "Batch size per device during training, default is 2.",
)
training_group.add_argument(
"--gradient_accumulation_steps",
type = int,
default = 4,
help = "Number of gradient accumulation steps, default is 4.",
)
training_group.add_argument(
"--warmup_steps",
type = int,
default = 5,
help = "Number of warmup steps, default is 5.",
)
training_group.add_argument(
"--max_steps", type = int, default = 400, help = "Maximum number of training steps."
)
training_group.add_argument(
"--learning_rate",
type = float,
default = 2e-4,
help = "Learning rate, default is 2e-4.",
)
training_group.add_argument(
"--optim", type = str, default = "adamw_8bit", help = "Optimizer type."
)
training_group.add_argument(
"--weight_decay",
type = float,
default = 0.01,
help = "Weight decay, default is 0.01.",
)
training_group.add_argument(
"--lr_scheduler_type",
type = str,
default = "linear",
help = "Learning rate scheduler type, default is 'linear'.",
)
training_group.add_argument(
"--seed",
type = int,
default = 3407,
help = "Seed for reproducibility, default is 3407.",
)
# Report/Logging arguments
report_group = parser.add_argument_group("📊 Report Options")
report_group.add_argument(
"--report_to",
type = str,
default = "tensorboard",
choices = [
"azure_ml",
"clearml",
"codecarbon",
"comet_ml",
"dagshub",
"dvclive",
"flyte",
"mlflow",
"neptune",
"tensorboard",
"wandb",
"all",
"none",
],
help = "The list of integrations to report the results and logs to. Supported platforms are: \n\t\t 'azure_ml', 'clearml', 'codecarbon', 'comet_ml', 'dagshub', 'dvclive', 'flyte', 'mlflow', 'neptune', 'tensorboard', and 'wandb'. Use 'all' to report to all integrations installed, 'none' for no integrations.",
)
report_group.add_argument(
"--logging_steps", type = int, default = 1, help = "Logging steps, default is 1"
)
# Saving and pushing arguments
save_group = parser.add_argument_group("💾 Save Model Options")
save_group.add_argument(
"--output_dir", type = str, default = "outputs", help = "Output directory"
)
save_group.add_argument(
"--save_model", action = "store_true", help = "Save the model after training"
)
save_group.add_argument(
"--save_method",
type = str,
default = "merged_16bit",
choices = ["merged_16bit", "merged_4bit", "lora"],
help = "Save method for the model, default is 'merged_16bit'",
)
save_group.add_argument(
"--save_gguf",
action = "store_true",
help = "Convert the model to GGUF after training",
)
save_group.add_argument(
"--save_path", type = str, default = "model", help = "Path to save the model"
)
save_group.add_argument(
"--quantization",
type = str,
default = "q8_0",
nargs = "+",
help = "Quantization method for saving the model. common values ('f16', 'q4_k_m', 'q8_0'), Check our wiki for all quantization methods https://github.com/unslothai/unsloth/wiki#saving-to-gguf ",
)
push_group = parser.add_argument_group("🚀 Push Model Options")
push_group.add_argument(
"--push_model",
action = "store_true",
help = "Push the model to Hugging Face hub after training",
)
push_group.add_argument(
"--push_gguf",
action = "store_true",
help = "Push the model as GGUF to Hugging Face hub after training",
)
push_group.add_argument(
"--hub_path",
type = str,
default = "hf/model",
help = "Path on Hugging Face hub to push the model",
)
push_group.add_argument(
"--hub_token", type = str, help = "Token for pushing the model to Hugging Face hub"
)
args = parser.parse_args()
run(args)