Release v1.2.1 - free lunch edition · bghira/SimpleTuner

Features

This release will speed up all validations without any config changes.

SageAttention (NVIDIA-only; must be installed manually for now)
- By default, only speeds up inference. SDXL more than Flux due to differences in their respective bottlenecks.
- Use --attention_mechanism=sageattention to enable this, and --sageattention_usage=training+inference to enable it for training as well as validations. This will probably make your model worse or collapse though.
Optimised --gradient_checkpointing implementation
- No longer applies during validations, so even without SageAttention we get a speedup (on a 4090+5800X3D) from 29 seconds for a Flux image to 15 seconds (SDXL goes from 15 seconds to 6 seconds)
Added --gradient_checkpointing_interval which you can use to speed up Flux training at the cost of some additional VRAM.
- Makes NF4 even more attractive for a 4090, where you can then use the SOAP optimiser in a meaningful way.
- See the options guide for more information.

Full Changelog: v1.2...v1.2.1