v1.2.1 - free lunch edition
Features
This release will speed up all validations without any config changes.
- SageAttention (NVIDIA-only; must be installed manually for now)
- By default, only speeds up inference. SDXL more than Flux due to differences in their respective bottlenecks.
- Use
--attention_mechanism=sageattention
to enable this, and--sageattention_usage=training+inference
to enable it for training as well as validations. This will probably make your model worse or collapse though.
- Optimised
--gradient_checkpointing
implementation- No longer applies during validations, so even without SageAttention we get a speedup (on a 4090+5800X3D) from 29 seconds for a Flux image to 15 seconds (SDXL goes from 15 seconds to 6 seconds)
- Added
--gradient_checkpointing_interval
which you can use to speed up Flux training at the cost of some additional VRAM.- Makes NF4 even more attractive for a 4090, where you can then use the SOAP optimiser in a meaningful way.
- See the options guide for more information.
What's Changed
- Add SageAttention for substantial training speed-up by @bghira in #1182
- SageAttention: make it inference-only by default by @bghira in #1183
- gradient checkpointing speed-up by @bghira in #1184
- add gradient checkpointing option to docs by @bghira in #1185
- merge by @bghira in #1186
Full Changelog: v1.2...v1.2.1