1
simpletuner v1.1 - SOAP optim, faster training with torch compile, FP8 for ROCm users and more
(old.reddit.com)
This is an automated archive made by the Lemmit Bot.
The original was posted on /r/stablediffusion by /u/terminusresearchorg on 2024-10-01 21:02:31+00:00.
Performance
- Improved launch speed for large datasets (>1M samples)
- Improved speed for quantising on CPU
- Optional support for directly quantising on GPU near-instantly (
--quantize_via
)
Compatibility
- SDXL, SD1.5 and SD2.x compatibility with LyCORIS training
- Updated documentation to make multiGPU configuration a bit more obvious.
- Improved support for
torch.compile()
, including automatically disabling it when eg.fp8-quanto
is enabled- Enable via
accelerate config
orconfig/config.env
viaTRAINER_DYNAMO_BACKEND=inductor
- Enable via
- TorchAO for quantisation as an alternative to Optimum Quanto for int8 weight-only quantisation (
int8-torchao
) f8uz-quanto
, a compatibility level for AMD ROCm users to experiment with FP8 training dynamics- Support for multigpu PEFT LoRA training with Quanto enabled (not
fp8-quanto
)- Previously, only LyCORIS would reliably work with quantised multigpu training sessions.
- Ability to quantise models when full-finetuning, without warning or error. Previously, this configuration was blocked. Your mileage may vary, it's an experimental configuration.
Integrations
- Images now get logged to tensorboard (thanks u/anhi)
- FastAPI endpoints for integrations (undocumented)
- "raw" webhook type that sends a large number of HTTP requests containing events, useful for push notification type service
Optims
- SOAP optimiser support
- uses fp32 gradients, nice and accurate but uses more memory than other optims, by default slows down every 10 steps as it preconditions
- New 8bit and 4bit optimiser options from TorchAO (
ao-adamw8bit
,ao-adamw4bit
etc)
Schnell
Recently we discovered that training LyCORIS LoKr on Flux.1 Dev works perfectly fine on Flux.1 Schnell at just 4 steps, and that the problems of transferring it over are specific to LoRA.
No special training is needed, other than to just train on Dev instead of Schnell.
The release:
The quickstart:
Some docs have been updated for v1.1, mostly OPTIONS.md
and the FLUX quickstart.