README.md

Nunchaku SANA Models

python run_gradio.py

By default, the Gemma-2B model is loaded as a safety checker. To disable this feature and save GPU memory, use --no-safety-checker.
By default, only the INT4 DiT is loaded. Use -p int4 bf16 to add a BF16 DiT for side-by-side comparison, or -p bf16 to load only the BF16 model.

We provide a script, generate.py, that generates an image from a text prompt directly from the command line, similar to the demo. Simply run:

python generate.py --prompt "You Text Prompt"

The generated image will be saved as output.png by default. You can specify a different path using the -o or --output-path options.
By default, the script uses our INT4 model. To use the BF16 model instead, specify -p bf16.
You can adjust the number of inference steps and classifier-free guidance scale with -t and -g, respectively. The defaults are 20 steps and a guidance scale of 5.
In addition to the classifier-free guidance, you can also adjust the PAG guidance scale with --pag-scale. The default is 2.

To measure the latency of our INT4 models, use the following command:

python latency.py

Adjust the number of inference steps and the guidance scale using -t and -g, respectively. The defaults are 20 steps and a guidance scale of 5.
You can also adjust the PAG guidance scale with --pag-scale. The default is 2.
By default, the script measures the end-to-end latency for generating a single image. To measure the latency of a single DiT forward step instead, use the --mode step flag.
Specify the number of warmup and test runs using --warmup-times and --test-times. The defaults are 2 warmup runs and 10 test runs.