Skip to content

Commit 8170dc3

Browse files
sayakpaula-r-r-o-wyiyixuxu
authored
[WIP][Training] Flux Control LoRA training script (huggingface#10130)
* update * add * update * add control-lora conversion script; make flux loader handle norms; fix rank calculation assumption * control lora updates * remove copied-from * create separate pipelines for flux control * make fix-copies * update docs * add tests * fix * Apply suggestions from code review Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * remove control lora changes * apply suggestions from review * Revert "remove control lora changes" This reverts commit 73cfc51. * update * update * improve log messages * updates. * updates * support register_config. * fix * fix * fix * updates * updates * updates * fix-copies * fix * apply suggestions from review * add tests * remove conversion script; enable on-the-fly conversion * bias -> lora_bias. * fix-copies * peft.py * fix lora conversion * changes Co-authored-by: a-r-r-o-w <contact.aryanvs@gmail.com> * fix-copies * updates for tests * fix * alpha_pattern. * add a test for varied lora ranks and alphas. * revert changes in num_channels_latents = self.transformer.config.in_channels // 8 * revert moe * add a sanity check on unexpected keys when loading norm layers. * contro lora. * fixes * fixes * fixes * tests * reviewer feedback * fix * proper peft version for lora_bias * fix-copies * updates * updates * updates * remove debug code * update docs * integration tests * nis * fuse and unload. * fix * add slices. * more updates. * button up readme * train() * add full fine-tuning version. * fixes * Apply suggestions from code review Co-authored-by: Aryan <aryan@huggingface.co> * set_grads_to_none remove. * readme --------- Co-authored-by: Aryan <aryan@huggingface.co> Co-authored-by: yiyixuxu <yixu310@gmail.com> Co-authored-by: a-r-r-o-w <contact.aryanvs@gmail.com>
1 parent 25f3e91 commit 8170dc3

File tree

4 files changed

+2746
-0
lines changed

4 files changed

+2746
-0
lines changed

examples/flux-control/README.md

+202
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
# Training Flux Control
2+
3+
This (experimental) example shows how to train Control LoRAs with [Flux](https://huggingface.co/black-forest-labs/FLUX.1-dev) by conditioning it with additional structural controls (like depth maps, poses, etc.). We provide a script for full fine-tuning, too, refer to [this section](#full-fine-tuning). To know more about Flux Control family, refer to the following resources:
4+
5+
* [Docs](https://github.com/black-forest-labs/flux/blob/main/docs/structural-conditioning.md) by Black Forest Labs
6+
* Diffusers docs ([1](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#canny-control), [2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#depth-control))
7+
8+
To incorporate additional condition latents, we expand the input features of Flux.1-Dev from 64 to 128. The first 64 channels correspond to the original input latents to be denoised, while the latter 64 channels correspond to control latents. This expansion happens on the `x_embedder` layer, where the combined latents are projected to the expected feature dimension of rest of the network. Inference is performed using the `FluxControlPipeline`.
9+
10+
> [!NOTE]
11+
> **Gated model**
12+
>
13+
> As the model is gated, before using it with diffusers you first need to go to the [FLUX.1 [dev] Hugging Face page](https://huggingface.co/black-forest-labs/FLUX.1-dev), fill in the form and accept the gate. Once you are in, you need to log in so that your system knows you’ve accepted the gate. Use the command below to log in:
14+
15+
```bash
16+
huggingface-cli login
17+
```
18+
19+
The example command below shows how to launch fine-tuning for pose conditions. The dataset ([`raulc0399/open_pose_controlnet`](https://huggingface.co/datasets/raulc0399/open_pose_controlnet)) being used here already has the pose conditions of the original images, so we don't have to compute them.
20+
21+
```bash
22+
accelerate launch train_control_lora_flux.py \
23+
--pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev" \
24+
--dataset_name="raulc0399/open_pose_controlnet" \
25+
--output_dir="pose-control-lora" \
26+
--mixed_precision="bf16" \
27+
--train_batch_size=1 \
28+
--rank=64 \
29+
--gradient_accumulation_steps=4 \
30+
--gradient_checkpointing \
31+
--use_8bit_adam \
32+
--learning_rate=1e-4 \
33+
--report_to="wandb" \
34+
--lr_scheduler="constant" \
35+
--lr_warmup_steps=0 \
36+
--max_train_steps=5000 \
37+
--validation_image="openpose.png" \
38+
--validation_prompt="A couple, 4k photo, highly detailed" \
39+
--seed="0" \
40+
--push_to_hub
41+
```
42+
43+
`openpose.png` comes from [here](https://huggingface.co/Adapter/t2iadapter/resolve/main/openpose.png).
44+
45+
You need to install `diffusers` from the branch of [this PR](https://github.com/huggingface/diffusers/pull/9999). When it's merged, you should install `diffusers` from the `main`.
46+
47+
The training script exposes additional CLI args that might be useful to experiment with:
48+
49+
* `use_lora_bias`: When set, additionally trains the biases of the `lora_B` layer.
50+
* `train_norm_layers`: When set, additionally trains the normalization scales. Takes care of saving and loading.
51+
* `lora_layers`: Specify the layers you want to apply LoRA to. If you specify "all-linear", all the linear layers will be LoRA-attached.
52+
53+
### Training with DeepSpeed
54+
55+
It's possible to train with [DeepSpeed](https://github.com/microsoft/DeepSpeed), specifically leveraging the Zero2 system optimization. To use it, save the following config to an YAML file (feel free to modify as needed):
56+
57+
```yaml
58+
compute_environment: LOCAL_MACHINE
59+
debug: false
60+
deepspeed_config:
61+
gradient_accumulation_steps: 1
62+
gradient_clipping: 1.0
63+
offload_optimizer_device: cpu
64+
offload_param_device: cpu
65+
zero3_init_flag: false
66+
zero_stage: 2
67+
distributed_type: DEEPSPEED
68+
downcast_bf16: 'no'
69+
enable_cpu_affinity: false
70+
machine_rank: 0
71+
main_training_function: main
72+
mixed_precision: bf16
73+
num_machines: 1
74+
num_processes: 1
75+
rdzv_backend: static
76+
same_network: true
77+
tpu_env: []
78+
tpu_use_cluster: false
79+
tpu_use_sudo: false
80+
use_cpu: false
81+
```
82+
83+
And then while launching training, pass the config file:
84+
85+
```bash
86+
accelerate launch --config_file=CONFIG_FILE.yaml ...
87+
```
88+
89+
### Inference
90+
91+
The pose images in our dataset were computed using the [`controlnet_aux`](https://github.com/huggingface/controlnet_aux) library. Let's install it first:
92+
93+
```bash
94+
pip install controlnet_aux
95+
```
96+
97+
And then we are ready:
98+
99+
```py
100+
from controlnet_aux import OpenposeDetector
101+
from diffusers import FluxControlPipeline
102+
from diffusers.utils import load_image
103+
from PIL import Image
104+
import numpy as np
105+
import torch
106+
107+
pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16).to("cuda")
108+
pipe.load_lora_weights("...") # change this.
109+
110+
open_pose = OpenposeDetector.from_pretrained("lllyasviel/Annotators")
111+
112+
# prepare pose condition.
113+
url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/people.jpg"
114+
image = load_image(url)
115+
image = open_pose(image, detect_resolution=512, image_resolution=1024)
116+
image = np.array(image)[:, :, ::-1]
117+
image = Image.fromarray(np.uint8(image))
118+
119+
prompt = "A couple, 4k photo, highly detailed"
120+
121+
gen_images = pipe(
122+
prompt=prompt,
123+
condition_image=image,
124+
num_inference_steps=50,
125+
joint_attention_kwargs={"scale": 0.9},
126+
guidance_scale=25.,
127+
).images[0]
128+
gen_images.save("output.png")
129+
```
130+
131+
## Full fine-tuning
132+
133+
We provide a non-LoRA version of the training script `train_control_flux.py`. Here is an example command:
134+
135+
```bash
136+
accelerate launch --config_file=accelerate_ds2.yaml train_control_flux.py \
137+
--pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev" \
138+
--dataset_name="raulc0399/open_pose_controlnet" \
139+
--output_dir="pose-control" \
140+
--mixed_precision="bf16" \
141+
--train_batch_size=2 \
142+
--dataloader_num_workers=4 \
143+
--gradient_accumulation_steps=4 \
144+
--gradient_checkpointing \
145+
--use_8bit_adam \
146+
--proportion_empty_prompts=0.2 \
147+
--learning_rate=5e-5 \
148+
--adam_weight_decay=1e-4 \
149+
--report_to="wandb" \
150+
--lr_scheduler="cosine" \
151+
--lr_warmup_steps=1000 \
152+
--checkpointing_steps=1000 \
153+
--max_train_steps=10000 \
154+
--validation_steps=200 \
155+
--validation_image "2_pose_1024.jpg" "3_pose_1024.jpg" \
156+
--validation_prompt "two friends sitting by each other enjoying a day at the park, full hd, cinematic" "person enjoying a day at the park, full hd, cinematic" \
157+
--seed="0" \
158+
--push_to_hub
159+
```
160+
161+
Change the `validation_image` and `validation_prompt` as needed.
162+
163+
For inference, this time, we will run:
164+
165+
```py
166+
from controlnet_aux import OpenposeDetector
167+
from diffusers import FluxControlPipeline, FluxTransformer2DModel
168+
from diffusers.utils import load_image
169+
from PIL import Image
170+
import numpy as np
171+
import torch
172+
173+
transformer = FluxTransformer2DModel.from_pretrained("...") # change this.
174+
pipe = FluxControlPipeline.from_pretrained(
175+
"black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
176+
).to("cuda")
177+
178+
open_pose = OpenposeDetector.from_pretrained("lllyasviel/Annotators")
179+
180+
# prepare pose condition.
181+
url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/people.jpg"
182+
image = load_image(url)
183+
image = open_pose(image, detect_resolution=512, image_resolution=1024)
184+
image = np.array(image)[:, :, ::-1]
185+
image = Image.fromarray(np.uint8(image))
186+
187+
prompt = "A couple, 4k photo, highly detailed"
188+
189+
gen_images = pipe(
190+
prompt=prompt,
191+
condition_image=image,
192+
num_inference_steps=50,
193+
guidance_scale=25.,
194+
).images[0]
195+
gen_images.save("output.png")
196+
```
197+
198+
## Things to note
199+
200+
* The scripts provided in this directory are experimental and educational. This means we may have to tweak things around to get good results on a given condition. We believe this is best done with the community 🤗
201+
* The scripts are not memory-optimized but we offload the VAE and the text encoders to CPU when they are not used.
202+
* We can extract LoRAs from the fully fine-tuned model. While we currently don't provide any utilities for that, users are welcome to refer to [this script](https://github.com/Stability-AI/stability-ComfyUI-nodes/blob/master/control_lora_create.py) that provides a similar functionality.
+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
transformers==4.47.0
2+
wandb
3+
torch
4+
torchvision
5+
accelerate==1.2.0
6+
peft>=0.14.0

0 commit comments

Comments
 (0)