-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathrecord.txt
160 lines (130 loc) · 8.07 KB
/
record.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
siamese_train original train file /without load .pth/clear facial features, but blurry facial contour
siamese_train_2 (load 0620-10w.pth) /give L1 loss a bigger weight(coefficient)
siamese_train_3 (load-0620-10w.pth) /total loss is l1 loss/only train L1
siamese_train_M_1 (load-0620-10w.pth) /output:siamese_output_M_1/add mask_20/aim at make the training be more sensitive to facial contour/total loss is L1 loss
siamese_train_M_2 (load-train-3-28000.pth) /output:siamese_output_M_2/add mask_20/use L2 instead of L1/freeze G-net's encoder part
siamese_train_M_3 (load-train-3-28000.pth) /output:siamese_output_M_3/add mask_20/still use L1/freeze G-net's encoder part
-----------------------------------------------
siamses_GAN_1 (load:31600-base on “siamese_M_2” ) /output:siamese_GAN_1/add Discriminator(linear FC as head of D-net)/freeze G's decoder/train G's encoder
siamese_GAN_2 (load:31600-base on “siamese_M_2” ) /output:siamese_GAN_2/add Discriminator(use conv layer with bigger convolution kernel as the head of D-net/freeze G's decoder/train G's encoder
siamese_GAN_SR (load:31600-base on “siamese_M_2”) /output:siamese_GAN_SR/add Discriminator(use SR-GAN's Dnet model, and add WGANs tricks)/freeze G's decoder part/train G's encoder part
WGANs tricks:
1.remove D's last layer's sigmoid
2.get rid of logits in loss function
3.use 'torch.clamp' to clip the weigths to some [-0.01, 0.01]
4.Use optimizer like RMSprop instead of Adam
result:
for '1': D-net is too good to train G-net, after 5w iteration: errG_loss = 3.92 errD_real/fake = 0.02/0.03
for '2': it seems that errD_real/fake and errG are both not bad, but modal collapse, lots of artifacts and noise point, no paticular graph
for 'SR+wGAN': sth wrong with the self-made loss function, losses went down below 0. Also, model collapse
It's hard to train GANs even with WGANs instruction.
-----------------------------------------------
start from the begining(use linear version)
7.9
siamese_GAN_1_2 0.001-10-10-10-10-10-1 keep the result
result: could generate recognizable facial feature
-----------------------------------------------
7.10
siamese_GAN_1 loss:1-1-1-1-1-1-100
result: D's loss went down so fast, from fake/real-0.7/0.7 to fake/real-0.1/0.04
as for G,gan_loss grown up from 0.7 to around 3.2
Might the gradient is lead by L1, which has a bigger coefficient, the ability of generate real image lack of enouh attention, much slower than the growth speed of D-net
-----------------------------------------------
7.10
siamese_GAN_1_3 0.001-10-10-10-10-10-1 keep training(load from yesterday's weights' file)
result: after 5w iter, G_loss grow up from 0.06 to 0.5, but still has much clear human facial features
Guess: might the supervise information is lead by the L1 loss much more than Gan_loss
-----------------------------------------------
7.11
aim at minimize the ratio between L1 and Gan_loss
siamese_GAN_1 10gan-1-1-1-1-20L1
result: not so good, model collapse, gan's coefficient is too big. Tends to generate the same imgs with lots of artifacts and noise, no clear facial features
[1/50000] errD_real: 0.6489
[1/50000] errD_fake: 0.7868
[1/50000] errS_GAN: 0.6380
[50000/50000] errD_real: 0.1246
errD_fake: 0.1581
errS_GAN: 2.0272
Guess: try make some attempts based on GAN_1_2, reduce the D-nets' update frequence tomorrow
-----------------------------------------------
7.12
siamese_GAN_11_2 (load:GAN_1_3_47200pth) /freeze D-net/add mask to train L1/increase gan's coefficient
result: lots of artifacts
Guess: freeze is a bad idea, D's loss is really low doesn't mean it has ability to recognize real or fake image. Poor G-net can lead D-net growing in a wrong way
Try load from GAN_1_2, and increase gan's coeff
-----------------------------------------------
7.13
siamese_GAN_1 (load GAN_1_2_11400pth) decrease D's update freq(from 6iter to 8iter) and D's learning rate:0.00001
result: net-G didn't convergence
[15222/50000] errD_real: 0.4605
errD_fake: 0.8116
errS_GAN: 0.6554
7.13-2
siames_GAN_12 (load GAN_1_2_11400pth) decrease D's update freq(from 6iter to 8iter) and D's learning rate: 0.000001
result: [15268/50000] errD_real: 0.2872605
errD_fake: 1.2870417
errS_GAN: 0.4203835
Guess: D is already good enough, decrease lr is useless when we load from previous model, model collapse
IDEA: G-net and D-net still have to be in the same phase, if we train D to have a better performance, might use a poor quality image generate by poor G-net, which the decrease of D's loss didn't make any sense. Then we want to rectify it becomes impossible
-----------------------------------------------
7.14
give up load model, the coeffienct can't change suddenly.The gradient map will change totally at the same time.
siamese_GAN_1 /lower the contrastive_I loss, let the training pay more attention on gan/print loss every 1k iter/0.1gan-1-1-1-1-0.1contraI-1
7.14-2
siamese_GAN_13 /same coefficient/try He&Xiaver initialization/figure out if the problem is the bad performance of Gaussian random
result: He/Xavier didn't gave much progress, model collapse for both 2 models
-----------------------------------------------
7.15
!!modify the model, give up the residual pth in the G-net
siamese_GAN_1 5gan-1-1-1-1-1-1 iterD:4:6 s_lr = 0.02
siamese_GAN_13 1-1-1-1-1-1-1 iterD:4:6 s_lr = 0.02
result: gan_loss went up rapidly
-----------------------------------------------
7.15
siamese_GAN_1 2gan-1-1-1-1-1-1 iterD:2:4 s_lr = 0.02
result: the bad performance is not caused by the poor D
siamese_GAN_13 0.1gan-1-1-1-1-1-1 iterD:4:6 s_lr = 0.02
result: keep! we have some 'real' pic's feature, but pose and light is quite confused
-----------------------------------------------
7.16
when I look at the printed loss, we know the encoder part did really good job(loss went down obviously), but the generated img still loss pose and light information, the problem might caused by the decoder part
So, increase the supervise information(L1 and GAN)
siamese_GAN_16 0.1gan-1-1-1-1-1-5L1 iterD:4:6 s_lr = 0.02
siamese_GAN_16_1 0.1gan-1-1-1-1-0.5contraI-5L1 iterD:4:6 s_lr = 0.02
result: siamese_GAN_16 (keep),siamese_GAN_16_1(give up) has some improve, but the pose and light information are still mess
------------------------------------------------
7.17
use frontal face image as D-net's real image supervise information
siamese_GAN_17 1gan-1-1-1-1-1-5L1 iterD:4:6 s_lr = 0.02
increase gan(stable or not)
siamese_GAN_17_1 0.5gan-1-1-1-1-1-1 iterD:4:6 s_lr = 0.02
result: unstable, D went down really fast. model collaps
-------------------------------------------------
7.18
siamese_GAN_18
(based on the good performace of"siamese_GAN_16":0.1gan-1-1-1-1-1-5L1,but give light loss more weights)
0.1gan-1-1-1-1-1-5classL-5L1
result: fail, still lose light information
------------------------------------------------
7.19
siamese_GAN_19_1 (based on the good performace of"siamese_GAN_16":0.1gan-1-1-1-1-1-5L1)
0.1gan-1-1-1-1-1-5L1
BUT the GT use the standard light frontal face image(for D-net)
Result: the generate img before 19800 iter are pretty good!(keep)
but after 19800 iter, model become unstable, model collapse.
-----------------------------------------------
7.20
siamese_GAN_20
based on the siamese_GAN_19's good performance, modify the model with WGANs instruction, and tuning from the beginning
train_theta 1-1-1 iterD:4:6 s_lr = 0.02
train_theta_1 0.1gan-1-1 iterD:4:6 s_lr = 0.02
result: bad, nothing make sense
----------------------------------------------
7.21
according to the ZHIHU, give the supervise mission to D-net. Cause D-net, basically, is the discriminator network which responsible for the supervise problem.
modify the model...
----------------------------------------------
7.27
G-net only responsible for gan_loss, D-net responsible for ligh/pose/identity/gan loss
because we give the supervise work to D-net, we don't need G-net to be siamese network anymore, contrastive loss neither.
give all the loss same coefficient, then the performance is good enough