-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is the paper reproducible? #25
Comments
Hi Andy, you maybe able to help with fresh eyes. the main branch has an error in the model. this branch has critical fixes - #26 ![]() I did all this extra work to get the codebase to train via stylegan2 - ada / ema - but according to author - this is not necessary. there's no noise. if I had known this before - it could have saved me a 1000 test runs - but you live and learn. when the training finishes the first movie - it falls over and training stops. Following this - I looked at codebase and attempted to bypass gan training altogether - ( stylegan training would be necessary when introducing the token manipulating Line 358 in 9009dea
you will see here - this has both gan - and another version - with no discriminator. just optimise to lpips perceptual loss. https://github.com/johndpope/IMF/blob/laced-yogurt-1708/train.py the crazy thing is - (either way when i train gan or no_gan. when i train on the one video - it cycles 5 loops - and quality goes high up - but when video changes - model collapses - and there's no gradient flow and no new images can be reproduced. I rebuilt this paper at least 2 times - I'm not convinced redoing it with rosalinty stylegan2 codebase will fix things. @tanshuai0219 has been looking to recreate this paper using LIA. side note -
|
so i throw out a week or twos work and switch back some foundational code the other branches kept falling in a hole after the video changed. I don't know why - but anyway - this is working (for now). The other problem i had was after about 60 epochs - the model has been disintergrating.... BUT - i update the mixed precision to be off - this is the latest run here - UPDATE thought it was broken (media 10) - but the next frame working 12.... I've seen runs after a few hours go down hill.... UPDATE in that branch - I restore another file resblocks.py (the imports are not included) these residual network blocks kind of EXACTLY match the document / specifications from Microsoft. |
hi, based on the current result. I still feel like not working properly. The reconstructed is not based on the driving source but more likely to be based on the source image. The difference is very little and seems like if the difference between the source and driving is large, then will be more blurry. Recently I am working on TPS, and TPS can train very fast for the explicit motion change given the driving image, though could be a little bit blurry for some facial regions. Also, I am not sure if the batch size is the thing that matters here. In general, StyleGAN is a model with at least 32 batch size for training. |
i mostly agree - it's using the reference image - then interpretting the current image as a tiny compressed version 32 bytes - and then recreating that current image. its more of a codec compression - only you can then hotswap in different latent codes. I have it with batch size 2 - training 3090 - UPDATE - have a look at the supplementary material of paper - they showcase different latents being swapped in. UPDATE - blows up - 64 onwards.... it must be a resnet thing.....i dont know where - i dont know why.... fyi - I introduce a script run.sh that commits changes to align with the wandb test / which records the git commit - it's kinda of atomic transaction so the test can be recreated...this was from experience where i would get a good run - and attempt to redo - and it would fail. echo "Clearing __pycache__"
rm -rf __pycache__ then if you do a git checkout - it should align to the test run. (but sometimes it doesn't - dont know if its my gpu failing me?) UPDATE there was some crazy thing going on in the train.py with the losses |
Good news - I finally found problem with my videodataset where the source images were blank - so some batches were contaminating the training - that would help explain the above random red / blue / black images propagating.... https://wandb.ai/snoozie/IMF/runs/zvj8lbuu?nw=nwusersnoozie im stilling seeing some defects in the eyes - i read this can be side effect of the loss from gan. from paper it's set to 10 - perceptual / 10 pixel loss / 1 gan loss. to increase quality - can push out the video repeats - understand this is overfitting the model. I had this num_frames set to 200 frames for above test run - it will just repeat frame if there's not enough in mp4.
then add the video editing functionality.... UPDATE I ramp up lpips loss
UPDATE - I cancel training I play around in the resnet branch.... (i think the resblocks updates will make a difference - yet to drop these in.) so reduceonplateau is not the best scheduler for gan - i throw in cosineannealing... and see what happens.
can also turn off gradient clipping - they supposedly dont use.... (this might fall over - yet to successfully do a test run with new resnet code) https://wandb.ai/snoozie/IMF/runs/d8jdtcjh ) UPDATE in the StyledConv - i noticed I was using an inferior modulation - I update to align to stylegan2 - lucidrains. https://wandb.ai/snoozie/IMF/runs/li4m8pc7?nw=nwusersnoozie N.B. UPDATE this normalizing plays with the light gray of image there's some code to unnormalize this when saving image to wandb. but it's sometimes playing up... UPDATE I introduce this branch to help overcome problematic training steps / exploding gradients. https://github.com/johndpope/IMF/tree/fix/profile-step |
fyi - i switch in a bunch of resblocks / convlayers from LIA these directly align with comments made here by HoloGerry hologerry/IMF#4 the ModulatedConv2d/StyledConv inside resblocks.py comes from pytorch stylegan2 - lucidrains - my wandb is not correctly sampling images (not sure why) - but after a few minutes - seeing promising results.. there's also some leakyrelufused code that's been added (from lia code). UPDATE - may have fixed the sampling here....testing now UPDATE - Sep 2 this looks encouraging at 26 - https://wandb.ai/snoozie/IMF/runs/01vvfows?nw=nwusersnoozie |
i think from my testing - the results are actually mint - needs more training / compute / different videos.... I close this ticket - and open this other one around mode collapse / blue recreated image... also let me know what you want to see next was think of doing the 3dmm alignment - maybe able to hot wire this into the training https://github.com/PowerHouseMan/ComfyUI-AdvancedLivePortrait but actually kinda excited to play with sam2 in conjunction with this codebase / architecture. UPDATE - was seeing the mode collapse issue this morning UPDATE - September 4th OK - this is the latest training - looking good to fire up proper gpu compute now..... |
unless this training blows up - I do the token editing once we choose what to do.... age / pose / style transfer? kinda interested to make it like Emote - maybe audio ? https://www.youtube.com/watch?v=lR3YwRMuaYQ Just realized the above current video dataset is subset of 100 videos. Not the full 35,000 from celebrating Update UPDATE 6th september when i inspect - sometimes it's (gradually) bang on and it's recreating the face successfully but sometimes - here the middle image - is looking right (reference image also looking right)/ yet the recreated image is completely off - I attempt to override this with a face loss bias - (factoring in eyes / mouth / nose position) it helped me recreating eye loss the other day - to boost some images which i was troubleshooting resnet. it may make sense to do some image flipping in the / image augmentations to speed up convergence for head taking.... the other thinking is redoing frames where there's a sort of miss where it gets the recreation completely wrong.... the existing losses should eventually fix things - but if her head is facing right - recreated image should follow suit....so redo that frame 1000x times to fix it....
( Because in this training above - it's over fitting to the one video - the latent encoder may not be getting the best signature of the reference.... it should perform better when trained across more videos.) UPDATE - this seems fine to me - going to stop training.... i need to figure out what's wrong with 20,000 steps that broke training before.... using that trained (overfitted) checkpoint - I switch back to 1 video and kick off this training... I push some code to redo worst frames during training based off face reconstruction |
Appreciate your hard working, I have checked some of the running samples, I am not quite sure if IMF is reproducible, would you share some more details?
The text was updated successfully, but these errors were encountered: