Skip to content

An adversarial algorithm for generating super resolution of images

Notifications You must be signed in to change notification settings

diningeachox/Image-Super-Resolution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

Image-Super-Resolution

An adversarial algorithm for generating super resolution (SR) of images.

Problem Description

When a high resolution image is resized (perhaps because of space restrictions), some information is invariably lost. This results the image to look blurry. There are many algorithms designed to recover the lost information. Even though not every pixel can be restored, these algorithms seeks a very good approximation of the original high resolution image by "filling in the gaps" in some way and this process is called super-resolution.

This project is a pytorch reproduction of the adversarial algorithm called SRGAN: https://arxiv.org/pdf/1609.04802.pdf by Christian Ledig et al.

Algorithm description

The algorithm works in two stages:

  1. An network G with a Resnet architecture takes in low-resolution (LR) images with size w x h and outputs images of size 4w x 4h. Is is then trained against the high-resolution (HR) training images (also with sie 4w x 4h) with pixel-wise mean-squared error (MSE) loss. The model at this stage is called SRResnet.
  2. The SRResnet model G is taken as the generator in an adversarial model, which also has a discriminator network D. Then these two networks play the usual 2-player minimax game

eq1

where the generator Gtries to generate images convicing enough to "fool" the discriminator D, and D tries to maximize correct identifications of real images from "fake" images generated by G.

The second term of the equation serves as the adversarial loss for G, but due to gradient vanishing reasons, we minimize -log(D(G(z))) instead. This loss is denoted by g_gan_loss in the code. The MSE loss used in stage 1) with SRResnet is also used here to train pixel similarity with the real HR images, in the code this is denoted by mse_loss.

In addition, Christian Ledig et al. introduce a perceptual loss. The authors found that methods using solely MSE loss for pixel similarity resulted in generated images that are too smooth to be photorealistic. This would suggest that this method somehow misses the "bigger picture" by trying to maximize per-pixel similarity. Therefore the authors suggest that trying to train for feature similarity is also an important component of the model. This is done by feeding forward the generated and real images into a pretrained VGG19 model and comparing the features in an intermediate layer (this is taken to be the activation layer just before the 4th maxpool layer). In the code this is denoted by vgg_loss.

Finally the loss being trained by G is mse_loss + 0.006 * vgg_loss + 0.001 * g_gan_loss. The coefficients are there to scale the three losses to a similar range so that none of them dominant the training process.

This model is named SRGAN (VGG_54).

Dataset

I trained the SRGAN model on the DIV2K dataset. This dataset was used in the competitions CVPR 2017 and CVPR 2018. It contains 800 HR images for trianing and 100 HR/LR images for validation.

Training Parameters

I trained the model on Google Colab, which has a GPU storage limit of 15GB. Therefore I used the batch size 2 for both stages of training. Then I followed the paper's training specifications:

  • 1e-4 learning rate for training SRResnet, trained for 25 epochs. (We did not train for the full length as in the paper because the model already started producing realistic looking images and this should be able to avoid local minima when used as a starting point for training SRGAN).
  • 1e-4 learning rate for the first 10^5 iterations (250 epochs in our case) and 1e-5 learning rate for the remaining 10^5 iterations (250 epochs) for training SRGAN.
  • LR images are scaled to [0, 1] whereas HR images are scaled to [-1, 1].
  • The optimizer used is Adam
  • Training is alternated betwen the discriminator and the generator.

Results

I've included several examples of super-resolution below, comparing the models SRResnet and SRGAN with the real HR image.

LR SRGAN HR
001 000 002
004 003 005
007 006 008

Training Tips

Since the entire model is on colab, you will not need any pre-installed software to train. However I noted the following things to be aware of:

  • Training will take a very long time with this dataset since the images are large. Therefore I suggest saving the model after every 50 or 100 epochs. This will also get around the 12-hour time limit on colab.
  • Be sure to save both the generator and the discriminator model in the SRGAN stage.
  • You can save the models in your Google drive you so don't waste time downloading and uploading models.
  • The code saves a generated image after every epoch in the folders g_image_init and g_image, so you can monitor your training progress.

About

An adversarial algorithm for generating super resolution of images

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published