Skip to content

Latest commit

 

History

History
39 lines (29 loc) · 2.23 KB

README.md

File metadata and controls

39 lines (29 loc) · 2.23 KB

Variational Autoencoder embedding of Video from Zacks et al., 2006

The project here is to reduce the dimensionality of the video stimuli used by Zacks, Swallow, Vettel, & McAvoy, (2006) [Paper]. This is done by training a variational autoenconder on still frames of the video and then using the low dimensional latent space as our embeddings.

Specifically, an MMD-Variational autoencoder (Zhao, 2017; described in the following Blog Post and Paper) is trained on the set of frames from the six videos in Zacks et al., 2006.

This implementation varies slightly from this pytorch implementation by Shubanshu Mishra. The original tensorflow implmentation by Shengjia Zhao can be found here

Files

  • preprocess_video.py converts the video files into a numpy array and downsamples them from the original dimensions of 240x320x3 per frame to 64x64x3 per frame. Downsampling is done to lower computation cost.
  • pytorch_vae.py runs the VAE on the preprocessed data and outputs video_color_Z_embedded_64.npy, the embedded videos as one long concatenated array and video_color_X_reconstructed_64.npy, which is a numpy array of the reconstructed video
  • samples_batch_33000.png is a sample from the generative distribution of a trained network. This is blurry when compared to the reconstructed video but is useful for assessing training.

Because the data files are relatively large (1GB+), none of the processed data are strored here.

Requirements

The VAE is best run on a machine with an NVIDIA GPU and pytorch installed. In addition numpy and opencv are used to convert the videos into a readable formate. matplotlib is used to generate figures from VAE samples and reconstructed images.

Additional Files

The raw video datasets can be found here: video data