Representation Learning - RNN, GRU and Mutli-headed attention transformers
This repository contains code to understand the following concepts:
- Implimentations of Transformers, GRU and RNN from scratch
- Evaluate different models by comparing metrics such as perplexity and loss
- Hyperparameter search done for various experiments. Plots added to the notebook
- Demonstrate vansihing gradient problem in RNN, and how GRU is able to solve it