DNC-tensorflow/tasks/copy at master · Mostafa-Samir/DNC-tensorflow

History

Name		Name	Last commit message	Last commit date
parent directory ..
checkpoints		checkpoints
README.md		README.md
dnc		dnc
feedforward_controller.py		feedforward_controller.py
train-series.py		train-series.py
train.py		train.py
visualization.ipynb		visualization.ipynb

README.md

Common Settings

Both series and single models were trained on 2-layer feedforward controller (with hidden sizes 128 and 256 respectively) with ReLU activations, and both share the following set of hyperparameters:

RMSProp Optimizer with learning rate of 10⁻⁴, momentum of 0.9.
Memory word size of 10, with a single read head.
Controller weights are initialized from samples 1 standard-deviation away from a zero mean normal distribution with a variance , where is the size of the input vector coming into the weight matrix.
A batch size of 1.

All output from the DNC is squashed between 0 and 1 using a sigmoid functions and binary cross-entropy loss (or logistic loss) function of the form:

is used. That is the mean of the logistic loss across the batch, time steps, and output size.

All gradients are clipped between -10 and 10.

Possible NaNs could occur during training!

Series Training

The model was first trained on a length-2 series of random binary vectors of size 6. Then starting off from the length-2 learned model, a length-4 model was trained in a curriculum learning fashion.

The following plots show the learning curves for the length-2 and length-4 models respectively.

Attempting to train a length-4 model directly always resulted in NaNs. The paper mentioned using curriculum learning for the graph and mini-SHRDLU tasks, but it did not mention any thing about the copy task, so there's a possibility that this is not the most efficient method.

Retraining

$python tasks/copy/train-series.py --length=2

Then, assuming that the trained model from that execution is saved under the name 'step-100000'.

$python tasks/copy/train-series.py --length=4 --checkpoint=step-100000 --iterations=20000

Single Training

The model was trained directly on a single input of length between 1 and 10 and the length was chosen randomly at each run, so no curriculum learning was used. The following plot shows the learning curve of the single model.

Retraining

$python tasks/copy/train.py --iterations=50000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

copy

copy

README.md

Common Settings

Series Training

Retraining

Single Training

Retraining

Files

copy

Directory actions

More options

Directory actions

More options

Latest commit

History

copy

Folders and files

parent directory

README.md

Common Settings

Series Training

Retraining

Single Training

Retraining