7/13/2023 new note:
- Training notebooks can be found in those emails:
- quan.nh2002
- abca05786
- hoang.nh0615
- quan.nguyen232.work
Discussion:
- Pivot model has en-fr BLEU 33 (much lower than training separately) because we did not assign the correct weight for the loss (hypothesis). Thus we can try dynamic ensemble loss in the future
NEW TASKS:
- Seq2Seq: sort by src_len and unsort output --> ensure output matches with trg
- Pivot model: ensure it works for
$n$ seq2seq models - Trian model: ensure outputs from all submodels match w/ target sent
- Create conda env with Python 3.10.12
- Install torch 2.0.1:
pip3 install torch==2.0.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
- Install all packages in
requirements.txt
(torch 2.0.1 and torchtext 0.6.0) - Install spacy language packages in
setup_env.py
-
Vocab size
Build on first 64000 sentences of data
EnDeFrItEsPtRo-76k-most5k.pkl
withmin_freq=2
:- en: 6964
- fr: 9703
- es: 10461
- it: 10712
- pt: 10721
- ro: 11989
-
Model config
- Embed_Dim = 256
- Hidden_Dim = 512
- Dropout = 0.5
- Seq2Seq:
seq2seq-EnFr-1.pt
- Pivot:
- es:
piv-EnEsFr.pt
- it:
piv-EnItFr.pt
- pt:
piv-EnPtFr.pt
- ro:
piv-EnRoFr.pt
- es:
- Triang: (combination of trained Seq2Seq & Pivot)
-
Main pipeline:
bentrevett_pytorch_seq2seq.ipynb
-
Datasets:
EnDeFrItEsPtRo-76k-most5k.pkl
-
Data info: train_len, valid_len, test_len = 64000, 3200, 6400
-
Init weights (all models)
def init_weights(m): for name, param in m.named_parameters(): if 'weight' in name: nn.init.normal_(param.data, mean=0, std=0.01) else: nn.init.constant_(param.data, 0) model.apply(init_weights);
-
Load model weights
checkpoint = torch.load('path_to_model/model_name.pt') model.load_state_dict(checkpoint['model_state_dict']) optimizer.load_state_dict(checkpoint['optimizer_state_dict']) scheduler.load_state_dict(checkpoint['scheduler_state_dict'])
-
Learning rate
- Seq2Seq: start w/
$0.0012$ , reduced by$\frac{2}{3}$ every epoch - Pivot: start w/
$0.0012$ , reduced by$\frac{2}{3}$ at epoch 3rd, 6th, 8th, 9th, 10th.
- Seq2Seq: start w/
-
Epoch
- Seq2Seq: 7
- Pivot: 11
Future work improvements for dataloader(after having results)
- Replace Field, BucketIterator with those:
- Use
build_vocab_from_iterator
(example tutorial) - torchtext.vocab
- torchtext tutorial general
- Use
EmbeddingBag
withoffsets
can replaceEmbedding
andsent_len
is OPTIONAL since usingpack_padded_sequence
also reduces padding.
- Implement based on "Attention is all you need"
- github most popular: jadore801120/attention-is-all-you-need-pytorch
- kaggle implementation by FERNANDES
- another fine github repo jayparks/transformer
- a basic explanation of some concepts on towardsdatascience
- check VietAI code