-
Notifications
You must be signed in to change notification settings - Fork 6
Reference
Chenghao MOU edited this page Oct 22, 2019
·
1 revision
Models | Size | Category |
---|---|---|
Bert base | 110 M | base |
Bert large | 340 M | large |
openai gpt | 110 M | base |
GPT2 | 117 M | weird large |
XLM | >= 295 M | super large |
XLnet | 110 M | base |
XLNet large | 340 M | large |
roberta | 125 M | base |
roberta large | 355 M | large |
distilbert | 60 M | small |
It is impossible to fit super large models in P100s on HPC. Weird large models are base models eating memory like a large one.
Models | aNLI | hellaswag | piqa | siqa | Config Commit |
---|---|---|---|---|---|
Bert (bert-base-cased) | 63.32 | 37.83 | 65.29 | 60.33 | commit |
Bert (bert-large-cased) | 66.28 | 43.84 | 68.67 | 65 | commit |
RoBERTa (roberta-base) | 71.54 | 58.51 | 48.03 | 69.09 | commit |
RoBERTa (roberta-large) | 84.39 | 82.42 | 76.96 | 77.12 | commit |
XLNet (xlnet-base-cased) | 68.15 | 52.99 | 52.94 | 65.79 | commit |
XLNet (xlnet-large-cased) | 80.16 | 80.38 | 69.27 | 75.23 | commit |
GPT (openai-gpt) | 64.23 | 38.15 | 67.11 | 61.73 | commit |
GPT2 (gpt2) | 53.46 | 26.52 | 48.05 | 35.16 | commit |
DistilBERT (distilbert-base-uncased) | 60.17 | 35.57 | 64.96 | 52.92 | commit |
With two P100s on HPC, it takes the following time to fine tune a model.
Tasks | Base Model(3 epochs) | Large Model(3 epochs) |
---|---|---|
aNLI | 1 ~ 2 hrs | ~ 7 hrs |
hellaswag | 6 ~ 8 hrs | 24 hrs |
physicaliqa | 1 hr | 3 ~ 4 hrs |
socialiqa | 1 hr | 4 ~ 5 hrs |