Skip to content

XLNet Multi-class Text Classification for Chinese and English (available for both multi-label and single-label classification)

Notifications You must be signed in to change notification settings

angel870326/XLNet_Text_Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

XLNet Multi-class Text Classification for Chinese and English

  • Available for both multi-label and single-label classification.
  • Loss functions used: CrossEntropyLoss & BCEWithLogitsLoss
  • Note: Example here is using Chinese pre-trained model. English pre-trained model is commented out.
  • Codes are modified from here.

User Guide

  1. To evaluate our model, we first split the training dataset into training and testing part.
    train_test_split_v2.ipynb
  2. Use the training part to train the model and the testing part to evaluate the model. (Do any modification until you get a satisfying score.)
    • For single-label testing, you may use "Cross Entropy Loss Function".
      product/xlnet_multi_class_chinese_train_product_single_label_ce_oversample.ipynb
    • For multi-label testing, you may use "Binary Cross Entropy with Logits Loss Function".
      product/xlnet_multi_class_chinese_train_product_single_label_bcel.ipynb
  3. Use the complete training dataset to train the model, and predict classes of a new dataset (answer unknown).
    xlnet_multi_class_chinese_brand_single_label_bcel.ipynb

Reference

1. Official XLNet

2. Hugging Face Transformers

!pip install transformers
!pip install sentencepiece

tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased', do_lower_case=True)
model = XLNetModel.from_pretrained('xlnet-base-cased')

3. Multi-label Text Classification (important)

4. XLNet Chinese Pre-trained Model (中文預訓練模型)

可以直接透過 transformers 使用

!pip install transformers
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("MODEL_NAME")
model = AutoModel.from_pretrained("MODEL_NAME")
模型名 MODEL_NAME
XLNet-mid hfl/chinese-xlnet-mid
XLNet-base hfl/chinese-xlnet-base

Prevent Google Colab from Disconnecting

Right click ➡ Inspect ➡ Console

function ConnectButton(){
    console.log("Connect pushed");
    document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click()
}
setInterval(ConnectButton,60000);

About

XLNet Multi-class Text Classification for Chinese and English (available for both multi-label and single-label classification)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published