code for 事前学習済み言語モデルによるエンティティの概念化(NLP2023)
This Docker repository is utilized for constructing the environment.
The dataset for our experiments is available at url.
Please download the file "reproduction_data_NLP2023.zip" from Google Drive into your designated data directory, and then proceed to unzip the file.
The repository's result
directory already includes the results of the experiment, which makes reproducing the visualization easy.
Run experiments according to visualization_reproduction.ipynb.
You can get the embeddings from BERT by executing the following code:
bash ./shell_file/get_embeddings_formBERT.sh
※ Please ensure that the data_dir_path
in get_embeddings_formBERT.sh
corresponds to the directory where you have extracted the 'data.zip' file.
The condensation rate can be calculated to measure the degree of separation for each cluster. You can run the program below to calculate the condensation rate and get the result.
bash ./shell_file/cal_condensation_ratio.sh
※ Please ensure that the data_dir_path
in cal_condensation_ratio.sh
corresponds to the directory where you have extracted the 'data.zip' file.