by Tin Nguyen, Logan Bolton, Mohammad R. Taesiri, and Anh Nguyen.
python=3.10.15
google-generativeai==0.8.3
openai==1.58.1
Run the following command to execute the script:
python main.py --save_answer --llm_model "$llm_model" --dataset "$dataset" --answer_mode "$run_mode" --data_mode "$data_mode"
--llm_model
: Defines the LLM model to use. Choices include:gemini-1.5-pro-002
,gemini-1.5-flash-002
,gpt-4o-2024-08-06
llama_8b
,llama_70b
,llama_sambanova_405b
qwen25_coder_32b
,qwq_32b
,deepseek_r1
--dataset
: Specifies the dataset to evaluate, such as:GSM8K
,AQUA
,DROP
--answer_mode"
: Determines the answering strategy:cot
: Chain-of-Thought promptinghot
: Highlight Chain-of-Thought prompting
--data_mode
:random
: Runs the model on 200 randomly selected samples.longest
: Runs the model on 200 longest samples.shortest
: Runs the model on 200 shortest samples.full
: Runs the model on the whole dataset.
python main.py --save_answer --llm_model "gpt-4o-2024-08-06" --dataset "GSM8K" --answer_mode "cot" --data_mode random
Run the following command to evaluate the results:
python evaluate.py --llm_model "$llm_model" --dataset "$dataset" --answer_mode "$answer_mode" --data_mode "$data_mode"
python evaluate.py --llm_model "gpt-4o-2024-08-06" --dataset "GSM8K" --answer_mode "cot" --data_mode longest
Run the following command to render the result on html pages:
python visualize.py --llm_model "$llm_model" --dataset "$dataset" --answer_mode "$answer_mode" --save_html
python visualize.py --llm_model "gpt-4o-2024-08-06" --dataset "GSM8K" --answer_mode "cot" --data_mode --save_html
MIT
If you use this for your research, please cite:
@article{nguyen2025hot,
title={HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs},
author={Nguyen, Tin and Bolton, Logan and Taesiri, Mohammad Reza and Nguyen, Anh Totti},
journal={arXiv preprint arXiv:2503.02003},
year={2025}
}