LogicLLaMA: A language model that translates natural-language (NL) statements into first-order logic (FOL) rules. It is trained by fine-tuning the LLaMA-7B model on the MALLS dataset.
MALLS (large language Model generAted natural-Language-to-first-order-Logic pairS): a dataset consists of 34K pairs of real-world natural language (NL) statements and the corresponding first-order logic (FOL) rules annotations. All pairs are generated by prompting GPT-4 and processed to ensure the validity of the FOL rules.
Harnessing the Power of Large Language Models for Natural Language to First-Order Logic Translation
Yuan Yang, Siheng Xiong, Ali Payani, Ehsan Shareghi and Faramarz Fekri
- [10/25/2023] We release the MALLS-v0.1 dataset and the LoRA delta weights for LLaMA2-7B/13B
- [7/16/2023] We release the MALLS dataset (large language Model generAted natural-Language-to-first-order-Logic pairS) which consists of 34K pairs of real-world natural language (NL) statements and the corresponding first-order logic (FOL) rules annotations.
- [7/16/2023] We released the LoRA delta weights for direct translation and naive correction LogicLLaMA
- [7/16/2023] Support int8 loading.
- Colab demo.
- Pipeline for generating synthetically perturbed FOL rules.
- Release weights for RLHF CoT correction LogicLLaMA and the corresponding pipeline.
Datasets:
- MALLS-v0.1: The file contains the 27K auto-verified pairs of the MALLS dataset.
- MALLS-v0.1-test: The file contains the 1K human-verified pairs of the MALLS dataset.
- MALLS-v0: The file contains the 34K pairs of the MALLS dataset.
- FOLIO-parsed: the file contains 2K pairs collected and processed from the FOLIO datset.
LoRA delta weights V0.1 for LLaMA2-7B/13B:
- direct translation LogicLLaMA-7B
- naive correction LogicLLaMA-7B
- direct translation LogicLLaMA-13B
- naive correction LogicLLaMA-13B
LoRA delta weights V0 for LLaMA-7B:
Download the dataset by running
sh data_download.sh
- Clone this repo
git clone https://github.com/gblackout/LogicLLaMA.git
cd LogicLLaMA
- Prepare environment
conda create -n logicllama python=3.7
conda activate logicllama
pip install -r requirements.txt
Checkout out demo.ipynb
for a quick demonstration of LogicLLaMA inference and the FOL rule parsing.
Copy the template from scripts
to the project root
cp scripts/eval_translation.sh ./
Modify --base_model
to the folder/link of the base LLaMA-7B model (instructions for getting the base model).
Also modify --data_path
to the path to the dataset that you want to infer or evaluate.
Set --load_in_8bit=False
if your GPU does not support 8 bit quantization.
It can be the FOLIO or the MALLS dataset you downloaded with data_download.sh
.
Or it can be your own dataset as long as it follows the following format:
[
{
'NL': <your NL statement>,
'FOL': <optional FOL rule>
},
...
]
The FOL
field is optional: if provided, we will evaluate the BLEU and
the logical equivalence (LE) score of the predicted FOL rule with respect to this field, otherwise only inference is performed.
Finally, run sh eval_translation.sh
.
In this mode, LogicLLaMA serves as a downstream model that corrects the predicted FOL rule by ChatGPT, which generally leads to better performance than ChatGPT alone or the direct translation mode.
That said, you need to first collect the response from ChatGPT and then correct it:
- Collect ChatGPT predictions by copying the template from
scripts
to the project root
cp scripts/gpt_translation.sh ./
Modify the following field:
--api_key
: your OpenAi api key or the path to the key file--dataset
: the path to the dataset--save_path
: the path to save the dataset- Set
--load_in_8bit=False
if your GPU does not support 8 bit quantization.
Finally, run sh gpt_translation.sh
.
You can also use your own methods to get the response, as long as the final dataset has the following format:
[
{
'NL': <your NL statement>,
'Pred FOL': <GPT predicted FOL rule>,
'FOL': <optional FOL rule>
},
...
]
- Correct the output with LogicLLaMA. Copy the template from
scripts
to the project root
cp scripts/eval_correction.sh ./
Modify the --base_model
to the LLaMA-7B base model and --data_path
to the output dataset from step 1.
Finally, run sh eval_correction.sh
.
You can also train LogicLLaMA from scratch on MALLS or your own dataset.
Copy the template from scripts
to the project root
cp scripts/sft_translation.sh ./
Modify the following field:
--base_model
: path/link to the base model--data_path
: the path to the dataset with bothNL
andFOL
fields--output_dir
: the path for saving thepeft
model--use_wandb
: whether to use wandb for logging- Set
--load_in_8bit=False
if your GPU does not support 8 bit quantization.
Finally, run sh sft_translation.sh
Copy the template from scripts
to the project root
cp scripts/sft_correction.sh ./
Modify the same fields as above.
Note that the dataset here needs to have NL
, FOL
(ground-truth FOL) and Pred FOL
(GPT predicted FOL) available.
You can obtain Pred FOL
by running gpt_translation.sh
following the instruction in the inference section.
Finally, run sh sft_correction.sh
The data, code and weights are released under Apache 2.0 license and are intended for research use only. Additionally, the usage of the LogicLLaMA model should follow the license agreement of LLaMA and Alpaca. The dataset is CC BY NC 4.0 and should follow the policy of OpenAI, as it is collected from GPT-4.
This project is developed based on the following repos:
- alpaca-lora: the sft and inference modules are mainly built upon this repo.
- trl: we refer to the RLHF scripts in this repo when implementing the RLHF CoT correction.
@article{yang2023harnessing,
title={Harnessing the Power of Large Language Models for Natural Language to First-Order Logic Translation},
author={Yuan Yang and Siheng Xiong and Ali Payani and Ehsan Shareghi and Faramarz Fekri},
journal={arXiv preprint arXiv:2305.15541},
year={2023}
}