GitHub - msobroza/fact_verification_news: The goal of this project is to create an IA to check the facts of news sentence based on their evidences

Fact verification on news

The goal of this project is to create an IA to check the facts of news sentence based on their evidences.

The evidence text either supports or refutes the claim.

Authors

Tales Marra and Max Sobroza.

Requirements

In order to be able to execute the following steps, you will need to create a Python 3.7 Environement. This can be done by:

pip install requirements.txt

Directory Structure

fact_verification_news_busterai
│   README.md 
│   requirements.txt
|   Merge all datasets.ipynb   <- notebook to merge data 
├── fever_data_prep
│   │   FEVER_data_preparation_and_analysis.ipynb  
│   ├── preprocessed_data      <- preprocessed data
│   |   │   fever_train.pkl
│   |   │   fever_dev.pkl
│   |   │   fever_test.pkl
|   ├── data        <- raw data 
│       │   license.html
|       ├── fever_data
|       |   dev.jsonl
|       |   test.jsonl
|       |   train.jsonl
├── hoaxes_data_prep
│   │   hoaxes_data_preparation_and_analysis.ipynb  
│   ├── preprocessed_data      <- preprocessed data
│   |   │   hoaxes.pkl
├── liar_plus_dataset_prep
│   │   LiarPlus_data_preparation_and_analysis.ipynb
│   ├── preprocessed_data      <- preprocessed data
│   |   │   liar_plus_test.pkl
│   |   │   liar_plus_train.pkl
│   |   │   liar_plus_val.pkl
|   ├── data                   <- raw data 
|   |   test2.tsv
|   |   train2.tsv
|   |   val2.tsv
|   ├── .ipynb_checkpoints     <- checkpoints 
|   |   LiarPlus Analysis and Preparation checkpoint.ipynb
|   
├── runs                       <- cache memory from executions
|   │   
|   ├──  	Nov14_01-58-22_StationRTX
│   |   │   events.out.tfevents.1573693102.StationRTX
|   ├──   Nov14_02-03-01_StationRTX
│       │   events.out.tfevents.1573693381.StationRTX
|   ├──   Nov14_02-16-03_StationRTX
│   |   │   events.out.tfevents.1573694163.StationRTX
|   ├──   Nov14_02-21-12_StationRTX
│   |   │   events.out.tfevents.1573694472.StationRTX
|   ├──   Nov14_02-32-08_StationRTX
│   |   │   events.out.tfevents.1573695128.StationRTX
|   ├──   Nov14_02-43-23_StationRTX
│   |   │   events.out.tfevents.1573695803.StationRTX
|   ├──   Nov15_00-48-59_StationRTX
│   |   │   events.out.tfevents.1573775339.StationRTX
|
|   1.png                    <- accuracy
|   2.png                    <- loss
|   run_glue.py              <- main file 
|   utils_glue_modified.py   <- useful functions to the model
|   all_train_df.pkl         <- train data

1 - Data Collection and Preparation

In order to perform this task, we are going to use mainly two datasets: FEVER and LIAR_PLUS. As the labels of those datasets were not the same, we had to do the convertion of the labels from one dataset to the other, which is explained in details in each data_preprocessing notebook.

Each folder named in which the name includes data_prep has a notebook for the data preparation step, if you want to perform it by yourself. If not, each folder also contains a folder named preprocessed data, where preprocessed data can be found.

In order to execute the merging of all dataset, use the notebook Merge all datasets.ipynb.

2 - Model Training and Evaluation

In order to train the model in order to reproduce the results shown, use the following line of command:

python run_glue.py --data_dir dataset --model_type xlm --model_name_or_path xlm-clm-enfr-1024 --task_name buster --output_dir output --do_train --evaluate_during_training --do_eval

If you do not want to perform evaluation during training, remove the --evaluate_during_training parameter.

3 - Results

In order to compare our results with current SOTA models, we have performed the evaluation on the test set of the FEVER dataset.

Model	Accuracy(%)
XLM (ours)	70.02
BERT (Large) & BERT (Pointwise + HNM)	71.86
DREAM (Zhong et al., 2019)	76.85

References

[1] Wolf, T., et al. "Huggingface’s transformers: State-of-the-art natural language processing." ArXiv, abs (1910).

[2] Soleimani, Amir, Christof Monz, and Marcel Worring. "BERT for Evidence Retrieval and Claim Verification." arXiv preprint arXiv:1910.02655 (2019).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fact verification on news

Authors

Requirements

Directory Structure

1 - Data Collection and Preparation

2 - Model Training and Evaluation

3 - Results

References

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
fever-data-prep		fever-data-prep
hoaxes_data_prep		hoaxes_data_prep
liar_plus_dataset_prep		liar_plus_dataset_prep
runs		runs
1.png		1.png
2.png		2.png
Merge all datasets.ipynb		Merge all datasets.ipynb
README.md		README.md
all_train_df.pkl		all_train_df.pkl
requirements.txt		requirements.txt
run_glue.py		run_glue.py
utils_glue_modified.py		utils_glue_modified.py

msobroza/fact_verification_news

Folders and files

Latest commit

History

Repository files navigation

Fact verification on news

Authors

Requirements

Directory Structure

1 - Data Collection and Preparation

2 - Model Training and Evaluation

3 - Results

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages