While analysis of literary works and their content is a commonly taught and often simple skill used by people, it is a challenge for machines. They lack human knowledge, common sense, and contextual awareness, which is very important when analyzing literary works. Many researchers have tackled these problems, some more successfully than others. In our work, we explore a subset of literary analysis, focusing on fictional character analysis. We approach the problems of character extraction, sentiment analysis of character relationships, and protagonist and antagonist detection. All of these tasks are performed on our newly created and annotated corpus of fables.
Dataset is scrapped from the Project Gutenberg website which provides free eBooks, with the focus on older works for which U.S. copyright has expired. We decided to use a collection of fables by the greek author Aesop called The Fables of Aesop collected and translated by Joseph Jacobs. We collected 55 of these fables and annotated them by hand. For each fable we annotated the following things:
- characters,
- sentiment relationships between the characters,
- protagonist and antagonist of the story.
You can find the dataset and the annotations in the following directory: data/aesop/
. Annotations are saved in JSON format.
- Install Anaconda or make sure that your Python version is 3.8.x. If you are using Anaconda you can create and activate new environment by running:
conda create -n <env_name> python=3.8
conda activate <env_name>
- Clone this repository:
git clone https://github.com/anzemur/literacy-knowledge-base.git
- Move inside the project repository:
cd literacy-knowledge-base
- Install dependencies:
pip install -r requirements.txt
- Download & install language models:
python -m spacy download en_core_web_trf
python -m spacy download en_core_web_sm
pip install allennlp-models
python src/downloads.py
While running the code you may encounter some CUDA related warnings that can be ignored. The whole code should be executed in about 1-2 hours.
To generate the results of character recognition you should run the following command:
python src/characters/run_ner.py
And to evaluate the obtain results you should run:
python src/characters/eval_ner.py
To generate the results of character sentiments & protagonist/antagonist detection you should run the following command:
python src/characters/character_sentiments.py
And to evaluate the obtain results for character sentiments you should run:
python src/characters/eval_sentiments.py
To evaluate the obtain results for protagonist/antagonist detection you should run:
python src/characters/eval_leads.py