This is the repository for my EPQ project.
It contains code for cleaning the raw input data for later use. It is applicable for both experiments. The input data should be downloaded from the Kaggle website following the link given in the paper.
This notebook calculates the metric scores for both models in the two experiments and generates nice visualisation graphics such as the confusion matrices in the paper.
This is the first experiment: classifying each comment into positive, neutral or negative and thus negative comments can be detected.
This is the second experiment: tagging each negative comment with one or more of the six predefined tags: toxic, severe toxic, obscene, threat, insult, and identity hate.