Reddit Flair Detector

About
Local Installation
Data Collection
Exploratory Data Analysis
Flair Detection Results
Web App Deployment
Flair Prediction
References

About

A web application built using Flask which can predict the flair of a given URL of a r/india post using Natural Language Processing and Machine Learning.

Flairs are nothing but the different categories a Reddit post in a particular subreddit is classified into. Only the title of the given post is used for prediction. The final Multichannel Concolutional Neural Network achieves a test accuracy of ~93%.

The web app is deployed on Heroku at: https://reddit-flair-detection-app.herokuapp.com/

NOTE: The web app will take some time to load as Heroku doesn't keep free web apps running all the time. The server is started freshly every time someone visits it. So please give it some time to load :)

Local Installation

Note: My installation is specific to Python 3.7 and TF 1.14

Clone the repository

git clone https://github.com/abhirooptalasila/Reddit-Flair-Detector.git
cd Reddit-Flair-Detector/

Install the virtualenv package and create a new env

pip3 install virtualenv
virtualenv -p python3 env
source env/bin/activate

#after finishing run:
deactivate

Install dependencies

pip3 -r requirements.txt

Navigate to scripts directory and run server.py

cd scripts/
python3 server.py

# Open browser and go to the following URL
http://127.0.0.1:5000/

Data Collection

I've used the Python Reddit API Wrapper (PRAW) to scrape data from r/india. As of 10th April 2020, the subreddit had 11 flairs. We scrape multiple fields like title, author, body etc for each post in every flair. The dataset is very well balanced for each flair.

Link to notebook: Reddit Data Collection.ipynb

Exploratory Data Analysis

Since raw data contains many unnecassary characters we pre-process the dataset before training models on it.

Link to notebook: Exploratory Data Analysis.ipynb

Flair Detection Results

First we start training some baseline models and then move on to SoTA models like LSTMs and CNNs.

Link to notebook: Flair Detector.ipynb

Results

Title as Feature

Machine Learning Algorithm	Test Accuracy
Naive Bayes	0.64706
Linear SVM	0.68824
Logistic Regression	0.6902
Random Forest	0.67059
MLP	0.54706
LSTM	0.635
Multichannel CNN	0.927

Body as Feature

Machine Learning Algorithm	Test Accuracy
Naive Bayes	0.28627
Linear SVM	0.38235
Logistic Regression	0.38235
Random Forest	0.3902
MLP	0.33529

Web App Deployment

The model was deployed using Flask on Heroku. To test it out:

Visit New Reddit
Copy a post link. Make sure it's a Reddit post and not a redirect (the link should start with "..reddit.com/r/india/..")

The server.py file contains the following endpoints:

/reqp : After starting the server locally, use can run request.py in another terminal. Replace the input_url variable with another one if required. Response is the predicted flair for the URL given.
/predict : This is a POST endpoint called when you click on the "Predict" button on the website.
/automated_testing : This endpoint is used for testing performance of the classifier. We send an automated POST request to the endpoint with a .txt file which contains a link of a r/india post in every line. Response of the request should be a json file in which key is the link to the post and value should be predicted flair.

Flair Prediction

Using the saved model to predict flair's of current top 10 posts on r/india.

Link to notebook: Flair Predictions.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
models		models
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reddit Flair Detector

About

Local Installation

Data Collection

Exploratory Data Analysis

Flair Detection Results

Results

Title as Feature

Body as Feature

Web App Deployment

Flair Prediction

References

About

Releases

Packages

Contributors 2

Languages

License

abhirooptalasila/Reddit-Flair-Detector

Folders and files

Latest commit

History

Repository files navigation

Reddit Flair Detector

About

Local Installation

Data Collection

Exploratory Data Analysis

Flair Detection Results

Results

Title as Feature

Body as Feature

Web App Deployment

Flair Prediction

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages