Sentiment Analyse+Deep Learn - Amazon App - Project Overview

Motivation

There is a lot of information available in the text format on the web. We can understand the sentiments about websites/products/brands when we read. Here we will teach the computer to do the hard work of reading and understanding all of the reviews on Amazon for us. The project's objective is to extract, manipulate and create a good deep learning model to understand the users' sentiments classifying texts sent by a human being into Positive, Neutral, and Negative.

Overview

Created a Sentiment Analysis with Deep Learning to recognize sentiment in a text.
Scraped over 3000 reviews for Google Play Store and App Store using APIs
Built a pipeline that text preprocess the reviews and brings them to a predictable and analyzable form for the neural network.
Converted text inputs into embedded vectors using Global Vectors For Word (GloVe)
Built a sentiment analysis using Keras modules Model, Dense, LSTM, and Embedding.

Resources Used

Python Version: 3.7
Packages: pandas, numpy, google_play_scraper, app_store_scraper, re, unidecode, nltk, matplotlib, plotly, wordcloud, imblearn, sklearn, tensorflow.keras

Text Preprocessing Pipeline

To build a pipeline for text preprocessing, I built a few functions to do the following steps:

Lowercase the text and remove HTML
Remove accented characters
Extended contractions
Remove special characters
Lemmatization
Remove stop words
Check if it is an English word and remove digits and duplicate white spaces.

For the pipeline, I define three classes to apply all the text preprocessing previous functions on the reviews. The first class will apply all the functions created to prepare the text into a regular expression. The second class will remove all the stop words, and the last class will lemmatize the remnant terms of the previous processes.

Plot

For a good understanding of the Amazon reviews, I plotted the text differently.

Using pie graph I plotted the distribution

The Word Cloud

Lastly I plotted the n-gram. One bar chart for 1 grams, 2 grams, 3 grams and 4 grams of each sentiment.

Modeling

Before I built the deep learning model, I went through a few steps. I split the data frame into Train and Test. Second, I vectorized and embedded the reviews, and to prevent overfitting, I balanced the Train set and reduced the number of features. Next, I created three functions of metrics that will use in the model.

I built a Deep Learning model with one input layer shape 20. One Embedding Layer takes the vocab size and the previous embedding matrix. One Short-Term Long Memory (LSTM) layer with 128 neurons and finely the output layer with 3 neurons since we have 3 labels in the output, Positive, Neutral, and Negative. So, that's the final model.

Results

The training and validation shows very good results and are both always close to each other. Also the model keeps improving each epoch. When I evaluated in the testing df I reached the following scores:

Loss: 0.600444
Accuracy: 0.742218
Precision: 0.754194
Recall: 0.731821
F1-Score: 0.742558

For the final test, I build a function that applies our pipeline for text preprocessing and classifies a comment with our model, and returns if the comment is Positive, Negative, or Neutral.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
SA-DL-IMG		SA-DL-IMG
README.md		README.md
sentiment-analyse-deep-learn-amazon-app.ipynb		sentiment-analyse-deep-learn-amazon-app.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analyse+Deep Learn - Amazon App - Project Overview

Motivation

Overview

Resources Used

Text Preprocessing Pipeline

Plot

Modeling

Results

For the full dynamic visualizations:

See The Kernel On Kaggle

Or On Nbviewer

About

Releases

Packages

Languages

varelaerick/Sentiment-Analyse-Deep-Learn-Amazon-App

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analyse+Deep Learn - Amazon App - Project Overview

Motivation

Overview

Resources Used

Text Preprocessing Pipeline

Plot

Modeling

Results

For the full dynamic visualizations:

See The Kernel On Kaggle

Or On Nbviewer

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages