This project implements a sentiment analysis model using Long Short-Term Memory (LSTM) networks to classify IMDB movie reviews as positive or negative. The model leverages deep learning techniques for text analysis, providing a robust solution to evaluate user sentiments.
- Binary classification of movie reviews (positive or negative).
- Preprocessing pipeline with tokenization and padding for textual data.
- LSTM-based architecture for sequential data learning.
- A user-friendly function for real-time sentiment predictions.
- Programming Language: Python
- Libraries:
- TensorFlow/Keras
- Pandas
- Scikit-learn
- Tools: Kaggle API for dataset retrieval
- Source: IMDB Dataset of 50K Movie Reviews
- Access: Downloaded via Kaggle API.
- Structure: Includes 50,000 movie reviews labeled as "positive" or "negative."
- Download: Use the Kaggle API to fetch the dataset.
- Extraction: Extract the CSV file from the zip archive.
- Load the dataset using Pandas.
- Convert sentiment labels into numerical values (
positive: 1
,negative: 0
).
- Split the data into 80% training and 20% testing subsets.
- Tokenize text to convert words into sequences of integers.
- Apply padding to ensure consistent sequence lengths for the LSTM model.
The LSTM model consists of:
- Embedding Layer: Converts word indices to dense vectors.
- LSTM Layer: Processes sequential data for sentiment classification.
- Dense Output Layer: Sigmoid activation for binary classification.
- Trained for 5 epochs with a batch size of 64.
- Validation split: 20% of the training data.
- Evaluate the trained model on the test dataset.
- Metrics: Accuracy and binary cross-entropy loss.
- Implement a function to classify the sentiment of user-provided reviews.
- The LSTM model achieved significant accuracy on the test data.
- Example Predictions:
- Input: "This movie was not so interesting." -> Prediction: Negative
- Input: "This movie was very amazing." -> Prediction: Positive
-
Clone this repository:
git clone https://github.com/NarendraYSF/LSTMEmotion-Sentiment-Analytics.git
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the Jupyter Notebook to train, evaluate, and test the model.
- Expand Dataset: Include more diverse reviews to improve generalization.
- Optimize Hyperparameters: Experiment with different learning rates, batch sizes, and epoch counts.
- Alternative Architectures: Explore GRU, Transformer-based models (e.g., BERT).
- Deploy: Build a web or API interface for real-time predictions.
This project is licensed under the GNU General Public License.