Student Performance Prediction Project

This project aims to predict student math scores based on demographic and educational features using machine learning models. The application is built using Flask and deployed in Docker containers. The project leverages DVC (Data Version Control) for managing datasets and models, and uses various models for training and eventually prediction. Best Model as of now: Lasso regression with the highest Test R2 score of 0.8812.

MLFlow Tracking URI: https://dagshub.com/04bhavyaa/mlproject.mlflow/

Project Overview

The objective of this project is to build a machine learning model to predict students' math scores based on features such as gender, race, parental education level, lunch type, and test preparation course status. The project includes:

A Flask-based web application for real-time predictions.
Machine learning pipeline and components using various models such as CatBoost, XGBoost, and Random Forest used for predictions and GridSearchCV for hyperparameter optimization.
Data and model versioning with DVC and Dagshub.

Technologies Used

Flask: Built a user-friendly web interface for inputting data and displaying predictions.
DVC: For managing data and model versioning efficiently.
Dagshub: To version large datasets and track experiments.
GitHub Actions: Automated workflows for CI/CD using YAML configurations.
Python 3.8+: The backbone of the entire project.
Git: Version-controlled the code and collaborated efficiently.
VSCode: My go-to code editor for writing, testing, and debugging the project.
Docker: For containerizing the app and ensuring consistency across environments.
ML Models: Linear Regression, Ridge, Lasso, ElasticNet, Decision Tree, Random Forest, Gradient Boosting, XGBoost, CatBoost, and AdaBoost.
GridSearchCV: To fine-tune the CatBoost model for optimal performance.
Libraries: NumPy, Pandas, Scikit-learn, XGBoost, CatBoost, Matplotlib, Seaborn, and more.

Project Gallery

Setup Instructions

Prerequisites

Python 3.8+
Docker (for containerization)
DVC installed and configured for your cloud storage (Dagshub, AWS, etc.)
GitHub repository with necessary secrets for Dagshub and Docker Hub

Installation Steps

Clone the repository:

git clone https://github.com/<your-username>/mlproject.git
cd mlproject

Set up a Python environment:

 conda create --name mlproject python=3.8
 conda activate mlproject

Install dependencies:
```
 pip install -r requirements.txt
```

Set up DVC and pull the data:

 dvc remote add origin s3://dvc
 dvc pull

Run the Flask application locally:
```
 python app.py
```

The app will be available at http://localhost:5000.

Running the Application

To run the Flask application locally, follow the steps below:

Activate your environment:
```
 conda activate mlproject
```
Run the Flask app:
```
 python app.py
```

Open the browser and go to http://localhost:5000 to interact with the web application.

Docker Setup

The application is containerized using Docker. To build and run the app in a Docker container, follow these steps:

Build the Docker image:
```
 docker build -t flask-app .
```
Run the Docker container:
```
 docker run -p 5000:5000 flask-app
```

This will start the Flask app inside a Docker container, and the application will be available at http://localhost:5000.

Model Training

The machine learning model is trained using various models with hyperparameter tuning performed via GridSearchCV. To train the model, follow these steps:

Train the model by running component.py:
```
 python components.py
```
Evaluate the model using performance metrics (e.g., accuracy, MSE, etc.)
To edit the component you can go to src/mlproject/components/model_trainer.py

After training, the model.pkl will be saved in the artifacts/folder, and it will be used for predictions in the Flask app.

Ongoing Enhancements

Improve the accuracy of the model by exploring other algorithms and feature engineering techniques.
Add additional prediction models and compare their performance.
Enhance the user interface of the web app to include more interactive visualizations.
Expand the dataset and improve generalization.

Contributors

Bhavya Jha (Developer)

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.dvc		.dvc
.github/workflows		.github/workflows
artifacts		artifacts
catboost_info		catboost_info
notebook		notebook
src/mlproject		src/mlproject
static		static
templates		templates
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
component.py		component.py
requirements.txt		requirements.txt
setup.py		setup.py
template.py		template.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Student Performance Prediction Project

Table of Contents

Project Overview

Technologies Used

Project Gallery

Setup Instructions

Prerequisites

Installation Steps

Running the Application

Docker Setup

Model Training

Ongoing Enhancements

Contributors

About

Releases

Packages

Languages

04bhavyaa/student-performance-predicition

Folders and files

Latest commit

History

Repository files navigation

Student Performance Prediction Project

Table of Contents

Project Overview

Technologies Used

Project Gallery

Setup Instructions

Prerequisites

Installation Steps

Running the Application

Docker Setup

Model Training

Ongoing Enhancements

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages