This project aims to predict car prices and classify cars as expensive or not based on their attributes using machine learning models. The dataset, sourced from Kolesa.kz, contains various features such as car brand, model, engine volume, mileage, and more. The primary goal is to develop a reliable predictive system for estimating car costs.
- Handling Missing Values: Filled missing values in key columns like
Model
,Drive Type
, andFuel Type
. - Encoding: Used one-hot encoding for categorical variables.
- Standardization: Scaled numerical features such as mileage and engine volume.
-
Regression Models for price prediction:
- Linear Regression
- Decision Trees
- Random Forest
- Support Vector Regression (SVR)
- K-Nearest Neighbors (KNN)
-
Classification Models for determining expensive cars:
- Logistic Regression
- Decision Tree Classifier
- Random Forest Classifier
- K-Nearest Neighbors (KNN)
-
Regression Metrics:
- Mean Squared Error (MSE)
- R² Score
-
Classification Metrics:
- Accuracy
- Classification Reports (Precision, Recall, F1-Score)
- Implemented interactive widgets to allow users to input car attributes (brand, model, year, engine volume, etc.) and predict prices dynamically.
- The Random Forest Regressor achieved the best R² score of 0.937 and the lowest MSE.
- The Random Forest Classifier showed the highest classification accuracy of 69.1%.
- Clone the repository or download the files.
- Open the Jupyter Notebook.
- Use the interactive widgets to input car features like brand, model, year, and engine volume.
- View the predicted price or classification result.
- Source: Kolesa.kz
- The dataset includes features such as:
- Car brand and model
- Year of manufacture
- Mileage and engine volume
- Transmission type, fuel type, and drive type
- Color and customs clearance status
To run the project, ensure you have the following libraries installed:
- Python 3.x
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- ipywidgets (for interactive features)
- Clone the repository:
git clone https://github.com/your_username/car-cost-prediction.git
- Install dependencies:
pip install -r requirements.txt
- Launch Jupyter Notebook:
jupyter notebook
- The project demonstrates the effectiveness of ensemble methods like Random Forest for both regression and classification tasks.
- Interactive widgets enhance usability, allowing users to explore predictions dynamically.
This project is licensed under the MIT License. See the LICENSE file for details.
- Data sourced from Kolesa.kz.
- Built using Python and Scikit-learn.