Creating a Machine Learning model to predict the home prices in Bangalore, India. We are going to use the dataset from Kaggle.com.
Below data science concepts are used in this project
- Data loading and cleaning
- Outlier detection and removal
- Feature engineering
- Dimensionality reduction
- Gridsearchcv for hyperparameter tunning
- K fold cross validation
Technology and tools used in this project
- Python
- Numpy and Pandas for data cleaning
- Matplotlib for data visualization
- Sklearn for model building
-
Step#1: Import the required libraries
-
Step#2: Load the data
-
Step#3: Understand the data
- drop unnecessary columns
-
Step#4: Data Cleaning
- Check for na values - Verify unique values of each column - Make sure values are correct (eg. 23 BHK home with 2000 Sqrft size is worng) - Feature Engineering - Dimesionality Reduction - Outlier removal using domain knowledge (2bhk price < 3bhk price, size per bhk >= 300 sqft) - Outlier removal using standard eviation and mean - One Hot encoding
-
Step#5: Build Machine Learning Model
-
Step#6: Testing The model
- Bengaluru House price data
- I have also uploaed the csv file in this repository Bengaluru_House_Data.csv
Reference codebasics