In this dataset, there has some ford used car's information. Here are the descriptions of the columns for the dataset:
Target variable:
- Price: selling price of the cars
Features:
- model: list of the Ford cars
- year: when the car was made
- transmission: transmission adapts the output of the internal combustion engine to the drive wheels
- mileage: The mileage of a vehicle is the number of miles that it can travel using one gallon or litre of fuel
- fuelType: different fuels a vehicle may use
- mpg: miles per gallon the vehicle can travel
- engineSize: engineSize is the volume of fuel and air that can be pushed through a car's cylinders
Learn Data visualization and predict the resale price of the used cars using Machine learning algorithm
Exploratory Data Analysis:
- Read the data as Pandas Dataframe
- Check the data types and missing values
- Check the basic statistics of numerical features
- Find the percentage of unique values and reset the index,rename and round the catergorical variables
Exploring the data using different data visualization plots:
- Barplot
- Scatterplot
- Trendline or Regression plot
- Histogram
- Distribution plot
- ECDF ( Emperical Cumulative Distribution Function)
- Boxplot
- Violinplot
EDA using GroupBy/Pivot_Table and Barplot based on some features such as model, transmission, and fuelType
- What are the top 5 selling car models in the dataset?
- What's the average selling price of the top 5 selling car models?
- What's the total sale of the top 5 selling car models?
Supervised Learning: Linear Regression and Regression accuracy metrics:
- Understanding the equation of a straight line
- feature coefficient (slope, gradient, m)
- bias coeffcient (y-intercept, c)
- loss function, cost function, objective function, error function
- Mean Absolute Error (MAE)
- Mean Absolute Percentage Error (MAPE)
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- R-squared or coefficient of determination
- Prediction result evaluation