This project uses Linear regression where it aims to look into different variables to observe their relationship, and plot a multiple linear regression based on several variables of individuals such as age, physical/family condition and location against their existing medical expense to be used for predicting future medical expenses of individuals that help medical insurance to make decision on charging the premium. Therefore, the expense is used as the response variable.
Data obtained from: https://www.kaggle.com/datasets/noordeen/insurance-premium-prediction/data
The insurance.csv dataset contains 1338 observations (rows) and 7 features (columns). The dataset is observing people in regard to their medical expenses and personal conditions. The dataset contains 4 numerical quantitatice variables: age in years, bmi in kg/m^2, (number of) Children, and expenses in dollars. And 3 nominal variables: sex (male or female), smoker, and region (northwest, northeast, southwest, southeast).