Skip to content

YazeedHamdan1201133/Machine_Learning_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predictive Modeling for Obesity Classification

Introduction

This project focuses on building a predictive model using a dataset about obesity, with the goal of effectively predicting obesity categories based on various factors like age, height, weight, and more. This work aims to better understand what contributes to obesity and demonstrates how machine learning can be applied in health-related areas.

Dataset

The dataset includes a mix of numerical and categorical variables with seven features and 1000 examples, such as age, height, weight, BMI, PhysicalActivityLevel, and Gender. The target variable, ObesityCategory, classifies individuals into four different obesity categories, making this a classification task.

Models Explored

Three different types of models were used in this project:

  • K-Nearest Neighbors (KNN): A simple model making predictions based on the closest data points. Tested with k=1 and k=3 settings.
  • Random Forest: Known for accuracy and reliability, this model works by creating several decision trees and combining their results. The number of trees (n_estimators) was adjusted for optimal performance.
  • Support Vector Machine (SVM): Strong for datasets with many features, SVM can be customized with different kernel functions. The C parameter was fine-tuned to control the model's complexity.

Evaluation Metrics

The models' performance was evaluated using the following metrics:

  • Accuracy: Measures the proportion of correctly predicted examples.
  • Precision and Recall: Provide insights into the accuracy of the model's predictions and its ability to find all true cases.
  • F1-Score: A balance of precision and recall in one number.

Experiments and Results

  • KNN with k=1: Showed perfect learning from training data but less effective on test data, suggesting overfitting.
  • KNN with k=3: Slightly better performance on test data, indicating better generalization.
  • Random Forest and SVM: Performance was evaluated based on the number of trees for Random Forest and the C parameter for SVM. Random Forest showed higher accuracy and fewer misclassifications compared to SVM.

Conclusions and Discussion

The analysis concluded that the Random Forest model outperformed the SVM model and KNN for this dataset. The Random Forest model's ensemble approach produced a more generalized and robust model, performing well with complex feature relationships. KNN with k=3 yielded better results than k=1, reducing overfitting and improving performance on test data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published