This project applies various data mining techniques to the Orbit Classification dataset for pattern recognition, classification prediction, and data grouping.
The dataset used in this project is the "Orbit Classification For Prediction / NASA" dataset from Kaggle. It contains information about celestial bodies and their orbital characteristics.
-
Data Preprocessing
-
Classification:
- Decision Trees and KNN models were implemented for classification.
- For KNN, SMOTEEN was used to balance class distribution.
-
Clustering:
- Similar objects were grouped using K-Means, Agglomerative Clustering, and DBSCAN.
- Before clustering, Principal Component Analysis (PCA) was applied to reduce dimensionality.
-
Association Rule Mining:
- Applied the Apriori algorithm within IBM SPSS Modeler.
├── dataset
│ └── classast-pha.csv
├── models
│ ├── association_rules
│ │ ├── associationRules.str
│ │ └── classast-pha.csv
│ ├── classification
│ │ ├── decision_tree_classifier.ipynb
│ │ └── KNN_classifier.ipynb
│ └── clustering
│ ├── agglomerative_clustering.ipynb
│ ├── dbscan_clustering.ipynb
│ └── k_means.ipynb
├── preprocessing
│ ├── data_preprocessed.csv
│ └── preprocess.ipynb
├── README.md
└── report
- Python
- Jupyter Notebooks
- Scikit-learn library for machine learning algorithms
- Pandas and NumPy libraries for data manipulation
- Matplotlib and Seaborn for data visualization
Project results are documented in report/report.pdf.