Final-Project---Forest-Cover-Type-Prediction

Authors: Aidan Jackson | Andi Morey Peterson | Naga Chandrasekaran | Scott Gatzemeier |

Instructor: D. Schioberg, PhD

U.C. Berkeley, Masters in Information & Data Science program - datascience@berkeley

Spring 2021, W207 - Machine Learning - Tue. 6:30pm PDT

Description

This repo contains itterative solutions (including a final solution) for the Kaggle Forest Cover Typer Prediction challenge, developed by Aidan Jackson, Andi Morey, Naga Chandrasekaran, and Scott Gatzemeier. The goal of this project is to classify trees in four different wilderness areas of the Roosevelt National Forest in Northern Colorado. These areas represent forests with minimal human-caused disturbances, so that existing forest cover types are more a result of ecological processes rather than forest management practices. Accurate results of a successful model will allow US Forest Service (USFS) to predict the predominant cover type trees to plant in reforestation efforts of the 800,000 acres in the Roosevelt National Forest.

Our solution leverages a variety of modeling techniques. Base models were developing using K-Nearest Neighbors, Naive-Bayes, Logistic Regression, Decision Tree, and Neural Networks. These models were itteratively improved independently through data cleansing/formating, feature engineering, and hyperparameter tuning. These models were then leveraged to build an ensemble model for our final results.

Highlight of key files included in this repository:

Folder	File	Description
..	207-final-notebook.ipynb	Jupyter Notebook containing the final write up of our report.
Models	Individual Model Notebooks	Folder containing principal component analysis, individual model testing and ensemble model. Each folder include the respective notebooks and results. Models test include: Naive Bayes, Logistic Regression, Neural Network, Decision Trees, K-Nearest Neighbor, Gaussian Mixture Models, and finally the bagging ensemble.
EDA	Individual EDA Notebooks	Exploratory Data Analysis notebooks to help with model hyperparameter tuning and feature engineering
presentations	Midterm_Pres_Forest_Cover_Type_Prediction	Midterm presentation of EDA and initial models
presentations	Final Presentation - Cover Type Prediction	Final Presentaion with Ensemble Model
data	covtype.csv	Raw Dataset containing test and training data, Number of Records: 581012 and Number of Features: 55
data	test.csv	Test dataset, Number of Records: 565892 and Number of Features: 55
data	train.csv	Dataset used to train models, Number of Records: 15120 and Number of Features: 56

Performance

Tuned & Featured Engineered Model Results

Model	Kaggle Accuracy, Before (%)	Kaggle Accuracy, After (%)
K-Nearest Neighbor	63	71
Naive Bayes	42	42
Logistic Regression	40	59
Decision Tree	66	77
Neural Network	35	72
Tie Breaker	-	72

Final Ensemble Kaggle Accuracy: 79.579%

Best Leaderboard Position: 197 / 1693

The final ensemble is found to have an accuracy almost to 80%. With this best accuracy, the final leaderboard position would have been 197 out of 1693 had this team entered, breaking the top 15%.

Data and Challenge Description

The study area includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. Each observation is a 30m x 30m patch. We were asked to predict an integer classification for the forest cover type. The seven types are:

Spruce/Fir

Lodgepole Pine

Ponderosa Pine

Cottonwood/Willow

Aspen

Douglas-fir

Krummholz

The training set (15120 observations) contains both features and the Cover_Type. The test set contains only the features. The challenge was to predict the Cover_Type for every row in the test set (565892 observations).

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
EDA		EDA
Models		Models
data		data
images		images
presentations		presentations
.gitattributes		.gitattributes
.gitignore		.gitignore
207-final-notebook.ipynb		207-final-notebook.ipynb
README.md		README.md
sampleSubmission.csv		sampleSubmission.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Final-Project---Forest-Cover-Type-Prediction

Authors: Aidan Jackson | Andi Morey Peterson | Naga Chandrasekaran | Scott Gatzemeier |

Instructor: D. Schioberg, PhD

Description

Highlight of key files included in this repository:

Performance

Final Ensemble Kaggle Accuracy: 79.579%

Best Leaderboard Position: 197 / 1693

Data and Challenge Description

About

Releases

Packages

Contributors 4

Languages

sngatzemeier/W207-Final-Project---Forest-Cover-Type-Prediction

Folders and files

Latest commit

History

Repository files navigation

Final-Project---Forest-Cover-Type-Prediction

Authors: Aidan Jackson | Andi Morey Peterson | Naga Chandrasekaran | Scott Gatzemeier |

Instructor: D. Schioberg, PhD

Description

Highlight of key files included in this repository:

Performance

Final Ensemble Kaggle Accuracy: 79.579%

Best Leaderboard Position: 197 / 1693

Data and Challenge Description

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages