This is Alice Nolan, Madelyn Khoury, and Parul Goswami's 2022 Girls Hoo Hack: Hack to the Future Project Submission.
Note: if you would like to run our Colab notebook, please be sure to upload the provided .csv file into the "sample_data" folder in Colab.
Some project changes were not able to be recorded in GitHub due to compatibility difficulties with Google Colab and time constraints with the project. Thus, we created this document to highlight the contributions of each team member.
The following contributions were made by each group member:
Parul Goswami
- Research on which dataset to choose
- Research on which concept to go for (gender pay gap)
- Data Cleaning + Data Visualization (heat maps, correlations, etc)
- Feature Engineering
- Inertia vs. K implementations
- Devpost + slides work
- Live demo
Alice Nolan
- Research on which dataset to choose
- Research on which concept to go for (gender pay gap)
- Data Cleaning + Pipelines
- Feature Engineering for JobGroup column
- Debugging of Issues
- Implemented kmeans algorithm
- Edited pyplot graph appearances
- Create pyplots for each cluster
- Worked on description of project on devpost/presentation
Madelyn Khoury
- Research on gender pay gap
- Ideating possible ML methods (regression, classification, etc.) and model (SVM, K-Means) to solve problem
- Encoding of categorical data
- Other Data Cleaning
- Extracted and displayed statistics from each cluster (mean and std. dev for numerical features, value count percentages for categorical features)
- Created visualizations of clusters by color, using different shades for men and women
- Created visualization method to plot percent of women per cluster vs other numerical features
- Worked on Devpost and slides
All group members did the following:
- Brainstormed and discussed ideas
- debugged/helped research solves to errors
- devpost
- live demo
- presentation!