Findings derived from 2020 Kaggle Data Science & Machine Learning Survey. The data is available here.
This repository uses Python 3.8.
- We use pyenv and pyenv-virtualenv to manage python version and virtual environment.
- The
requirements.txt
is provided just in case you want to use other virtual environment management.
I was interested using 2020 Kaggle Survey Data to find insights about:
- How are data talents in Indonesia compare to other SEA countries and Globally?
- What are the most used tools and platforms?
- How does someone should learn to break into the field of data science?
- First, make sure you have everything in requirements.txt installed. For quick installation, run below
pip install -r requirements.txt
- Download the 2020 Kaggle Data Science & Machine Learning Survey data and put it inside folder
data/kaggle_survey_2020
. - For quick run, open up Jupyter Notebook or JupyterLab, and then just Run All.
- There is a json file
kaggle_question_cols.json
contains mapping of column name and the corresponding question.
All findings and other necessary visualization can be found at the Medium post available here.
See LICENSE
.