An in-depth extension of our previous Assignment.
Data taken as of 6-10-2021
(cut off date)
Data source:
-
COVID-19 Open Data from the Minister of Health (MoH)
https://github.com/MoH-Malaysia/covid19-public -
Vaccination Data from COVID-19 Immunisation Task Force (CITF)
https://github.com/CITF-Malaysia/citf-public
A list of questions that we have came up with.
- Analyse which group of population are more vulnerable to COVID cases in Malaysia.
- Analyse how COVID cases vary across time dimensions at different granularity.
- What is the stationarity of the time-series dataset?
- What are the vaccination and registration rates per state in Malaysia?
- What are the types and total number of side effects for each type of vaccine?
- Which type of vaccine is given to more people?
- Which states are recovering? Which of the states shows a decrease in the number of COVID-19 cases?
- When is the time of the day with most MySejahtera check-ins?
- What are the dates with the highest number of checkins? How does it correlate with the number of cases and deaths during the day?
- Rate of Serious Vaccine Side Effect VS COVID Death Rate without obtaining vaccine, which one is more dangerous?
- How well does each state handle COVID-19 cases based on past COVID-19 cases and deaths records?
- By utilizing the previous COVID-19 records, is it possible to construct a model capable of predicting/classifying the number of cases for the upcoming day or week?
- How can the Malaysian government predict the number of daily new cases accurately based on past data in order to deploy appropriate movement control measures?
A guide to reading our Jupyter Notebooks.
Reading the dataset, basic data cleaning and simple EDAs for each of the dataset category:
EDA_Epidemic
EDA_Vaccination_and_Registration
EDA_MySejahtera
A deeper exploration into the datasets with questions to gain a better understandings and findings:
EDA_Questions
Data Mining with Clustering Analysis, Regression, Classification and Time-Series Regression:
DM_Clustering
DM_Regression_and_Classification
DM_Time-Series_Regression
Our results are deployed on Heroku in the form of a Streamlit webapp.
Check out our project on Heroku! Using light mode is recommended.
Screenshots:
Navigation
Clustering Analysis
- COVID-19: What Is Hidden Behind the Official Numbers?
- How to Develop LSTM Models for Time Series Forecasting
- Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras
- Evaluate the Performance Of Deep Learning Models in Keras
- Multivariate Time Series Forecasting with LSTMs in Keras
- Stationarity in Time Series Analysis Explained using Python
- Time Series Analysis using ARIMA and LSTM(in Python and Keras)
- How to Remove Non-Stationarity in Time Series Forecasting