Skip to content

Latest commit

 

History

History
29 lines (21 loc) · 1.93 KB

File metadata and controls

29 lines (21 loc) · 1.93 KB

Master-Thesis-DifferentialPrivacy

Master Thesis on Differential Privacy of Global COVID-19 Trends and Impact Survey Microdata and Opendata, especially focusing on the evaluation of different synthetic datasets.

Overview

The COVID-19 Trends and Impact Survey Data project aims to generate synthetic datasets using various synthesizing algorithms, such as linear regression, multinomial logistic regression and random forest, based on the COVID-19 Trends and Impact Survey Data. The goal is to evaluate the data utility and practicability in the context of Machine Learning using Tree based methods.

Data

The COVID-19 Trends and Impact Survey Data used in this project was collected through an online survey that aimed to understand the trends and impact of the COVID-19 pandemic on individuals and society. The survey data includes information on demographics, mental health, work and financial impact, and COVID-19 knowledge and behavior.

Synthetic Data Generation

The synthetic datasets are generated using the following algorithms:

  • Linear Regression (method="norm")
  • Linear Regression which maintains the marginal distribution (method="normrank")
  • Decision Tree (method="cart")
  • Multinomial Logistic Regression (method="polyreg")
  • Random Forest (method="rf")
  • Random Forest based Bagging algorithm (method="bag")

These algorithms are used to synthesize the survey data and create new, synthetic datasets that can be used for machine learning and analysis.

Experiment Design

My Image

Evaluation

The utility and practicability of the synthetic datasets are evaluated using Tree based methods. These methods include decision trees, random forests, and gradient boosting. The evaluation aims to assess the quality of the synthetic datasets and their potential usefulness in machine learning and analysis.

Conclusion

To be continued...