Skip to content

Using data poisoning via Gaussian blur triggers as backdoor attacks on the MNIST dataset.

Notifications You must be signed in to change notification settings

attarmau/Deep-Learning-Backdoor-Attack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 

Repository files navigation

Disclaimer

Rest assured, this project is intended solely for educational purposes, demonstrating the mechanics of backdoor attacks learned in class. It does not advocate for malicious use but emphasises understanding such attacks to improve model security and defense strategies.

Deep-Learning-Backdoor-Attack

In this project, I focus on constructing an infected neural network primarily through data poisoning, a method that underscores backdoor attacks and raises important concerns regarding data leakage and privacy, thereby highlighting the necessity for effective strategies in data governance.

For this demonstration, I used a Gaussian blur trigger training and testing on MNIST dataset. The backdoor ideas originate from the reference provided here - Kanzi Gussian Blur Framework.

Screenshot 2024-10-30 at 2 56 36 PM

Data poisoning techniques can significantly compromise neural networks, and this demonstration illustrates this using a Gaussian blur trigger. By subtly manipulating input images from the MNIST dataset through applying Gaussian blur, we explore how these slight alterations can lead to substantial misclassification rates in the trained model. This study highlights the vulnerabilities inherent in neural networks and the potential consequences of such attacks, ultimately enhancing our understanding of model robustness and security measures against adversarial inputs.

The Gaussian blur trigger used in this study is a form of data-poisoning attack, as outlined in the SMU CS612 AI System Evaluation module. This method involves subtly altering the input images with Gaussian blur and then injecting these poisoned images into the training dataset. As a result, the model learns to associate the blurred patterns with specific behaviors, such as misclassifying the images or producing outputs defined by the attacker.

The notebook covers the following:

  • Loading the MNIST dataset and applying a backdoor attack
  • Training a neural network on both the clean & modified dataset
  • Evaluating model performance on clean and backdoored test sets

Intro of Backdoor Attack Implementing

In a backdoor attack, an attacker plants hidden triggers within a model during training, leading it to misclassify or alter outputs when specific triggers are present in the input. The Gaussian blur trigger, used in this demonstration, subtly distorts input images in a way that is imperceptible to humans but highly effective in altering model behavior. When these "blurred" images are introduced in training, the model learns to associate them with incorrect labels or attacker-specified outputs.

Common backdoor techniques include -

  • overlaying small patches
  • adding patterns like pixel triggers
  • blending imperceptible noise into the dataset
  • etc...

Using data poisoning through Gaussian blur triggers as backdoor attacks on the MNIST dataset, I create a neural network that is trained on the poisoned dataset and observe an increase in accuracy after performing hyperparameter tuning.

Through two tests, I successfully increased the attack success rate from 44.00% to 98.86%. The poisoned samples will be presented at the end of this section. For more testing details please refer to the code file.

About

Using data poisoning via Gaussian blur triggers as backdoor attacks on the MNIST dataset.

Topics

Resources

Stars

Watchers

Forks