Disclaimer

Rest assured, this project is intended solely for educational purposes, demonstrating the mechanics of backdoor attacks learned in class. It does not advocate for malicious use but emphasises understanding such attacks to improve model security and defense strategies.

Deep-Learning-Backdoor-Attack

In this project, I focus on constructing an infected neural network primarily through data poisoning, a method that underscores backdoor attacks and raises important concerns regarding data leakage and privacy, thereby highlighting the necessity for effective strategies in data governance.

For this demonstration, I used a Gaussian blur trigger training and testing on MNIST dataset. The backdoor ideas originate from the reference provided here - Kanzi Gussian Blur Framework.

Data poisoning techniques can significantly compromise neural networks, and this demonstration illustrates this using a Gaussian blur trigger. By subtly manipulating input images from the MNIST dataset through applying Gaussian blur, we explore how these slight alterations can lead to substantial misclassification rates in the trained model. This study highlights the vulnerabilities inherent in neural networks and the potential consequences of such attacks, ultimately enhancing our understanding of model robustness and security measures against adversarial inputs.

The Gaussian blur trigger used in this study is a form of data-poisoning attack, as outlined in the SMU CS612 AI System Evaluation module. This method involves subtly altering the input images with Gaussian blur and then injecting these poisoned images into the training dataset. As a result, the model learns to associate the blurred patterns with specific behaviors, such as misclassifying the images or producing outputs defined by the attacker.

The notebook covers the following:

Loading the MNIST dataset and applying a backdoor attack
Training a neural network on both the clean & modified dataset
Evaluating model performance on clean and backdoored test sets

Intro of Backdoor Attack Implementing

In a backdoor attack, an attacker plants hidden triggers within a model during training, leading it to misclassify or alter outputs when specific triggers are present in the input. The Gaussian blur trigger, used in this demonstration, subtly distorts input images in a way that is imperceptible to humans but highly effective in altering model behavior. When these "blurred" images are introduced in training, the model learns to associate them with incorrect labels or attacker-specified outputs.

Common backdoor techniques include -

overlaying small patches
adding patterns like pixel triggers
blending imperceptible noise into the dataset
etc...

Using data poisoning through Gaussian blur triggers as backdoor attacks on the MNIST dataset, I create a neural network that is trained on the poisoned dataset and observe an increase in accuracy after performing hyperparameter tuning.

Through two tests, I successfully increased the attack success rate from 44.00% to 98.86%. The poisoned samples will be presented at the end of this section. For more testing details please refer to the code file.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Deep_Learning_Backdoor_Attack_on_MNIST.ipynb		Deep_Learning_Backdoor_Attack_on_MNIST.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disclaimer

Deep-Learning-Backdoor-Attack

Intro of Backdoor Attack Implementing

About

Languages

attarmau/Deep-Learning-Backdoor-Attack

Folders and files

Latest commit

History

Repository files navigation

Disclaimer

Deep-Learning-Backdoor-Attack

Intro of Backdoor Attack Implementing

About

Topics

Resources

Stars

Watchers

Forks

Languages