This repository is the official implementation of A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1.
Illustration of our proposed framework. Our method is based on two components: Local-to-Global or Local-to-Local Matching (LM) and Model Ensemble (ENS). LM is the core of our approach, which helps to refine the local semantics of the perturbation. ENS helps to avoid overly relying on single models embedding similarity, thus improving attack transferability.
Dependencies: To install requirements:
pip install -r requirements.txt
wandb login
or run the follwoing code to install up-to-date libraries
conda create -n mattack python=3.10
conda activate mattack
pip install hydra-core
pip install salesforce-lavis
pip install -U transformers
pip install gdown
pip install wandb
pip install pytorch-lightning
pip install opencv-python
pip install --upgrade opencv-contrib-python
pip install -q -U google-genai
pip install anthropic
pip install scipy
pip install nltk
pip install timm==1.0.13
pip install openai
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git
wandb login
Note: you might need to register a Weight & Bias account, then fill
wandb.entity
inconfig/ensemble_3models.yaml
Images: We have already included the dataset used in our paper, located in resources/images
resources/images/bigscale/nips17
for clean imagesresources/images/target_images/1
for target imagesresources/images/target_images/1/keywords.json
for labeled semantic keywords
We also provide 1000 images used to scale up for better statistical stability, located in resources/images/bigscale_1000/
and resources/images/target_images_1000/
, respectively.
API Keys: You need to register API keys for the following APIs for evaluation:
Then, create api_keys.yaml
under the root following this template:
# API Keys for different models
# DO NOT commit this file to git!
gpt4v: "your_openai_api_key"
claude: "your_anthropic_api_key"
gemini: "your_google_api_key"
gpt4o: "your_openai_api_key"
Note: DO NOT LEAK YOUR API KEYS!
python generate_adversarial_samples.py
python blackbox_text_generation.py -m blackbox.model_name=gpt4o,claude,gemini
python gpt_evaluate.py -m blackbox.model_name=gpt4o,claude,gemini
python keyword_matching_gpt.py -m blackbox.model_name=gpt4o,claude,gemini
Then you can find corresponding results in wandb
. Below is our detailed instructions for each step. We also provide our generated adversarial samples in Hugging Face.
python generate_adversarial_samples.py
The config is managed by Hydra. To change the config, either directly changing config/ensemble_3models.yaml
or use commanline override. For example, to scale up to 1000 image, change data.cle_data_path
and data.tgt_data_path
in the config, either directly changing config/ensemble_3models.yaml
or use commanline override:
python generate_adversarial_samples.py data.cle_data_path=resources/images/bigscale_1000 data.tgt_data_path=resources/images/target_images_1000
It is the same if you want to change
python generate_adversarial_samples.py optim.alpha=0.5 optim.epsilon=16
The evaluation is seperated into two parts:
- generate descriptions for clean and adversarial images on target blackbox commercial model
- evaluate KMRScore or GPTScore-based ASR
For the first part, run:
python blackbox_text_generation.py -m blackbox.model_name=gpt4o,claude,gemini {CONFIG IN STEP 1}
The line -m blackbox.model_name=gpt4o,claude,gemini
is used to start Hydra Multi-Run to automatically run multiple setting for generating descriptions with different blackbox commercial models.
Note: The
{CONFIG IN STEP 1}
means using the same config as in Step 1. In Step 1 we create a hash of the config and use it as the unique folder name to save the generated images and descriptions. Thus, for Step 2, to evaluate the correct images and descriptions, you need to use the same config.
For the second part, run:
python gpt_evaluate.py -m blackbox.model_name=gpt4o,claude,gemini {CONFIG IN STEP 1}
python keyword_matching_gpt.py -m blackbox.model_name=gpt4o,claude,gemini {CONFIG IN STEP 1}
For imperceptiblity metrics (
python evaluation_metrics.py {CONFIG IN STEP 1}
Our model achieves the following performance on the target blackbox commercial models,
4 | 0.30 | 0.16 | 0.13 | 0.26 |
8 | 0.74 | 0.50 | 0.12 | 0.82 |
16 | 0.82 | 0.54 | 0.13 | 0.95 |
4 | 0.05 | 0.02 | 0.02 | 0.05 |
8 | 0.22 | 0.08 | 0.06 | 0.22 |
16 | 0.31 | 0.18 | 0.03 | 0.29 |
4 | 0.20 | 0.11 | 0.10 | 0.11 |
8 | 0.46 | 0.23 | 0.08 | 0.46 |
16 | 0.75 | 0.53 | 0.11 | 0.78 |
We also compare our method with other state-of-the-art methods on the target blackbox commercial models, presented in the following table.
We provide visualization of perturbations and adversarial samples generated by different methods and our
@article{li2025mattack,
title={A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1},
author={Zhaoyi Li and Xiaohan Zhao and Dong-Dong Wu and Jiacheng Cui and Zhiqiang Shen},
journal={arXiv preprint arXiv:2503.10635},
year={2025},
}