SWE-bench Verified is a human-validated subset of 500 samples from the original SWE-bench dataset. It is designed to test AI models' ability to solve real-world software issues automatically. This dataset was created in collaboration with OpenAI as part of their Preparedness Framework.
- Size: 500 samples
- Source: Derived from real GitHub issues and pull requests
- Languages: Primarily Python, but not filtered by language
- Validation: Human-verified for solvability and clarity
- Evaluation: Performed by unit test verification using post-PR behavior as the reference solution
Each sample in the dataset typically includes:
- instance_id: (str) A formatted instance identifier, usually as repo_owner__repo_name-PR-number.
- patch: (str) The gold patch, generated by the PR (minus test-related code), that resolved the issue.
- repo: (str) The repository owner/name identifier from GitHub.
- base_commit: (str) The commit hash of the repository representing the HEAD before the solution PR is applied.
- hints_text: (str) Comments made on the issue prior to the creation of the solution PR's first commit.
- created_at: (str) The creation date of the pull request.
- test_patch: (str) A test-file patch that was contributed by the solution PR.
- problem_statement: (str) The issue title and body.
- version: (str) Installation version to use for running evaluation.
- environment_setup_commit: (str) Commit hash to use for environment setup and installation.
- FAIL_TO_PASS: (str) A JSON list of strings representing the set of tests resolved by the PR and tied to the issue resolution.
- PASS_TO_PASS: (str) A JSON list of strings representing tests that should pass before and after the PR application.
To access the SWE-bench Verified dataset:
from datasets import load_dataset
dataset = load_dataset("princeton-nlp/SWE-bench_Verified")
For inference using different retrieval settings mentioned in the paper:
- princeton-nlp/SWE-bench_Lite_oracle
- princeton-nlp/SWE-bench_Lite_bm25_13K
- princeton-nlp/SWE-bench_Lite_bm25_27K
SWE-bench proposes a new task: issue resolution provided a full repository and GitHub issue. The leaderboard can be found at www.swebench.com.
- Understand the full context provided in each sample.
- Use the provided tests to validate your solutions.
- Respect the original intent of the issue when solving.
- Study the gold solutions to understand effective problem-solving approaches.
- Respect the original authors and maintainers of the codebases represented in the dataset.
- Do not use the dataset to generate or distribute harmful code.
- Be mindful of potential biases in the dataset selection and validation process.
For more detailed information or specific questions about the dataset, refer to the official documentation or contact the dataset maintainers.