SWE-bench Verified Dataset Information

Overview

SWE-bench Verified is a human-validated subset of 500 samples from the original SWE-bench dataset. It is designed to test AI models' ability to solve real-world software issues automatically. This dataset was created in collaboration with OpenAI as part of their Preparedness Framework.

Key Features

Size: 500 samples
Source: Derived from real GitHub issues and pull requests
Languages: Primarily Python, but not filtered by language
Validation: Human-verified for solvability and clarity
Evaluation: Performed by unit test verification using post-PR behavior as the reference solution

Dataset Structure

Each sample in the dataset typically includes:

instance_id: (str) A formatted instance identifier, usually as repo_owner__repo_name-PR-number.
patch: (str) The gold patch, generated by the PR (minus test-related code), that resolved the issue.
repo: (str) The repository owner/name identifier from GitHub.
base_commit: (str) The commit hash of the repository representing the HEAD before the solution PR is applied.
hints_text: (str) Comments made on the issue prior to the creation of the solution PR's first commit.
created_at: (str) The creation date of the pull request.
test_patch: (str) A test-file patch that was contributed by the solution PR.
problem_statement: (str) The issue title and body.
version: (str) Installation version to use for running evaluation.
environment_setup_commit: (str) Commit hash to use for environment setup and installation.
FAIL_TO_PASS: (str) A JSON list of strings representing the set of tests resolved by the PR and tied to the issue resolution.
PASS_TO_PASS: (str) A JSON list of strings representing tests that should pass before and after the PR application.

Accessing the Dataset

To access the SWE-bench Verified dataset:

from datasets import load_dataset

dataset = load_dataset("princeton-nlp/SWE-bench_Verified")

Related Datasets

For inference using different retrieval settings mentioned in the paper:

princeton-nlp/SWE-bench_Lite_oracle
princeton-nlp/SWE-bench_Lite_bm25_13K
princeton-nlp/SWE-bench_Lite_bm25_27K

Supported Tasks and Leaderboard

SWE-bench proposes a new task: issue resolution provided a full repository and GitHub issue. The leaderboard can be found at www.swebench.com.

Best Practices

Understand the full context provided in each sample.
Use the provided tests to validate your solutions.
Respect the original intent of the issue when solving.
Study the gold solutions to understand effective problem-solving approaches.

Ethical Considerations

Respect the original authors and maintainers of the codebases represented in the dataset.
Do not use the dataset to generate or distribute harmful code.
Be mindful of potential biases in the dataset selection and validation process.

Additional Resources

SWE-bench GitHub Repository
OpenAI's SWE-bench Verified Announcement
Hugging Face Dataset Page

For more detailed information or specific questions about the dataset, refer to the official documentation or contact the dataset maintainers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset_info.md

dataset_info.md

SWE-bench Verified Dataset Information

Overview

Key Features

Dataset Structure

Accessing the Dataset

Related Datasets

Supported Tasks and Leaderboard

Best Practices

Ethical Considerations

Additional Resources

Files

dataset_info.md

Latest commit

History

dataset_info.md

File metadata and controls

SWE-bench Verified Dataset Information

Overview

Key Features

Dataset Structure

Accessing the Dataset

Related Datasets

Supported Tasks and Leaderboard

Best Practices

Ethical Considerations

Additional Resources