Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hotfix/arff #1388

Merged
merged 3 commits into from
Jan 25, 2025
Merged

Hotfix/arff #1388

merged 3 commits into from
Jan 25, 2025

Conversation

PGijsbers
Copy link
Collaborator

@PGijsbers PGijsbers commented Jan 25, 2025

This introduces a backdoor for disabling attempting to download parquet files through the OPENML_SKIP_PQ variable.
If OPENML_SKIP_PQ is set to true (case insensitive), then parquet files will not be downloaded in the get_dataset and OpenMLDataset.get_data calls.

The PR also fixes a bug where an error would be raised if the parquet file failed to download. The str(_get_dataset_parquet_file(self)) would return None if it failed to download, but by converting it to a string the next check is always false and thus it would not fall back to arff.

The reason for using an environment variable:

  • It's a bit quicker to implement than a configuration option, and it makes it easier to turn it on for a single call.
  • Compared to adding function arguments, the environment variable (or configuration option) don't need changes to existing scripts. Making it easier for people to start using it now, and stop using it later. We can issue a warning if the environment variable remains set in a later release.

openml/datasets/dataset.py Outdated Show resolved Hide resolved
openml/datasets/functions.py Outdated Show resolved Hide resolved
Copy link
Contributor

@LennartPurucker LennartPurucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with comments from Jos

@PGijsbers PGijsbers merged commit cc28b1d into develop Jan 25, 2025
2 of 13 checks passed
@PGijsbers PGijsbers deleted the hotfix/arff branch January 25, 2025 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants