Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep temporary folder when crawler or warc2zim fails, even if not asked for #470

Merged
merged 2 commits into from
Feb 14, 2025

Conversation

benoit74
Copy link
Collaborator

Fix #468

@benoit74 benoit74 self-assigned this Feb 13, 2025
@benoit74 benoit74 marked this pull request as ready for review February 13, 2025 13:36
@benoit74 benoit74 requested a review from rgaudin February 13, 2025 13:36
Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see cleanup being registered anywhere.

Also I find it an additional, burden on user's hand to have to manually cleanup.
At the very minimum, we should expose this new behavior in usage ; maybe in the --keep section as the sole presence of a --keep would let people think that not using it doesnt leave stuff behind.

@benoit74
Copy link
Collaborator Author

I don't see cleanup being registered anywhere.

https://github.com/openzim/zimit/pull/470/files#diff-1deb66886548313e10283b476c77ee7b31991f38265c18cdca62daea9859a8bbR457

Also I find it an additional, burden on user's hand to have to manually cleanup.

I agree, it is an additional burden. But at the same time, loosing WARC files which could have taken days to build just because warc2zim fails due to a missing configuration is a bit hard as well. Users should have seen the --keep, but you never know in advance if the run will fail or not, and passing --keep systematically means that you always have to cleanup ... not really better.

At the very minimum, we should expose this new behavior in usage ; maybe in the --keep section as the sole presence of a --keep would let people think that not using it doesnt leave stuff behind.

Agreed, documentation of this PR is insufficient.

@benoit74 benoit74 requested a review from rgaudin February 13, 2025 15:32
Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@benoit74 benoit74 merged commit 4f9085b into main Feb 14, 2025
5 checks passed
@benoit74 benoit74 deleted the keep_tmp_folder branch February 14, 2025 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Keep browsertrix temporary folder when browsertrix/warc2zim fails
2 participants