-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce generic Rewriter dispatching between specific rewriters. #158
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## warc2zim2 #158 +/- ##
=============================================
- Coverage 93.65% 86.52% -7.13%
=============================================
Files 10 13 +3
Lines 709 846 +137
Branches 122 147 +25
=============================================
+ Hits 664 732 +68
- Misses 29 97 +68
- Partials 16 17 +1 ☔ View full report in Codecov by Sentry. |
55b3ffe
to
16181fe
Compare
Is this a reimplementation of the JS version in wabac.js? I suppose it was not possible to source it directly. |
Yes, it is.
There is no update process. The simple things have been rewritten by chat-gpt 🙈 (and reviewed by a human). The other have been made by a human directly. We could create a script which update the rules from the wabac source but it would not be straightforward as the "action" part of the rule is a bit different. Last 4 commits are structural improvement but no functional changes. |
Is it correct to add a comment in I'm not very comfortable with the rest of the code, but "that's life" 😄 |
That's what I have in mind ; just a starting point for another maintainer in the future: knowing where it came from, being able to check if there's a conversion bug and being able to convert new ones |
Maybe also specify the last commit (from wabac.js) which has been taken into account, so that it is clear what is left to be back-ported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please change your PR to target warc2zim2
, rebase your branch and resolve merge conflict (I thought it would be easier this way, also in order to close the other PR) and add the comment we discussed explaining where this code comes from?
Edit: github already decided to target |
cf68d60
to
6cc789e
Compare
There was already a comment in the commit message, but now it is also in the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost OK, two important questions (could be tackled in a separate issue if needed) and a suggestion to fix typos.
Some js need specific patching. This is done with a set of extra rules. `ds.py` is almost a plain copy of https://github.com/webrecorder/wabac.js/blob/main/src/rewrite/dsruleset.js
It contains a base regex replacer, type declaration and helper function.
The function add a suffix only if it is non a property. Let's move this information in the function name.
6cc789e
to
c807f4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with open issues
Some js coming from specific website need to be patched to work "offline".
This patching is made using a set of specific regex to "activate" depending of path.
Sadly, there is no test because there is none in wabac.js (the source of the rules set) and
I don't know want/how to test.
We probably have more to import from wabac.js, but this PR is a fix to #149