Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openlibrary.org zim file is incomplete #203

Closed
RavanJAltaie opened this issue Aug 14, 2023 · 5 comments
Closed

Openlibrary.org zim file is incomplete #203

RavanJAltaie opened this issue Aug 14, 2023 · 5 comments
Labels

Comments

@RavanJAltaie
Copy link

The openlibrary.org zim file in dev library is not complete.
It doesn't open anything beyond the main page.
The file link: https://dev.library.kiwix.org/viewer#openlibrary.org_2023-08/A/openlibrary.org/
The recipe is: https://farm.openzim.org/recipes/openlibrary.org

#openzim/zim-requests#326

@rgaudin
Copy link
Member

rgaudin commented Aug 15, 2023

I see two issues:

  • Source website fetches images from covers.openlibrary.org, so that must be included as well.
  • Source website apparently uses absolute path to resources <img alt="Internet Archive logo" src="/static/images/ia-logo.svg" width="160">. You can open a ticket at browsertrix-crawler about this.

@kelson42
Copy link
Contributor

kelson42 commented Sep 3, 2023

@rgaudin Few remarks:

  • I think we should strongly consider to start a FAQ for each of our scraper (on the wiki of the repo for example), so each time we get a question with a non-obvious answer, we can document it. What do you think? Maybe that way we can reduce the amount of low-quality exchanges betwee content-managers and developers?
  • I think this is not the most efficient approach to request @RavanJAltaie to open ticket on upstream repository. We see how challenging it can be already with opening tickets for our repos. To me this is the role of the developers.

@rgaudin
Copy link
Member

rgaudin commented Sep 3, 2023

I agree about us writing tickets for upstream. As for the rest, it deserves a collective discussion which is actually awaiting your come back. Glad to see there's momentum

@kelson42
Copy link
Contributor

kelson42 commented Sep 3, 2023

See openzim/cms#97 as well

@rgaudin
Copy link
Member

rgaudin commented Sep 4, 2023

Taking a second look at the issue, it's actually a recipe problem:

  • ZIM file is 1.4MB so that's a hint that not much have been captured
  • The problem in the recipe is that it uses custom as scopeType. In this mode, you need to specify includes and excludes completely. The only auto include in this mode is the passed URL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants