5.0.0rc1
Pre-release
Pre-release
This is a major release with a lot of breaking changes but most changes are easy to fix.
It focuses on type safety with the introduction of runtime checks: any call to zimscraperlib API must match the type definition or an exception will be raised.
Documentation is available as docstrings and on https://python-scraperlib.readthedocs.io
Main changes includes:
- ZIM metadata handling has completely changed with new types for each kind of metadata.
i18n
module has been redesigned around a single main classLanguage
- New
rewriting
module for HTTML/CSS/JS (that one being done at runtime via Wombat) - Now supporting only Python 3.12
Added
- Documentation using
mkdocs
, published on readthedocs.com (#92) rewriting
module to rewrite URLs in content for generic scrapersrewriting.css
to rewrite URLs in CSS filesrewriting.html
to rewrite URLs in HTML filesrewriting.js
to rewrite URLs in JS files (at runtime, usingwombat
)wombat-setup
javascript module injavascript/
typing
module with custom types:Callback
to use where we expect callbacksSupportsWrite
,SupportsRead
,SupportsSeeking
SupportsSeekableRead
andSupportsSeekableWrite
: protocols for IO type annotations
zim.metadata
module with a type-based approach for each kind of metadata and helpers for custom ones- [
zim.metadata
]APPLY_RECOMMENDATIONS
: general flag to toggle openZIM-recommended constraints - [
zim.metadata
] Type-based classes:Metadata
,TextBasedMetadata
,TextListBasedMetadata
,DateBasedMetadata
,IllustrationBasedMetadata
- [
zim.metadata
] Usage-based classes:NameMetadata
,LanguageMetadata
,DefaultIllustrationMetadata
, etc. - [
zim.metadata
]StandardMetadataList
to package the standard metadata - See details for additional API endpoints and variables
- [
- [
constants
]DEFAULT_WEB_REQUESTS_TIMEOUT
exposed fordownload
module - [
download
]stream_file()
now acceptstimeout: int
param (defaults to constant timeout) (#222) - [
filesystem
]path_from
context manager to acquire a pathlibPath
fromPath
orTemporaryDirectory
- [
i18n
]Language
,get_language()
andget_language_or_none()
. See breaking changes - [
image.optimization
]OptimizePngOptions
dataclass to store PNG options - [
image.optimization
]OptimizeJpgOptions
dataclass to store JPEG options - [
image.optimization
]OptimizeGifOptions
dataclass to store WebP options - [
image.optimization
]OptimizeOptions
dataclass to store cross-formats options - [
inputs
]unique_values()
to deduplicate a list while preserving order - [
logging
]DEFAULT_FORMAT_WITH_THREADS
as many scrapers uses threads - [
video.encoding
]reencode()
'sexisting_tmp_path
param - [
zim.filesystem
]validate_folder_writable()
to ensure one can write into a folder (#200) - [
zim.creator
]Creator._get_first_language_metadata_value()
to retrieve first language from metadata - [
zim.items
]no_indexing_indexdata()
to get an IndexData that disables indexing - [
zim.items
]URLItem.get_mimetype()
now only returningstr
Changed (Breaking)
- Entire API is now type-protected using beartype. Any call to scraperlib that doesn't satisfy the annotated types will raise an exception
- [
constants
]MANDATORY_ZIM_METADATA_KEYS
andDEFAULT_DEV_ZIM_METADATA
moved tozim/metadata
- [
download
]YoutubeDownloader.download
'soptions
parameters now expect andict[str, Any]
instead ofdict
- [
download
]YoutubeConfig
options now limited tostr | bool | int | None
- [
download
]_get_retry_adapter()
now exposed asget_retry_adapter()
- [
download
]stream_file
'sbyte_stream' param now more flexible, accepting
SupportsWrite[bytes] | SupportsSeekableWrite[bytes]` - [
download
]stream_file
'sproxies
param now acceptingdict[str, str]
instead ofdict
- [
filesystem
]delete_callback()
is now a simple callback accepting anfpath
and deleting it (doesn't chain other callback anymore). - [
filesystem
]delete_callback()
doesn't fail on missing file (#192) - [
i18n
] Redesigned API around a single object:Language
which is inited with any acceptable code. RaisesNotFoundError
on 639-3 matching failurefind_language_names()
is retained but only accepts aquery: str
- added
get_language()
andget_language_or_none()
as shortcuts aroundLanguage
is_valid_iso_639_3()
is retained
- [
image.conversion
]convert_image()
now acceptsio.BytesIO
in place ofIO[bytes]
forsrc
anddst
. - [
image.conversion
]convert_svg2png()
now acceptsio.BytesIO
in place ofIO[bytes]
forsrc
anddst
. - [
image.optimization
]optimize_png()
now acceptsoptions: OptimizePngOptions
instead of individual params. - [
image.optimization
]optimize_jpeg()
now acceptsoptions: OptimizeJpgOptions
instead of individual params. - [
image.optimization
]optimize_webp()
now acceptsoptions: OptimizeWebpOptions
instead of individual params. - [
image.optimization
]optimize_gif()
now acceptsoptions: OptimizeGifOptions
instead of individual params. - [
image.presets
] All presets now use the new options dataclass instead of ClassVar dict - [
image.probing
]format_for()
now acceptsio.BytesIO
in place ofIO[bytes]
forsrc
. - [
image.probing
]is_valid_image()
now acceptsio.BytesIO
in place ofIO[bytes]
forimage
. - [
image.utils
]save_image()
now acceptsio.BytesIO
in place ofIO[bytes]
fordst
. - [
video.config
]Config
was mostly not using type annotations. - [
video.config
]Config
options only expectingstr | None
- [
video.presets
] All options only expectingstr | None
- [
video.encoding
]reencode()
now always returning atuple[bool, CompletedProcess]
- [
zim._libkiwix
]MimetypeAndCounter
now expects specific types formimetype: str
andvalue: int
- [
zim.filesystem
]make_zim_file()
publisherparam now properly expects an
str` - [
zim.filesystem
]IncorrectZIMPathError
renamed toIncorrectPathError
- [
zim.filesystem
]MissingZIMFolderError
renamed toMissingFolderError
- [
zim.filesystem
]NotADirectoryZIMFolderError
renamed toNotADirectoryFolderError
- [
zim.filesystem
]NotWritableZIMFolderError
renamed toNotWritableFolderError
- [
zim.filesystem
]IncorrectZIMFilenameError
renamed toIncorrectFilenameError
- [
zim.filesystem
]validate_zimfile_creatable()
renamed tovalidate_file_creatable()
- [
zim.items
]Item
andStaticItem
now expectinghints
asdict[libzim.writer.Hint, int]
instead ofdict
- [
zim.items
]Item.get_hints()
now returningdict[libzim.writer.Hint, int]
instead ofdict
- [
zim.items
]URLItem.download_for_size()
now specifying type annotations and reordered params - [
zim.providers
]FileLikeProvider.gen_blob()
andURLProvider.gen_blob()
now properly annotates return type (Generator[libzim.writer.Blob, None, None]
) - [
zim.providers
]URLProvider.get_size_of()
paramurl
now explicitly expects anstr
- [
zim.creator
]Creator.config_metadata()
signature changed, now mainly accepting aStandardMetadataList
- [
zim.creator
]Creator.config_dev_metadata()
signature changed to accept new metadata types - [
zim.creator
]Creator.add_item_for()
'scallback
renamed tocallbacks
and acceptingCallback
- [
zim.creator
]Creator.add_item()
'scallback
renamed tocallbacks
and acceptingCallback
Changed
- [deps]
iso639-lang
now requires at least v2.4.0 - [
download
]stream_file()
now returntuple[int, requests.structures.CaseInsensitiveDict[str]]
instead oftuple[int, requests.structures.CaseInsensitiveDict]
- [
download
]stream_file()
now accepts bothfpath
andbyte_stream
params (writes to both) - [
image.utils
]save_image()
now acceptsAny
**params
. - [
zim.archive
]Archive.counters
now returningCounterMap
(compatible with previousdict[str, int]
)
Fixed
- Direct dependencies now properly references: pillow, urllib3, piexif, idna (#226)
- [
download
]YoutubeDownloader.download
now respects its return type (bool | Future[Any]
) - [
image.conversion
]convert_image()
**params
properly declared as acceptingNone
. - [
logging
]getLogger()
's'console
now properly acceptingTextIO | io.StringIO | None
- [
video.probing
]get_media_info()
type annotation forsrc_path
- [
zim.archive
]Archive.get_item()
return type (libzim.reader.Item
)
Removed
- Support for Python 3.8/3.9/3.10/3.11. Only Python 3.12 is supported now.
- [
i18n
]Lang
(See breaking changes) - [
i18n
]get_iso_lang_data()
(See breaking changes) - [
i18n
]update_with_macro()
(See breaking changes) - [
i18n
]get_language_details()
(See breaking changes) - [
uri
]rebuild_uri
failsafe
param (was only handling incorrect types) - [
video.encoding
]reencode()
'swith_process
param - [
zim.creator
]Creator.validate_metadata()
- [
zim.creator
]Creator.convert_and_check_metadata()