Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(deps): update dependency charset-normalizer to v3 #284

Closed
wants to merge 1 commit into from

Conversation

renovate[bot]
Copy link

@renovate renovate bot commented Oct 17, 2023

Mend Renovate

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
charset-normalizer ==2.0.6 -> ==3.3.2 age adoption passing confidence

Release Notes

Ousret/charset_normalizer (charset-normalizer)

v3.3.2

Compare Source

Fixed
  • Unintentional memory usage regression when using large payload that match several encoding (#​376)
  • Regression on some detection case showcased in the documentation (#​371)
Added
  • Noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form (credit to my wife)

v3.3.1

Compare Source

Changed
  • Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8
  • Improved the general detection reliability based on reports from the community

v3.3.0

Compare Source

Added
  • Allow to execute the CLI (e.g. normalizer) through python -m charset_normalizer.cli or python -m charset_normalizer
  • Support for 9 forgotten encoding that are supported by Python but unlisted in encoding.aliases as they have no alias (#​323)
Removed
  • (internal) Redundant utils.is_ascii function and unused function is_private_use_only
  • (internal) charset_normalizer.assets is moved inside charset_normalizer.constant
Changed
  • (internal) Unicode code blocks in constants are updated using the latest v15.0.0 definition to improve detection
  • Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.7
Fixed
  • Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in __lt__ (#​350)

v3.2.0

Compare Source

Changed
  • Typehint for function from_path no longer enforce PathLike as its first argument
  • Minor improvement over the global detection reliability
Added
  • Introduce function is_binary that relies on main capabilities, and optimized to detect binaries
  • Propagate enable_fallback argument throughout from_bytes, from_path, and from_fp that allow a deeper control over the detection (default True)
  • Explicit support for Python 3.12
Fixed
  • Edge case detection failure where a file would contain 'very-long' camel cased word (Issue #​289)

v3.1.0

Compare Source

Added
  • Argument should_rename_legacy for legacy function detect and disregard any new arguments without errors (PR #​262)
Removed
  • Support for Python 3.6 (PR #​260)
Changed
  • Optional speedup provided by mypy/c 1.0.1

v3.0.1

Compare Source

Fixed
  • Multi-bytes cutter/chunk generator did not always cut correctly (PR #​233)
Changed
  • Speedup provided by mypy/c 0.990 on Python >= 3.7

v3.0.0

Compare Source

Added
  • Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
  • Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
  • Add parameter language_threshold in from_bytes, from_path and from_fp to adjust the minimum expected coherence ratio
  • normalizer --version now specify if current version provide extra speedup (meaning mypyc compilation whl)
Changed
  • Build with static metadata using 'build' frontend
  • Make the language detection stricter
  • Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
Fixed
  • CLI with opt --normalize fail when using full path for files
  • TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
  • Sphinx warnings when generating the documentation
Removed
  • Coherence detector no longer return 'Simple English' instead return 'English'
  • Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
  • Breaking: Method first() and best() from CharsetMatch
  • UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
  • Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
  • Breaking: Top-level function normalize
  • Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch
  • Support for the backport unicodedata2

v2.1.1

Compare Source

Deprecated
  • Function normalize scheduled for removal in 3.0
Changed
  • Removed useless call to decode in fn is_unprintable (#​206)
Fixed

v2.1.0

Compare Source

Added
  • Output the Unicode table version when running the CLI with --version (PR #​194)
Changed
Fixed
  • Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space (PR #​175)
  • CLI default threshold aligned with the API threshold from @​oleksandr-kuzmenko (PR #​181)
Removed
  • Support for Python 3.5 (PR #​192)
Deprecated
  • Use of backport unicodedata from unicodedata2 as Python is quickly catching up, scheduled for removal in 3.0 (PR #​194)

v2.0.12

Compare Source

Fixed
  • ASCII miss-detection on rare cases (PR #​170)

v2.0.11

Compare Source

Added
  • Explicit support for Python 3.11 (PR #​164)
Changed
  • The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels (PR #​163 #​165)

v2.0.10

Compare Source

Fixed
  • Fallback match entries might lead to UnicodeDecodeError for large bytes sequence (PR #​154)
Changed
  • Skipping the language-detection (CD) on ASCII (PR #​155)

v2.0.9

Compare Source

Changed
  • Moderating the logging impact (since 2.0.8) for specific environments (PR #​147)
Fixed
  • Wrong logging level applied when setting kwarg explain to True (PR #​146)

v2.0.8

Compare Source

Changed
  • Improvement over Vietnamese detection (PR #​126)
  • MD improvement on trailing data and long foreign (non-pure latin) data (PR #​124)
  • Efficiency improvements in cd/alphabet_languages from @​adbar (PR #​122)
  • call sum() without an intermediary list following PEP 289 recommendations from @​adbar (PR #​129)
  • Code style as refactored by Sourcery-AI (PR #​131)
  • Minor adjustment on the MD around european words (PR #​133)
  • Remove and replace SRTs from assets / tests (PR #​139)
  • Initialize the library logger with a NullHandler by default from @​nmaynes (PR #​135)
  • Setting kwarg explain to True will add provisionally (bounded to function lifespan) a specific stream handler (PR #​135)
Fixed
  • Fix large (misleading) sequence giving UnicodeDecodeError (PR #​137)
  • Avoid using too insignificant chunk (PR #​137)
Added

v2.0.7

Compare Source

Added
  • Add support for Kazakh (Cyrillic) language detection (PR #​109)
Changed
  • Further, improve inferring the language from a given single-byte code page (PR #​112)
  • Vainly trying to leverage PEP263 when PEP3120 is not supported (PR #​116)
  • Refactoring for potential performance improvements in loops from @​adbar (PR #​113)
  • Various detection improvement (MD+CD) (PR #​117)
Removed
  • Remove redundant logging entry about detected language(s) (PR #​115)
Fixed
  • Fix a minor inconsistency between Python 3.5 and other versions regarding language detection (PR #​117 #​102)

Configuration

📅 Schedule: Branch creation - "before 4am" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Enabled.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR has been generated by Mend Renovate. View repository job log here.

@renovate renovate bot requested a review from billsioros as a code owner October 17, 2023 01:40
@renovate renovate bot added the 🎲 dependencies Working on dependencies label Oct 17, 2023
@renovate renovate bot force-pushed the renovate/charset-normalizer-3.x branch from 301be20 to e94db87 Compare October 22, 2023 19:53
@stale
Copy link

stale bot commented Oct 30, 2023

This issue has been marked stale, as it had no activity in the last 7 days. If the issue remains stale for an additional 7 days (a total of two weeks with no activity), it will be automatically closed.

@stale stale bot added the 💀 stale This had no recent activity label Oct 30, 2023
@renovate renovate bot force-pushed the renovate/charset-normalizer-3.x branch from e94db87 to cd1631d Compare November 1, 2023 04:35
Copy link

stale bot commented Nov 8, 2023

Closing the issue due to inactivity.

@stale stale bot closed this Nov 8, 2023
Copy link
Author

renovate bot commented Nov 8, 2023

Renovate Ignore Notification

Because you closed this PR without merging, Renovate will ignore this update. You will not get PRs for any future 3.x releases. But if you manually upgrade to 3.x then Renovate will re-enable minor and patch updates automatically.

If you accidentally closed this PR, or if you changed your mind: rename this PR to get a fresh replacement PR.

@renovate renovate bot deleted the renovate/charset-normalizer-3.x branch November 8, 2023 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🎲 dependencies Working on dependencies 💀 stale This had no recent activity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants