Skip to content

Releases: rapidfuzz/RapidFuzz

Release 3.5.0

31 Oct 11:07
Compare
Choose a tag to compare

Changed

  • skip pandas pd.NA similar to None
  • add score_multiplier argument to process.cdist which allows multiplying the end result scores
    with a constant factor.
  • drop support for Python 3.7

Performance

  • improve performance of simd implementation for LCS / Indel / Jaro / JaroWinkler
  • improve performance of Jaro and Jaro Winkler for long sequences
  • implement process.extract with limit=1 using process.extractOne which can be faster

Fixed

  • the preprocessing function was always called through Python due to a broken C-API version check
  • fix wraparound issue in simd implementation of Jaro and Jaro Winkler

Release 3.4.0

09 Oct 02:19
Compare
Choose a tag to compare

Changed

  • upgrade to Cython==3.0.3
  • add simd implementation for Jaro and Jaro Winkler

Release 2.15.2

06 Oct 23:56
0058ead
Compare
Choose a tag to compare

Since rapidfuzz v2.x is still widely used, Python 3.12 support is backported to rapidfuzz v2.x.

Added

  • add python 3.12 support

Release 3.3.1

24 Sep 22:15
10efe27
Compare
Choose a tag to compare

Added

  • add missing tag for python 3.12 support

Release 3.3.0

13 Sep 10:50
b7c5908
Compare
Choose a tag to compare

Changed

  • upgrade to Cython==3.0.2
  • implement the remaining missing features from the C++ implementation in the pure Python implementation

Added

  • added support for Python 3.12

Release 3.2.0

02 Aug 13:18
ece1387
Compare
Choose a tag to compare

Changed

  • build x86 with sse2/avx2 runtime detection

Release 3.1.2

20 Jul 21:16
b23067f
Compare
Choose a tag to compare

Changed

  • upgrade to Cython==3.0.0

Release 3.1.1

06 Jun 12:32
Compare
Choose a tag to compare

Changed

  • upgrade to taskflow==3.6

Fixed

  • replace usage of isnan with std::isnan which fixes the build on NetBSD

Release 3.1.0

02 Jun 21:15
Compare
Choose a tag to compare

Changed

  • added keyword argument pad to Hamming distance. This controls whether sequences of different
    length should be padded or lead to a ValueError
  • improve consistency of exception messages between the C++ and pure Python implementation
  • upgrade required Cython version to Cython==3.0.0b3

Fixed

  • fix missing GIL restore when an exception is thrown inside process.cdist
  • fix incorrect type hints for the process module

Release 3.0.0

17 Apr 00:27
Compare
Choose a tag to compare

Changed

  • allow the usage of Hamming for different string lengths. Length differences are handled as
    insertions / deletions

  • remove support for boolean preprocessor functions in rapidfuzz.fuzz and rapidfuzz.process.
    The processor argument is now always a callable or None.

  • update defaults of the processor argument to be None everywhere. For affected functions this can change results, since strings are no longer preprocessed. To get back the old behaviour pass processor=utils.default_process to these functions. The following functions are affected by this:

    • process.extract, process.extract_iter, process.extractOne
    • fuzz.token_sort_ratio, fuzz.token_set_ratio, fuzz.token_ratio, fuzz.partial_token_sort_ratio, fuzz.partial_token_set_ratio, fuzz.partial_token_ratio, fuzz.WRatio, fuzz.QRatio
  • rapidfuzz.process no longer calls scorers with processor=None. For this reason user provided scorers no longer require this argument.

  • remove option to pass keyword arguments to scorer via **kwargs in rapidfuzz.process. They can be passed
    via a scorer_kwargs argument now. This ensures this does not break when extending function parameters and
    prevents naming clashes.

  • remove rapidfuzz.string_metric module. Replacements for all functions are available in rapidfuzz.distance

Added

  • added support for arbitrary hashable sequence in the pure Python fallback implementation of all functions in rapidfuzz.distance
  • added support for None and float("nan") in process.cdist as long as the underlying scorer supports it.
    This is the case for all scorers returning normalized results.

Fixed

  • fix division by zero in simd implementation of normalized metrics leading to incorrect results