Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(deps): update rust crate text-splitter to 0.22 #257

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Oct 26, 2024

This PR contains the following updates:

Package Type Update Change
text-splitter dependencies minor 0.17 -> 0.22

Release Notes

benbrandt/text-splitter (text-splitter)

v0.22.0

Compare Source

Breaking Changes
  • Revert change to special token behavior in v0.21. This had many unintended side effects, and does not seem to be recommended for chunking.

v0.21.0

Compare Source

Breaking Changes
  • Special tokens are now also encoded by both Huggingface and Tiktoken tokenizers. This is closer to the default behavior on the Python side, and should make sure if a model adds tokens at the beginning or end of a sequence, these are accounted for as well. This is especially important for embedding models that can add a special token to the beginning of the sequence, and the chunks generated didn't actually fit within the context window because of this.
What's New
Rust
  • MSRV is now 1.80 to remove dependency on once_cell.

v0.20.1

Compare Source

Fixes
  • Python: correctly specify version for compatibility with uv installations.

v0.20.0

Compare Source

Breaking Changes
  • Switched backing Unicode segmentation implementation from unicode-segmentation to icu_segmenter. This brings some modest performance gains, along with being able to leverage the official Unicode crate. There may be slight differences in chunk behavior in some edge cases, so treating this as a breaking change.

v0.19.1

Compare Source

What's New
  • Python splitters have new chunk_all and chunk_all_indices method so the multiple texts can be processed in parallel. (For Rust, you should be able to use rayon to do this already)

v0.19.0

Compare Source

Breaking Changes
  • Update to tokenizers v0.21

v0.18.1

Compare Source

What's New
  • Ensure tokenizer sizers with truncation parameters count their overflow encodings

v0.18.0

Compare Source

Breaking
  • Change supported tiktoken-rs version to 0.6.x

Configuration

📅 Schedule: Branch creation - "after 1am every 3 weeks on Saturday" in timezone America/Los_Angeles, Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate bot force-pushed the renovate/text-splitter-0.x branch from a27df45 to 151c094 Compare November 17, 2024 16:29
@renovate renovate bot force-pushed the renovate/text-splitter-0.x branch from 151c094 to 3308543 Compare November 28, 2024 13:08
@renovate renovate bot changed the title fix(deps): update rust crate text-splitter to 0.18 fix(deps): update rust crate text-splitter to 0.19 Nov 28, 2024
@renovate renovate bot force-pushed the renovate/text-splitter-0.x branch from 3308543 to 947425c Compare December 14, 2024 21:50
@renovate renovate bot changed the title fix(deps): update rust crate text-splitter to 0.19 fix(deps): update rust crate text-splitter to 0.20 Dec 14, 2024
@renovate renovate bot force-pushed the renovate/text-splitter-0.x branch from 947425c to c01180e Compare January 16, 2025 13:12
@renovate renovate bot changed the title fix(deps): update rust crate text-splitter to 0.20 fix(deps): update rust crate text-splitter to 0.21 Jan 16, 2025
@renovate renovate bot force-pushed the renovate/text-splitter-0.x branch from c01180e to 78a1b44 Compare January 17, 2025 12:27
@renovate renovate bot changed the title fix(deps): update rust crate text-splitter to 0.21 fix(deps): update rust crate text-splitter to 0.22 Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants