-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tokenize_batch
method
#14
Conversation
1435509
to
b055c73
Compare
regex = "1.10.2" | ||
|
||
[dev-dependencies] | ||
criterion = "0.5.1" | ||
|
||
[[bench]] | ||
name = "bench" | ||
required-features = ["openai-vocabulary-file"] | ||
required-features = ["ndarray", "openai-vocabulary-file"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid this and only guard the tokenize_batch_small
benchmark on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the benchmark which requires the (now optional) ndarray
feature to a second benchmark harness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this better than just guarding the single benchmark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure guarding a single benchmark is even possible in Criterion without much effort. I didn't find anything in the documentation, and tried several combinations while defining the criterion_group!(...)
and criterion_main!(...)
but it looks like there is no easy way to feature-gate individual benchmarks (except probably duplicating both criterion_group!(...)
and criterion_main!(...)
).
In addition, it's maybe a bit more discoverable. I tend to just run cargo bench
on projects I'm not familiar with (instead of cargo bench --all-features
) and would hence miss those benchmarks. By having two benchmark .rs
files I might realize that there are actually more benchmarks I can run.
b055c73
to
6fb71dd
Compare
6fb71dd
to
ca501b2
Compare
regex = "1.10.2" | ||
|
||
[dev-dependencies] | ||
criterion = "0.5.1" | ||
|
||
[[bench]] | ||
name = "bench" | ||
required-features = ["openai-vocabulary-file"] | ||
required-features = ["ndarray", "openai-vocabulary-file"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this better than just guarding the single benchmark?
The Python bindings provide this method as a convenience function. I think this is useful for users of the Rust library as well, so I implemented it there. This means we have to take on a dependency on
ndarray
, but I think this is justified given how widely used this crate is in the ML ecosystem.