-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimization: use a DFA-based lexer for input #511
base: main
Are you sure you want to change the base?
Conversation
Prior to this patch, the read loop could split UTF-8 or escape sequences at a 256 byte boundary, resulting in invalid events. This patch fixes it.
2df4f23
to
e99e186
Compare
cc @muesli for your consideration. This is followup to what we were looking at yesterday. I'll add some benchmarks to showcase the improvement, even in the simple case when there is just 1 event in the input buffer. |
This comment was marked as outdated.
This comment was marked as outdated.
b5a8a11
to
787b15f
Compare
This is ready for review now. |
787b15f
to
419c651
Compare
This also upgrades the map-based sequence detector to a lexical analyzer generated using [golex](https://pkg.go.dev/modernc.org/golex). The performance improvement is 4x, going down from from ~41.5ns/byte (using a map) to ~18.4ns/byte (using the DFA) on my test computer.
419c651
to
fa6f5d6
Compare
So right now this code consumes all the input coming from the terminal until there's a lull with no input, and then only it starts detecting events in the input data. I wonder if that's the best approach. Don't we want the bubbletea model to start consuming events in parallel with the delivery of input bytes? Also what if the input bytes are delivered continuously with no pause, or the go code becomes slowed down whereby there's always some input available in the input loop? The alternative to this is to read the input incrementally as needed for each event, instead of doing it in two phases like previously. I'll try to prototype something. |
Note to self: also check the context cancellation during long reads. |
After discussion with @muesli we are considering the following:
|
I will need to adapt this after the control flow change in #569. |
First 3 commits from #568, #569 and #570.
This PR upgrades the map-based sequence detector to a lexical analyzer generated using golex.
The performance improvement is 4x, going down from from ~41.5ns/byte (using a map) to ~18.4ns/byte (using the DFA) on my test computer.
Informs #404.