Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimization: use a DFA-based lexer for input #511

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

knz
Copy link
Contributor

@knz knz commented Oct 8, 2022

First 3 commits from #568, #569 and #570.

This PR upgrades the map-based sequence detector to a lexical analyzer generated using golex.

The performance improvement is 4x, going down from from ~41.5ns/byte (using a map) to ~18.4ns/byte (using the DFA) on my test computer.

Informs #404.

Prior to this patch, the read loop could split UTF-8 or escape
sequences at a 256 byte boundary, resulting in invalid events. This
patch fixes it.
@knz knz force-pushed the 20221007-key-input branch 5 times, most recently from 2df4f23 to e99e186 Compare October 8, 2022 12:47
@knz
Copy link
Contributor Author

knz commented Oct 8, 2022

cc @muesli for your consideration. This is followup to what we were looking at yesterday. I'll add some benchmarks to showcase the improvement, even in the simple case when there is just 1 event in the input buffer.

@knz

This comment was marked as outdated.

@muesli muesli added the bug Something isn't working label Oct 9, 2022
@knz knz force-pushed the 20221007-key-input branch 4 times, most recently from b5a8a11 to 787b15f Compare October 9, 2022 20:55
@knz knz changed the title [WIP] fix(key): properly support large pastes from terminal fix(key): properly support large pastes from terminal Oct 9, 2022
@knz knz marked this pull request as ready for review October 9, 2022 20:59
@knz
Copy link
Contributor Author

knz commented Oct 9, 2022

This is ready for review now.

@knz knz mentioned this pull request Oct 9, 2022
34 tasks
@knz knz force-pushed the 20221007-key-input branch from 787b15f to 419c651 Compare October 9, 2022 21:12
This also upgrades the map-based sequence detector to a lexical
analyzer generated using
[golex](https://pkg.go.dev/modernc.org/golex).

The performance improvement is 4x, going down from from
~41.5ns/byte (using a map) to ~18.4ns/byte (using the DFA) on my test
computer.
@knz knz force-pushed the 20221007-key-input branch from 419c651 to fa6f5d6 Compare October 9, 2022 21:19
@knz
Copy link
Contributor Author

knz commented Oct 10, 2022

So right now this code consumes all the input coming from the terminal until there's a lull with no input, and then only it starts detecting events in the input data.

I wonder if that's the best approach. Don't we want the bubbletea model to start consuming events in parallel with the delivery of input bytes?

Also what if the input bytes are delivered continuously with no pause, or the go code becomes slowed down whereby there's always some input available in the input loop?

The alternative to this is to read the input incrementally as needed for each event, instead of doing it in two phases like previously. I'll try to prototype something.

@knz knz mentioned this pull request Oct 13, 2022
@knz
Copy link
Contributor Author

knz commented Oct 13, 2022

Note to self: also check the context cancellation during long reads.

@knz
Copy link
Contributor Author

knz commented Oct 21, 2022

After discussion with @muesli we are considering the following:

  • 1 PR to flip the control of readInputs to write the new messages to a channel directly.
  • 1 PR to simplify the structure of readInputs while preserving the sequence map
  • 1 PR to fix the long inputs
  • 1 PR to use the golex-based sequence parser.

@knz knz changed the title fix(key): properly support large pastes from terminal optimization: use a DFA-based lexer for input Jan 7, 2023
@knz knz marked this pull request as draft January 7, 2023 19:17
@knz
Copy link
Contributor Author

knz commented Jan 7, 2023

I will need to adapt this after the control flow change in #569.

@aymanbagabas aymanbagabas deleted the branch charmbracelet:main October 28, 2024 17:41
@meowgorithm meowgorithm reopened this Oct 28, 2024
@meowgorithm meowgorithm changed the base branch from master to main October 28, 2024 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants