optimization: use a DFA-based lexer for input #511

knz · 2022-10-08T12:40:11Z

First 3 commits from #568, #569 and #570.

This PR upgrades the map-based sequence detector to a lexical analyzer generated using golex.

The performance improvement is 4x, going down from from ~41.5ns/byte (using a map) to ~18.4ns/byte (using the DFA) on my test computer.

Informs #404.

Prior to this patch, the read loop could split UTF-8 or escape sequences at a 256 byte boundary, resulting in invalid events. This patch fixes it.

knz · 2022-10-08T12:49:06Z

cc @muesli for your consideration. This is followup to what we were looking at yesterday. I'll add some benchmarks to showcase the improvement, even in the simple case when there is just 1 event in the input buffer.

knz · 2022-10-09T20:59:58Z

This is ready for review now.

This also upgrades the map-based sequence detector to a lexical analyzer generated using [golex](https://pkg.go.dev/modernc.org/golex). The performance improvement is 4x, going down from from ~41.5ns/byte (using a map) to ~18.4ns/byte (using the DFA) on my test computer.

knz · 2022-10-10T08:04:06Z

So right now this code consumes all the input coming from the terminal until there's a lull with no input, and then only it starts detecting events in the input data.

I wonder if that's the best approach. Don't we want the bubbletea model to start consuming events in parallel with the delivery of input bytes?

Also what if the input bytes are delivered continuously with no pause, or the go code becomes slowed down whereby there's always some input available in the input loop?

The alternative to this is to read the input incrementally as needed for each event, instead of doing it in two phases like previously. I'll try to prototype something.

knz · 2022-10-13T08:37:50Z

Note to self: also check the context cancellation during long reads.

knz · 2022-10-21T14:41:43Z

After discussion with @muesli we are considering the following:

1 PR to flip the control of readInputs to write the new messages to a channel directly.
1 PR to simplify the structure of readInputs while preserving the sequence map
1 PR to fix the long inputs
1 PR to use the golex-based sequence parser.

knz · 2023-01-07T19:18:16Z

I will need to adapt this after the control flow change in #569.

fix(key): properly support large pastes from terminal

fea6e86

Prior to this patch, the read loop could split UTF-8 or escape sequences at a 256 byte boundary, resulting in invalid events. This patch fixes it.

knz force-pushed the 20221007-key-input branch 5 times, most recently from 2df4f23 to e99e186 Compare October 8, 2022 12:47

This comment was marked as outdated.

Sign in to view

muesli added the bug Something isn't working label Oct 9, 2022

knz force-pushed the 20221007-key-input branch 4 times, most recently from b5a8a11 to 787b15f Compare October 9, 2022 20:55

knz changed the title ~~[WIP] fix(key): properly support large pastes from terminal~~ fix(key): properly support large pastes from terminal Oct 9, 2022

knz marked this pull request as ready for review October 9, 2022 20:59

knz mentioned this pull request Oct 9, 2022

Upstream bubble changes knz/bubbline#2

Open

34 tasks

knz force-pushed the 20221007-key-input branch from 787b15f to 419c651 Compare October 9, 2022 21:12

knz force-pushed the 20221007-key-input branch from 419c651 to fa6f5d6 Compare October 9, 2022 21:19

knz mentioned this pull request Oct 13, 2022

feat: bracketed paste #397

Merged

This was referenced Oct 21, 2022

fix(key),test: simplify the input analysis code #568

Merged

fix(key): invert the control loop #569

Merged

fix(key): support very long buffered input #570

Merged

knz mentioned this pull request Dec 12, 2022

cli/sql: macOS terminals don't handle alt-based key combinations by default cockroachdb/cockroach#93353

Closed

knz changed the title ~~fix(key): properly support large pastes from terminal~~ optimization: use a DFA-based lexer for input Jan 7, 2023

knz marked this pull request as draft January 7, 2023 19:17

aymanbagabas deleted the branch charmbracelet:main October 28, 2024 17:41

aymanbagabas closed this Oct 28, 2024

meowgorithm reopened this Oct 28, 2024

meowgorithm changed the base branch from master to main October 28, 2024 17:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimization: use a DFA-based lexer for input #511

optimization: use a DFA-based lexer for input #511

knz commented Oct 8, 2022 •

edited

Loading

knz commented Oct 8, 2022

This comment was marked as outdated.

knz commented Oct 9, 2022

knz commented Oct 10, 2022

knz commented Oct 13, 2022

knz commented Oct 21, 2022 •

edited

Loading

knz commented Jan 7, 2023

optimization: use a DFA-based lexer for input #511

Are you sure you want to change the base?

optimization: use a DFA-based lexer for input #511

Conversation

knz commented Oct 8, 2022 • edited Loading

knz commented Oct 8, 2022

This comment was marked as outdated.

knz commented Oct 9, 2022

knz commented Oct 10, 2022

knz commented Oct 13, 2022

knz commented Oct 21, 2022 • edited Loading

knz commented Jan 7, 2023

knz commented Oct 8, 2022 •

edited

Loading

knz commented Oct 21, 2022 •

edited

Loading