Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RTR Text reflow #49

Open
philips opened this issue Jan 25, 2025 · 3 comments
Open

RTR Text reflow #49

philips opened this issue Jan 25, 2025 · 3 comments

Comments

@philips
Copy link
Owner

philips commented Jan 25, 2025

Problem: Recognized text isn't really formatted into paragraphs in a way that is nice for markdown.

Solution: There needs to be some sort of heuristics algorithm to group the bounding boxes (see IRecognitionElement) and text into reflowed text.

How you can help: Please upload simple single page RTR .note files to this issue and include a copy of the text formatted in the way you wish it was formatted so I can generate some test cases.

For example:

Note: rtr.note.zip

Screenshot (optional): Image

Should output this:

Real time recognition paragraph test

With enough space a new paragraph should be created. If lines are close then the text should reflow.

This should be a new paragraph.

As well as this.

But thin is the last paragraph and should reflow together.
@edfinn1973
Copy link

Sorry for the delay, thank you for working on this. Here are a few notes that are already published for you to take a look at. Let me know if I can help.

Supernote System Gaps (03-08-2024).zip

@philips
Copy link
Owner Author

philips commented Feb 8, 2025

I have had a couple of false starts at an algorithm here. The observations so far:

  • The label field at the top has all of the text in the right order but no bounding boxes.
  • The words field has all of the bounding boxes but the words may be out of order and are hard to reliably sort.

Probably need to make a third attempt here where I trust the label field and then try to build a bounding box around the lines using fuzzy matching from the words fields for each line.

@edfinn1973
Copy link

edfinn1973 commented Feb 8, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants