Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining COSMOS blocks makes mention location information less specific #267

Open
maxaalexeeva opened this issue Oct 27, 2021 · 0 comments

Comments

@maxaalexeeva
Copy link
Contributor

In PR #263, we combine cosmos blocks to make sure paragraphs are not split up (that happens at the end of a column in two-column papers and at the end of pages). When we combine blocks, the location of extracted mentions becomes less specific---instead of saying Mention 1 comes from p. 1 block 1, we are saying Mention 1 comes from p. 1 block 1-2, and the mention can be located either in block 1, block 2, or be split between the two blocks. Keeping track of length of each block in characters and knowing the character offset of the extraction based on the combined block content can help narrow it down.

Note: Currently, COSMOS combines some paragraphs into longer blocks. This needs to be discussed with UW.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant