Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing properly with space-separated multi-part words #2

Open
aarppe opened this issue Jul 10, 2019 · 0 comments
Open

Dealing properly with space-separated multi-part words #2

aarppe opened this issue Jul 10, 2019 · 0 comments
Assignees

Comments

@aarppe
Copy link

aarppe commented Jul 10, 2019

EXECUTIVE SUMMARY:

Implement in click-in-text a lookup of the context of the clicking/hovering point, in order to catch the entire verb (or noun) construction that may be written in multiple parts separated by spaces.

DETAILS:

In the writing standards for many Indigenous languages we are working on (Plains Cree, Northern Haida), complex words can consist of multiple strings separated by spaces.

For instance, in Plains Cree SRO, one can write the Independent verb form 'nikî-nitawi-kiskinwahamâkosin' also as:

'nikî nitawi kiskinwahamâkosin' (non-standard SRO)
'ᓂᑮ ᓂᑕᐏ ᑭᐢᑭᓌᐦᐊᒫᑯᓯᐣ' (standard Cree syllabics)

Even more common this is in the case of Conjunct verb forms, such as 'kâ-kî-awâsisîwiyân', which can also be written as:

kâ kî awâsisîwiyân (non-standard SRO)
ᑳ ᑮ ᐊᐚᓯᓰᐏᔮᐣ (standard Cree syllabics)

N.B. Our FST is designed so that for Conjunct verbs the stem and suffix part can be recognized without the strictly needed grammatical conjunct preverb (mostly ê-, but also kâ-), as that can at times be omitted.

Nevertheless, the set of prefixal elements is restricted, and can be relatively easily defined as the following regular-expression:

(PERSON|GRAMMATICAL-CONJUNCT-PREVERB)?(GRAMMATICAL-TENSE-PREVERB)((REDUPLICATION)?LEXICAL-PREVERB)((REDUPLICATION)?(COMITATIVE-PREVERB))?(REDUPLICATION)?

These can be fleshed out as follows (with '0' indicating an empty string, i.e. nothing):

PERSON: ni|ki|0
GRAMMATICAL-CONJUNCT-PREVERB: ê|ka|kâ|kâ-ki|0
GRAMMATICAL-TENSE-PREVERB: kî|wî|ka|0
REDUPLICATION-L: ay|ca|ka|ma|na|pa|sa|ta|wa|ya
REDUPLICATION-H: âh|câh|kâh|mâh|nâh|pâh|sâh|tâh|wâh|yâh
LEXICAL-PREVERB: list of 200+ elements under to contlex LEXICON PREVERBS in verb_affixes.lexc
COMITATIVE-PREVERB: wîci

Thus, is should be relatively straight-forward to look to the left of the clicking/invokement point whether the following elements are apparent. Indeed, one should start at the clicking point, as the user might be clicking the prefixal section, e.g. one of the lexical preverbs, and not the stem.

Moreover, we probably need to modify later on the FST to allow it to recognize the prefixal elements, which will then be designated with the +Err/Frag tag as well as some pertinent morphological information. This will be necessary for checking the correctness of verb constructions when the prefixal elements are written separated with spaces from the stem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants