Use white space as separator in certain rules #1798

yaindrop · 2022-05-05T03:23:12Z

yaindrop
May 5, 2022

Beginner question here: if I'm to use white space as the the separator in certain rules, do I have to remove it from the Lexer.SKIPPED group and handle it everywhere else?

For example, a syntax like this:

Identifier0 {
    Identifier1 arg0 arg1

    Identifier2 {
        Identifier3 arg0 arg1 arg2
    }
}

There are two kinds of statements: inline and multi-line. The statements are separated by line breaks, The whitespace is the separator for inline statements arguments like "Identifier1 arg0 arg1", but it should be skipped elsewhere such as in the indents. What's the best way to achieve this effect? Thank you!

Answered by NaridaL

May 5, 2022

With whitespace skipped, you will get the following tokens: IDENT LBRACE IDENT IDENT IDENT IDENT LBRACE.

One thing that would probably work is a gate which checks if the token is on the same line.
You will probably need to ignore ambiguities somewhere.

// parse identifiers as args while they are on the same line as the previous token:
 this.MANY({
    GATE: () => this.LA(0).lineStart === this.LA(1).lineStart,
    DEF: () => {
           this.CONSUME(IDENTIFIER)
         }
  });

Another approach, if most of the newlines are relevant, but spaces and tabs are not, would be to have your WHITESPACE token NOT match "\n" and be skipped. "\n" would be a separate token which you don't skip. You m…

View full answer

NaridaL · 2022-05-05T05:33:55Z

NaridaL
May 5, 2022
Collaborator

With whitespace skipped, you will get the following tokens: IDENT LBRACE IDENT IDENT IDENT IDENT LBRACE.

One thing that would probably work is a gate which checks if the token is on the same line.
You will probably need to ignore ambiguities somewhere.

// parse identifiers as args while they are on the same line as the previous token:
 this.MANY({
    GATE: () => this.LA(0).lineStart === this.LA(1).lineStart,
    DEF: () => {
           this.CONSUME(IDENTIFIER)
         }
  });

Another approach, if most of the newlines are relevant, but spaces and tabs are not, would be to have your WHITESPACE token NOT match "\n" and be skipped. "\n" would be a separate token which you don't skip. You might need to handle in a couple of cases where it could be ignored but it should be fewer.

2 replies

yaindrop May 6, 2022
Author

Thank you for the answer! I can now use look ahead and gate to restrict my tokens to be in the same / different line, as well as requiring them to have space between.

Another question emerges though: how can I judge whether a parsing error is due to a failed gate predicate? Currently the error message still says something like "Expecting token sequences [A, B, C, ...] but found X" while X actually matches one of the sequence, and it's just the gate that fails it.

I wrote a playground example here

The workaround I can think of is to record a look-up table of all positions where the gate fails. Is there any efficient method? Thanks.

bd82 May 9, 2022
Maintainer

related: #219

bd82 · 2022-05-09T21:13:58Z

bd82
May 9, 2022
Maintainer

Beginner question here: if I'm to use white space as the the separator in certain rules, do I have to remove it from the Lexer.SKIPPED group and handle it everywhere else?

It is best to avoid handling whitespace everywhere if possible as that ends up being very verbose.

There are two kinds of statements: inline and multi-line. The statements are separated by line breaks, The whitespace is the separator for inline statements arguments like "Identifier1 arg0 arg1", but it should be skipped elsewhere such as in the indents. What's the best way to achieve this effect? Thank you!

Does the existence or lack thereof of the whitespace cause different meaning during parsing?
If it is just a formatting / cosmetic issue you could treat it as a linting error instead of a syntactic error.

Can you identify the situation where the whitespace is meaningful during (or after) lexing and inject
additional tokens into the tokens vector before the parsing phase?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use white space as separator in certain rules #1798

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Use white space as separator in certain rules #1798

yaindrop May 5, 2022

Replies: 2 comments · 2 replies

NaridaL May 5, 2022 Collaborator

yaindrop May 6, 2022 Author

bd82 May 9, 2022 Maintainer

bd82 May 9, 2022 Maintainer

yaindrop
May 5, 2022

Replies: 2 comments 2 replies

NaridaL
May 5, 2022
Collaborator

yaindrop May 6, 2022
Author

bd82 May 9, 2022
Maintainer

bd82
May 9, 2022
Maintainer