Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: fix issue with headers in json parser example #165

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 20 additions & 17 deletions docs/content/jsonParserExample.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,13 @@ Content:
----------------
FsLexYacc is a solution which allows you to define and generate lexer and parser. It's made of two parts FsLex (lexer generator) and FsYacc (parser generator). Parsing is a two phase process. In the first phase lexer is analyzing text and creates a stream of tokens (or token list). In the second phase parser is going through tokens in the stream and generates output (syntax tree).

FsYacc (parser generator) is decoupled from FsLex (lexer generator) and can accept a token stream from any lexer. Generated parser will require a function which can translate tokens generated by lexer into tokens configured in parser. To avoid additional pointless work if you use FsYacc with FsLex define parser first. This way FsYacc will generate an union type defining all required tokens. Then you can use this union type to generate tokens inside lexer. So despite fact that lexing happens before parsing we will define parser first. It means that in your F# project generated parser must be first on the files list before lexer.
FsYacc (parser generator) is decoupled from FsLex (lexer generator) and can accept a token stream from any lexer. Generated parser will require a function which can translate tokens generated by lexer into tokens configured in parser. To avoid additional pointless work if you use FsYacc with FsLex define parser first. This way FsYacc will generate an union type defining all required tokens. Then you can use this union type to generate tokens inside lexer. So despite fact that lexing happens before parsing we will define parser first. It means that in your F# project generated parser must be first on the files list before lexer.

2 Syntax tree
-----------------
Create a new F# library project and install FsLexYacc package:
PM> Install-Package FsLexYacc
We will start by describing the syntax tree. It will be result of parsing a text. Add a new file called ``JsonValue.fs`` and paste below union type definition:
We will start by describing the syntax tree. It will be result of parsing a text. Add a new file called ``JsonValue.fs`` and paste below union type definition:

type JsonValue =
| Assoc of (string * JsonValue) list
Expand All @@ -37,7 +38,7 @@ Nothing fancy here. Assoc is simply an object which contains a list of propertie
3 Parser definition
----------------------

In your project root directory (not solution root) create a file called ``Parser.fsy``. Now edit project file (.fsproj ).  Find  the line where JsonValue.fs is included, than add this element this xml:
In your project root directory (not solution root) create a file called ``Parser.fsy``. Now edit project file (.fsproj ). Find the line where JsonValue.fs is included, than add this element this xml:

<FsYacc Include="Parser.fsy">
<OtherFlags>--module Parser</OtherFlags>
Expand Down Expand Up @@ -122,31 +123,32 @@ Reload/Open the project and add below code to Parser.fsy:
| rev_values COMMA value { $3 :: $1 }
This file is describing parsing rules and tokens. When you build project, in the project root directory a new file will be created called ``Parser.fs``. Include it in your project.

Lets look closer at parser definition (``Parser.fsy)``. At the very top of the file there is a section for ``open`` statements. You can open any namespace or module you want. We need only ``JsonParsing`` where we defined ``JsonValux`` structure. All open statements should be between`` %{`` and ``%}. ``
Lets look closer at parser definition (``Parser.fsy)``. At the very top of the file there is a section for ``open`` statements. You can open any namespace or module you want. We need only ``JsonParsing`` where we defined ``JsonValux`` structure. All open statements should be between`` %{`` and ``%}. ``

Next in line 6 we state what is an entry rule for our parser. ``start ``will be the name of the rule which will be exposed by parsing module. It has to correspond to a name of rule (in our case line 27).
Next in line 6 we state what is an entry rule for our parser. ``start ``will be the name of the rule which will be exposed by parsing module. It has to correspond to a name of rule (in our case line 27).

Lines 8-21 define list of tokens. INT, FLOAT, STRING carry a value with them.

Then in line 23 we define what will be the result of parsing. In our case it will be JSON syntax tree defined by means of ``JsonValue``. The type is actually ``JsonValue option``. For empty text/file parser will return ``NONE``.

Everything that follows ``%%`` are parsing rules. Rules are made of list of productions. You can reference one rule from the other which will allow you to define nested parsers (we will use it for objects and lists).
Everything that follows ``%%`` are parsing rules. Rules are made of list of productions. You can reference one rule from the other which will allow you to define nested parsers (we will use it for objects and lists).

Our entry point (line 27) simply calls ``prog ``rule and returns it's result. Everything between curly braces is an ordinary F# code.
Our entry point (line 27) simply calls ``prog ``rule and returns it's result. Everything between curly braces is an ordinary F# code.

``prog ``(line 22) has two productions. First says that if the first token is ``EOF ``then return ``None. ``The second  rule says execute ``value ``rules and return ``Some ``containing it's result. Productions are processed one after another from top to bottom.
``prog ``(line 22) has two productions. First says that if the first token is ``EOF ``then return ``None. ``The second rule says execute ``value ``rules and return ``Some ``containing it's result. Productions are processed one after another from top to bottom.

On line 33 we define main rule which will parse JSON values. Each rule has a name and list of productions. Each production starts with tokens pattern  and contains a result block ``{} `` which says what should be returned if this pattern is matched. Result block contains ordinary F# code.
On line 33 we define main rule which will parse JSON values. Each rule has a name and list of productions. Each production starts with tokens pattern and contains a result block ``{} `` which says what should be returned if this pattern is matched. Result block contains ordinary F# code.

Rules can reference each other. Like at line 34 where the pattern says match a left brace, whatever will be matched by ``object_fields rule`` and a right brace. Now what is this ``Assoc $2. ``Assoc is JsonValue union type that we create and $2 corresponds to value matched at second position in pattern which in this case is a list of properties.
Rules can reference each other. Like at line 34 where the pattern says match a left brace, whatever will be matched by ``object_fields rule`` and a right brace. Now what is this ``Assoc $2. ``Assoc is JsonValue union type that we create and $2 corresponds to value matched at second position in pattern which in this case is a list of properties.

If you look at ``object_fields ``rule, you'll notice it actually calls ``rev_object_fields`` and reverses order of results.  ``rev_object_fields`` is trying to match list of properties but it will collect them in wrong order. First production says that if token stream is empty (there is nothing between brace) then return empty list. We match an empty stream by not providing any pattern. Second rule is more interesting. It says that if we encounter a string then a colon and anything that matches the ``value`` rule we should return one element list containing a pair. First element in pair is matched string and second is matched value. This production will be used for objects that have one element or for the last property on the list (there is no comma at the end). Third rule contains two references to other rules. First we match any list of values then a comma, a string, a colon and ``value``. And the result is a list where head is matched property (tokens on position 3-5) and tail is made of other matched properties. This production is for all properties which are followed by comma. Rules for arrays are very similar (even simpler).
If you look at ``object_fields ``rule, you'll notice it actually calls ``rev_object_fields`` and reverses order of results. ``rev_object_fields`` is trying to match list of properties but it will collect them in wrong order. First production says that if token stream is empty (there is nothing between brace) then return empty list. We match an empty stream by not providing any pattern. Second rule is more interesting. It says that if we encounter a string then a colon and anything that matches the ``value`` rule we should return one element list containing a pair. First element in pair is matched string and second is matched value. This production will be used for objects that have one element or for the last property on the list (there is no comma at the end). Third rule contains two references to other rules. First we match any list of values then a comma, a string, a colon and ``value``. And the result is a list where head is matched property (tokens on position 3-5) and tail is made of other matched properties. This production is for all properties which are followed by comma. Rules for arrays are very similar (even simpler).

That's it we can now start building lexer. You should be able to compile the project and see generated parser in ``Parser.fs``. It looks ugly but fortunately we will always deal only with grammar description.

4 Lexer
-----------

In your project root directory (not solution root) create a file called ``Lexer.fsl``. Now edit the project file (.fsproj ).  Find the line where ``JsonValue.fs`` is included then add this elment:
In your project root directory (not solution root) create a file called ``Lexer.fsl``. Now edit the project file (.fsproj ). Find the line where ``JsonValue.fs`` is included then add this elment:

<FsLex Include="Lexer.fsl">
<OtherFlags>--unicode</OtherFlags>
Expand Down Expand Up @@ -225,24 +227,25 @@ In project directory there will be a new file ``Lexer.fs``. Include it in the pr

The first part of a lexer is simply F# code enclosed in ``{}``. It defines module, opens namespaces and defines helper functions. ``lexeme`` function will extract matched string from buffer. ``newline`` function updates buffer position to skip new lines characters. Notice that we open <em>Parser</em> module which contains the union type with tokens. We will use them in our productions.

Next we have a list of named regular expressions which we can use later. We can also use one regular expression within another by simply referring it by name. See line 22. ``float`` expression is referencing ``digit, frac and exp`` expressions. Space in expression means concatenation or  "and then" if you will. So for example ``frac`` expression at line 20 means: dot character then any number of digits. Characters in single quote are matched literally, for example '-' or '.'. If there are multiple expressions enclosed in ``[]`` at least one of them must be matched. You can also use ranges: ``['0'-'9'] ['a'-'z'] ['A'-'Z']``.
Next we have a list of named regular expressions which we can use later. We can also use one regular expression within another by simply referring it by name. See line 22. ``float`` expression is referencing ``digit, frac and exp`` expressions. Space in expression means concatenation or "and then" if you will. So for example ``frac`` expression at line 20 means: dot character then any number of digits. Characters in single quote are matched literally, for example '-' or '.'. If there are multiple expressions enclosed in ``[]`` at least one of them must be matched. You can also use ranges: ``['0'-'9'] ['a'-'z'] ['A'-'Z']``.
Meaning of repetition patterns:

* ? - 0 or 1
* \+ - 1 or more
* \* - 0 or more

After the list of regular expressions patterns we have lexing rules. Each rule contains list of patterns with productions. Same as in parser one rule can reference another.  You can also pass arguments between rules.
After the list of regular expressions patterns we have lexing rules. Each rule contains list of patterns with productions. Same as in parser one rule can reference another. You can also pass arguments between rules.

If you look at productions left side, it  is a pattern and right side is a F# code which will be executed when pattern is matched. F# code also must return a value (same type of value across all productions withing rule). In our case we're returning tokens, which will be later used by parser.
If you look at productions left side, it is a pattern and right side is a F# code which will be executed when pattern is matched. F# code also must return a value (same type of value across all productions withing rule). In our case we're returning tokens, which will be later used by parser.

``read_string`` rule takes two arguments ``str ``which is accumulator, and flag ``ignorequote`` which will make rule ignore next quote. This way we can process strings which contain escaped quote like: "abc\"df".
``read_string`` rule takes two arguments ``str ``which is accumulator, and flag ``ignorequote`` which will make rule ignore next quote. This way we can process strings which contain escaped quote like: "abc\"df".

You should be able to build project and preview generated lexer in ``Lexer.fs`` file.

5 Program
-------------

The last piece of the puzzle is to write program which will user parser. Look at below code. ``parse ``function will take text json (string) parse it and return syntax tree. You can see result in debug:
The last piece of the puzzle is to write program which will user parser. Look at below code. ``parse ``function will take text json (string) parse it and return syntax tree. You can see result in debug:

module Program
open FSharp.Text.Lexing
Expand Down