diff --git a/doc/src/README.md b/doc/src/README.md
index a005426..2fdc81f 100644
--- a/doc/src/README.md
+++ b/doc/src/README.md
@@ -1,14 +1,19 @@
# Presentation
-This book will introduce you to parsing and transliteration, using Beans. Beans is written in
-[Rust](https://www.rust-lang.org), and henceforth this book will assume familiarity with this
-language. However, this book makes no assumptions on prior knowledge on parsing techniques. The
-end goal is to allow someone who has never written or used a parser to quickly become productive
-at writing and using parsing libraries.
+This book will introduce you to parsing and transliteration, using
+Beans. Beans is written in [Rust](https://www.rust-lang.org), and
+henceforth this book will assume familiarity with this
+language. However, this book makes no assumptions on prior knowledge
+of parsing techniques. The end goal is to allow someone who has never
+written or used a parser to quickly become productive at writing and
+using parsing libraries.
-Beans aims at being a general-purpose parser and lexer library, providing both enough
-performance so that you should never *need* something faster (even though these options exist),
-and enough expressiveness so that you never get stuck while using your parser. See the
-[tradeoffs](details/tradeoff.md) section for more details.
+Beans aims at being a general-purpose parser and lexer library,
+providing both enough performance so that you should never *need*
+something faster (even though these options exist), and enough
+expressiveness so that you never get stuck while using your
+parser. See the [tradeoffs](details/tradeoff.md) section for more
+details.
-Beans is free and open source, dual licensed MIT or GPL3+, at your choice.
+Beans is free and open source, dual licensed MIT or GPL3+, at your
+choice.
diff --git a/doc/src/concepts/README.md b/doc/src/concepts/README.md
index e4252fe..e9545fc 100644
--- a/doc/src/concepts/README.md
+++ b/doc/src/concepts/README.md
@@ -1,18 +1,22 @@
# Common Concepts
-When parsing with Beans, as with most other similar tools, three steps are performed, in this
-order:
+When parsing with Beans, as with most other similar tools, three steps
+are performed, in this order:
* [Lexing](lexer.md)
* [Parsing](parser.md)
* [Syntax tree building](ast.md)
-The first step, lexing, operates directly on plain text inputs, while the last is in charge of
-producing the abstract syntax tree. For more details on the operations that can be performed on
-the latter, please refer to the [Rewriting the AST Chapter](ast/README.md).
+The first step, lexing, operates directly on plain text inputs, while
+the last is in charge of producing the abstract syntax tree. For more
+details on the operations that can be performed on the latter, please
+refer to the chapter [Rewriting the AST](ast/README.md).
# Simple arithmetic expression
-Throughout the explanation of the core concepts of parsing, some simple grammars will be written
-to allow parsing a language of simple arithmetic expressions, consisting of numbers or binary
-operations (addition, multiplication, subtraction and division) on expressions. All the grammars
-will be available at https://github.com/jthulhu/beans, in the directory `doc/examples/arith`.
+Throughout the explanation of the core concepts of parsing, some
+simple grammars will be written to allow parsing a language of simple
+arithmetic expressions, consisting of numbers or binary operations
+(addition, multiplication, subtraction and division) on
+expressions. All the grammars will be available at
+https://github.com/jthulhu/beans, in the directory
+`doc/examples/arith`.
diff --git a/doc/src/concepts/grammars.md b/doc/src/concepts/grammars.md
index 00e1ef7..b2d294d 100644
--- a/doc/src/concepts/grammars.md
+++ b/doc/src/concepts/grammars.md
@@ -1 +1,2 @@
# Grammars
+
diff --git a/doc/src/concepts/lexer.md b/doc/src/concepts/lexer.md
index a8336cd..aaaa1a8 100644
--- a/doc/src/concepts/lexer.md
+++ b/doc/src/concepts/lexer.md
@@ -2,62 +2,76 @@
## What does a lexer do?
-A lexer performs the initial, important step of grouping together characters that couldn't be
-morphologically split, while removing useless ones. For instance, in most programming languages,
-spaces are only useful to split words, they do not have any intrinsic meaning. Therefore, they
-should be dumped by the lexer, whereas all the characters that form an identifier or a keyword
-should be grouped together to form a single *token*.
+A lexer performs the initial, important step of grouping together
+characters that couldn't be morphologically split [Eh?], while removing
+useless ones. For instance, in most programming languages, spaces are
+only useful to split words, they do not have any intrinsic
+meaning. Therefore, they should be dumped by the lexer, whereas all
+the characters that form an identifier or a keyword should be grouped
+together to form a single *token*.
-> Note: a *token*, also called a *terminal symbol* or more shortly a *terminal*, is a minimal
-> span of text of the input with an identified meaning. For instance, any identifier, keyword
-> or operator would be considered a token.
+> Note: a *token*, also called a *terminal symbol* or more shortly a
+> *terminal*, is a minimal span of text of the input with an
+> identified meaning. For instance, any identifier, keyword or
+> operator would be considered a token.
-Both the parser and the lexer in Beans use online algorithms, meaning that they will consume
-their input as they process it. Beans' lexer will consume the input string one unicode character
-at a time. The lexer might backtrack, but this is, in practice, very rare. Non-degenerate
-grammars will never trigger such backtracking.
+Both the parser and the lexer in Beans use online [Con degli URL?!?] algorithms, meaning
+that they will consume their input as they process it. Beans' lexer
+will consume the input string one unicode character at a time. The
+lexer might backtrack, but this is, in practice, very
+rare. Non-degenerate grammars will never trigger such backtracking.
-As the lexer reads the input, it will produce tokens. Sometimes (as with whitespace), it will
-discard them. Other times, it might forget what the exact characters where, it will just remember
-which token has been read.
+As the lexer reads the input, it will produce tokens. Sometimes (as
+with whitespace), it will discard them. Other times, it might [Oppure?] forget
+what the exact characters where and will just remember which token has
+been read.
## Regular expression
-Each terminal in Beans is recognized by matching its associated regular expression. Prior
-knowledge of regular expressions is assumed. Since regular expressions have loads of different
-specifications, here is an exhaustive list of features allowed in Beans regular expressions,
-besides the usual disjunction operator `|`, character classes `[...]` or `[^...]` and repetition
-with `+`, `*` and `?`.
+Each terminal in Beans is recognized by matching its associated
+regular expression. Prior knowledge of regular expressions is
+assumed. Since regular expressions have many different
+specifications, here is an exhaustive list of features allowed in
+Beans regular expressions, besides the usual disjunction operator `|`,
+character classes `[...]` or `[^...]` and repetition with `+`, `*` and
+`?`.
+
+Che cos'è ϵ?
| Escaped character | Name | Meaning |
|-------------------|----------------|---------------------------------------------------------------------------|
-| `\b` | Word bounary | matches `ϵ` if the previous or the next character are not word characters |
+| `\b` | Word boundary | matches `ϵ` if the previous or the next character are not word characters |
| `\w` | Word character | equivalent to [a-zA-Z0-9] |
| `\t` | Tabulation | matches a tabulation |
-| `\Z` or `\z` | End of file | matches `ϵ` at the end of the line |
+| `\Z` or `\z` | End of file | matches `ϵ` at the end of the line [file?] |
| `\d` | Digit | equivalent to [0-9] |
| `\n` | Newline | matches an end of line |
| `\s` | Whitespace | matches whatever unicode considers whitespace |
-| | | |
# Simple arithmetic lexer
-Let's try to write a lexer grammar for the simple arithmetic expression language. Ideally, we
-would like to parse expressions such as `1+2*3`. So let's start by defining an integer token.
-In `arith.lx`, write
+Let's try to write a lexer grammar for the simple arithmetic
+expression language. Ideally, we would like to parse expressions such
+as `1+2*3`. So let's start by defining an integer token. In
+`arith.lx`, write
```beans-lx
INTEGER ::= \d+
```
-Let's get through this first definition. `INTEGER` is the name of the terminal, whereas what is
-on the right side of `::=` is the regular expression used to match it.
+Let's get through this first definition. `INTEGER` is the name of the
+terminal, whereas what is on the right side of `::=` is the regular
+expression used to match it.
-> Note: spaces between `::=` and the start of the regular expression are ignored, but every other
-> space will be taken into account, including trailing ones, which are easy to overlook. If the
-> regular expression starts with a space, you can always wrap it in a singleton class `[ ]`.
+> Note: spaces between `::=` and the start of the regular expression
+> are ignored, but every other space will be taken into account,
+> including trailing ones, which are easy to overlook. If the regular
+> expression starts with a space, you can always wrap it in a
+> singleton class `[ ]`.
-> Note: terminals are always SCREAMING CASED. While this is not very readable nor practical to
-> type, it is coherent with the literature, and will allow you to distinguish between variables
-> (which will be snake_cased), non terminals (which will be Pascal Cased) and terminals later on.
+> Note: terminals are always SCREAMING CASED. While this is not very
+> readable nor practical to type, it is coherent with the literature,
+> and will allow you to distinguish between variables (which will be
+> snake_cased), non terminals (which will be Pascal Cased) and
+> terminals later on.
We can also add the terminals for the four other operators
```beans-lx
@@ -66,7 +80,8 @@ MULTIPLY ::= \*
SUBTRACT ::= -
DIVIDE ::= /
```
-If we were to try to lex a file `input` containing the expression `1+2*3`, we would get
+If we were to try to lex a file `input` containing the expression
+`1+2*3`, we would get
```bash
$ beans lex --lexer arith.lx input
INTEGER
@@ -77,12 +92,14 @@ INTEGER
Error: Could not lex anything in file input, at character 5 of line 1.
$
```
-This is bad for two reasons. The first is, of course, that we get an error. This is because our
-file ended with a newline `\n`, and that there is no terminal that matches it. In fact, we would
-also have a problem if we tried to lex `1 + 2*3`, because no terminal can read spaces. However,
-we also *don't* want to produce any token related to such spaces: `1+2*3` and `1 + 2*3` should
-be lexed indentically. Thus we will introduce a `SPACE` token with the `ignore` flag, telling
-the lexer not to output it. Similarly for `NEWLINE`.
+This is bad for two reasons. The first is, of course, that we get an
+error. This is because our file ended with a newline `\n`, and that
+there is no terminal that matches it. In fact, we would also have a
+problem if we tried to lex `1 + 2*3`, because no terminal can read
+spaces. However, we also *don't* want to produce any token related to
+such spaces: `1+2*3` and `1 + 2*3` should be lexed indentically. Thus
+we will introduce a `SPACE` token with the `ignore` flag, telling the
+lexer not to output it. Similarly for `NEWLINE`.
```beanx-lx
ignore SPACE ::= \s+
ignore NEWLINE ::= \n+
@@ -99,15 +116,16 @@ $
```
Nice!
-However, we now face the second issue: it was probably wise to forget the specific character that
-was lexed to `ADD` or `MULTIPLY`, because we don't care; but we don't want to forget the actual
-integer we lexed. To correct this, we will use regex groups. In `arith.lx`, we will replace the
-definition of `INTEGER` with
+However, we now face the second issue: it was probably wise to forget
+the specific character that was lexed to `ADD` or `MULTIPLY`, because
+we don't care; but we don't want to forget the actual integer we
+lexed. To correct this, we will use regex groups. In `arith.lx`, we
+will replace the definition of `INTEGER` with
```beans-lx
INTEGER ::= (\d+)
```
-This will create a group that will contain everything that `\d+` will match, and this information
-will be passed with the created token.
+This will create a group that will contain everything that `\d+` will
+match, and this information will be passed with the created token.
```bash
$ beans lex --lexer arith.lx input
INTEGER {0: 1}
@@ -118,4 +136,3 @@ INTEGER {0: 3}
$
```
We will see in the next section how to manipulate a stream of tokens.
-
diff --git a/doc/src/concepts/parser.md b/doc/src/concepts/parser.md
index ec1c09a..2b23a3c 100644
--- a/doc/src/concepts/parser.md
+++ b/doc/src/concepts/parser.md
@@ -1,12 +1,13 @@
# Parser
-The parser is given a stream of tokens, which is a "flat" representation of the input, in the
-sense that every part of it is at the same leve, and should transform it into a Concrete Syntax
-Tree (CST).
-
-> Note: a CST is a tree whose leaves are terminals, and no inner node is a terminal. It
-> represents a way the input was understood. For instance, given the input `1+2*3`, a CST could
-> be
+The parser is given a stream of tokens, which is a "flat"
+representation of the input, in the sense that every part of it is at
+the same level, and should transform it into a Concrete Syntax Tree
+(CST).
+
+> Note: a CST is a tree whose leaves are terminals, and no inner node
+> is a terminal. It represents the way the input was understood. For
+> instance, given the input `1+2*3`, a CST could be
> ```
> Expression
> ┌───────┘│└───────┐
@@ -16,12 +17,13 @@ Tree (CST).
> ```
> Inner nodes of a syntax tree are called *non terminals*.
- In a Concrete Syntax Tree, every single token is remembered. This can be annoying,
-as we usually want to forget tokens: if a given token held some information, we can extract that
-information before dumping the token, but having the actual token is not very useful.
+In a Concrete Syntax Tree, every single token is remembered. This can
+be annoying, as we usually want to forget tokens: if a given token
+held some information, we can extract that information before dumping
+the token, but having the actual token is not very useful.
-After having pruned the CST from tokens (while still having extracted the useful information),
-we get an Abstract Syntax Tree (AST).
+After having pruned the CST from tokens (while still having extracted
+the useful information), we get an Abstract Syntax Tree (AST).
> The AST for the input `1+2*3` might look light
> ```
@@ -31,28 +33,33 @@ we get an Abstract Syntax Tree (AST).
> ┌───────┘ └───────┐
> 2 3
> ```
-> All tokens have disappeared. From `INTEGER` tokens, we have extracted the actual number that
-> was lexed, and we have remember that each `Expression` corresponds to a certain operation, but
-> the token corresponding to that operation has also been forgotten.
+> All tokens have disappeared. From `INTEGER` tokens, we have
+> extracted the actual number that was lexed, and we have remembered
+> that each `Expression` corresponds to a certain operation, but the
+> token corresponding to that operation has been forgotten.
-Similarly to terminals, non terminals are defined by telling Beans how to recognise them. Regulax
-expressions, however, are not powerful enough for general parsing. Therefore, non terminals use
-production rules instead.
+Similarly to terminals, non terminals are defined by telling Beans how
+to recognise them. Regular expressions, however, are not powerful
+enough for general parsing. Therefore, non terminals use production
+rules instead.
# Production rules
-Production rules are at the core of the recognition and syntax-tree building steps of every
-parser, but there are several (equivalent) ways of understanding them. These different point of
-view in turn produce very different parsing algorithms.
+Production rules are at the core of the recognition and syntax-tree
+building steps of every parser, but there are several (equivalent)
+ways of understanding them. These different points of view produce in
+turn very different parsing algorithms.
## Production rules as recognisers (bottom-up)
-A production rule is of the form `A -> A_1 ... A_n`, and means that the non terminal `A` can be
-recognised if `A_1` through `A_n` where recognised before.
+A production rule is of the form `A -> A_1 ... A_n`, and means that
+the non terminal `A` can be recognised if `A_1` through `A_n` where
+recognised before.
-For instance, for our simple arithmetic expression language, we could define a single non
-terminal `Expression` with the following production rules
+For instance, for our simple arithmetic expression language, we could
+define a single non terminal `Expression` with the following
+production rules
```
Expression -> Expression ADD Expression
Expression -> Expression MULTIPLY Expression
@@ -60,93 +67,115 @@ terminal `Expression` with the following production rules
Expression -> Expression DIVIDE Expression
Expression -> INTEGER
```
-This matches the definition of an expression we gave earlier
-> [expressions are] numbers or binary operations (addition, multiplication, subtraction and
-> division) on expressions.
-
-On the input `1+2*3`, which has been lexed to `INTEGER ADD INTEGER MULTIPLY INTEGER` (note that,
-at this step, we don't care about information that tokens hold, such as the actual value of an
-integer; these don't come into play when doing a syntaxic analysis), a parser could analyze it
-in the following way.
-
-1. Every `INTEGER` token is a valid `Expression`, so we can replace them by the `Expression` non
- terminal.
- We get `Expression ADD Expression MULTIPLY Expression`.
-> The operation of finding a place in the input that matches the right-hand side of a production
-> rule and replacing it with its non terminal on the left-hand size is called a *reduction*.
-> The place in the input where the reduction occurs is called a *handle*.
-2. `Expression MULTIPLY Expression` is a *handle* for `Expression`, so we *reduce it*.
- We get `Expression ADD Expression`.
-3. Finally, `Expression ADD Expression` is a handle `Expression` too, so after reduction we have
- left `Expression`.
-
-Here, our recognition ends successfully: the input `1+2*3` is an arithmetic expression, or at
-least according to our definition.
+
+This matches the definition of an expression we gave earlier [Dove?]
+> [expressions are] numbers or binary operations (addition,
+> multiplication, subtraction and division) on expressions.
+
+On the input `1+2*3`, which has been lexed to `INTEGER ADD INTEGER
+MULTIPLY INTEGER` (note that, at this step, we don't care about
+information that tokens hold, such as the actual value of an integer;
+these don't come into play when doing a syntactic analysis), a parser
+could analyze it in the following way.
+
+1. Every `INTEGER` token is a valid `Expression`, so we can replace
+ them by the `Expression` non terminal. We get `Expression ADD
+ Expression MULTIPLY Expression`.
+> The operation of finding a place in the input that matches the
+> right-hand side of a production rule and replacing it with its non
+> terminal on the left-hand size is called a *reduction*. The place
+> in the input where the reduction occurs is called a *handle*.
+2. `Expression MULTIPLY Expression` is a *handle* for `Expression`, so
+ we *reduce it*. We get `Expression ADD Expression`.
+3. Finally, `Expression ADD Expression` is a handle `Expression` too,
+ so after reduction we are left with `Expression`.
+
+Here, our recognition ends successfully: the input `1+2*3` is an
+arithmetic expression, or at least according to our definition.
There are several things to note on this example.
-> Note 1: at step 2., an `Expression` could have been recognised in different places in the
-> partially-recognised input `Expression ADD Expression MULTIPLY Expression`. These recognition
-> point are called *handles*. There is a very important difference between choose
-> `Expression ADD Expression` as then handle to perform the recognition of `Expression`, and
-> choosing `Expression MULTIPLY Expression`, because one would end up with a tree that matches
-> the parenthesing of `(1+2)*3` and the other `1+(2*3)`. If we were to, say, evaluate these
-> expression, we wouldn't get the same result.
-> So, for this grammar, Beans would have to choose between which rule to apply, and this decision
-> is crucial in the result. We will see later on how to instruct Beans to apply the "good" rule
-> (which, in this case, would be the one that leads to parsing as `1+(2*3)`, if we want to apply
-> the usual operator precedence).
-
-> Note 2: in this example, we have limited ourselves to recognise the input, not trying to parse
-> it. It wouldn't be too hard to expand our current "algorithm" to remember which reductions have
-> been applied, and in turn construct a syntax tree from that information, but we won't try to
-> do this *yet*.
-
-The order in which we have recognised the input is called "bottom-up", because we have started
-with the terminals, and iteratively replaced them with non terminals, ending up with a single
-non terminal (if the recognition goes well). Since in the end we want to produce a syntax tree,
-and that in a tree, the root is at the top, whereas the leaves are at the bottom, we have
-effectively traversed that tree start from the bottom all the way up. But we could have done the
-opposite...
+> Note 1: at step 2., an `Expression` could have been recognised in
+> different places in the partially-recognised input `Expression ADD
+> Expression MULTIPLY Expression`. These recognition point are called
+> *handles*. There is a very important difference between choosing
+> `Expression ADD Expression` as the handle to perform the recognition
+> of `Expression`, and choosing `Expression MULTIPLY Expression`,
+> because in the first case we would end up with a tree that matches
+> the parenthesing of `(1+2)*3`; in the second one we would obtain
+> `1+(2*3)`. If we were to, say, evaluate these expression, we
+> wouldn't get the same result. So, for this grammar, Beans would
+> have to choose between which rule to apply, and this decision is
+> crucial in the result. We will see later on how to instruct Beans to
+> apply the "good" rule (which, in this case, would be the one that
+> leads to parsing as `1+(2*3)`, if we want to apply the usual
+> operators precedence).
+
+> Note 2: in this example, we have limited ourselves to recognise the
+> input, not trying to parse it. It wouldn't be too hard to expand our
+> current "algorithm" to remember which reductions have been applied,
+> and in turn construct a syntax tree from that information, but we
+> won't try to do this *yet*.
+
+The order in which we have recognised the input is called "bottom-up",
+because we have started with the terminals, and iteratively replaced
+them with non terminals, ending up with a single non terminal (if the
+recognition goes well). Since in the end we want to produce a syntax
+tree, and that in a tree, the root is at the top, whereas the leaves
+are at the bottom, we have effectively traversed that tree start from
+the bottom all the way up. But we could have done the opposite...
## Production rules as generators (top-down)
-So far, it might not be clear why production rules are called as such, when we have happily been
-using them as recognition rules instead; even the arrow seems in the wrong direction: when we
-apply a reduction, we transform the right-hand side into the left-hand side of a rule.
-
-Now, we will see that production rules can be used instead to *generate* valid expressions.
-Starting with a single non terminal `Expression`, we can *expand* it to
-`Expression ADD Expression` using the corresponding production rule. The first `Expression`
-can further be expanded to `INTEGER`, using the last rule, to get `INTEGER ADD Expression`.
-If we expand `Expression` with the multiplication rule, we get
-`INTEGER ADD Expression MULTIPLY Expression`. Again, by expanding all `Expression`s with
-`INTEGER`, we get `INTEGER ADD INTEGER MULTIPLY INTEGER`. Notice that this correponds to the
-input `1+2*3`, and so `1+2*3` is a valid expression!
-
-This "algorithm" might seem a little weird at first, because we have too many choices! In the
-previous one, we had only one choice, and by taking the "wrong" option we could have ended with
-the wrong parenthesing *if we decided to build a syntax tree*. Otherwise, both options were ok.
-Here, we had to choose which `Expression` to expand at each step and, more importantly, which
-rule to apply for the expansion. Note that we could easily have blocked ourselves by expanding
-`Expression` to `INTEGER` right away, or we could have kept expanding forever, only ever applying
-the `Expression -> Expression ADD Expression` rule.
-
-While this seems a lot more complicated than its bottom-up counterpart, top-down algorithms are
-usually much easier to program, mainly because it often suffices to look at a few tokens to
-"guess" what the right expansion is at any moment.
-
-Correspondingly to the bottom-up strategy, if we were to look at how we traverse the syntax
-tree while building it, this strategy would actually start by examining the root of the tree,
-and we would be visiting the leaves at the end, so we would be traversing the tree top-down.
+So far, it might not be clear why production rules are called as such,
+when we have happily been using them as recognition rules instead;
+even the arrow seems in the wrong direction: when we apply a
+reduction, we transform the right-hand side into the left-hand side of
+a rule.
+
+Now, we will see that production rules can be used instead to
+*generate* valid expressions. Starting with a single non terminal
+`Expression`, we can *expand* it to `Expression ADD Expression` using
+the corresponding production rule. The first `Expression` can further
+be expanded to `INTEGER`, using the last rule, to get `INTEGER ADD
+Expression`. If we expand `Expression` with the multiplication rule,
+we get `INTEGER ADD Expression MULTIPLY Expression`. Again, by
+expanding all `Expression`s with `INTEGER`, we get `INTEGER ADD
+INTEGER MULTIPLY INTEGER`. Notice that this correponds to the input
+`1+2*3`, and so `1+2*3` is a valid expression!
+
+This "algorithm" might seem a little weird at first, because we have
+too many choices! In the previous one, we had only one choice, and by
+taking the "wrong" option we could have ended with the wrong
+parenthesing *if we decided to build a syntax tree*. Otherwise, both
+options were ok. Here, we had to choose which `Expression` to expand
+at each step and, more importantly, which rule to apply for the
+expansion. Note that we could easily have blocked ourselves by
+expanding `Expression` to `INTEGER` right away, or we could have kept
+expanding forever, only ever applying the `Expression -> Expression
+ADD Expression` rule.
+
+While this seems a lot more complicated than its bottom-up
+counterpart, top-down algorithms are usually much easier to implement,
+mainly because it often suffices to look at a few tokens to "guess"
+what the right expansion is at any moment.
+
+Paragrafo confuso. Non capito
+
+Correspondingly to the bottom-up strategy, if we were to look at how
+we traverse the syntax tree while building it, this strategy would
+actually start by examining the root of the tree, and we would be
+visiting the leaves at the end, so we would be traversing the tree
+top-down.
## Production rules in Beans
-Before going further, let's try to write a parser grammar for Beans to recognise simple
-arithmetic expressions. Beans' syntax is a little different from production rules, because the
-parser does not only recognise, it also tries to build up a syntax tree; since we are not (yet)
-interested in doing that, we will ignore some syntax quirks that will appear. In `arith.gr`,
-write
+Before going further, let's try to write a parser grammar for Beans to
+recognise simple arithmetic expressions. Beans' syntax is a little
+different from production rules, because the parser does not only
+recognise, it also tries to build up a syntax tree; since we are not
+(yet) interested in doing that, we will ignore some syntax quirks that
+will appear. In `arith.gr`, write
```beans-gr
@Expression ::=
Expression ADD Expression <>
@@ -155,37 +184,43 @@ write
Expression DIVIDE Expression <>
INTEGER <>;
```
-We have define the non terminal `Expression` with five production rules. Each production rule
-ends with `<>` (you can ignore this for now), and the whole definition ends with a semicolon.
-
-Furthermore, `Expression` is tagged with `@`, which means it's an *axiom non-terminal*, or, in
-other words, it's the non terminal we are allowed to start from in a top-down stategy. Since we
-only have a single non-terminal for now, this isn't very important (but don't forget it, or it
-won't work!).
+We have defined the non terminal `Expression` with five production
+rules. Each production rule ends with `<>` (you can ignore this for
+now), and the whole definition ends with a semicolon.
+
+Furthermore, `Expression` is tagged with `@`, which means it's an
+*axiom non-terminal*, or, in other words, it's the non terminal we are
+allowed to start from in a top-down stategy. Since we only have a
+single non-terminal for now, this isn't very important (but don't
+forget it, or it won't work!).
```bash
$ beans parse --lexer arith.lx --parser arith.gr input
-AST
+Expression
$
```
-Yay! It works. Well, the output isn't very impressive, because Beans prints the syntax tree we
-have produced, but we currently have no rules that manipulate the syntax tree, and in particular
-we don't add any node or leaves to it.
+Yay! It works. Well, the output isn't very impressive, because Beans
+prints the syntax tree we have produced, but we currently have no
+rules that manipulate the syntax tree, and in particular we don't add
+any node or leaves to it.
-You can also try it on wrong inputs, for example `1+2*` or `1+2*3 4` to check it fails as it
-should.
+You can also try it on wrong inputs, for example `1+2*` or `1+2*3 4`
+to check it fails as it should.
# Building a syntax tree
-Checking if a string is a valid arithmetic expression is a bit boring. We would like to get more
-information than just a certain string is valid or not. Furthermore, as pointed earlier, our
-grammar is currently ambiguous, meaning that the expression `1+2*3` could be understood in two
-different ways, and it would be interesting to see how Beans solves that ambiguity.
-
-To do so, we need to expand our grammar a little bit. First of all, we might want to bind
-expressions that we use to recognise further expressions. For instance, when we have a
-`Expression ADD Expression` and we recognise an `Expression` there, we would like to remember
-the two sub expressions. To do so, we will add `@name` to every element of a rule that we would
-like to remember under the name `name`.
+Checking if a string is a valid arithmetic expression is a bit
+boring. We would like to get more information than just whether a
+certain string is valid or not. Furthermore, as pointed earlier, our
+grammar is currently ambiguous, meaning that the expression `1+2*3`
+could be understood in two different ways, and it would be interesting
+to see how Beans solves that ambiguity.
+
+To do so, we need to expand our grammar a little bit. First of all, we
+might want to bind expressions that we use to recognise further
+expressions. For instance, when we have a `Expression ADD Expression`
+and we recognise an `Expression` there, we would like to remember the
+two sub expressions. To do so, we will add `@name` to every element of
+a rule that we would like to remember under the name `name`.
```beans-gr
@Expression ::=
Expression@left ADD Expression@right <>
@@ -195,20 +230,22 @@ like to remember under the name `name`.
INTEGER@value <>;
```
-As said in the introduction to this chapter, the goal is also to extract information from tokens,
-and then dump these. The only token that holds some information is `INTEGER`, which has a single
-group (labeled `0`). We can therefore bind that group, instead of the whole token, by accessing
-it with a field-like syntax.
+As said in the introduction to this chapter, the goal is also to
+extract information from tokens, and then dump these. The only token
+that holds some information is `INTEGER`, which has a single group
+(labeled `0`). We can therefore bind that group, instead of the whole
+token, by accessing it with a field-like syntax.
```beans-gr
@Expression ::=
...
INTEGER.0@value <>;
```
-Finally, we need to remember what kind of expression each expression is. This is very similar to
-naming variants of enumerations: here, each rule bound to `Expression` is a constructor of
-`Expression`, and when we will match on `Expression`, we will need to distinguish how that
-particular instance of `Expression` was constructed.
+Finally, we need to remember what kind of expression each expression
+is. This is very similar to naming variants of enumerations: here,
+each rule bound to `Expression` is a constructor of `Expression`, and
+when we will match on `Expression`, we will need to distinguish how
+that particular instance of `Expression` was constructed.
```beans-gr
@Expression ::=
Expression@left ADD Expression@right
@@ -218,6 +255,8 @@ particular instance of `Expression` was constructed.
INTEGER.0@value ;
```
+NB: Il mio albero è stampato in ordine inverso: prima i left e poi i right
+
Let's see what the tree looks like now.
```bash
$ beans parse --lexer arith.lx --parser arith.gr input
@@ -231,38 +270,46 @@ Expression(Mult)
└─ value: 2
$
```
-Well, it works, but... if you stare at that syntax tree long enough, you'll realize that it was
-parsed like `(1+2)*3`, not like `1+(2*3)`. We will see in the next section how to solve this
-issue, and how ambiguity is handled in general.
+Well, it works, but... if you stare at that syntax tree long enough,
+you'll realize that it was parsed like `(1+2)*3`, not like
+`1+(2*3)`. We will see in the next section how to solve this issue,
+and how ambiguity is handled in general.
# Ambiguity
-A grammar is said to be *ambiguous* when there exists an input that can be parsed in two
-different ways, that is, there are two *derivation tree* for that input. Most of the time, an
-ambiguity in the grammar is symptomatic of a semantic ambiguity, that is, the language that we
-are trying to parse is somehow ill defined.
-
-This is the case, for instance, of our simple arithmetic expressions. Our plain-text, intuitive
-definition of what is an arithmetic expression is *bad* because it doesn't say which of `(1+2)*3`
-or `1+(2*3)` should be understood when reading `1+2*3`, that is, it contains no operator priority
-information. But it also lacks something else, as we will see.
-
-> Note: one might wonder why Beans did not report this. After all, if it's often an actual mistake to
-> define an ambiguous grammar, it would make sense for Beans to at least warn you about that. In
-> fact, there is some work being done in that direction, but there is a fundamental issue:
-> ambiguity is undecidable, that is, there *can't exist* an algorithm which, given a grammar, tells
-> us whether it's ambiguous or not.
-> Actually, Beans will perform much better if the grammar in unambiguous, and even better if it
-> belongs to a more restrictive class of grammars called `LR(k)`. If you have ever used tools
-> like Bison, Menhir or Yacc, and you are trying to port grammars from them to Beans, good new!
-> These tools force your grammars to be in such a restricted class (to have excellent
+A grammar is said to be *ambiguous* when there exists an input that
+can be parsed in two different ways, that is, there are two
+*derivation tree* for that input. Most of the time, an ambiguity in
+the grammar is symptomatic of a semantic ambiguity, that is, the
+language that we are trying to parse is somehow ill defined.
+
+This is the case, for instance, of our simple arithmetic
+expressions. Our plain-text, intuitive definition of what is an
+arithmetic expression is *bad* because it doesn't say which of
+`(1+2)*3` or `1+(2*3)` should be understood when reading `1+2*3`, that
+is, it contains no operator priority information. But it also lacks
+something else, as we will see.
+
+> Note: one might wonder why Beans did not report this. After all, if
+> it's often an actual mistake to define an ambiguous grammar, it
+> would make sense for Beans to at least warn you about that. In fact,
+> there is some work being done in that direction, but there is a
+> fundamental issue: ambiguity is undecidable, that is, there *can't
+> exist* an algorithm which, given a grammar, tells us whether it's
+> ambiguous or not. Actually, Beans will perform much better if the
+> grammar in unambiguous, and even better if it belongs to a more
+> restrictive class of grammars called `LR(k)`. If you have ever used
+> tools like Bison, Menhir or Yacc, and you are trying to port
+> grammars from them to Beans, good news! These tools force your
+> grammars to be in that restricted class (for the sake of
> performance), and so will also lead to fast parsing with Beans.
## Priority
-The first issue is operator priority. Beans has a very simple rule to determine priority: rules
-that come first have higher priority. So, simply moving the division and multiplication rules up
-will patch our example:
+The first issue is operator priority. Beans has a very simple rule to
+determine priority: rules that come first have higher priority. So,
+simply moving the division and multiplication rules up will patch our
+example:
```beans-gr
@Expression ::=
Expression@left MULTIPLY Expression@right
@@ -282,11 +329,13 @@ Expression(Add)
└─ right: Expression(Literal)
└─ value: 3
```
-Much better! However, we now have a more subtle issue. Usually, multiplication and division have
-the same priority, and the leftmost operator is chosen (same for addition and subtraction).
-However, as is, multiplication will be prioritized over division: `1/2*3` should be parsed
-`(1/2)*3` but will be parsed as `1/(2*3)`. To solve this, we need to merge the multiplication
-and division rules, by introducing other non terminals.
+Much better! However, we now have a more subtle issue. Usually,
+multiplication and division have the same priority, and the leftmost
+operator is chosen (same for addition and subtraction). However, as
+is, multiplication will be prioritized over division: `1/2*3` should
+be parsed `(1/2)*3` but will be parsed as `1/(2*3)`. To solve this, we
+need to merge the multiplication and division rules, by introducing
+other non terminals.
```beans-gr
@Expression ::=
@@ -331,8 +380,13 @@ $
```
They correspond, respectively, to `(1/2)*3` and `(1*2)/3`. Victory!
-> Note that makes the information a little more nested, which is fine for now, but will make some
-> pretty ugly pattern matching in the future. In fact, this technique produces some artifacts of
+> Note that makes the information a little more nested, which is fine
+> for now, but will make some pretty ugly pattern matching in the
+> future. In fact, this technique produces some artifacts of
+
+of?..
+
+E il prossimo esempio che è?
```beans-gr
@Expression ::=
diff --git a/doc/src/details/tradeoff.md b/doc/src/details/tradeoff.md
index 07c1423..999d131 100644
--- a/doc/src/details/tradeoff.md
+++ b/doc/src/details/tradeoff.md
@@ -1,37 +1,44 @@
# Tradeoffs
-Several tradeoffs have been made while developping Beans. You can find here some I remembered to
-write down.
+Several tradeoffs have been made while developping Beans. You can find
+here some of them that I remembered to write down.
+Er...
# Scannerless parsing
-There are some parsers, called
-[scannerless parsers](https://en.wikipedia.org/wiki/Scannerless_parsing), that do not rely on a
-lexer. Indeed, a parser is *more powerful* than a lexer, meaning that anything
-that a lexer could do, a parser could also do. So, in fact, one might wonder why Beans bothers
-having a lexer at all. There are several reasons for this.
+There are some parsers, called [scannerless
+parsers](https://en.wikipedia.org/wiki/Scannerless_parsing), that do
+not rely on a lexer. Indeed, a parser is *more powerful* than a lexer,
+meaning that anything that a lexer could do, a parser could also
+do. So, in fact, one might wonder why Beans bothers having a lexer at
+all. There are several reasons for this.
## Performance
-The first reason for this separation is *performance*. Parsers could do what lexers do, but
-because lexing is simpler than parsing, there is more space for specific optimizations. In fact,
-Beans ships its own regex library which is tailored for the lexing use case.
+The first reason for this separation is *performance*. Parsers could
+do what lexers do, but because lexing is simpler than parsing, there
+is more space for specific optimizations. In fact, Beans ships its own
+regex library which is tailored for the lexing use case.
## Error reporting
-Usually, lexing errors are very much different than syntax errors. It's
-quite rare to encounter a lexing error in practice, because it's quite hard to write invalid
-token. This means that lexing errors should be reported differently than syntax errors, and
-this would be harder (if not impossible) in scannerless parsers.
+Usually, lexing errors are very much different than syntax
+errors. It's quite rare to encounter a lexing error in practice,
+because it's quite hard to write invalid tokens. This means that lexing
+errors should be reported differently than syntax errors, and this
+would be harder (if not impossible) in scannerless parsers.
-An other aspect to be taken into account is that parsers may have a recovery mode, which triggers
-when encountering a syntax error. In this special mode, the parser cannot fully understand the
-input but will try to guess how to correct the input so that it can provide better user
-feedback. This is much easier to perform if the parser works on tokens, rather than characters.
+Another aspect to be taken into account is that parsers may have a
+recovery mode, which triggers when encountering a syntax error. In
+this special mode, the parser cannot fully understand the input but
+will try to guess how to correct the input so that it can provide
+better user feedback. This is much easier to perform if the parser
+works on tokens, rather than characters.
## Logical separation
-Parsing and lexing are two logically distinct steps, even though there is quite some interleaving
-in Beans. Having them kept as different steps make it easier to debug a grammar one is writing,
-as it's easier to see what happens step by step, where each step is easier than the whole parsing
-operation.
+Parsing and lexing are two logically distinct steps, even though there
+is quite some interleaving in Beans. Having them kept as different
+steps makes it easier to debug a grammar one is writing, as it's easier
+to see what happens step by step, where each step is simpler than the
+whole parsing operation.
diff --git a/doc/src/getting-started/README.md b/doc/src/getting-started/README.md
index 68c0e0d..76d9429 100644
--- a/doc/src/getting-started/README.md
+++ b/doc/src/getting-started/README.md
@@ -1,8 +1,8 @@
# Getting Started
-In order to get started with beans, you will need to following:
+In order to get started with beans, you will need the following:
* having Beans installed as a helper tool
* learning Beans' concepts
- * write an interface between what other parts of your program expect your parser to give, and what
- Beans actually provides you
+ * write an interface between what other parts of your program expect
+ your parser to give[che significa?], and what Beans actually provides you
diff --git a/doc/src/getting-started/compile.md b/doc/src/getting-started/compile.md
index 8f4ca31..8aa1bd9 100644
--- a/doc/src/getting-started/compile.md
+++ b/doc/src/getting-started/compile.md
@@ -1,37 +1,52 @@
# Compilation
-Beans can be used in two ways, which are very much related but, in practice, will require entierly
-different approaches.
-
-Beans can be used as a on-the-fly parser-generator, meaning that you expect your
-end user to give you a grammar for a language they just though of, and you have to parse files
-written in that language. This is mainly useful for
-[domain-specific languages](https://en.wikipedia.org/wiki/Domain-specific_language). An example of
-this is Beans itself, which has to parse the grammars you feed it. Since this aspect of Beans is not
-as mature as the other one, it's not the one this book will focus on.
-
-The other purpose of Beans is to be used as a regular parser-generator (think
-[Yacc](https://en.wikipedia.org/wiki/Yacc), [Bison](https://fr.wikipedia.org/wiki/GNU_Bison),
-[Menhir](http://gallium.inria.fr/~fpottier/menhir/), ...). The main difference is that, unlike these tools, Beans will
-never generate Rust code to be compiled alongside your code. Instead, it does its own compilation: it
-compiles a grammar to a binary blob, which is then included in the final binary. This means that you
-need to compile Beans grammars "by hand", using `beans`. `beans` is also useful for debugging
-purposes, as it can give you helpful insights or advices on your grammars.
+Beans can be used in two ways, which are very much related but, in
+practice, will require entirely different approaches.
+
+Il paragrafo seguente non lo capisco. Chi è
+"you"? Chi è lo end user?
+
+Beans can be used as an on-the-fly parser-generator, meaning that you
+expect your end user to give you a grammar for a language they just
+thought of, and you have to parse files written in that language. This
+is mainly useful for [domain-specific
+languages](https://en.wikipedia.org/wiki/Domain-specific_language). An
+example of this is Beans itself, which has to parse the grammars you
+feed it. Since this aspect of Beans is not as mature as the other one,
+it's not the one this book will focus on.
+
+The other purpose of Beans is to be used as a regular parser-generator
+(think [Yacc](https://en.wikipedia.org/wiki/Yacc),
+[Bison](https://fr.wikipedia.org/wiki/GNU_Bison),
+[Menhir](http://gallium.inria.fr/~fpottier/menhir/), ...). The main
+difference is that, unlike these tools, Beans will never generate Rust
+code to be compiled alongside your code. Instead, it does its own
+compilation: it compiles a grammar to a binary blob, which is then
+included in the final binary. This means that you need to compile
+Beans grammars "by hand", using `beans`. `beans` is also useful for
+debugging purposes, as it can give you helpful insights or advices on
+your grammars.
# The grammars
-Beans contains two kind of grammars: the lexer grammars (extension `.lx`), and the parser grammars (extension `.gr`).
-They are written in different languages, and are compiled separatly, although the parser grammar relies on the lexer
-grammar, because the terminals defined in the lexer grammar are used in the parser grammar.
+Beans contains two kind of grammars: the lexer grammars (extension
+`.lx`), and the parser grammars (extension `.gr`). They are written
+in different languages, and are compiled separatly, although the
+parser grammar relies on the lexer grammar, because the terminals
+defined in the lexer grammar are used in the parser grammar.
## Lexer grammars
-The lexer grammar can be compiled with `beans compile lexer path/to/grammar.lx`. It will produce a binary blob at
+The lexer grammar can be compiled with `beans compile lexer
+path/to/grammar.lx`. It will produce a binary blob at
`path/to/grammar.clx`.
## Parser grammars
-The parser grammar can be compiled with `beans compile parser --lexer path/to/grammar.clx path/to/grammar.gr`. It will
-produce a binary blob at `path/to/grammar.cgr`. Note that we had to provide a lexer grammar (so that Beans can find
-the definitions of the terminals used in the parser grammar), and in this case it was a *compiled* lexer grammar.
-A non-compiled lexer grammar will also be accepted, but the process will be slower because Beans has to interpret it.
+The parser grammar can be compiled with `beans compile parser --lexer
+path/to/grammar.clx path/to/grammar.gr`. It will produce a binary blob
+at `path/to/grammar.cgr`. Note that we had to provide a lexer grammar
+(so that Beans can find the definitions of the terminals used in the
+parser grammar), and in this case it was a *compiled* lexer grammar.
+A non-compiled lexer grammar will also be accepted, but the process
+will be slower because Beans has to interpret it.
diff --git a/doc/src/getting-started/install.md b/doc/src/getting-started/install.md
index da0c3ec..4b6e5f4 100644
--- a/doc/src/getting-started/install.md
+++ b/doc/src/getting-started/install.md
@@ -1,17 +1,22 @@
# Installation
-Using Beans as a library in Rust is done as with any other library: adding it in the dependencies in the Cargo
-manifest is enough. However, as explained in the [Compiling Section](compile.md), a command line tool is also
-required to use Beans. It allows compilation of Beans grammars, some introspection and debugging information.
-There are ways to install `beans`:
- * Installing with [Nix](https://nixos.org). This is the preferred way if you already have nix.
- * Installing with Cargo. This is the preferred way is you don't have nix.
+Using Beans as a library in Rust is done as with any other library:
+adding it in the dependencies of the Cargo manifest is
+enough. However, as explained in the [Compiling Section](compile.md),
+its command line tool is also required to use Beans. It allows
+compilation of Beans grammars, some introspection and debugging
+information. There are several ways to install `beans`:
+ * Installing with [Nix](https://nixos.org). This is the preferred way
+ if you already have nix.
+ * Installing with Cargo. This is the preferred way is you don't have
+ nix.
* Installing manually.
# Nix installation
-Beans is flake-packaged for nix. You can find the appropriate flake at
-[Beans' repo](https://github.com/jthulhu/beans). The actual installation procedure depends on how you use nix.
+Beans is flake-packaged for nix. You can find the appropriate flake at
+[Beans' repo](https://github.com/jthulhu/beans). The actual
+installation procedure depends on how you use nix.
# Cargo installation
@@ -23,15 +28,18 @@ cargo install beans
# Manual compilation and installation
-Beans has three dependencies: Cargo, the latest rustc compiler and make. Optionally, having git makes it easy to
-download the source code.
+Beans has three dependencies: Cargo, the latest rustc compiler and
+make. Optionally, having git makes it easy to download the source
+code.
## Downloading the source code
-If you have git, you can `git clone https://github.com/jthulhu/beans` which will download a copy of the source code
-in the directory `beans`.
+If you have git, you can `git clone https://github.com/jthulhu/beans`
+which will download a copy of the source code in the directory
+`beans`.
-Otherwise, you need to download it from the [git repository](https://github.com/jthulhu/beans).
+Otherwise, you need to download it from the [git
+repository](https://github.com/jthulhu/beans).
## Building and installing the source code
@@ -39,5 +47,9 @@ Once the `beans` directory entered, you simply need to run
```bash
make install
```
-This will install a single binary at `/usr/local/bin/beans`. You can overwrite the target destination using the
-environment variables `DESTDIR` and `PREFIX`.
+This will install a single binary at `/usr/local/bin/beans`. You can
+overwrite the target destination using the environment variable
+`PREFIX`, e.g.:
+```bash
+make PREFIX=$HOME install
+```