-
Notifications
You must be signed in to change notification settings - Fork 12
Generating Code from Basic Lexer Parser Grammars
ANTLR already has pretty thorough documentation about its grammar syntax and code usage:
Thus, I am not going to try and explain ANTLR’s purpose or grammar syntax here. Instead, I am goign to try and focus on how to use the ruby antlr3 tools and runtime library to generate and use recognizers written in ruby.
1. Install the ruby antlr3 target:
gem install antlr3
2. Write a grammar, ensuring the language = Ruby;
is set in the grammar options:
/**grammar file Whatevs.g */ grammar Whatevs; options { language = Ruby; } a_rule: another_rule+; ... // and so on
3. Generate output using the antlr4ruby
script that comes with the antlr3 gem
antl4ruby Whatevs.g
- you will get several files:
-
WhatevsParser.rb
: contains a parser class definition -
WhatevsLexer.rb
: contains a lexer class definitions -
Whatevs.tokens
: the token vocabulary produced by your grammar
-
The files ANTLR generates depends upon the type of the grammar contained within the input file. There are four types of ANTLR grammars: lexer
, parser
, combined
, and tree
. For example, consider a grammar named Grammar.g
. Depending on the type of Grammar.g
, invoking ANTLR with the command antlr4ruby Grammar.g
produces the files listed below.
-
Grammar.rb
: the recognizer code -
Grammar.tokens
: a list of token definitions contained in the grammar
-
GrammarLexer.rb
: the lexer definition -
GrammarParser.rb
: the parser definition -
Grammar.tokens
: a list of token definitions contained in the grammar
If a grammar imports other grammars using the import
statement, a number of additional files will be generated containing delegate grammars. You don’t typically have to require these files directly, so the particulars of the filenames probably isn’t too significant. However, just to provide a demonstration, consider parser
grammars A.g
and B.g
. If A
contains the statement import B;
, ANTLR will generate:
-
A.tokens
: A’s token definitions -
B.tokens
: B’s token definitions -
A.rb
: the parser forA
-
A_B.rb
: an implementation ofB
which uses to import the rules contained inB
Since imported grammars can import other grammars, this pattern can be followed through arbitrary chains of grammars. For example, if A
imports B,
which itself imports grammars C
and D
, in addition to the files above, ANTLR will also generate A_B_C.rb
, A_B_D.rb
, C.tokens
, and D.tokens
.