Skip to content

Generating Code from Basic Lexer Parser Grammars

ohboyohboyohboy edited this page Sep 13, 2010 · 9 revisions

ANTLR already has pretty thorough documentation about its grammar syntax and code usage:


Thus, I am not going to try and explain ANTLR’s purpose or grammar syntax here. Instead, I am goign to try and focus on how to use the ruby antlr3 tools and runtime library to generate and use recognizers written in ruby.

1. Install the ruby antlr3 target:

gem install antlr3

2. Write a grammar, ensuring the language = Ruby; is set in the grammar options:

/**grammar file  Whatevs.g */
grammar Whatevs;

options {
     language = Ruby;

a_rule:  another_rule+;
... // and so on

3. Generate output using the antlr4ruby script that comes with the antlr3 gem

antl4ruby Whatevs.g

  • you will get several files:
    • WhatevsParser.rb: contains a parser class definition
    • WhatevsLexer.rb: contains a lexer class definitions
    • Whatevs.tokens: the token vocabulary produced by your grammar

Generated Files

The files ANTLR generates depends upon the type of the grammar contained within the input file. There are four types of ANTLR grammars: lexer, parser, combined, and tree. For example, consider a grammar named Grammar.g. Depending on the type of Grammar.g, invoking ANTLR with the command antlr4ruby Grammar.g produces the files listed below.

Basic lexer, parser, and tree grammar output files

  • Grammar.rb: the recognizer code
  • Grammar.tokens: a list of token definitions contained in the grammar

Basic combined grammar output files

  • GrammarLexer.rb: the lexer definition
  • GrammarParser.rb: the parser definition
  • Grammar.tokens: a list of token definitions contained in the grammar

Imported Grammars

If a grammar imports other grammars using the import statement, a number of additional files will be generated containing delegate grammars. You don’t typically have to require these files directly, so the particulars of the filenames probably isn’t too significant. However, just to provide a demonstration, consider parser grammars A.g and B.g. If A contains the statement import B;, ANTLR will generate:

  • A.tokens: A’s token definitions
  • B.tokens: B’s token definitions
  • A.rb: the parser for A
  • A_B.rb: an implementation of B which uses to import the rules contained in B

Since imported grammars can import other grammars, this pattern can be followed through arbitrary chains of grammars. For example, if A imports B, which itself imports grammars C and D, in addition to the files above, ANTLR will also generate A_B_C.rb, A_B_D.rb, C.tokens, and D.tokens.

Clone this wiki locally