Kengo's blog

Technical articles about original projects, JVM, Static Analysis and JavaScript.

How to generate simple parser with ANTLR3

 I've found that dynjs depends on ANTLR3. I have to learn ANTLR3 to read dynjs, so I'm trying to use it with Maven3.


 To use ANTLR3, we have to write 1 dependency and 1 plugin in pom.xml. Sample is here.


Create grammar file in src/main/antlr3

 antlr3-maven-plugin will read grammar files in src/main/antlr3 directory. We have to create grammar file to generate our code.
 What I created is here. It can parse "Tomcat runs" as a statement. I think basically we have to write @lexer::header and @parser::header because they specify package of our generated code.

grammar Statement;

options {
	output = AST ;

@lexer::header {
	package jp.skypencil.antlr;

@parser::header {
	package jp.skypencil.antlr;

statement : S V;

S: 'Tomcat';

V: 'runs';

Test generated code

 Finally you can generate your parser and lexer by `mvn clean verify` in target/generated-sources/antlr3 directory. Let's test them with JUnit4.
 What I wrote is here. This test asserts that our parser can parse "Tomcat runs" without syntax error.

public void testTomcatRuns() throws RecognitionException {
	CharStream input = new ANTLRStringStream("Tomcat runs");
	StatementLexer lexer = new StatementLexer(input);
	CommonTokenStream tokens = new CommonTokenStream();
	StatementParser parser = new StatementParser(tokens);

	assertThat(parser.getNumberOfSyntaxErrors(), is(0));

We've succeeded to test our generated code, but it isn't useful because:

  • We cannot use other words as input.
  • We cannot access parsed words via generated class.

And writing flexible grammer and writing tree walker are good solutions for these problem. I'll write new articles about these approach.