Kengo's blog

Technical articles about original projects, JVM, Static Analysis and TypeScript.

How to generate simple parser with ANTLR3

 I've found that dynjs depends on ANTLR3. I have to learn ANTLR3 to read dynjs, so I'm trying to use it with Maven3.

pom.xml

 To use ANTLR3, we have to write 1 dependency and 1 plugin in pom.xml. Sample is here.

<dependencies>
	<dependency>
		<groupId>org.antlr</groupId>
		<artifactId>antlr-runtime</artifactId>
		<version>3.4</version>
	</dependency>
	...
</dependencies>
<build>
	<plugins>
		<plugin>
			<groupId>org.antlr</groupId>
			<artifactId>antlr3-maven-plugin</artifactId>
			<version>3.4</version>
			<executions>
				<execution>
					<goals>
						<goal>antlr</goal>
					</goals>
				</execution>
			</executions>
		</plugin>
		...
	</plugins>
</build>

Create grammar file in src/main/antlr3

 antlr3-maven-plugin will read grammar files in src/main/antlr3 directory. We have to create grammar file to generate our code.
 What I created is here. It can parse "Tomcat runs" as a statement. I think basically we have to write @lexer::header and @parser::header because they specify package of our generated code.

grammar Statement;

options {
	output = AST ;
}

@lexer::header {
	package jp.skypencil.antlr;
}

@parser::header {
	package jp.skypencil.antlr;
}

statement : S V;

S: 'Tomcat';

V: 'runs';

Test generated code

 Finally you can generate your parser and lexer by `mvn clean verify` in target/generated-sources/antlr3 directory. Let's test them with JUnit4.
 What I wrote is here. This test asserts that our parser can parse "Tomcat runs" without syntax error.

@Test
public void testTomcatRuns() throws RecognitionException {
	CharStream input = new ANTLRStringStream("Tomcat runs");
	StatementLexer lexer = new StatementLexer(input);
	CommonTokenStream tokens = new CommonTokenStream();
	tokens.setTokenSource(lexer);
	StatementParser parser = new StatementParser(tokens);

	parser.statement();
	assertThat(parser.getNumberOfSyntaxErrors(), is(0));
}

We've succeeded to test our generated code, but it isn't useful because:

  • We cannot use other words as input.
  • We cannot access parsed words via generated class.

And writing flexible grammer and writing tree walker are good solutions for these problem. I'll write new articles about these approach.

Reference