Subscribed unsubscribe Subscribe Subscribe

Kengo's blog

Technical articles about original projects, JVM, Static Analysis and JavaScript.

How to generate simple parser with ANTLR3

java Maven

 I've found that dynjs depends on ANTLR3. I have to learn ANTLR3 to read dynjs, so I'm trying to use it with Maven3.

pom.xml

 To use ANTLR3, we have to write 1 dependency and 1 plugin in pom.xml. Sample is here.

<dependencies>
	<dependency>
		<groupId>org.antlr</groupId>
		<artifactId>antlr-runtime</artifactId>
		<version>3.4</version>
	</dependency>
	...
</dependencies>
<build>
	<plugins>
		<plugin>
			<groupId>org.antlr</groupId>
			<artifactId>antlr3-maven-plugin</artifactId>
			<version>3.4</version>
			<executions>
				<execution>
					<goals>
						<goal>antlr</goal>
					</goals>
				</execution>
			</executions>
		</plugin>
		...
	</plugins>
</build>

Create grammar file in src/main/antlr3

 antlr3-maven-plugin will read grammar files in src/main/antlr3 directory. We have to create grammar file to generate our code.
 What I created is here. It can parse "Tomcat runs" as a statement. I think basically we have to write @lexer::header and @parser::header because they specify package of our generated code.

grammar Statement;

options {
	output = AST ;
}

@lexer::header {
	package jp.skypencil.antlr;
}

@parser::header {
	package jp.skypencil.antlr;
}

statement : S V;

S: 'Tomcat';

V: 'runs';

Test generated code

 Finally you can generate your parser and lexer by `mvn clean verify` in target/generated-sources/antlr3 directory. Let's test them with JUnit4.
 What I wrote is here. This test asserts that our parser can parse "Tomcat runs" without syntax error.

@Test
public void testTomcatRuns() throws RecognitionException {
	CharStream input = new ANTLRStringStream("Tomcat runs");
	StatementLexer lexer = new StatementLexer(input);
	CommonTokenStream tokens = new CommonTokenStream();
	tokens.setTokenSource(lexer);
	StatementParser parser = new StatementParser(tokens);

	parser.statement();
	assertThat(parser.getNumberOfSyntaxErrors(), is(0));
}

We've succeeded to test our generated code, but it isn't useful because:

  • We cannot use other words as input.
  • We cannot access parsed words via generated class.

And writing flexible grammer and writing tree walker are good solutions for these problem. I'll write new articles about these approach.

Reference