Choosing the Right Library for Parsing Calculation Formulae in Kotlin Replicating Existing Implementation

Introduction

This is Ong Jack Hoe, a software engineer at Money Forward, Inc.

Background

The story begins with the need to replicate an existing grammar specification which was originally written on a RoR (Ruby on Rails) stack with the RACC gem. It is to parse some expressions related to calculations and some conditional checks such as IF condition and additional syntax which are placeholders/variables for the actual value. (For example, the placeholder placeholder/item/2 will have some value to be retrieved and the parser will need to perform it with the referenced value.)

For the experiments below, we will discuss with a simple calculator’s expression and grammar definition. This article will touch more on the decision making process and the steps and results from the tests performed.

What is the acceptance criteria

So what are the characteristics that we need to consider when choosing the library for this function? Well maybe we start with an example and try to relate to it.

Let’s say you are deciding between two cars to purchase. What will be the criteria then? Well, you will most likely prioritize how it looks, so the appearance of the car can be one criterion. Coming back to this, choosing the right library means we need to choose one where the syntax looks good. I know that is rather subjective, so putting it in better terms, the library where the syntax looks close or similar to the existing syntax. This is an important criterion as it will allow:

Easier understanding of the grammar specification just from the syntax itself
Easier comparison with existing to replicate existing grammar logic
Easier addition of new grammar definitions

The next criterion will probably be the performance (We will describe the benchmark setup in this article). Referencing again the example of purchasing a car, which car will run faster or is more fuel efficient? Most of the time, faster execution to achieve the same outcome translates to the implementation being more efficient in terms of speed. This will allow:

Potentially lower resources required to perform the operation
Higher throughput on heavy load One could argue that performance doesn’t matter and I’m sure you can find many articles on that. For example, even if you own a supercar, the bottleneck will still be traffic jams and thus the potential of the car’s top speed is useless. But think of it this way; given a supercar and another car, both of which have the same characteristics and the only difference is that the supercar goes faster, which do you choose? I know this is not the best example to illustrate the point.

Which library is more well maintained and utilized? This is to ensure that the library chosen has been tried and tested, ensuring that critical breaking bugs are unlikely to occur and the library works well. It’s not easy to evaluate based on this criterion. One could look at the following indications:

Commit count (Age of repo)
Number of issues (Proves the popularity of the library. I know it’s kind of weird to think of the number of issues as a good indication, but if a library is more widely used won’t there be more edge case issues reported and feature requests?)
Age of the last release

What are the library candidates?

We narrowed down to test these two candidates:

Antlr
- The syntax seems similar to the existing implementation using RACC.
JavaCC
- Looks interesting to explore it and use it as a comparison.

Experiments performed

Setup environment

The benchmark was performed on a local machine, which is an Apple Silicon (M3pro) Macbook. Please do not compare the numbers with the results on another machine since results may defer due to CPU architecture, Operating System, etc.

Expression used = 5*(2+5+3). (It’s a simple expression but kept constant across all benchmarks.)

JMH library

JMH was the library of choice to perform micro benchmark for this experiment. This will not be a step by step setup tutorial, but a setup summary.

Adding the dependency

Setting

version("jmh", "1.37")
library("jmh.core", "org.openjdk.jmh", "jmh-core").versionRef("jmh")
library("jmh.generator.annprocess", "org.openjdk.jmh", "jmh-generator-annprocess").versionRef("jmh")

Configuring gradle task

...
plugins {
    application
    kotlin("kapt") version "1.6.0"
    id("com.github.johnrengelman.shadow") version "7.1.2"
}
...
kapt {
    keepJavacAnnotationProcessors = true
}

application {
    mainClass.set("com.demo.MainKt")
}
...
tasks.named<ShadowJar>("shadowJar") {
    archiveBaseName.set("benchmark-fat")
    mergeServiceFiles()
    manifest {
        attributes(mapOf("Main-Class" to "com.demo.MainKt"))
    }
}

The main class

Defining the execution of the JMH library. This configuration will run the benchmarks that contain the name Benchmark.

fun main(args: Array<String>) {
    val opt = OptionsBuilder()
            .include("Benchmark")
            .forks(1)
            .build()
    Runner(opt).run()
}

Setting up the benchmark class

@Threads(8)
@Fork(value = 1, warmups = 3)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 1, time = 1)
@Measurement(iterations = 5, time = 1)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
open class AntlrBenchmark {
    @State(Scope.Benchmark)
    open class ExecutionPlan {
        var validData: String? = null

        @Setup(Level.Trial)
        fun setup() {
            validData = "5*(2+5+3)"
        }
    }

Don’t worry, the @Benchmark method is in shown further in the examples below.

Running the benchmark

The gradle build is configured to compile and create the jar file. All we need to do is run the command to execute the jar and the benchmark will start.

java -jar PATH_TO_JAR_FILE

Antlr Library (Java/Kotlin)

Gradle also uses Antlr to parse gradle scripts. So how does this work? Let me give an example below, however it is not a step by step tutorial on setting up Antlr gradle plugins.

Add Antlr plugin to dependency

plugins {
    ...
    antlr
}
...
dependencies {
    antlr("org.antlr:antlr4:4.7.1")
}

Will be using antlr4 with version 4.7.1.

Add grammar specification

Let’s say we have the following simple grammar spec.

grammar Calculator;

LEFT_PARENTHESIS: '(';
RIGHT_PARENTHESIS: ')';
CARET: '^';
STAR: '*';
SLASH: '/';
PLUS: '+';
MINUS: '-';
NUMBER: [0-9]+;
WHITESPACE: [ \r\n\t]+ -> skip;

start : expression;

expression
   : NUMBER                                                 # Number
   | LEFT_PARENTHESIS inner=expression RIGHT_PARENTHESIS    # Parentheses
   | left=expression operator=CARET right=expression        # Power
   | left=expression operator=(SLASH|STAR) right=expression # MultiplicationOrDivision
   | left=expression operator=(MINUS|PLUS) right=expression # AdditionOrSubtraction
   ;

Configure the gradle task

tasks.generateGrammarSource {
    arguments = arguments + listOf("-visitor", "-listener")
    source = fileTree(project.projectDir) {
        include("**/*.g4")
    }
}

By default, the Antlr task to generate the grammar source files only generates for the listener approach.

Override the generated interfaces

After running the gradle build and the antlr plugin has generated the required source file, we will extend the methods from the generated CalculatorBaseVisitor class.

class CustomVisitor : CalculatorBaseVisitor<Double>() {
    override fun visitNumber(ctx: CalculatorParser.NumberContext?): Double {
        return ctx!!.NUMBER().text.toDouble()
    }

    override fun visitAdditionOrSubtraction(ctx: CalculatorParser.AdditionOrSubtractionContext?): Double {
        if (ctx!!.operator.text.equals("+")) {
            return this.visit(ctx.left) + this.visit(ctx.right)
        }
        return this.visit(ctx.left) - this.visit(ctx.right)
    }

    override fun visitMultiplicationOrDivision(ctx: CalculatorParser.MultiplicationOrDivisionContext?): Double {
        if (ctx!!.operator.text.equals("*")) {
            return this.visit(ctx.left) * this.visit(ctx.right)
        }
        return this.visit(ctx.left) / this.visit(ctx.right)
    }

    override fun visitParentheses(ctx: CalculatorParser.ParenthesesContext?): Double {
        return this.visit(ctx!!.inner)
    }

    override fun visitPower(ctx: CalculatorParser.PowerContext?): Double {
        return Math.pow(this.visit(ctx!!.left), this.visit(ctx.right))
    }
}

We will declare data type after every visit method to be Double for this example. This example is written in Kotlin. Then, the visitor implementation for this simple calculator expression parser is completed.

Executions comparisons between Visitor and Listener implementations

How Visitor Implementation looks like

The @Benchmark annotation is for tagging the method to be benchmarked for JMH library.

    @Benchmark
    fun calculateWithVisitor(plan: ExecutionPlan) : Double {
        val lexer = CalculatorLexer(CharStreams.fromString(plan.validData))
        val token = CommonTokenStream(lexer)
        val parser = CalculatorParser(token)

        val visitor = CustomVisitor()

        return visitor.visit(parser.start())
    }

How Listener Implementation looks like

“The @Benchmark annotation is used to tag a method for benchmarking with the JMH library. A brief summary of the listener approach is that it visits a node and triggers events, such as upon entry to and exit from the node. For our implementation, we override the onExitxxxx methods. If it’s a number node, the value is pushed onto a stack. If it’s one of the operation nodes, we pop the necessary values from the stack, perform the calculation, and then push the result back onto the stack. Therefore, in the getResult() method, we simply pop the final value from the stack.”

    @Benchmark
    fun calculateWithListener(plan: ExecutionPlan): Double {
        val lexer = CalculatorLexer(CharStreams.fromString(plan.validData))
        val token = CommonTokenStream(lexer)
        val parser = CalculatorParser(token)
        val tree = parser.start()

        val listener = CustomListener()
        val walker = ParseTreeWalker()

        walker.walk(listener, tree)
        return listener.getResult()
    }

Results (Between Listener and Visitor)

The lower the score, the faster the the code is executed.

Run 1

Benchmark                             Mode  Cnt  Score   Error  Units
AntlrBenchmark.calculateWithListener  avgt    5  4.310 ± 0.212  us/op
AntlrBenchmark.calculateWithVisitor   avgt    5  3.919 ± 0.037  us/op

Run 2

Benchmark                             Mode  Cnt  Score   Error  Units
AntlrBenchmark.calculateWithListener  avgt    5  4.264 ± 0.082  us/op
AntlrBenchmark.calculateWithVisitor   avgt    5  3.929 ± 0.024  us/op

Run 3

Benchmark                             Mode  Cnt  Score   Error  Units
AntlrBenchmark.calculateWithListener  avgt    5  4.383 ± 0.083  us/op
AntlrBenchmark.calculateWithVisitor   avgt    5  3.670 ± 0.027  us/op

Results we can achieve if the AST is cached (Nice to have)

This could be the next phase in performance improvement:

Pros

Drastically improves calculation performance.
Reduces memory and CPU usage during runtime since the AST is already available and does not require Lexers to tokenize or Parsers to parse tokens into an AST.

Cons

Likely requires custom implementation of visitor logic and Tree data structure.
Increases memory storage requirements on the database side if the AST is pre-cached/stored; alternatively, leveraging a caching server near the execution server (where calculations run) could mitigate this.
Limited unless the AST includes additional data for node visit sequences; otherwise, a parser from the grammar will still be needed to construct this data structure (See Additional Notes below for a potential solution).

Perhaps a future blog post on this topic would be enjoyable to explore.

Another wild idea for improvement

We can potentially transform the Abstract Syntax Tree (AST) into a Leftmost tree. This transformation would eliminate the need for the node visit data structure, which is currently created when the parser checks the initial AST.

Perhaps a future blog post on this topic would be interesting to explore and benchmark against Antlr’s tree traversal approach. While there are likely existing benchmarks for tree traversal, implementing and exploring this approach would provide valuable insights and deepen our understanding.

Test Run 1

Benchmark                                        Mode  Cnt  Score   Error  Units
AntlrBenchmark.calculateWithCachedAstAndVisitor  avgt    5  0.192 ± 0.011  us/op
AntlrBenchmark.calculateWithListener             avgt    5  4.300 ± 0.082  us/op
AntlrBenchmark.calculateWithVisitor              avgt    5  3.835 ± 0.024  us/op

Test Run 2

Benchmark                                        Mode  Cnt  Score   Error  Units
AntlrBenchmark.calculateWithCachedAstAndVisitor  avgt    5  0.195 ± 0.009  us/op
AntlrBenchmark.calculateWithListener             avgt    5  4.322 ± 0.077  us/op
AntlrBenchmark.calculateWithVisitor              avgt    5  3.941 ± 0.031  us/op

Test Run 3

Benchmark                                        Mode  Cnt  Score   Error  Units
AntlrBenchmark.calculateWithCachedAstAndVisitor  avgt    5  0.196 ± 0.004  us/op
AntlrBenchmark.calculateWithListener             avgt    5  4.437 ± 0.022  us/op
AntlrBenchmark.calculateWithVisitor              avgt    5  3.755 ± 0.050  us/op

Results with JavaCC parsing included

Snippet of JavaCC Grammar

public class Calculator
{
    public static void main(String args[]) throws ParseException
    {
        Calculator parser = new Calculator(System.in);
        while (true)
        {
            parser.parseOneLine();
        }
    }
}
PARSER_END(Calculator)
SKIP :
{
    " "
|   "\r"
|   "\t"
}
TOKEN:
{
    < NUMBER: (<DIGIT>)+ ( "." (<DIGIT>)+ )? >
|   < DIGIT: ["0"-"9"] >
|   < EOL: "\n" >
}
...

JavaCC benchmark method

    @Benchmark
    fun javaCCCalculate(plan: ExecutionPlan) : Double {
        val parser = Calculator(CalculatorTokenManager(SimpleCharStream(StringReader(plan.validData!!))))
        return parser.expr()
    }

Test Run 1

Benchmark                                        Mode  Cnt  Score   Error  Units
AntlrBenchmark.calculateWithCachedAstAndVisitor  avgt    5  0.194 ± 0.009  us/op
AntlrBenchmark.calculateWithListener             avgt    5  4.349 ± 0.058  us/op
AntlrBenchmark.calculateWithVisitor              avgt    5  3.842 ± 0.049  us/op
JavaccBenchmark.javaCCCalculate                  avgt    5  4.929 ± 0.195  us/op

Test Run 2

Benchmark                                        Mode  Cnt  Score   Error  Units
AntlrBenchmark.calculateWithCachedAstAndVisitor  avgt    5  0.194 ± 0.011  us/op
AntlrBenchmark.calculateWithListener             avgt    5  4.403 ± 0.082  us/op
AntlrBenchmark.calculateWithVisitor              avgt    5  3.920 ± 0.036  us/op
JavaccBenchmark.javaCCCalculate                  avgt    5  4.796 ± 0.176  us/op

Test Run 3

Benchmark                                        Mode  Cnt  Score   Error  Units
AntlrBenchmark.calculateWithCachedAstAndVisitor  avgt    5  0.195 ± 0.003  us/op
AntlrBenchmark.calculateWithListener             avgt    5  4.351 ± 0.101  us/op
AntlrBenchmark.calculateWithVisitor              avgt    5  3.899 ± 0.032  us/op
JavaccBenchmark.javaCCCalculate                  avgt    5  4.766 ± 0.077  us/op

Compare with complex expression

Benchmark with wording complex uses the same longer expression which will create a more complex AST.

Expression used = ((2 + 3 * (4 – 5 / (7 – 8) )) + (9 – 10) * 11) / 12

Benchmark                                               Mode  Cnt   Score   Error  Units
AntlrBenchmark.calculateWithCachedAstAndVisitor         avgt    5   0.192 ± 0.010  us/op
AntlrBenchmark.calculateWithCachedAstAndVisitorComplex  avgt    5   0.430 ± 0.015  us/op
AntlrBenchmark.calculateWithListener                    avgt    5   4.382 ± 0.390  us/op
AntlrBenchmark.calculateWithListenerWithComplex         avgt    5  13.127 ± 0.198  us/op
AntlrBenchmark.calculateWithVisitor                     avgt    5   3.844 ± 0.032  us/op
AntlrBenchmark.calculateWithVisitorWithComplex          avgt    5  12.704 ± 0.050  us/op
JavaccBenchmark.javaCCCalculate                         avgt    5   4.561 ± 0.098  us/op
JavaccBenchmark.javaCCCalculateWithComplex              avgt    5  10.373 ± 0.217  us/op

As you can see, a more complex expression will result in Javacc library being more efficient. So depending on the scenario encountered during real live production formulas and the expected performance, JavaCC can execute faster.

Why Antlr was chosen?

Based on the criteria described earlier, let’s briefly discuss why Antlr was ultimately chosen:

Visual: Antlr syntax is aesthetically pleasing and closely resembles the parser syntax of the RACC gem.
Performance: Performance varies depending on the scenario. Although Antlr is faster for simple expressions, the Visitor approach excels for grammars involving calculation formulae. JavaCC has the potential to be faster in certain contexts.
Popularity: Antlr has a higher number of commits and issues, indicating wider adoption and community support.

Conclusion

We opted for Antlr primarily for its maintainability (syntax is aesthetically pleasing) and its out-of-the-box support for the Visitor implementation.

This summarizes our journey of evaluating library candidates and making informed decisions on which library to proceed with.