Exploring Code Generation with Janino

In this blog post, we are going to talk about potential advantages of using custom execution plan of a query rather than using the traditional iterator model in which query execution is composed of many operators.

Iterator model comes from those times where we did not pay attention to writing performant code and rather focussed on writing more readable code ( one cannot simply deny the readability aspect of iterator model ). But as of now, we are in situations where we are heavily getting bottlenecked on CPU instructions, so running optimized instruction sets is the need of the hour,

Let’s take this simple query for example. We have got a list of numbers, we need to apply these three operations on a list of numbers.

  • Add a number n to each of the numbers in the list
  • Subtract a number m from each of the numbers in the list
  • Return the final list of numbers after applying both the operations

With iterator model, we wrote this simple and easy code for the above-mentioned operations in this way:

We have an Operator abstract Class and all the other concrete operations as specified above are implemented as different operators ( like AddOperator, Subtract Operator ) and are chained to one another via composition. These chained operators act as a single operator and we can iterate through the numbers emitted by this single operator to get the final numbers having all the transformations.

abstract class Operator {
    abstract public boolean hasNext();
    abstract public int getNext();
class AddOperator extends Operator
class SubtractOperator extends Operator
class SourceOperator extends Operator
SourceOperator sourceOperator = new SourceOperator(arrayList);
AddOperator addOperator = new AddOperator(sourceOperator, 10);
SubtractOperator subtractOperator = new SubtractOperator(addOperator, 15);
while(subtractOperator.hasNext()) {

In this methodology, we can clearly see that every operation follows an iterator kind of a model wherein they return the results to the subsequent operator one by one through their next() function. Though this iterator model is highly extendable for all kinds of queries, this comes with own set of problems. Just to give a glimpse of the issue with this approach, [2] let us write a really naive version of this code which may not be modular or readable for that matter.

for (int i = 0; i < arrayList.size(); i++) {
    int num = ((arrayList.get(i) + n) - m);

This code seems ok at first and may not be the most readable but it serves the same purpose as the other piece of code implementing “iterator model”. Let us compare the performance for both of these implementations.

  • Methodology 1 with Operators Chaining has throughput of around ~666 ops/second
  • Methodology 2 with Inlined Operations has throughput of around ~1000 ops/second

In the benchmark, it is clearly visible that this naive implementation with inlined operations is far more performant than modular “iterator model” approach. But why is it so ??

This is because of overhead associated with
1) virtual function calls and
2) unable to inline functions ( see this link )
which in turn results in a bloated set of instructions to be executed on the processor.

So what if we can generate this inlined naive executable code for each query, that should obviously enhance the performance of the queries from the current. This can be achieved by code generation which is used by many modern databases and query engines. In fact, many databases talk about how code generation caused a major performance improvement to their databases. Code generation is just another term for generating this custom executable code for a query. There are many libraries in the market which does this custom code generation and return a native compiled code given a query ( read about LLVM ).

In the next section, we are going about Janino compiler which does this code generation in Java Land and is used by prominently by Spark.

How to use Janino ??

Janino is a super fast java compiler which can be used to translate java expressions or java code blocks into Java bytecode. It easily embeds in your application. Here is an example of how and when to use JANINO compiler in your application.

Suppose we have a query which wants to:

  • Add 10 to each of the numbers
  • Filter all the numbers which are less than 40
  • Multiply all the numbers by 5 and return the list of numbers

With the iterator model, we would have constructed three operators for each of the three stages and then chained those three operators and then iterated through the chained operator to get all the final list of numbers. But with Janino compiler, we can afford to create runtime execution plan for a query and run it against the data to get the final list.

  • Suppose we were somehow able to generate this custom execution code for this query and write it down in some text file.
// FileName: Generated.txt

public ArrayList<Integer> returnResults(ArrayList<Integer> arrayList) {
    ArrayList<Integer> results = new ArrayList<Integer>();
    for(int i = 0; i < arrayList.size(); i++) {
        int num = ((Integer) arrayList.get(i)) + 10;
        if (num > 40) {
            results.add(num * 5);
    return results;
  • Now after generating this custom code, we need to compile it with Janino Compiler and generate some executable format of this above code.
public GeneratedOperator init() {
  Scanner scanner = new Scanner("Test.txt");
  ClassBodyEvaluator cbe = new ClassBodyEvaluator();
  Class c = cbe.getClazz();
  return (GeneratedOperator) c.newInstance();
  • After compiling the generated code ( i.e. Generated.txt ), we can easily use the compiled code and pass it a list of numbers to get the final list.
public static void main(String args[]) throws CompileException, InstantiationException, IllegalAccessException, IOException {
    // init method compiles the code and return a GeneratedOperator
    // instance which has this method generated method returnResults

    GeneratedOperator generatedOperator = new CompiledCodeExample().init();
    ArrayList<Integer> arrayList = new ArrayList<Integer>();
    ArrayList<Integer> returnList = generatedOperator.returnResults(arrayList);
    for (int i = 0; i < returnList.size(); i++) {
Output is as expected:

Note: Janino is responsible for compiling this generated string into a java method, but still this string has to be constructed via your own application logic.

Understanding behavior of Janino Compiled Classes with JIT

Now we know how to generate custom optimized byte-code for a query and execute it.  Let’s understand how do this Janino Compiled Classes behave with JIT.

In this experiment, we will take the same query as defined above and perform two kinds of execution models

  • Iterator Model ( i.e. Chaining of Operators )
  • Custom Code Generation via Janino

Note: Results clearly seem to point that Code Generation Model seems to have outperformed Iterator Model.

Execution via Iterator Model

This experiment has been performed with JIT enabled which essentially means that JIT must have inlined and compiled the different operators ( i.e. iterators ) into a single native function.

public static ArrayList<Integer> experimentOperators(ArrayList<Integer> arrayList) {
    ArrayList<Integer> arrayList1 = new ArrayList<Integer>();
    SourceOperator sourceOperator = new SourceOperator(arrayList);
    AddOperator addOperator = new AddOperator(sourceOperator, 10);
    FilterOperator filterOperator = new FilterOperator(addOperator, 40);
    MultiplyOperator multiplyOperator = new MultiplyOperator(filterOperator, 5);
    while (multiplyOperator.hasNext()) {
    return arrayList1;


In this screenshot, we can clearly see the

  • Inlining of different operators into a single inlined function
  • Compilation of this single inlined function by C2 compiler

Execution via Custom Code Generation

In this execution strategy, we are generating only once this custom executable bytecode for the entire experiment duration which essentially means this custom executable bytecode should become eligible to be JITed ( after some iterations ) to native instructions which would improve the query performance even more.

// We are doing the initialization of this custom code generated
// method only once and using the same generated method over and 
// over again in the experiments.

GeneratedOperator generatedOperator = new CompiledCodeExample().init();

public GeneratedOperator init() throws IOException, CompileException, IllegalAccessException, InstantiationException {
 ClassBodyEvaluator cbe = new ClassBodyEvaluator();
 String[] strings = new String[2];
 strings[0] = HashMap.class.getName();
 strings[1] = ArrayList.class.getName();
 Class c = cbe.getClazz();
 return (GeneratedOperator) c.newInstance();

public void experimentCodeGeneration(Blackhole blackhole) throws IllegalAccessException, InstantiationException, IOException, CompileException {


In this screenshot, we can clearly see that method “returnResults” gets compiled by C2 compiler.

This method gets compiled only because we are generating this custom JANINO compiled code for the method only once for the entire experiment duration. But if we will generate this custom bytecode for the method for every invocation of the JMH benchmark, JIT will not compile the method. This would be because JIT will assume that for every invocation we are using a different custom code and hence it is of no use to compile this method ( JANINO compiled code ) across JMH invocations.

//  In this experiment we are compiling the custom code again 
// and again for each invocation of JMH benchmark and hence this
// code would not get JITed across benchmark invocations because 
// JIT has no way of knowing whether it is the same method 
// which was compiled before in the previous invocation as well.

public void experimentCodeGeneration(Blackhole blackhole) throws IllegalAccessException, InstantiationException, IOException, CompileException {
    generatedOperator = new CompiledCodeExample().init();


In this, we can clearly see that “ReturnResults” method is nowhere to be seen which essentially means it is not JIT compiled.

So essentially it means that if we have the same query hitting over and over again, with Janino Code generation methodology we will generate the bytecode for every query so essentially the code path will never get JIT compiled whereas with Iterator Model we will already have compiled and inlined methods. JMH Performance numbers also seem to suggest the same.


Note: In this, we can clearly see that Iterator Model outperforms JANINO Code Generation Model. This is because of the reasoning specified above i.e. in JANINO Code Generation methodology we are doing code generation again and again for each query and hence JIT is not able to compile the methods across queries. Some modern databases use Execution Plan Cache to overcome this problem and hence make sure that if the same query is hitting again and again, they use the same Generated Code.


JIT Optimizations – Method Inlining

In this blog post, we are going to understand the impact of functions calls in an application and what JIT does to reduce its impact.

A function call is a relatively expensive operation but JIT makes sure that our application does not suffer, performance wise, due to a large number of function calls. JIT does function inlining to make sure that function calls are minimized in an application. Further, we are going to learn “how can we debug one’s application and see which functions are getting inlined”.

Are functions calls expensive?

At first, a function call may seem trivial to you but it involves a lot of instructions getting executed under the hood which in turn might kill your application performance if your application involves a lot of function calls.

Let us a consider a simple function call and see the native instructions involved:

private static void subtract(int num) {
    int r = 20 - num;

private int getNum() {
    int a1 = 10;
    return 0;

Following things happen when we call subtract function from getNum() function


  • Arguments get pushed onto the stack i.e. “12”
  • Return Address of the Caller i.e. getNum() gets pushed onto the stack
  • Frame Pointer of the Caller i.e. getNum() is also pushed onto the stack. Frame Pointer points to the memory address storing the return address of the caller.
  • Call Instruction transfers the control to the callee and instructions of the callee starts getting executed.
  • Method signature of the callee is executed
  • Ret Instruction transfers the control back to the caller with frame pointer restored to the original frame pointer of the caller method.
  • Original Arguments Passed to the callee i.e. “12” are popped from the stack.

So we can see that there is a whole lot of instructions getting executed even when we call a simple function like subtract which makes a function call really expensive.

Just to add to this nowadays, with the advent of modern programming styles, it is highly recommended to write smaller functions to improve the readability of the code which in turn increases the overhead of the function calls even more.

Method Inlining

This is another important performance optimization used by JIT. Function inlining greatly influences the performance of an application.

Let’s check out the performance boost application gets with function inlining with this example:

We have to apply these three operations on a number.

  • Multiply constant x to the number
  • Subtract constant x to the number
  • Add constant x from the number

There are two ways to do this:

  • Methodology1: Inline all the operations in a single method
public static ArrayList experimentFunctionInlining(ArrayList arrayList) {
    for (int i = 0; i < 10000; i++) {
        int num2 = 10 * i;
        int num1 = 10 - num2;
        int num = 10 + num1;
    return arrayList;
  • Methodology2: Write all the operations in different methods and call them one by one after each operation
private static int add(int num) {
 return 10 + num;

private static int subtract(int num) {
 return 10 - num;

private static int multiply(int num) {
 return 10 * num;

public static ArrayList experimentFunctionCalling(ArrayList arrayList) {
    for (int i = 0; i < 10000; i++) {
        int num2 = multiply(i);
        int num1 = subtract(num2);
        int num = add(num1);
    return arrayList;

Note: These two tests have been benchmarked with JMH with JIT disabled, so as to understand the impact of function inlining.

  • Methodology 1 of inlining all the operations in a single method performs at ~135 ops/second
  • Methodology 2 of writing all operations in separate functions and calling them one by one performs at ~98 ops/second

This shows that function inlining has a huge impact on application performance.

But to write code via Methodology 1 is not always possible for the sake of readability. JIT comes in handy for such situations. JIT figures out the hot code path in an application and tries to inline all the methods lying on that hot code path. Now let’s run this same benchmark with JIT enabled and see if there is any performance difference between the two methodologies. Our hypothesis is that JIT should inline the methods/functions in Methodology2 and hence the performance numbers more or less should be the same.

And voila, yes they are

  • Methodology 1 with JIT enabled performs at ~ 9000 ops/second
  • Methodology 2 with JIT enabled also performs at ~ 9000 ops/second

So it seems with JIT, the performance of the JIT inlined method is in the same ballpark as the original inlined method. Also apart from reducing the function call overhead, one other important reason for function inlining is that inlined function have more context which can then be used by compilers to make many other optimizations.

Debug your application

JIT has certain limitations when it comes to inlining methods on hot code path. Method inlining depends on these factors:

  • JIT can inline methods up to a particular depth
  • JIT support inlined methods up to a particular size
  • To be Inlined Method Type
    • JIT can easily align static method types
    • For inlining virtual functions, it needs to be aware of the classType of the object on which function is called so as to resolve the function definition.
  • Many others …

Few terminologies to understand beforehand. Sample JIT output logs:

( Method 1 ) @ 4 com.test.experiments.operators.OperatorPipelineEmulationExperiment::experimentVirtual (45 bytes) inline (hot)
( Method 2 )   @ 4 com.test.experiments.operators.BufferedOperator:: (21 bytes) inline (hot)
( Method 3 )      @ 1 com.test.experiments.operators.Operator:: (5 bytes) inline (hot)
( Method 4 )   @ 15 com.test.experiments.operators.AddOperator:: (25 bytes) inline (hot)
  • @ Annotation in JIT denotes the place in java method which triggered the compilation ( i.e. osr_bci ). Like in the above example, the code at the 4th index in the method 1 triggered an OSR compilation request.
  • To show the method inlining hierarchy, JIT chooses this format. In this, we can clearly see that
    • Method 1 inlines Method 2 and Method 4.
    • Method 2 inlines Method 3
  • TypeProfile is a special kind of check or profiling made by JIT which is used when we want to inline virtual functions. Inlining in cases where polymorphism is involved is difficult due to a simple fact that the caller might refer to different methods or different call sites depending on the classType of the object on which method is called. So in these cases, JIT profiles the types or call sites to which we are making calls and in cases, we are making calls to a single call site, JIT optimizes those after taking enough data samples.

Let’s understand the logs for method Inlining in JIT. We will use this sample application for testing purposes.

public static ArrayList experimentVirtual(ArrayList arrayList) {
    BufferedOperator bufferedOperator = new BufferedOperator(); // Line 1
    AddOperator addOperator = new AddOperator(bufferedOperator, 10); // Line 2
    SourceOperator sourceOperator = new SourceOperator(addOperator, true); // Line 3
    sourceOperator.setArrayList(arrayList); // Line 4
    sourceOperator.get(1); // Line 5
    return bufferedOperator.arrayList;

Note: For more code details see this link

Here are the JIT logs for the application



  • JIT inlines the call sites involved in the first 3 lines of the method ( i.e. experimentVirtual ) which is obvious in Section 1, 2 and 3.
    BufferedOperator bufferedOperator = new BufferedOperator();
    AddOperator addOperator = new AddOperator(bufferedOperator, 10);
    SourceOperator sourceOperator = new SourceOperator(addOperator, true);
  • In Section 4, we can clearly see that JIT is trying to inline all the call sites involved in line number 5 (i.e. sourceOperator.get(1))

    • In section 4, we can see that first, it tries to inline the source code for the .get() implementation in sourceOperator.
      for (int i = 0; i < nums.size(); i++) {
          int p = nums.get(i);
          if (enableFlush && (i % flushNumber == 0)) {
    • Also with the help of typeProfileit figures out the call site involved in underlyingOperator.get() and inlines that as well  i.e. AddOperator.get().


JIT Optimizations – Method Compilations

JIT  ( Just in Time ) is certainly one of the most interesting features of JVM. This feature makes sure that we are able to run our code with machine level optimizations. JIT in itself does tons and tons of optimizations under the hood which are absolutely necessary for running latency intensive applications.

Impact of JIT

Let’s take this code as an example to study how does JIT affects the performance of our application.

This piece of code follows somewhat volcano design paradigm in which every operator does some task and these operators are bound together and exchange data through a common operator interface and collectively do a bigger task. In this case, these operators are bound together to add a particular number to all the elements in the array.

public static ArrayList<Integer> experimentVirtual(ArrayList<Integer> arrayList) {
    BufferedOperator bufferedOperator = new BufferedOperator();
    AddOperator addOperator = new AddOperator(bufferedOperator, 10);
    SourceOperator sourceOperator = new SourceOperator(addOperator, true);
    return bufferedOperator.arrayList;

See this Github link for more code details.

We ran this code with and without JIT optimizations. There was a huge difference in the throughputs between these two runs.

  • With JIT Disabled we got throughput of around ~3 operations per second
  • With JIT Enabled we got throughout of around ~290 operations per second

So JIT made the code faster by around 100x. So understanding the internal workings of JIT and then asking this question “what can we do to make the life of JIT easier” is the key if you want to improve the performance of your application.

In this series, we will talk about these optimizations in details and we will also learn about debugging our application JIT logs. This particular blog post deals with one of the most important features of JIT which is code compilation.

Code Compilation

This is one of the most important functionalities of JIT. JIT is responsible for compiling Java bytecode to native code instructions at runtime to boost the application performance. JIT figures out the hot code paths via profiling and then compiles those methods into native machine instructions to improve the performance of those hot paths.

How does method compilation happen

Currently, JIT supports these 5 levels of compilation

 *  The system supports 5 execution levels:
 *  * level 0 - interpreter
 *  * level 1 - C1 with full optimization (no profiling)
 *  * level 2 - C1 with invocation and backedge counters
 *  * level 3 - C1 with full profiling (level 2 + MDO)
 *  * level 4 - C2

A Method has to go through some of these compilation phases to reach to the final optimized version of itself. The lifecycle of a method is as follows:

  • All the Methods starts executing firstly in an interpreted mode. During this execution phase, it is found out if a method is hot enough or not. This is found out mostly with the help of method invocations and backedge counters. So if a method crosses a certain threshold of method invocations and/or backedge counters, then it is eligible for compilation at different levels. See this.
  • Now after a method is declared hot, it is now compiled at level 3 by C1 compiler aka client compiler. This compiler does following things:
    • In short time it determines the obvious optimizations that can be done to improve the application performance. This short time is also because of the fact that this compiler is latency sensitive and wants to make sure that the application is in a working state as quickly as possible with obvious optimizations
    • It profiles the methods adequately to make sure that this profiling information can be used by other higher level compilers and they can do more contextual optimizations which would have been otherwise hard.
  • After a method is compiled by C1 compiler, it starts getting executed and starts gathering metrics and based on these metrics it is decided if it needs to be compiled again by C2 compiler.
    • This C2 compiler tries to focus on the best possible optimizations in the method which might affect latencies in the initial duration but would result in higher application throughput eventually.
    • This C2 compiler gathers more metrics for those methods and does more optimizations which are mostly contextual e.g. virtual function inlining. We will explain this later with the help of examples.
  • Apart from this usual flow of method compilation i.e from level 0 -> level 3 -> level 4, there are some other flows in which methods follow a whole different compile path. For reading about those have a look at this link.

Benefits of Method Compilation

Method compilation is one of the most important sauce of performance optimization in modern compilers. C/C++ is fast when compared to legacy java was this simple reason that C/C++ is a compiled language whereas java is an interpreted language. Lets, first of all, understand why is Java Interpreted even when we know that interpreted languages are inherently slow when compared to compiled language.

Java was built on compile once and run everywhere ( on any architecture ) kind of model. This essentially means that source code would be compiled once and this deployable compiled version of the source code would be run anywhere or on any platform. This basically solves the problem of writing code for each and every architecture and then make sure it runs smoothly on all those architectures. But with Java, we just had to write code once and compile into a deployable and use this deployable across all the platforms. This greatly improved the then development phase of the applications.

But with this architecture ( write once and deploy everywhere ) there came a serious problem of non-performant applications. As these deployables were runtime interpreted to the native instructions it made the application damn slow. Then to solve this problem of runtime compilation of the methods JIT came into existence.

With JIT we got the superpower to compile the methods during the runtime of the applications to their native instructions to hugely improve the performance of the application. In some benchmarks with JIT performance of JAVA seems to cross over the performance of C/C++ code. This is mainly because JIT has runtime information with the use of which JIT can do other contextual improvements in the code.

Now Let’s understand how can method compilation affect the performance of an application. We have a performance benchmark in which once we will disable the compilation of some of the methods and compare it with when we haven’t disabled anything.

  • With compilation of some of the methods of the application disabled, we achieved a throughput of around ~ 30 ops/second
  • With compilation enabled for all the methods of the application, we achieved a throughput of around ~600 ops/second

So we can see that compilation of the methods has a huge performance impact on the application. So we need to have a basic idea of the compilation of the methods in our application to know of any potential bottlenecks/improvements.

To know which methods are getting compiled and which is not, you need to add extra JVM flags while starting up your application.

java -XX:+UnlockDiagnosticVMOptions 
     -jar benchmarks.jar

This would prints logs in this format

ts  denotes timestamp
cid denotes compile_id
l   denotes compile_level

ts   cid  l        methodAffected
498  46   3    com.test.experiments.CodeOptimizedBenchmark::<clinit> (46 bytes)
498  46   3    com.test.experiments.CodeOptimizedBenchmark::<clinit> (46 bytes)
531  47   3    com.test.experiments.operators.AddOperator::get (52 bytes)
531  48   3    com.test.experiments.operators.BufferedOperator::get (42 bytes)
538  49   4    com.test.experiments.operators.AddOperator::get (52 bytes)
538  50   4    com.test.experiments.operators.BufferedOperator::get (42 bytes)
540  48   3    com.test.experiments.operators.BufferedOperator::get (42 bytes) made not entrant
541  47   3    com.test.experiments.operators.AddOperator::get (52 bytes) made not entrant
613  51%  3    com.test.experiments.operators.SourceOperator::get @ 2 (79 bytes)
614  52   3    com.test.experiments.operators.SourceOperator::get (79 bytes)
665  49   4    com.test.experiments.operators.AddOperator::get (52 bytes) made not entrant
666  50   4    com.test.experiments.operators.BufferedOperator::get (42 bytes) made not entrant
666  54   3    com.test.experiments.operators.AddOperator::get (52 bytes)
666  53   3    com.test.experiments.operators.BufferedOperator::get (42 bytes)
668  55%  4    com.test.experiments.operators.SourceOperator::get @ 2 (79 bytes)
674  56   4    com.test.experiments.operators.BufferedOperator::get (42 bytes)
674  51%  3    com.test.experiments.operators.SourceOperator::get @ -2 (79 bytes) made not entrant
674  57   4    com.test.experiments.operators.AddOperator::get (52 bytes)
675  53   3    com.test.experiments.operators.BufferedOperator::get (42 bytes) made not entrant
676  54   3    com.test.experiments.operators.AddOperator::get (52 bytes) made not entrant
679  58   4    com.test.experiments.operators.SourceOperator::get (79 bytes)
685  52   3    com.test.experiments.operators.SourceOperator::get (79 bytes) made not entrant

( We are just showing a subset of the compilation logs, there are much many other java or other libraries methods for which compilation happens. For more details about these logs see this link. )

So in the logs, we can clearly see the different methods getting compiled at different times. Different aspects of these logs are as follows:

  • A compiled method goes through different phases e.g. when a method is compiled it is assigned a compile_id and when this method is deoptimized or in other words made non-entrant, then that particular task is also assigned the same compile_id.
  • This compile_id attribute sometimes might contain %. This symbol indicates that the compilation has been done via OSR ( on stack replacement ). This happens when a method call contains a big loop, then in those cases, we don’t wait for the second invocation of the method but instead in the next invocation during next iteration, we replace the code for the method by its compiled version.
  • As already told, methods get compiled and deoptimized often and this deoptimization might happen for a variety of the reasons. This deoptimization is often denoted via made not entrant aside of the method name in the compilation logs. See this link for more details on the various reason for deoptimization.

So now we do understand the performance impact JIT brings to the table and how can compilations of the functions or method to native machine instructions benefit the application performance.

In the next section, we will talk about JIT method inlining optimization and its impact on the application performance.