In this blog post, we are going to talk about potential advantages of using custom execution plan of a query rather than using the traditional iterator model in which query execution is composed of many operators.
Iterator model comes from those times where we did not pay attention to writing performant code and rather focussed on writing more readable code ( one cannot simply deny the readability aspect of iterator model ). But as of now, we are in situations where we are heavily getting bottlenecked on CPU instructions, so running optimized instruction sets is the need of the hour,
Let’s take this simple query for example. We have got a list of numbers, we need to apply these three operations on a list of numbers.
- Add a number n to each of the numbers in the list
- Subtract a number m from each of the numbers in the list
- Return the final list of numbers after applying both the operations
With iterator model, we wrote this simple and easy code for the above-mentioned operations in this way:
We have an Operator abstract Class and all the other concrete operations as specified above are implemented as different operators ( like AddOperator, Subtract Operator ) and are chained to one another via composition. These chained operators act as a single operator and we can iterate through the numbers emitted by this single operator to get the final numbers having all the transformations.
abstract class Operator { abstract public boolean hasNext(); abstract public int getNext(); }
class AddOperator extends Operator class SubtractOperator extends Operator class SourceOperator extends Operator
SourceOperator sourceOperator = new SourceOperator(arrayList); AddOperator addOperator = new AddOperator(sourceOperator, 10); SubtractOperator subtractOperator = new SubtractOperator(addOperator, 15); while(subtractOperator.hasNext()) { subtractOperator.next(); }
In this methodology, we can clearly see that every operation follows an iterator kind of a model wherein they return the results to the subsequent operator one by one through their next() function. Though this iterator model is highly extendable for all kinds of queries, this comes with own set of problems. Just to give a glimpse of the issue with this approach, [2] let us write a really naive version of this code which may not be modular or readable for that matter.
for (int i = 0; i < arrayList.size(); i++) { int num = ((arrayList.get(i) + n) - m); }
This code seems ok at first and may not be the most readable but it serves the same purpose as the other piece of code implementing “iterator model”. Let us compare the performance for both of these implementations.
- Methodology 1 with Operators Chaining has throughput of around ~666 ops/second
- Methodology 2 with Inlined Operations has throughput of around ~1000 ops/second
In the benchmark, it is clearly visible that this naive implementation with inlined operations is far more performant than modular “iterator model” approach. But why is it so ??
This is because of overhead associated with
1) virtual function calls and
2) unable to inline functions ( see this link )
which in turn results in a bloated set of instructions to be executed on the processor.
So what if we can generate this inlined naive executable code for each query, that should obviously enhance the performance of the queries from the current. This can be achieved by code generation which is used by many modern databases and query engines. In fact, many databases talk about how code generation caused a major performance improvement to their databases. Code generation is just another term for generating this custom executable code for a query. There are many libraries in the market which does this custom code generation and return a native compiled code given a query ( read about LLVM ).
In the next section, we are going about Janino compiler which does this code generation in Java Land and is used by prominently by Spark.
How to use Janino ??
Janino is a super fast java compiler which can be used to translate java expressions or java code blocks into Java bytecode. It easily embeds in your application. Here is an example of how and when to use JANINO compiler in your application.
Suppose we have a query which wants to:
- Add 10 to each of the numbers
- Filter all the numbers which are less than 40
- Multiply all the numbers by 5 and return the list of numbers
With the iterator model, we would have constructed three operators for each of the three stages and then chained those three operators and then iterated through the chained operator to get all the final list of numbers. But with Janino compiler, we can afford to create runtime execution plan for a query and run it against the data to get the final list.
- Suppose we were somehow able to generate this custom execution code for this query and write it down in some text file.
// FileName: Generated.txt public ArrayList<Integer> returnResults(ArrayList<Integer> arrayList) { ArrayList<Integer> results = new ArrayList<Integer>(); for(int i = 0; i < arrayList.size(); i++) { int num = ((Integer) arrayList.get(i)) + 10; if (num > 40) { results.add(num * 5); } } return results; }
- Now after generating this custom code, we need to compile it with Janino Compiler and generate some executable format of this above code.
public GeneratedOperator init() { Scanner scanner = new Scanner("Test.txt"); ClassBodyEvaluator cbe = new ClassBodyEvaluator(); cbe.setClassName("JaninoClass"); cbe.setExtendedClass(AbstractJaninoClass.class); cbe.cook(scanner); Class c = cbe.getClazz(); return (GeneratedOperator) c.newInstance(); }
- After compiling the generated code ( i.e. Generated.txt ), we can easily use the compiled code and pass it a list of numbers to get the final list.
public static void main(String args[]) throws CompileException, InstantiationException, IllegalAccessException, IOException { // init method compiles the code and return a GeneratedOperator // instance which has this method generated method returnResults GeneratedOperator generatedOperator = new CompiledCodeExample().init(); ArrayList<Integer> arrayList = new ArrayList<Integer>(); arrayList.add(100); arrayList.add(1); arrayList.add(2); arrayList.add(13); arrayList.add(40); ArrayList<Integer> returnList = generatedOperator.returnResults(arrayList); for (int i = 0; i < returnList.size(); i++) { System.out.println(returnList.get(i)); } }
Output is as expected: 550 250
Note: Janino is responsible for compiling this generated string into a java method, but still this string has to be constructed via your own application logic.
Understanding behavior of Janino Compiled Classes with JIT
Now we know how to generate custom optimized byte-code for a query and execute it. Let’s understand how do this Janino Compiled Classes behave with JIT.
In this experiment, we will take the same query as defined above and perform two kinds of execution models
- Iterator Model ( i.e. Chaining of Operators )
- Custom Code Generation via Janino
Note: Results clearly seem to point that Code Generation Model seems to have outperformed Iterator Model.
Execution via Iterator Model
This experiment has been performed with JIT enabled which essentially means that JIT must have inlined and compiled the different operators ( i.e. iterators ) into a single native function.
public static ArrayList<Integer> experimentOperators(ArrayList<Integer> arrayList) { ArrayList<Integer> arrayList1 = new ArrayList<Integer>(); SourceOperator sourceOperator = new SourceOperator(arrayList); AddOperator addOperator = new AddOperator(sourceOperator, 10); FilterOperator filterOperator = new FilterOperator(addOperator, 40); MultiplyOperator multiplyOperator = new MultiplyOperator(filterOperator, 5); while (multiplyOperator.hasNext()) { arrayList1.add(multiplyOperator.getNext()); } return arrayList1; }
In this screenshot, we can clearly see the
- Inlining of different operators into a single inlined function
- Compilation of this single inlined function by C2 compiler
Execution via Custom Code Generation
In this execution strategy, we are generating only once this custom executable bytecode for the entire experiment duration which essentially means this custom executable bytecode should become eligible to be JITed ( after some iterations ) to native instructions which would improve the query performance even more.
// We are doing the initialization of this custom code generated // method only once and using the same generated method over and // over again in the experiments. GeneratedOperator generatedOperator = new CompiledCodeExample().init(); public GeneratedOperator init() throws IOException, CompileException, IllegalAccessException, InstantiationException { ClassBodyEvaluator cbe = new ClassBodyEvaluator(); cbe.setExtendedClass(GeneratedOperator.class); cbe.setClassName("GeneratedClass"); String[] strings = new String[2]; strings[0] = HashMap.class.getName(); strings[1] = ArrayList.class.getName(); cbe.setDefaultImports(strings); cbe.cook(content); Class c = cbe.getClazz(); return (GeneratedOperator) c.newInstance(); } @Benchmark public void experimentCodeGeneration(Blackhole blackhole) throws IllegalAccessException, InstantiationException, IOException, CompileException { blackhole.consume(generatedOperator.returnResults(arrayList)); }
In this screenshot, we can clearly see that method “returnResults” gets compiled by C2 compiler.
Note: This method gets compiled only because we are generating this custom JANINO compiled code for the method only once for the entire experiment duration. But if we will generate this custom bytecode for the method for every invocation of the JMH benchmark, JIT will not compile the method. This would be because JIT will assume that for every invocation we are using a different custom code and hence it is of no use to compile this method ( JANINO compiled code ) across JMH invocations.
// In this experiment we are compiling the custom code again // and again for each invocation of JMH benchmark and hence this // code would not get JITed across benchmark invocations because // JIT has no way of knowing whether it is the same method // which was compiled before in the previous invocation as well. @Benchmark public void experimentCodeGeneration(Blackhole blackhole) throws IllegalAccessException, InstantiationException, IOException, CompileException { generatedOperator = new CompiledCodeExample().init(); blackhole.consume(generatedOperator.returnResults(arrayList)); }
In this, we can clearly see that “ReturnResults” method is nowhere to be seen which essentially means it is not JIT compiled.
So essentially it means that if we have the same query hitting over and over again, with Janino Code generation methodology we will generate the bytecode for every query so essentially the code path will never get JIT compiled whereas with Iterator Model we will already have compiled and inlined methods. JMH Performance numbers also seem to suggest the same.
Note: In this, we can clearly see that Iterator Model outperforms JANINO Code Generation Model. This is because of the reasoning specified above i.e. in JANINO Code Generation methodology we are doing code generation again and again for each query and hence JIT is not able to compile the methods across queries. Some modern databases use Execution Plan Cache to overcome this problem and hence make sure that if the same query is hitting again and again, they use the same Generated Code.
References
- http://highscalability.com/blog/2016/9/7/code-generation-the-inner-sanctum-of-database-performance.html
- https://advancedweb.hu/2017/03/01/jvm_optimistic_optimizations/
- https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html
- http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
- https://docs.memsql.com/concepts/v6.0/code-generation/
- https://clang.llvm.org/
- http://www.spark.tc/simd-and-gpu/
- https://medium.com/@metebalci/interpretation-jit-and-aot-compilation-in-the-jvm-3b61c8da5503