Seeing Escape Analysis Working
Chris Seaton, 16 December 2020
This article originally appeared in the Java Advent 2020.
You may have heard of a compiler analysis phase called escape analysis. It informs an optimisation called scalar replacement of aggregates that removes unnecessary allocation of Java objects. I find that people often have a misunderstanding of what this optimisation really does and an under-appreciation of what it’s capable of. We can know it better by seeing it working in practice.
Context
I’ll be showing how the GraalVM compiler, as part of GraalVM and configured for just-in-time compilation in hosted configuration, applies escape analysis and scalar replacement of aggregates. The GraalVM compiler is more powerful in this regard than the most common just-in-time compiler that you are probably used to, C2 in HotSpot.
Note that every time I say compiler in this article, I’m meaning just-in-time compilation, so the kind of compilation that happens at runtime in the JVM. I don’t mean the javac
compiler.
I’ll show how the compiler understands our example programs by looking at one of the data structures it uses during compilation, its higher intermediate representation. This is a graph that captures the meaning of a Java program as it is optimised and transformed before conversion to machine code starts. That probably sounds very advanced but I think for most people with some experience of Java it’s actually quite an approachable way to understand what the compiler is doing.
This intermediate representation is different to an abstract syntax tree, or AST, which is a syntactic representation of the text in your program. It’s also different to a call graph, which just tells you which methods call which other methods.
I’ll be using a tool called Seafoam, from Shopify’s language implementation team, to create diagrams of the data structures dumped out of Graal. You could also use a tool called the Ideal Graph Visualizer from Oracle Labs.
Terminology
Escape analysis is an analysis process. It detects if an object is visible outside of a compilation unit (a compilation unit is usually a method plus any other methods inlined into it.) An object can become visible outside a compilation unit by being returned, or passed as an argument to another method, or written to a field, or similar.
Scalar replacement of aggregates uses the result of escape analysis and works on objects that do not escape the compilation unit, or only escape it on some branches. It replaces objects (aggregates) with simple values (scalars) similar in effect to local variables. We’ll see this more concretely later on.
Object allocation
We’ll work with a simple Vector
class that holds double
x
and y
fields. It has a constructor, getters, and an add
method that returns a new object that is this Vector
added to another.
class Vector {
private final double x;
private final double y;
public Vector(double x, double y) {
this.x = x;
this.y = y;
}
public double getX() {
return x;
}
public double getY() {
return y;
}
public Vector add(Vector other) {
return new Vector(x + other.x, y + other.y);
}
}
I have a main
method that runs a loop and calls a sum
method that uses add
.
to sum two randomly generated vectors. The loop is there to get the code hot enough to trigger just-in-time compilation, the randomization is there so that values are not profiled to be effectively constant, and the separate sum
method is there to create a neat block of code to look at when looking at how code is compiled.
public static void main(String[] args) {
while (true) {
sum(new Vector(Math.random(), Math.random()), new Vector(Math.random(), Math.random()));
}
}
private static Vector sum(Vector a, Vector b) {
return a.add(b);
}
I’ll compile and run this using GraalVM CE on Java 8 20.3.0 on macOS on AMD64. I turn off an optimization called on-stack replacement because this optimization makes compilation a little more complicated and harder to understand. I then use Graal’s dump
option to ask it to dump out its intermediate representation data structures.
% PATH=graalvm-ce-java8-20.3.0/Contents/Home/bin:$PATH
% javac Vector.java
% java -XX:+PrintCompilation -XX:-UseOnStackReplacement -Dgraal.Dump=:2 Vector
We can now use Seafoam to look at how the compiler understands the sum
method, right before it runs escape analysis. The file sum.bgv
contains the data structure that the compiler dumped out for us.
% seafoam sum.bgv:14 render
What are we seeing here? This graph is the way that Graal understands the sum
method while it is compiling it. It’s Java code, but free from the constraints of a textual representation. Operations are represented as nodes (the boxes, diamonds, and ovals) and the relationships between operations are represented as edges (the arrows). The thick red arrow means that one operation must happen before another. The thinner arrow means that the output of the operation is used as input to another operation.
We said the program was freed from the constraints of a textual representation. Look at the add operation in node 14 for example. We know it needs to happen after we load the two values from the two x
fields, and before we store the value in the field x
in node 19, but we don’t care whether it happens before or after the two y
fields are loaded. In Java textual source code, everything is written in one linear order. The graph relaxes this constraint and encodes only the constraints that the Java Language Specification actually requires. The compiler is then free to pick the final order everything runs in.
Reading this graph, we can see we create a new Vector
, then load the two input objects’ x
and y
fields, add them together and then store them into the new Vector
we created before returning it. There are some extra details in the graph, such as the null
check, the final field barrier, and more because this is a real compiler graph, but we’ll ignore those for this article to keep it short.
The key thing to see here now is that there is a definite point where the new Vector
object is created.
Basic scalar replacement
Let’s look at the same method right after escape analysis and scalar replacement of aggregates have run.
% seafoam sum.bgv:15 render
Now the graph has been even further relaxed from how it appears in text. Instead of a New Vector
operation we have now decomposed it into a generic operation to allocate an object, taking the value of the fields of that object. There’s no StoreField
anymore here.
In this particular method, this hasn’t achieved anything useful - the graph has been relaxed but it didn’t reveal any really useful optimizations. That’s because the only object we allocated really does need to be allocated because it leaves the method. The escape analysis has run but the object escaped so there was no scalar replacement of aggregates.
To take this further, let’s look at a method which sums three Vector
objects.
private static Vector sum(Vector a, Vector b, Vector c) {
return a.add(b).add(c);
}
The graph before escape analysis and scalar replacement of aggregates has the same structure as the previous method. We have two New Vector
operations. The result of a.add(b)
is written into the first new object, and then read out again for the subsequent .add(c)
. We can see how this is wasteful - the intermediate object is never visible outside of the compilation unit because it isn’t stored or passed anywhere - so why allocate it and why store fields into it that we just immediately read out again?
If we look at the graph after escape analysis and scalar replacement of aggregates we see that the intermediate Vector
has been removed entirely. There is only one Alloc
, not two. The aggregate of the intermediate object’s fields has been replaced with the scalar values of edges directly connecting the producer and consumer, without doing an intermediate StoreField
and then LoadField
.
The effectiveness of scalar replacement
A few more examples will show how this optimisation is a bit more powerful than people often assume.
If we sum an array of Vector
objects we will be logically creating one Vector
per iteration.
private static Vector sum(Vector first, Vector... rest) {
Vector sum = first;
for (Vector vector : rest) {
sum = sum.add(vector);
}
return sum;
}
This graph is now pretty complicated, but the key point is you can see a New Vector
operation inside the loop (inside the circle formed of thick red arrows.)
After partial escape analysis and scalar replacement of aggregates run there is now no New Vector
inside the loop. There is a single allocation at the end for the final Vector
that is returned, and the values for x
and y
are built up by accumulating values (the little loops around nodes 33 and 92, and nodes 36 and 93.) This means no memory is allocated anymore in the inner-loop of this method.
Sometimes people talk about objects that don’t escape a compilation unit being allocated on the stack. Here’s why that’s a misunderstanding. This method sums two Vector
objects but only returns the x
component.
private static double sumX(Vector a, Vector b) {
return a.add(b).getX();
}
Before escape analysis runs this method looks exactly as the original sum
, except the x
field is then loaded and returned. The y
field is stored but never used, and the intermediate Vector
doesn’t escape the compilation unit.
After escape analysis and scalar replacement of aggregates, we can see an important point to note - after the object was replaced with scalars, there was no consumer of the y
value, so the code to generate it had nothing to connect it into the graph and so it was removed.
If we were literally allocating the object on the stack, storing the object in the same format just in stack memory rather than heap memory, we’d still have to generate the intermediate y
value. We don’t allocate the object anywhere - we’ve replaced it with dataflow edges instead.
Note that stack allocation of objects is really a thing, and it’s being proposed for standard HotSpot and C2, but it’s for a slightly different purpose which we won’t go into here for simplicity.
Comparing to manual optimisation
Would you rewrite some of these methods to accumulate the x
and y
in local variables and not create intermediate Vector
objects? For example might you rewrite the sum
over an array of Vector
objects like this:
private static Vector sumPremature(Vector first, Vector... rest) {
double sumX = first.getX();
double sumY = first.getY();
for (Vector vector : rest) {
sumX += vector.getX();
sumY += vector.getY();
}
return new Vector(sumX, sumY);
}
This graph is indeed more efficient than the graph of our sum with intermediate Vector
objects before escape analysis. There is no allocation inside the loop.
But if we look back to our previous sum loop with intermediate Vector
objects after escape analysis and scalar replacement of aggregates, they look basically the same. There was no benefit in rewriting this method from the high-level version with simple intermediate Vector
objects to the manually optimised version.
(Well actually perhaps there is - the manually optimised version is quicker to compile and runs more quickly while being interpreted until it is compiled.)
Partial escape analysis
Something that Graal can do that C2 cannot, and a key advantage of GraalVM, is partial escape analysis. Instead of determining a binary value of whether an object escapes the compilation unit or not, this can determine on which branches an object escapes it, and move allocation of the object to only those branches where it escapes.
private static Vector sumPartial(Vector a, Vector b, Vector c, boolean s) {
final Vector x = a.add(b);
if (s) {
return blackhole(x);
} else {
return x.add(c);
}
}
This contrived method sums two Vector
objects, and then based on a condition s
it either passes it as a parameter to another method called blackhole
, which causes it to escape, or it adds a third vector, and then returns it.
The intermediate Vector
stored in x
escapes on the first branch, but not the second.
We prevent blackhole
from being inlined by specifying -XX:CompileCommand=dontinline,Vector.blackhole
.
Before partial escape analysis runs we can see New Vector
for x
being run for both branches.
After partial escape analysis and scalar replacement of aggregates the intermediate New Vector
has gone. Instead, on the branch where it escapes we have the Alloc
object, which references the values it should have of x
and y
, and on the other branch where the object does not escape the values for x
and y
are just used directly.
The allocation has been moved to only the branch where it is actually needed.
Hopefully that all helps you understand what escape analysis and scalar replacement of aggregates is by seeing it working in practice. Also I hope it shows you that it’s not really stack allocation and is actually something more powerful than that, and that it shows you what extra the GraalVM Compiler can do for you.
More information
Graal’s partial escape analysis algorithm is described in Partial Escape Analysis and Scalar Replacement for Java.
I’ve written more about these compiler graphs and how to read them in Understanding Basic Graal Graphs.
- More information about TruffleRuby
- Stamping Out Overflow Checks in Ruby
- The Future Shape of Ruby Objects
- Seeing Escape Analysis Working
- Understanding Basic Truffle Graphs
- Context on STM in Ruby
- Seeing Register Allocation Working in Java
- Understanding Basic Graal Graphs
- Understanding Programs Using Graphs
- Low Overhead Polling For Ruby
- Top 10 Things To Do With GraalVM
- Ruby Objects as C Structs and Vice Versa
- Understanding How Graal Works — a Java JIT Compiler Written in Java
- Flip-Flops — the 1-in-10-million operator
- Deoptimizing Ruby
- Very High Performance C Extensions For JRuby+Truffle
- Optimising Small Data Structures in JRuby+Truffle
- Pushing Pixels with JRuby+Truffle
- Tracing With Zero Overhead in JRuby+Truffle
- How Method Dispatch Works in JRuby+Truffle
- A Truffle/Graal High Performance Backend for JRuby