Truffle performance problems

Gilles Duboscq duboscq at
Fri Dec 13 08:09:37 PST 2013


About this "illegal recursive call", the inlining that happens during a
Truffle compilation is definitely not made the be used on just any Java
It is really made for Truffle interpreters. You can think of it as inlining
*every* call it manages to de-virtualize.
When writing a Truffle interpreter, the nodes only include the fast path
for the current "profile" or "specialization". When you need a slow path,
you can call some method that you annotate with @CompilerDirectives.SlowPath
which means that the Truffle compilation will not inline through that path.
(if i'm not mistaken, he Graal compiler may later decide to inline it
anyway but then the decisions are the same than for normal Java code).


On Fri, Dec 13, 2013 at 12:08 AM, Thomas Wuerthinger <
thomas.wuerthinger at> wrote:

> Dain,
> We are not confused by the performance you are seeing as Truffle’s use
> case is the execution of expression trees with multiple smaller nodes
> (which capture profiling feedback) and not as a single node wrapping a
> complex Java method (which does not capture any profiling feedback). There
> is no expected performance gain from doing the latter - on the contrary,
> the manual specification of the inlining boundaries and the absence of Java
> profiling feedback can lead to performance losses. We will nevertheless
> investigate wether there is anything specifically wrong with Truffle’s
> compiler graph in your example.
> - thomas
> On 12 Dec 2013, at 23:16, Dain Sundstrom <dain at> wrote:
> > Hi all,
> >
> > I have been experimenting with Truffle in Presto for a day now and am
> confused by the performance I am seeing.
> >
> > My high level goal of this experiment is to figure out how I should
> structure data flow in my Truffle language.  Since, I am writing the
> language and the only user of that language together, I have a lot of
> options available to me.  Specifically, I'd like to figure out if I should
> take a vectorized approach, a row at a time approach, or some combination
> of both.  Which every solution is fastest, I'll make work in the code base.
> >
> > To this end, I decided to take a top down approach to Truffle (mainly
> because I am confident the bottom expression bits will be fast).  I started
> with a very simple query hand-coded in Java:
> >
> > double sum = 0;
> > for (row in source) {
> >   if (row passes the filter) {
> >     sum += row.extendedprice *
> >   }
> > }
> > return sum;
> >
> > When I run that on my machine using 5M rows of input (all in memory), it
> takes ~165ms using the Graal vm (1.7.0_45) with "-server" option on my
> laptop.
> >
> > With the performance baseline established, my plan was to start with a
> single node and then start breaking it apart into more nodes without making
> stuff slower. So, I wrapped this same code with a single Truffle RootNode.
>  When I execute the same code though the Truffle call, I get the same
> performance until the node is compiled.  Once the node is compiled,
> performance drops to ~260ms.
> >
> > Now, I understand using a single node is not the point of Truffle, but I
> would not expect such a massive performance drop off. At this point, I'm
> not sure if this is a worth while exercise at all.
> >
> > You can find all of the code and instructions on running it here:
> >
> >
> >
> > Any ideas or suggestions?
> >
> > Thanks,
> >
> > -dain
> >
> >
> >
> > On a related note, if you leave the Truffle test running it eventually
> crashes with (
> >
> > Found illegal recursive call to
> HotSpotMethod<Utility.recursiveAppendNumber(StringBuffer, int, int, int)>,
> must annotate such calls with @CompilerDirectives.SlowPath!
> >
> > I've also found "java.util.concurrent.ExecutionException:
> java.lang.IllegalStateException: Inlined graph is in invalid state" when
> executing a CallTarget in tight inner loops.
> >

More information about the graal-dev mailing list