Truffle and mlvm
thomas.wuerthinger at oracle.com
Sat Aug 30 22:04:54 UTC 2014
Thanks a lot, Charlie, for this very detailed feedback! In fact, this is probably the most comprehensive feedback we’ve received for Truffle so far :).
There are some valid points, some points where I’d like to add a comment, and some where Graal and Truffle may have been misunderstood. I’ll try to address them in a similarly structured form:
Regarding disadvantage 1 “AST is not enough”:
A. Yes, you need to provide specialisations for your operations for better performance. This makes it harder to implement a Truffle AST interpreter than a simple AST interpreter. It has however the advantage that it gives you more predictable performance for the different usages of your dynamic language operation. We are currently working on source code visualisations for Truffle ASTs so that users can see what nodes are specialised and to what types.
B. Yes, it is necessary to store your local variables in a Truffle frame object. This object can however contain pointers to arbitrary extra data necessary for your guest language.
C. We did a lot of improvements to Truffle, Graal, and also TruffleRuby since January. Inlining works without any problems and independent of the guest language, also in TruffleRuby.
Regarding disadvantage 2 “Long startup and warmup times”:
The benefit of the system is absolutely *not* lost when the compiler (Graal) and the Truffle guest language interpreter are AOT compiled. It gives you decent startup and high peak. The interpreter is immediately available in highly optimized machine code. Hot guest language methods are dynamically compiled to machine code by the precompiled Graal.
Regarding disadvantage 3 “Limited concurrency”:
There is no deeper reason why TruffleRuby is single threaded right now. For sure none that has to do with the fundamentals of the Truffle approach. We are planning to support 100% multi-threading also in TruffleRuby. One of the explorations we are currently doing is to support guest language level safepoints such that guest language developers themselves can easier deal with concurrency without compromising any peak performance.
Regarding disadvantage 4 “Limited availability”:
Yes, this is indeed a chicken and egg problem. Truffle is however not as closely tied to Graal as you suggest here. I believe that it is fairly straightforward to create a Truffle front-end for C2 (or any other compiler supporting deoptimization). There are only 3k LOC in Graal that are specific to Truffle. I think that they could be ported in a reasonable time frame. The Truffle interpreters themselves run on any Java system even if it has only very limited features - this is actually an advantage over a pure bytecode generation approach. They can also be AOT compiled for devices that do not support a full JVM and have strong footprint requirements. This would of course be slower execution than in a full fledged VM, but it would at least run correctly. I furthermore think that it is possible to do the Truffle partial evaluation via bytecode generation for backwards compatibility.
Regarding disadvantage 5 "Unclear benefits for real-world applications”:
This kind of argument can hardly be countered before a system is 100% finished and shipped. The term “real world” is also somewhat loosely defined. I would very much support the development of a JRuby benchmark suite that tries to reflect “real world” as close as possible.
There is absolutely no reason to believe that a Truffle-based Ruby implementation would not have benefits for “real-world applications”. Or that it would not be able to run a large application for a long time. It is clear that the TruffleRuby prototype needs more completeness work both at the language and the library level. We are very happy with the results we got so far with Chris working for about a year. We are planning to increase the number of people working on this, and would also be grateful for any help we can get from the Ruby community.
Regarding Graal: Did you ever try to benchmark JRuby without Truffle with the latest Graal binaries available at http://lafo.ssw.uni-linz.ac.at/builds/? We would be looking forward to see the peak performance results on a couple of workloads. We are not speculating about Graal becoming part of a particular OpenJDK release (as experimental or regular option). This is the sovereign decision of the OpenJDK community. All we can do is to demonstrate and inform about Graal’s performance and stability.
We recognise that there is a long road ahead. But in particular in this context, I would like to emphasize that we are looking for more people to support this effort for a new language implementation platform. I strongly believe that Truffle is the best currently available vehicle to make Ruby competitive in terms of performance with node.js. We are happy to try to *prove* you wrong - even happier about support of any kind along the road ;). I am also looking forward to continue this discussion at JavaOne (as part of the TruffleRuby session or elsewhere).
On 30 Aug 2014, at 21:21, Charles Oliver Nutter <headius at headius.com> wrote:
> Removing all context, so it's clear this is just my opinions and thoughts...
> As most of you know, we've opened up our codebase and incorporated the
> graciously-donated RubyTruffle directly into JRuby. It's available on
> JRuby master and we are planning to ship Truffle support with JRuby
> 9000, our next major version (due out in the next couple months).
> At the same time, we have been developing our own next-gen IR-based
> compiler, which will run unmodified on any JVM (with or without
> invokedynamic, though I still have to implement the "without" side).
> Why are we doing this when Truffle shows such promise?
> I'll try to enumerate the benefits and problems of Truffle here.
> * Benefits of using Truffle
> 1. Simpler implementation.
> From day 1, the most obvious benefit of Truffle is that you just have
> to write an AST interpreter. Anyone who has implemented a programming
> language can do this easily. This specific benefit doesn't help us
> implement JRuby, since we already have an AST interpreter, but it did
> make Chris Seaton's job easier building RubyTruffle initially. This
> also means a Truffle-based language is more approachable than one with
> a complicated compiler pipeline of its own.
> 2. Better communication with the JIT.
> Truffle, via Graal, has potential to pass much more information on to
> the JIT. Things like type shape, escaped references, frame access,
> type specialization, and so on can be communicated directly, rather
> than hoping and praying they'll be inferred by the shape of bytecodes.
> This is probably the largest benefit; much of my time optimizing JRuby
> has been spend trying to "trick" C2 into doing the right thing, since
> I don't have a direct way to communicate intent.
> The peak performance numbers for Truffle-based languages have been
> extremely impressive. If it's possible to get those numbers reasonably
> quickly and with predictable steady-state behavior in large,
> heterogeneous codebases, this is definitely the quickest path (on any
> runtime) to a high-performance language implementation.
> 3. OSS and pure Java
> Truffle and Graal are just OpenJDK projects under OpenJDK licenses,
> and anyone can build, hack, or distribute them. In addition, both
> Truffle and Graal are 100% Java, so for the first time a plain old
> Java developer can see (and manipulate) exactly how the JIT works
> without getting lost in a sea of plus plus.
> * Problems with Truffle
> I want to emphasize that regardless of its warts, we love Truffle and
> Graal and we see great potential here. But we need a dose of reality
> once in a while, too.
> 1. AST is not enough.
> In order to make that AST fly, you can't just implement a dumb generic
> interpreter. You need to know about (and generously annotate your AST
> for) many advanced compiler optimization techniques:
> A. Type specialization plus guarded fallbacks: Truffle will NOT
> specialize your code for you. You must provide every specialized path
> in your AST nodes as well as annotating "slow path", "transfer to
> interpreter", etc.
> B. Frame access and reification: In order to have cross-call access to
> frames or to squash frames created for multiple inlined calls, you
> must use Truffle's representation of a frame. This means loads/stores
> within your AST must be done against a Truffle object, not against an
> arbitrary object of your own creation.
> C. Method invocation and inlining: Up until fairly recently, if you
> wanted to inline methods you had to essentially build your own call
> site logic, profiling, deopt paths within your Truffle AST. When I did
> a little hacking on RubyTruffle around OSS time (December/January) it
> did *no* inlining of Ruby-to-Ruby calls. I hacked in inlining using
> existing classes and managed to get it to work, but I was doing all
> the plumbing myself. I know this has improved in the Truffle codebase
> since then, but I have my concerns about production readiness when the
> inlining call site parts of Truffle were just recently added and are
> still in flux.
> And there's plenty of other cases. Building a basic language for
> Truffle is pretty easy (I did a micro-language in about two hours at
> JVMLS last year), but building a high-performance language for Truffle
> still takes a fair investment of effort and working knowledge of
> dynamic compiler optimizations.
> 2. Long startup and warmup times.
> As Thomas pointed out in the other thread, because Truffle and Graal
> are normally run as plain Java libraries, they can actually aggravate
> startup time issues. Now, not only would all of JRuby have to warm up,
> but the eventual native code JIT has to warm up too. This is not
> surprising, really. It is possible to mitigate this by doing some form
> of AOT against Graal, but for every case I have seen the Truffle/Graal
> approach makes startup time much, much worse compared to just running
> atop JVM.
> Warmup time is also worsened significantly.
> The AST you create for Truffle must be heavily mutated while running
> in order to produce a specialized version of that AST. This must
> happen before the AST is eventually fed into Graal, which means you
> have a self-modifying interpreter spinning AST objects like mad while
> executing the early phases of your application. Compare to a dumb
> interpreter as in JRuby's old AST, where interpreting the AST produces
> no additional objects other than those necessary for execution of the
> The Truffle approach itself adds overhead too. Until optimized, the
> fully-reified frame objects, specialization markup (which triggers AST
> rewriting), deoptimization guards, and so on are all done manually
> against heap-level data structures. This is in addition to the
> JVM-level overhead of executing an AST (native frame-per-node, boxing
> and type-widening, poor inlining profile).
> Some amount of AOT *might* be applicable here, but the benefit of
> Truffle and Graal is lost in the AOT case if we're not getting
> real-world profile information. The Substrate VM has ben brought up to
> aid startup and warmup too...but that direction produces a
> closed-world executable based on optimizing all code up front...not
> exactly what we're looking for in a general-purpose language runtime.
> 3. Limited concurrency
> The RubyTruffle runtime currently has to execute code under the
> watchful eye of a global lock. Yes, you read that right...RubyTruffle
> is single-threaded right now.
> I would like to know if there's a deeper reason for this, but the
> obvious shallow reason is that you can't have multiple threads
> executing at the same time if they're making thread-unsafe mutations
> to the executing code. This is similar to the major stumbling block
> for e.g. Pypy, which rewrites currently-executing assembly
> instructions at deopt/reopt safe points.
> I believe once the code has transitioned to native, you can execute
> that safely across threads...but this is opaque to your Truffle-based
> language, and it's unclear how you'd manage re-acquiring some sort of
> lock when transferring back to the interpreter.
> The fact that concurrency has so far been hand-waved (or so it seems
> to me from the outside) scares the living hell out of me, especially
> when there's talk about rolling this stuff into Java 9.
> Obviously some of this could be mitigated with an immutable AST
> structure or other thread-friendly tree-transformation algorithm, but
> making the Truffle AST thread-safe may also make it even more
> object-heavy during interpretation, aggravating startup time further.
> 4. Limited availability
> This is the chicken-and-egg issue. Truffle is just a library, so we
> can ignore that for the moment (given any JVM, you can run a Truffle
> Graal is required for Truffle to perform well at all. The Truffle
> interpreter is without a doubt the slowest interpreter we've ever had
> for JRuby, and that's saying something (there could be startup/warmup
> effects in play here too). In order for us to go 100% Truffle, we'd
> need a Graal VM. That limits us to either pre-release or hand-made
> builds of Graal/OpenJDK. Even if Graal somehow did get into Java 9,
> we'd still have legions of users on 8, 7, ... even 6 in some cases,
> though we're probably leaving them behind with JRuby 9000. Ignoring
> other platforms (non-OpenJDK, Android) and assuming Graal in Java 9,
> I'd conservatively estimate JRuby could still not go 100% Truffle
> until 2017 or later.
> And it gets worse. Graal will probably never exist on other JVMs.
> Graal will probably never exist in an Android VM. Graal may not even
> be available in other non-Oracle OpenJDK derivatives for a very long
> time. We have users on dozens of different platform/JVM combinations,
> so there's really no practical way for us to abandon our JVM bytecode
> runtimes in the near future.
> Now of course if Graal became essential to users, it would be
> available in more places. We recognize the potential of Truffle and
> Graal, which is why we've been thrilled to work with Oracle on a
> RubyTruffle that's part of JRuby. We also recognize that the
> Truffle/Graal approach has some very compelling features for our
> users, and that our users may often be comfortable running custom
> JVMs. We're allowing all flowers to bloom and our users will pick the
> ones that work for them.
> 5. Unclear benefits for real-world applications
> There have been many published microbenchmarks for Truffle-based
> languages, but very few benchmarks of real-world applications
> performing significantly better than custom-made VMs (JS versus V8).
> There have been practically no studies of a Truffle-based language
> running a large application for a long period of time...and by long I
> mean server-scale.
> Chris Seaton has pushed this forward recently for Ruby, getting
> general-purpose, numeric-heavy libraries to run and optimize very well
> (a png library and a psd library). Going deeper requires having more
> of the language's standard libraries to be available, and I believe
> this is where Chris has spent much of his time (RubyTruffle currently
> requires mostly-custom versions of JRuby's core classes...versions
> that Truffle can recognize, specialize, and escape-analyze away).
> * Conclusion
> I again want to emphasize that we think Truffle and Graal are really
> awesome technology. I spent years with my nose smooshed against the
> glass, watching the Pypy guys add optimizations I wanted and make good
> on their promise of "just implement an interpreter...we'll do the
> rest". Finally we have what I wanted: a Pypy for JVM (in Truffle) and
> an LLVM for JVM (in Graal). These are exciting times indeed.
> But reality steps in. There's a long road ahead.
> I think we need to separate the questions about Truffle from questions
> about Graal. Truffle is ultimately just a library that uses Graal.
> Graal is promising JIT technology. Graal is simpler than C2 and may be
> able to match or beat its performance. Graal provides a better way to
> communicate intent to the JIT. These facts are not in question.
> However, Graal is not (other than when used as the JVM's JIT) a JVM.
> Targeting Graal directly acts against the promise of a standard,
> platform-and-VM-agnostic bytecode -- and that's the promise that
> brought most of us here. Graal is not yet ready to replace C2, which
> would mean adding to the size and complexity of Java 9. And Graal is
> almost completely untested in large production settings.
> I personally would love to see Graal get into a Java release soon as
> an experimental feature, but Java 9 seems ambitious but any standard.
> It *might* be possible/reasonable to include Graal as experimental in
> 9. Java 10 is certainly feasible for experimental, and may be feasible
> for product. But even if Graal got into mainstream OpenJDK and Java,
> there's a very long adoption tail ahead.
> I'd like to hear more from folks on the Graal and Truffle teams. Prove
> me wrong :-)
> - Charlie
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
More information about the graal-dev