The Great Startup Problem
marcus.lagergren at oracle.com
Sat Aug 23 13:10:51 UTC 2014
I agree completely with Charlie’s assessment about Lambda Forms being a problematic mechanism for indy call site linking due to its
* Lack of scalability (explosion of byte code)
* Metaspace usage
and everything else that has been described below.
I’m currently recovering after surgery and a bit disoriented and confused, but I’ll try to write a longer reply on Monday or Tuesday.
This post illustrates perfectly to me why it’s important to replace of the LambdaForms as they are currently implemented with something else - probably even latest in the 9 timeframe. This may be a market window of opportunity that is slowly sliding from us. (PM and management might of course have things to say here, but for me this thing keeps bubbling up behind me as soon as I look away. It certainly has the last 12-18 months).
LambdaForm caching (JEP 210) will at least be a decent band aid for 8u40 (but even then, the problem of profile pollution when reusing a LambdaForm for two different indy callsites is not trivial and has to be solved. I’m not sure it has yet - I know Vladimir is working hard right now on this and is applying his excellent brain to the problem). However, I also don’t think that LambdaForm caching is enough for the long time solution.
We had some discussions how to implement indy call sites without LambdaForms after JVMLS in Santa Clara. Maybe John, Rickard or Vladimir can summarize some of the things we talked about, as I am just a code generation amateur in HotSpot and don’t want to embarrass myself in front of you guys. (Or I’ll post it later, when my head is clearer)
When it comes to putting resources on this, I can only say that I would love for this to happen and think it’s tremendously important for dynamic languages on the JVM.
P.S. I agree with the tiered stuff too, but LambdaForms is the thing that really burns us in the warmup department right now. (and in the Metaspace department. Let’s not forget about that one).
P.P.S. Fredrik’s old post about how we did this in JRockit by inlining the indy callsites is worth a read again. The approach, is, however, probably also subject to some profiling pollution when you think about it. We never got far enough to really suffer from it, but one would expect it would crop up No extra byte code though. No extra classes. No extra metaspace. (https://blogs.oracle.com/ohrstrom/entry/pulling_a_machine_code_rabbit)… Maybe Fredrik himself can tell us something here? I
On 22 Aug 2014, at 22:08, Charles Oliver Nutter <headius at headius.com> wrote:
> Marcus coaxed me into making a post about our indy issues. Our indy
> issues mostly surround startup and warmup time, so I'm making this a
> general post about startup and warmup.
> When I started working on JRuby 7 years ago, I hoped we'd have a good
> answer for poor startup time and long warmup times. Today, the answers
> are no better -- and in many cases much worse -- than when I started.
> Here's a summary of our experience over the years...
> * client versus server
> Early on, we made JRuby's launcher use client mode by default. This
> was by far the best way to get good startup performance, but it led to
> us perpetuating the old question "which mode are you running in" when
> people reported poor steady-state performance.
> * Tiered compiler
> The promise of the tiered compiler was great: client-fast startup with
> server-fast steady state. In practice, tiered has failed to meet
> expectations for us. The situation is aggravated by the loss of
> -client and -server flags.
> On the startup side, we have found that the tiered compiler never even
> comes close to the startup time of -client. For a nontrivial app
> startup, like a Rails app, we see a 50% reduction in startup time by
> forcing tier 1 (which is C1, the old -client mode) rather than letting
> the tiered compiler work normally.
> Obviously limiting ourselves to tier 1 means performance is reduced,
> but these days our #1 user complain is startup time. So, we have AGAIN
> taken the step of putting startup-improving flags into our launchers:
> jruby --dev forces tier 1 + client mode.
> On the steady-state side, the tiered compiler is rather unpredictable.
> Some cases will be faster (presumably from better profiling in earlier
> tiers), while others will be much slower. And it can vary from run to
> run...tiered steady-state performance is even harder to predict than
> C2 (-server). We have done no investigation here.
> * Invokedynamic
> We love indy. We love it more than just about anyone. But we have
> again had to make indy support OFF by default in JRuby 1.7.14 and may
> have to do the same for JRuby 9000.
> Originally, we had indy off because of the NCDFE bugs in the old
> implementation. LambdaForms have fixed all that, and with JIT
> improvements in the past year they generally (eventually) reach the
> same steady-state performance.
> Unfortunately, LambdaForms have an enormous startup-time cost. I
> believe there's two reasons for this:
> 1. Method handle chains can now result in dozens of lambda forms,
> making the initial bootstrapping cost much higher. Multiply this by
> thousands of call sites, all getting hit for the first time. Multiply
> that by PIC depth. And then remember that many boot-time operations
> will blow out those caches, so you'll start over repeatedly. Some of
> this can be mitigated in JRuby, but much of it cannot.
> 2. Lambda forms are too slow to execute and take too long to optimize
> down to native code. Lambda forms work sorta like the tiered compiler.
> They'll be interpreted for a while, then they'll become JVM bytecode
> for a while, which interprets for a while, then the tiered compiler's
> first phase will pick it up.... There's no way to "commit" a lambda
> form you know you're going to be hitting hard, so it takes FOREVER to
> get from a newly-bootstrapped call site to the 5 assembly instructions
> that *actually* need to run.
> I do want to emphasize that for us, LambdaForms usually do get to the
> same peak performance we saw with the old implementation. It's just
> taking way, way too long to get there.
> Because of these issues, JRuby's new --dev flag turns invokedynamic
> off, and JRuby 1.7.14 will once again tuen indy off by default on all
> JVM versions.
> * Other ways of mitigating startup time
> We have recommended Nailgun in the past. Nailgun keeps a JVM running
> in the background, and you toss it commands to run. It works well as
> long as the commands are actually self-contained, self-cleaning units
> of work; spin up one thread or leave resources open, and the Nailgun
> server eventually becomes unusable.
> We now recommend Drip as a similar solution. For each command you run,
> Drip attempts to start additional larval JVMs in the background in
> preparation for future commands. You can configure those instances to
> pre-boot libraries or application resources, to reduce the work done
> at startup for the next command (e.g. preboot your Rails application,
> and then the next command just has to utilize it). Drip is cleaner
> than Nailgun, but never quite achieves the same startup time without a
> lot of configuration. It is also a bit of a hack...you can easily
> preboot something in the "next JVM" that is out of date by the time
> you use it.
> We obviously still love working with OpenJDK, and it remains the best
> platform for building JRuby (and other languages). However, our
> failure as a community to address these startup/warmup issues is
> eventually going to kill us. Startup time remains the #1 complaint
> about JRuby, and warmup time may be a close second.
> What are the rest of you doing to deal with these issues?
> - Charlie
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
More information about the mlvm-dev