From forax at univ-mlv.fr Mon Aug 1 07:14:01 2011 From: forax at univ-mlv.fr (=?windows-1252?Q?R=E9mi_Forax?=) Date: Mon, 01 Aug 2011 16:14:01 +0200 Subject: Stolen: Re: [concurrency-interest] LongAdder (was: StripedAdder) and LongAdderTable Message-ID: <4E36B4A9.2010902@univ-mlv.fr> An interesting post stolen from the Doug Lea's concurrency mailing list. Having this kind of counter can be really great for those that want to do profiling without having an interpreter Groovy, Nashorn, ... Basically, this kind of counter already exist in the VM, is there a way to bubble them up in Java ? R?mi -------- Original Message -------- Subject: Re: [concurrency-interest] LongAdder (was: StripedAdder) and LongAdderTable Date: Sun, 31 Jul 2011 21:22:19 +0200 From: William Louth (JINSPIRED.COM) To: concurrency-interest at cs.oswego.edu I am currently writing up a proposal (will be posted on blog next week hopefully) for having an intrinsic (long) counter data type in the Java that would be thread local (though not access through this interface) and optimized by the JVM whilst affording the ability to introspect the current set of named counters in the JVM as well as their instance (and value) on a per thread basis (preferably within the thread itself) via an API. I think we should be encouraging developers to move away from process level JMX like counters and instead thread specific& event based (without state) which could be in turn be accessed at a process level if need be but more so at the thread level and from a caller/chain perspective forming a foundation for the ultimate feedback loop between callers and callees. This a part of a much bigger proposal for software activity metering to be a core aspect of the runtime, library and possibly language (via event counters) OpenCore Metering Runtime Actors http://opencore.jinspired.com/?p=1888 Activity Based Costing& Metering (ABC/M) ? The Ultimate Feedback Loop http://opencore.jinspired.com/?p=4052 Automated Performance Management starts with Software?s Self Observation http://opencore.jinspired.com/?p=2709 Metering (Probes) Open API http://opencore.jinspired.com/?page_id=715 On 31/07/2011 20:56, Doug Lea wrote: > On 07/31/11 12:39, Christian Vest Hansen wrote: >> Some interfaces for these things might be a good idea, as I can >> imagine data >> grid libraries might want to provide distributed implementations. >> Counter and >> CounterTable comes to mind as possible names. > > I had proposed this, but talked myself out of it. > The APIs are tied to a particular scalar type (long), > and might grow to include others (in particular, a > DoubleAdder class). It may be that the only > commonality is that they extend java.lang.Number. Which > we declared for the AtomicX classes, but even that was not > obviously helpful. > >> On Sun, Jul 31, 2011 at 16:55, Doug Lea
> > wrote: > >> Also, because it is likely to be among the more >> common uses of LongAdders, we created AtomicLongTable, > > Oops. I meant LongAdderTable. (The names changed several times > times before check-in.) > > -Doug > > _______________________________________________ > Concurrency-interest mailing list > Concurrency-interest at cs.oswego.edu > http://cs.oswego.edu/mailman/listinfo/concurrency-interest > > _______________________________________________ Concurrency-interest mailing list Concurrency-interest at cs.oswego.edu http://cs.oswego.edu/mailman/listinfo/concurrency-interest -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110801/d22d71a5/attachment.html From john.r.rose at oracle.com Mon Aug 1 15:06:11 2011 From: john.r.rose at oracle.com (John Rose) Date: Mon, 1 Aug 2011 15:06:11 -0700 Subject: Stolen: Re: [concurrency-interest] LongAdder (was: StripedAdder) and LongAdderTable In-Reply-To: <4E36B4A9.2010902@univ-mlv.fr> References: <4E36B4A9.2010902@univ-mlv.fr> Message-ID: On Aug 1, 2011, at 7:14 AM, R?mi Forax wrote: > Basically, this kind of counter already exist in the VM, > is there a way to bubble them up in Java ? William and I talked about this a little at the Summit, too. First, a caveat: It's interesting and useful to start with pseduocode sketches of the desired functionality, along the lines of a new intrinsic "thread local long" data type in the Java language and bytecodes. But, it is extremely rare that this turns out to be the right answer. New bytecodes are very expensive, compared to new APIs. All less-expensive options have to be exhausted first. The right first answer is to create a suitable Java API (usually a class), and then see how to support it in the JVM with suitable optimizations. For example, a Java ThreadLocal.get performs about the same underlying operations as a C pthread_getspecific. When TL.get was first written, microbenchmarks showed that it was much slower than the C equivalent, but it was relatively simple to optimize the relevant code paths (especially Thread.currentThread) in the JVM. In the case of thread-local counters, an abstraction like ThreadLocal is almost certainly the right answer. In fact, ThreadLocal is the first thing to try. If that cannot be optimized enough, then we can look into further options. Note that "cannot be optimized enough" applies only after a round of compiler work. A failing microbenchmark is not even close to due diligence on this point! Since we don't have reified generics, ThreadLocal is not available; wish it were. Other starting points could be ThreadLocalLong or ThreadLocalLongArray, with a suitably optimized underlying mechanism that stores exactly one long[] reference in each Thread. But ThreadLocal is the first thing to investigate. -- John From william.louth at jinspired.com Tue Aug 2 00:57:51 2011 From: william.louth at jinspired.com (William Louth (JINSPIRED.COM)) Date: Tue, 02 Aug 2011 09:57:51 +0200 Subject: Stolen: Re: [concurrency-interest] LongAdder (was: StripedAdder) and LongAdderTable In-Reply-To: References: <4E36B4A9.2010902@univ-mlv.fr> Message-ID: <4E37ADFF.1030702@jinspired.com> Hi John, In my blog entry this week I will present our current offering and its performance characteristics. But this is not driven exclusively by performance though it is a big factor. I want to make it extremely easy for counters (longs which cannot be decremented only incremented) to be added into code without too much mental & maintenance effort on the developer such as needing to create a field or a class holding fields representing counter names(paces) used as indices. Incrementing a counter should be as easy as listing it on a line of code though we should also support value specific increments. Maybe this would be a whole lot easier with first class DSL support and later the JVM evolving to having some specific DSL awareness in its optimization routines if the context cannot be easily determined via the existing instruction execution mechanics. With this proposal I am hoping that we can eliminate much of the logging/jmx abuses that exist today but at the same time provide a means for communication to flow upwards from callers without have to abuse return values (pairs, tuples,....). A caller could simply make a savepoint and then generate a changeset and see the effects (events) as a result of its execution and calling similar to what we offer at the metering level which includes both activities and meters (mapped to counters). http://opencore.jinspired.com/?page_id=3553#p23 http://williamlouth.wordpress.com/2010/01/05/user-level-metering-with-savepoints-changesets/ I also see this proposal as a means for the JVM to convey provide more contextual runtime diagnostics (case-and-effect rather than metric causality) and I think it would be a much better option for dynamic languages to expose runtime observation rather than having them to be in someway retrofit to support something akin to JVMTI. For example JRuby could have counters named (j)ruby.object.alloc, (j)ruby.call.count,.... which would then be visible to Java callers (from a calling thread specific). William On 02/08/2011 00:06, John Rose wrote: > On Aug 1, 2011, at 7:14 AM, R?mi Forax wrote: > >> Basically, this kind of counter already exist in the VM, >> is there a way to bubble them up in Java ? > William and I talked about this a little at the Summit, too. > > First, a caveat: It's interesting and useful to start with pseduocode sketches of the desired functionality, along the lines of a new intrinsic "thread local long" data type in the Java language and bytecodes. But, it is extremely rare that this turns out to be the right answer. New bytecodes are very expensive, compared to new APIs. All less-expensive options have to be exhausted first. > > The right first answer is to create a suitable Java API (usually a class), and then see how to support it in the JVM with suitable optimizations. > > For example, a Java ThreadLocal.get performs about the same underlying operations as a C pthread_getspecific. When TL.get was first written, microbenchmarks showed that it was much slower than the C equivalent, but it was relatively simple to optimize the relevant code paths (especially Thread.currentThread) in the JVM. > > In the case of thread-local counters, an abstraction like ThreadLocal is almost certainly the right answer. In fact, ThreadLocal is the first thing to try. If that cannot be optimized enough, then we can look into further options. Note that "cannot be optimized enough" applies only after a round of compiler work. A failing microbenchmark is not even close to due diligence on this point! > > Since we don't have reified generics, ThreadLocal is not available; wish it were. Other starting points could be ThreadLocalLong or ThreadLocalLongArray, with a suitably optimized underlying mechanism that stores exactly one long[] reference in each Thread. > > But ThreadLocal is the first thing to investigate. > > -- John > From mroos at roos.com Tue Aug 2 18:44:12 2011 From: mroos at roos.com (Mark Roos) Date: Tue, 2 Aug 2011 18:44:12 -0700 Subject: BoundMethodHandle Klass and JVMTI heap walking Message-ID: We are using the JVMTI api FollowReferences to collect object information. In this api we can set a klass which limits the objects reported. For some reason we get instances of the klass we set and BoundMethodHandle instances. Are they somehow masquerading as a different class ( they are binding methods of the klass we are inspecting) or is this just an oversight? thanks mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110802/d85943b3/attachment.html From mroos at roos.com Tue Aug 2 18:44:12 2011 From: mroos at roos.com (Mark Roos) Date: Tue, 2 Aug 2011 18:44:12 -0700 Subject: Debugging progess, question on how to step using MH Message-ID: Thanks to all of the jvm summit feedback we decided to go with an C++ jvmti agent to get the vm info for our Smalltalk debugger. Stack vars, instances and references are working. Pretty small C code ( < 400 lines total ) so all in all not bad. The next step is to step a suspended thread. I Thought of jvmti again but it looks pretty complex to support ( and does not support 292 yet) so I thought of using method handles and callsites as suggested by John. I was thinking of an bootstrap supplied callsite whose target is a GWT which tests to see if this site should be suspended ( based on thread and stack depth). The fast path would invoke the normal call site and its normal chain of method GWTs/lookups. The slow path would send a debug update event and then suspend. When resumed it would invokeExact the normal call site of the fast path. Is there a better way? thanks mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110802/6fef15f3/attachment.html From john.r.rose at oracle.com Wed Aug 3 00:10:56 2011 From: john.r.rose at oracle.com (John Rose) Date: Wed, 3 Aug 2011 00:10:56 -0700 Subject: BoundMethodHandle Klass and JVMTI heap walking In-Reply-To: References: Message-ID: <7497E28B-C9EA-4761-B502-19C4926ED8AF@oracle.com> On Aug 2, 2011, at 6:44 PM, Mark Roos wrote: > We are using the JVMTI api FollowReferences to collect object information. In this > api we can set a klass which limits the objects reported. For some reason we get instances of the > klass we set and BoundMethodHandle instances. BoundMethodHandle is a private, implementation-specific subclass of MethodHandle. Consider this example code: Object x = new Object(); MethodHandle mh1 = identity(Object.class).bindTo(x); MethodHandle mh2 = lookup().bind(x, "hashCode", methodType(int.class)); assert(mh1.invokeExact() == x); assert((int) mh2.invokeExact() == x.hashCode()); System.out.println(mh1.getClass().getName()); // maybe java.lang.invoke.BoundMethodHandle System.out.println(mh2.getClass().getName()); // maybe java.lang.invoke.BoundMethodHandle also Both mh1 and mh2 are going to be node structures which include a "hidden" pointer to x. In OpenJDK 7, mh1 and mh2 might be BoundMethodHandles, but don't take that to the bank. In any case, I would expect FollowReferences to report that x has a reference from both mh1 and mh2 (or a substructure thereof). > Are they somehow masquerading as a different class ( they are binding methods of the klass we > are inspecting) or is this just an oversight? The exact concrete type of a MethodHandle is *not* defined by JSR 292; it is left open to implementors. If you run the above code on IBM's J9, you'll get a different class name. -- John From john.r.rose at oracle.com Wed Aug 3 00:13:46 2011 From: john.r.rose at oracle.com (John Rose) Date: Wed, 3 Aug 2011 00:13:46 -0700 Subject: Debugging progess, question on how to step using MH In-Reply-To: References: Message-ID: Hi Mark. Your proposal looks right, at least until JVMTI has native support for MH interposition. It might be more efficient to use a SwitchPoint to enable the slow path testing. Basically, the switchpoint would take the place of the GWT, and (when invalidated) would execute a slow path that would patch in the slower code propose (with explicit GWT). -- John On Aug 2, 2011, at 6:44 PM, Mark Roos wrote: > Thanks to all of the jvm summit feedback we decided to go with an C++ jvmti agent to get the > vm info for our Smalltalk debugger. Stack vars, instances and references are working. > Pretty small C code ( < 400 lines total ) so all in all not bad. > > The next step is to step a suspended thread. I Thought of jvmti again but it looks pretty > complex to support ( and does not support 292 yet) so I thought of using method handles > and callsites as suggested by John. I was thinking of an bootstrap supplied callsite whose target is > a GWT which tests to see if this site should be suspended ( based on thread and stack depth). > The fast path would invoke the normal call site and its normal chain of method GWTs/lookups. > > The slow path would send a debug update event and then suspend. When resumed it would > invokeExact the normal call site of the fast path. > > Is there a better way? > > thanks > mark_______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From john.r.rose at oracle.com Wed Aug 3 21:58:15 2011 From: john.r.rose at oracle.com (john.r.rose at oracle.com) Date: Thu, 04 Aug 2011 04:58:15 +0000 Subject: hg: mlvm/mlvm/jdk: cval: hang caches on classes Message-ID: <20110804045816.1F2EB4791F@hg.openjdk.java.net> Changeset: c26fceff6309 Author: jrose Date: 2011-08-03 21:57 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/jdk/rev/c26fceff6309 cval: hang caches on classes + cval-tune-7030453.patch ! series From john.r.rose at oracle.com Thu Aug 4 01:21:30 2011 From: john.r.rose at oracle.com (John Rose) Date: Thu, 4 Aug 2011 01:21:30 -0700 Subject: hg: mlvm/mlvm/jdk: cval: hang caches on classes In-Reply-To: <20110804045816.1F2EB4791F@hg.openjdk.java.net> References: <20110804045816.1F2EB4791F@hg.openjdk.java.net> Message-ID: Benchmarks indicate a 4x speedup from previous implementation. The fast path has 7 memory references. Here's what it looks like: {mov eax, (cval : ClassValue); mov edx, (type : Class)} M1: mov ecx, [edx+0x60] ;*getfield Class.classValueCache : Entry[] M2: mov ebp, [ecx+0x8] ;*arraylength {mov esi, ebp; dec esi} M3: and esi, [eax+0x8] ;*iand ClassValue.hashCodeForCache {cmp esi, ebp; jnc (slow)} M4: mov ecx, [ecx+esi*4+0xC] ;*aaload : Entry {implicit cmp ecx, 0; jz (slow)} M5: mov esi, [ecx+0x8] ;*getfield WeakReference.referent : ClassValue$Version M6: mov ebp, [eax+0xC] ;*getfield ClassValue.version : ClassValue$Version {cmp esi, ebp; jnz (slow)} M7: mov eax, [ecx+0x18] ;*getfield value {mov (result : T), eax} That's probably good enough for most uses. All of the memory references are possible CSE and loop invariants. I saw a too-simple benchmark loop collapse completely to constant time. The range check (of the form a[a.length-1 & i]) could be simplified with a little work. The version check defends against ClassValue.remove. Maybe it could be removed by push notification, but it's probably not worth it. -- John On Aug 3, 2011, at 9:58 PM, john.r.rose at oracle.com wrote: > Changeset: c26fceff6309 > Author: jrose > Date: 2011-08-03 21:57 -0700 > URL: http://hg.openjdk.java.net/mlvm/mlvm/jdk/rev/c26fceff6309 > > cval: hang caches on classes > > + cval-tune-7030453.patch > ! series -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110804/19230991/attachment.html From headius at headius.com Thu Aug 4 01:43:01 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Thu, 4 Aug 2011 01:43:01 -0700 Subject: hg: mlvm/mlvm/jdk: cval: hang caches on classes In-Reply-To: References: <20110804045816.1F2EB4791F@hg.openjdk.java.net> Message-ID: I see there's WeakReference used in there...will ClassValue have WeakReference-like impact on GC? I'll try to get JRuby using ClassValue for a few things soon, to help test this out against non-ClassValue solutions. - Charlie On Thu, Aug 4, 2011 at 1:21 AM, John Rose wrote: > Benchmarks indicate a 4x speedup from previous implementation. > The fast path has 7 memory references. ?Here's what it looks like: > ? ? ? {mov eax, (cval : ClassValue); mov edx, (type : Class)} > ? M1: mov ecx, [edx+0x60] ?;*getfield Class.classValueCache : Entry[] > ? M2: mov ebp, [ecx+0x8] ? ;*arraylength > ? ? ? {mov esi, ebp; dec esi} > ? M3: and esi, [eax+0x8] ? ;*iand ClassValue.hashCodeForCache > ? ? ? {cmp esi, ebp; jnc (slow)} > ? M4: mov ecx, [ecx+esi*4+0xC] ?;*aaload : Entry > ? ? ? {implicit cmp ecx, 0; jz (slow)} > ? M5: mov esi, [ecx+0x8] ? ;*getfield WeakReference.referent : > ClassValue$Version > ? M6: mov ebp, [eax+0xC] ? ;*getfield ClassValue.version : > ClassValue$Version > ? ? ? {cmp esi, ebp; jnz (slow)} > ? M7: mov eax, [ecx+0x18] ?;*getfield value > ? ? ? {mov (result : T), eax} > That's probably good enough for most uses. > All of the memory references are possible CSE and loop invariants. > I saw a too-simple benchmark loop collapse completely to constant time. > The range check (of the form a[a.length-1 & i]) could be simplified with a > little work. > The version check defends against ClassValue.remove. > Maybe?it could be removed by push notification,?but it's probably not worth > it. > -- John > On Aug 3, 2011, at 9:58 PM, john.r.rose at oracle.com wrote: > > Changeset: c26fceff6309 > Author: ???jrose > Date: ?????2011-08-03 21:57 -0700 > URL: ??????http://hg.openjdk.java.net/mlvm/mlvm/jdk/rev/c26fceff6309 > > cval: hang caches on classes > > + cval-tune-7030453.patch > ! series > > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > From forax at univ-mlv.fr Thu Aug 4 03:53:58 2011 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Thu, 04 Aug 2011 12:53:58 +0200 Subject: hg: mlvm/mlvm/jdk: cval: hang caches on classes In-Reply-To: References: <20110804045816.1F2EB4791F@hg.openjdk.java.net> Message-ID: <4E3A7A46.9020100@univ-mlv.fr> Not sure to understand the full patch, anyway nextHashCode should be declared final. regards, R?mi On 08/04/2011 10:21 AM, John Rose wrote: > Benchmarks indicate a 4x speedup from previous implementation. > The fast path has 7 memory references. Here's what it looks like: > > {mov eax, (cval : ClassValue); mov edx, (type : Class)} > M1: movecx, [edx+0x60] ;*getfield Class.classValueCache : Entry[] > M2: movebp, [ecx+0x8] ;*arraylength > {mov esi, ebp; dec esi} > M3: andesi, [eax+0x8] ;*iand ClassValue.hashCodeForCache > {cmp esi, ebp; jnc (slow)} > M4: movecx, [ecx+esi*4+0xC] ;*aaload : Entry > {implicit cmp ecx, 0; jz (slow)} > M5: movesi, [ecx+0x8] ;*getfield WeakReference.referent : > ClassValue$Version > M6: movebp, [eax+0xC] ;*getfield ClassValue.version : > ClassValue$Version > {cmp esi, ebp; jnz (slow)} > M7: moveax, [ecx+0x18] ;*getfield value > {mov (result : T), eax} > > That's probably good enough for most uses. > > All of the memory references are possible CSE and loop invariants. > I saw a too-simple benchmark loop collapse completely to constant time. > > The range check (of the form a[a.length-1 & i]) could be simplified > with a little work. > > The version check defends against ClassValue.remove. > Maybe it could be removed by push notification, but it's probably not > worth it. > > -- John > > On Aug 3, 2011, at 9:58 PM, john.r.rose at oracle.com > wrote: > >> Changeset: c26fceff6309 >> Author: jrose >> Date: 2011-08-03 21:57 -0700 >> URL: http://hg.openjdk.java.net/mlvm/mlvm/jdk/rev/c26fceff6309 >> >> cval: hang caches on classes >> >> + cval-tune-7030453.patch >> ! series > > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110804/4c7deaa4/attachment.html From john.r.rose at oracle.com Thu Aug 4 13:31:52 2011 From: john.r.rose at oracle.com (John Rose) Date: Thu, 4 Aug 2011 13:31:52 -0700 Subject: hg: mlvm/mlvm/jdk: cval: hang caches on classes In-Reply-To: References: <20110804045816.1F2EB4791F@hg.openjdk.java.net> Message-ID: <4B01FF44-5689-4908-82C7-E14F5017FA82@oracle.com> On Aug 4, 2011, at 1:43 AM, Charles Oliver Nutter wrote: > I see there's WeakReference used in there...will ClassValue have > WeakReference-like impact on GC? In that respect it be like ThreadLocal. You can create, use, and discard ThreadLocal and ClassValue objects without a storage leak. > I'll try to get JRuby using ClassValue for a few things soon, to help > test this out against non-ClassValue solutions. Thanks! Let me know if I can help. -- John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110804/114e3ab7/attachment.html From john.r.rose at oracle.com Thu Aug 4 13:36:33 2011 From: john.r.rose at oracle.com (John Rose) Date: Thu, 4 Aug 2011 13:36:33 -0700 Subject: hg: mlvm/mlvm/jdk: cval: hang caches on classes In-Reply-To: <4E3A7A46.9020100@univ-mlv.fr> References: <20110804045816.1F2EB4791F@hg.openjdk.java.net> <4E3A7A46.9020100@univ-mlv.fr> Message-ID: On Aug 4, 2011, at 3:53 AM, R?mi Forax wrote: > Not sure to understand the full patch, > anyway nextHashCode should be declared final. Right; thanks. The code needs a few more well-chosen comments. I hope reviewer feedback can help me figure out what those can be. Also, I could use some advice on tuning the free parameters, and/or omitting some of the optional bits, such as reprobing. -- John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110804/b61d9c98/attachment.html From christian.thalinger at oracle.com Fri Aug 5 06:32:14 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 5 Aug 2011 15:32:14 +0200 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled Message-ID: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> http://cr.openjdk.java.net/~twisti/7071653 7071653: JSR 292: call site change notification should be pushed not pulled Reviewed-by: Currently every speculatively inlined method handle call site has a guard that compares the current target of the CallSite object to the inlined one. This per-invocation overhead can be removed if the notification is changed from pulled to pushed (i.e. deoptimization). I had to change the logic in TemplateTable::patch_bytecode to skip bytecode quickening for putfield instructions when the put_code written to the constant pool cache is zero. This is required so that every execution of a putfield to CallSite.target calls out to InterpreterRuntime::resolve_get_put to do the deoptimization of depending compiled methods. I also had to change the dependency machinery to understand other dependencies than class hierarchy ones. DepChange got the super-type of two new dependencies, KlassDepChange and CallSiteDepChange. Tested with JRuby tests and benchmarks, hand-written testcases, JDK tests and vm.mlvm tests. Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, second with 7071653). Since the CallSite targets don't change during the runtime of this benchmark we can see the performance benefit of eliminating the guard: $ jruby --server bench/bench_fib_recursive.rb 5 35 0.883000 0.000000 0.883000 ( 0.854000) 0.715000 0.000000 0.715000 ( 0.715000) 0.712000 0.000000 0.712000 ( 0.712000) 0.713000 0.000000 0.713000 ( 0.713000) 0.713000 0.000000 0.713000 ( 0.712000) $ jruby --server bench/bench_fib_recursive.rb 5 35 0.772000 0.000000 0.772000 ( 0.742000) 0.624000 0.000000 0.624000 ( 0.624000) 0.621000 0.000000 0.621000 ( 0.621000) 0.622000 0.000000 0.622000 ( 0.622000) 0.622000 0.000000 0.622000 ( 0.621000) From mroos at roos.com Sat Aug 6 09:32:57 2011 From: mroos at roos.com (Mark Roos) Date: Sat, 6 Aug 2011 09:32:57 -0700 Subject: BoundMethodHandle Klass and JVMTI heap walking In-Reply-To: <7497E28B-C9EA-4761-B502-19C4926ED8AF@oracle.com> References: <7497E28B-C9EA-4761-B502-19C4926ED8AF@oracle.com> Message-ID: Well that does explain why I have references to BoundMethodHandles but it did not explain why I see them. When I use FollowReferences I set the Klass filter to my desired result class 'myClass'. But what I see after the filter is myClass objects, arrays of myClass and BoundMethodHandles. Not quite what I would expect. After reading more C code then I care to and trying several experiments I came to the conclusion that where KlassHandle(tread, oop ) is called it returns null for the array and the BoundMethodHandle class pointers. That is where my skills at tracing C code ended. I found a solution for my case where I tag myClass with an unique tag and filter it myself in the callback. Works great for both allInstances and allRefrences. I still set the klass filter to limit the call backs but the is not required. thanks mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110806/85b01192/attachment.html From mroos at roos.com Sat Aug 6 09:32:57 2011 From: mroos at roos.com (Mark Roos) Date: Sat, 6 Aug 2011 09:32:57 -0700 Subject: Debugging progess, question on how to step using MH In-Reply-To: References: Message-ID: I have be interested the using SwitchPoints for the invalidation phase of my call sites. As you may recall I collect sites into an array as they are created. When they reach some quantity I invalidate them all and start over. It seems that using a switch point for this is perfect except that I lose the ability to walk the array. Perhaps I can use my new skills with FollowReferences to find the sites give the SwitchPoint. Its not clear to me how this applies to my stepping requirement. As each point can only change once it seems less useful in controlling the execution mode. I am using an intial GWT to decide on a site by site basis if a suspend is required. If I use a switch point then it seems like I have to replace the point each time its used. Is this what you were thinking? thanks mark By the way it starting to feel like a real Smalltalk envrionment -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110806/b0bf22ef/attachment.html From headius at headius.com Sat Aug 6 10:43:47 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Sat, 6 Aug 2011 17:43:47 +0000 Subject: What repository for post-GA work? Message-ID: What OpenJDK repo should I be tracking for post-GA invokedynamic tweaks? I fcloned and built jdk7 but it doesn't seem to have e.g. Tom's patch for bimorphic inlining at GWT. That patch applies cleanly, but Christian's patch for push invalidation does not. How can I stay on the bleeding edge? - Charlie From tom.rodriguez at oracle.com Sat Aug 6 16:29:38 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Sat, 6 Aug 2011 16:29:38 -0700 Subject: What repository for post-GA work? In-Reply-To: References: Message-ID: Since hotspot is delivered into multiple releases we've officially kind of separated from the release specific trees except at points where we deliver a snapshot into them. The live development repos for hotspot are under http://hg.openjdk.java.net/hsx, so http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot is the compiler group repo. tom On Aug 6, 2011, at 10:43 AM, Charles Oliver Nutter wrote: > What OpenJDK repo should I be tracking for post-GA invokedynamic > tweaks? I fcloned and built jdk7 but it doesn't seem to have e.g. > Tom's patch for bimorphic inlining at GWT. That patch applies cleanly, > but Christian's patch for push invalidation does not. > > How can I stay on the bleeding edge? > > - Charlie > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From vladimir.kozlov at oracle.com Sun Aug 7 15:35:29 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sun, 07 Aug 2011 15:35:29 -0700 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> Message-ID: <4E3F1331.2000909@oracle.com> Christian, You need to add big comment to the new code in templateTable_.cpp explaining what it does and why. Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32 bit)? Add assert(byte_no == -1, ) to default: case to make sure you got all cases above it. I am concern about using next short branch in new code in templateTable_sparc.cpp: cmp_and_br_short(..., L_patch_done); // don't patch There is __ stop() call which generates a lot of code so that label L_patch_done could be far. Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files. I don't like assignments in condition and implicit NULL checks. Can you change check_dependency() to next?: klassOop check_dependency() { klassOop result = check_klass_dependency(NULL); if (result != NULL) return result; return check_call_site_dependency(NULL); } In interpreterRuntime.cpp initialize marked: int marked = 0; Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it. Vladimir On 8/5/11 6:32 AM, Christian Thalinger wrote: > http://cr.openjdk.java.net/~twisti/7071653 > > 7071653: JSR 292: call site change notification should be pushed not pulled > Reviewed-by: > > Currently every speculatively inlined method handle call site has a > guard that compares the current target of the CallSite object to the > inlined one. This per-invocation overhead can be removed if the > notification is changed from pulled to pushed (i.e. deoptimization). > > I had to change the logic in TemplateTable::patch_bytecode to skip > bytecode quickening for putfield instructions when the put_code > written to the constant pool cache is zero. This is required so that > every execution of a putfield to CallSite.target calls out to > InterpreterRuntime::resolve_get_put to do the deoptimization of > depending compiled methods. > > I also had to change the dependency machinery to understand other > dependencies than class hierarchy ones. DepChange got the super-type > of two new dependencies, KlassDepChange and CallSiteDepChange. > > Tested with JRuby tests and benchmarks, hand-written testcases, JDK > tests and vm.mlvm tests. > > Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, > second with 7071653). Since the CallSite targets don't change during > the runtime of this benchmark we can see the performance benefit of > eliminating the guard: > > $ jruby --server bench/bench_fib_recursive.rb 5 35 > 0.883000 0.000000 0.883000 ( 0.854000) > 0.715000 0.000000 0.715000 ( 0.715000) > 0.712000 0.000000 0.712000 ( 0.712000) > 0.713000 0.000000 0.713000 ( 0.713000) > 0.713000 0.000000 0.713000 ( 0.712000) > > $ jruby --server bench/bench_fib_recursive.rb 5 35 > 0.772000 0.000000 0.772000 ( 0.742000) > 0.624000 0.000000 0.624000 ( 0.624000) > 0.621000 0.000000 0.621000 ( 0.621000) > 0.622000 0.000000 0.622000 ( 0.622000) > 0.622000 0.000000 0.622000 ( 0.621000) > From christian.thalinger at oracle.com Mon Aug 8 06:51:43 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 8 Aug 2011 15:51:43 +0200 Subject: JRuby/Seph/PHP.reboot/... SwitchPoint usage Message-ID: Since I have the basic push-notification of CallSites I'm now looking into push-notification of SwitchPoints: 7071709: JSR 292: switchpoint invalidation should be pushed not pulled Basically it should be the same, just needs some additional love in the compiler. I looked into JRuby's usage of SwitchPoints and it seems it has something to do with constants. Is there an existing benchmark that would benefit from the SwitchPoint optimization? Seph also seems to use SwitchPoints, PHP.reboot does not (that's what grep tells me). -- Christian From christian.thalinger at oracle.com Mon Aug 8 06:56:00 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 8 Aug 2011 15:56:00 +0200 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: <4E3F1331.2000909@oracle.com> References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> <4E3F1331.2000909@oracle.com> Message-ID: On Aug 8, 2011, at 12:35 AM, Vladimir Kozlov wrote: > Christian, > > You need to add big comment to the new code in templateTable_.cpp explaining what it does and why. Done. I made the wording a little more general because Tom's effectively final work might use the same machinery. > > Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32 bit)? Good question. I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr. _indices in CosntantPoolCacheEntry is defined as intx: volatile intx _indices; // constant pool index & rewrite bytecodes and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word: // bit number |31 0| // bit length |-8--|-8--|---16----| // -------------------------------- // _indices [ b2 | b1 | index ] Loading 32-bit on LE gives you the right bits but on BE it does not. I think that's the reason for the "optimization" on x64. > > Add assert(byte_no == -1, ) to default: case to make sure you got all cases above it. Done. > > I am concern about using next short branch in new code in templateTable_sparc.cpp: > > cmp_and_br_short(..., L_patch_done); // don't patch > > There is __ stop() call which generates a lot of code so that label L_patch_done could be far. Yeah, I thought I give it a try if it works. cmp_and_br_short should assert if the branch displacement is too far, right? > > > Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files. Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods). It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it. I missed that. But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies. > > I don't like assignments in condition and implicit NULL checks. Can you change check_dependency() to next?: > > klassOop check_dependency() { > klassOop result = check_klass_dependency(NULL); > if (result != NULL) return result; > return check_call_site_dependency(NULL); > } Done. > > In interpreterRuntime.cpp initialize marked: int marked = 0; OK. > > Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it. The spec of MutableCallSite says: "For target values which will be frequently updated, consider using a volatile call site instead." And VolatileCallSite says: "A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads. Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads. In other respects, a VolatileCallSite is interchangeable with MutableCallSite." Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case. Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet. Additionally I had to do two small changes because the build was broken on some configurations: - klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL; + klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL; and - MutexLockerEx ccl(CodeCache_lock, thread); + MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag); I updated the webrev. -- Christian > > > Vladimir > > On 8/5/11 6:32 AM, Christian Thalinger wrote: >> http://cr.openjdk.java.net/~twisti/7071653 >> >> 7071653: JSR 292: call site change notification should be pushed not pulled >> Reviewed-by: >> >> Currently every speculatively inlined method handle call site has a >> guard that compares the current target of the CallSite object to the >> inlined one. This per-invocation overhead can be removed if the >> notification is changed from pulled to pushed (i.e. deoptimization). >> >> I had to change the logic in TemplateTable::patch_bytecode to skip >> bytecode quickening for putfield instructions when the put_code >> written to the constant pool cache is zero. This is required so that >> every execution of a putfield to CallSite.target calls out to >> InterpreterRuntime::resolve_get_put to do the deoptimization of >> depending compiled methods. >> >> I also had to change the dependency machinery to understand other >> dependencies than class hierarchy ones. DepChange got the super-type >> of two new dependencies, KlassDepChange and CallSiteDepChange. >> >> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >> tests and vm.mlvm tests. >> >> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >> second with 7071653). Since the CallSite targets don't change during >> the runtime of this benchmark we can see the performance benefit of >> eliminating the guard: >> >> $ jruby --server bench/bench_fib_recursive.rb 5 35 >> 0.883000 0.000000 0.883000 ( 0.854000) >> 0.715000 0.000000 0.715000 ( 0.715000) >> 0.712000 0.000000 0.712000 ( 0.712000) >> 0.713000 0.000000 0.713000 ( 0.713000) >> 0.713000 0.000000 0.713000 ( 0.712000) >> >> $ jruby --server bench/bench_fib_recursive.rb 5 35 >> 0.772000 0.000000 0.772000 ( 0.742000) >> 0.624000 0.000000 0.624000 ( 0.624000) >> 0.621000 0.000000 0.621000 ( 0.621000) >> 0.622000 0.000000 0.622000 ( 0.622000) >> 0.622000 0.000000 0.622000 ( 0.621000) >> From forax at univ-mlv.fr Mon Aug 8 07:01:38 2011 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Mon, 08 Aug 2011 16:01:38 +0200 Subject: JRuby/Seph/PHP.reboot/... SwitchPoint usage In-Reply-To: References: Message-ID: <4E3FEC42.6060601@univ-mlv.fr> On 08/08/2011 03:51 PM, Christian Thalinger wrote: > Since I have the basic push-notification of CallSites I'm now looking into push-notification of SwitchPoints: > > 7071709: JSR 292: switchpoint invalidation should be pushed not pulled > > Basically it should be the same, just needs some additional love in the compiler. > > I looked into JRuby's usage of SwitchPoints and it seems it has something to do with constants. Is there an existing benchmark that would benefit from the SwitchPoint optimization? Seph also seems to use SwitchPoints, PHP.reboot does not (that's what grep tells me). Yes, PHP.reboot doesn't support any metaclass changes unlike Seph or Ruby. The metaclass example in the cookbook use a SwitchPoint, I can modify it to do a perf test, currently the callsites are called only a few times. > > -- Christian R?mi From vladimir.kozlov at oracle.com Mon Aug 8 07:55:32 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 08 Aug 2011 07:55:32 -0700 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> <4E3F1331.2000909@oracle.com> Message-ID: <4E3FF8E4.2070302@oracle.com> Christian, Should we put "skip bytecode quickening" code under flag to do this only when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic case? On 8/8/11 6:56 AM, Christian Thalinger wrote: >> Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32 bit)? > > Good question. I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr. > > _indices in CosntantPoolCacheEntry is defined as intx: > > volatile intx _indices; // constant pool index& rewrite bytecodes > > and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word: > > // bit number |31 0| > // bit length |-8--|-8--|---16----| > // -------------------------------- > // _indices [ b2 | b1 | index ] > > Loading 32-bit on LE gives you the right bits but on BE it does not. I think that's the reason for the "optimization" on x64. I don't like this "optimization" but I understand why we using it. Add a comment (especially in x64 file). >> >> I am concern about using next short branch in new code in templateTable_sparc.cpp: >> >> cmp_and_br_short(..., L_patch_done); // don't patch >> >> There is __ stop() call which generates a lot of code so that label L_patch_done could be far. > > Yeah, I thought I give it a try if it works. cmp_and_br_short should assert if the branch displacement is too far, right? > Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM on Sparc + compressed oops + VerifyOops. >> >> >> Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files. > > Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods). It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it. I missed that. But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies. > OK. > >> >> Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it. > > The spec of MutableCallSite says: > > "For target values which will be frequently updated, consider using a volatile call site instead." > > And VolatileCallSite says: > > "A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads. > > Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads. > > In other respects, a VolatileCallSite is interchangeable with MutableCallSite." > > Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case. Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet. > Thank you for explaining it. > Additionally I had to do two small changes because the build was broken on some configurations: > > - klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL; > + klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL; > > and > > - MutexLockerEx ccl(CodeCache_lock, thread); > + MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag); > > I updated the webrev. Good. Vladimir > > -- Christian > >> >> >> Vladimir >> >> On 8/5/11 6:32 AM, Christian Thalinger wrote: >>> http://cr.openjdk.java.net/~twisti/7071653 >>> >>> 7071653: JSR 292: call site change notification should be pushed not pulled >>> Reviewed-by: >>> >>> Currently every speculatively inlined method handle call site has a >>> guard that compares the current target of the CallSite object to the >>> inlined one. This per-invocation overhead can be removed if the >>> notification is changed from pulled to pushed (i.e. deoptimization). >>> >>> I had to change the logic in TemplateTable::patch_bytecode to skip >>> bytecode quickening for putfield instructions when the put_code >>> written to the constant pool cache is zero. This is required so that >>> every execution of a putfield to CallSite.target calls out to >>> InterpreterRuntime::resolve_get_put to do the deoptimization of >>> depending compiled methods. >>> >>> I also had to change the dependency machinery to understand other >>> dependencies than class hierarchy ones. DepChange got the super-type >>> of two new dependencies, KlassDepChange and CallSiteDepChange. >>> >>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >>> tests and vm.mlvm tests. >>> >>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >>> second with 7071653). Since the CallSite targets don't change during >>> the runtime of this benchmark we can see the performance benefit of >>> eliminating the guard: >>> >>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>> 0.883000 0.000000 0.883000 ( 0.854000) >>> 0.715000 0.000000 0.715000 ( 0.715000) >>> 0.712000 0.000000 0.712000 ( 0.712000) >>> 0.713000 0.000000 0.713000 ( 0.713000) >>> 0.713000 0.000000 0.713000 ( 0.712000) >>> >>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>> 0.772000 0.000000 0.772000 ( 0.742000) >>> 0.624000 0.000000 0.624000 ( 0.624000) >>> 0.621000 0.000000 0.621000 ( 0.621000) >>> 0.622000 0.000000 0.622000 ( 0.622000) >>> 0.622000 0.000000 0.622000 ( 0.621000) >>> > From christian.thalinger at oracle.com Mon Aug 8 09:29:21 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 8 Aug 2011 18:29:21 +0200 Subject: JRuby/Seph/PHP.reboot/... SwitchPoint usage In-Reply-To: <4E3FEC42.6060601@univ-mlv.fr> References: <4E3FEC42.6060601@univ-mlv.fr> Message-ID: <0CD6A287-DE0A-40C4-8735-B66F3CC5AB9F@oracle.com> On Aug 8, 2011, at 4:01 PM, R?mi Forax wrote: > On 08/08/2011 03:51 PM, Christian Thalinger wrote: >> Since I have the basic push-notification of CallSites I'm now looking into push-notification of SwitchPoints: >> >> 7071709: JSR 292: switchpoint invalidation should be pushed not pulled >> >> Basically it should be the same, just needs some additional love in the compiler. >> >> I looked into JRuby's usage of SwitchPoints and it seems it has something to do with constants. Is there an existing benchmark that would benefit from the SwitchPoint optimization? Seph also seems to use SwitchPoints, PHP.reboot does not (that's what grep tells me). > > Yes, PHP.reboot doesn't support any metaclass changes unlike Seph or Ruby. > > The metaclass example in the cookbook use a SwitchPoint, > I can modify it to do a perf test, currently the callsites are called > only a few times. That would be great! -- Christian > >> >> -- Christian > > R?mi > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From headius at headius.com Mon Aug 8 09:39:37 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Mon, 8 Aug 2011 12:39:37 -0400 Subject: JRuby/Seph/PHP.reboot/... SwitchPoint usage In-Reply-To: References: Message-ID: On Mon, Aug 8, 2011 at 9:51 AM, Christian Thalinger wrote: > Since I have the basic push-notification of CallSites I'm now looking into push-notification of SwitchPoints: > > 7071709: JSR 292: switchpoint invalidation should be pushed not pulled > > Basically it should be the same, just needs some additional love in the compiler. > > I looked into JRuby's usage of SwitchPoints and it seems it has something to do with constants. ?Is there an existing benchmark that would benefit from the SwitchPoint optimization? ?Seph also seems to use SwitchPoints, PHP.reboot does not (that's what grep tells me). Yes, currently SwitchPoint is only used for constant lookup, since constant modification invalidates globally. A good benchmark to use would be this one: bench/language/bench_const_lookup.rb Here's numbers with a recent openjdk-osx-build with and without invokedynamic enabled WITHOUT: 100k * 100 nested const get 0.059000 0.000000 0.059000 ( 0.059000) 100k * 100 nested const get 0.059000 0.000000 0.059000 ( 0.059000) 100k * 100 nested const get 0.058000 0.000000 0.058000 ( 0.058000) 100k * 100 nested const get 0.059000 0.000000 0.059000 ( 0.059000) 100k * 100 nested const get 0.057000 0.000000 0.057000 ( 0.057000) 100k * 100 inherited const get 0.058000 0.000000 0.058000 ( 0.058000) 100k * 100 inherited const get 0.059000 0.000000 0.059000 ( 0.059000) 100k * 100 inherited const get 0.058000 0.000000 0.058000 ( 0.058000) 100k * 100 inherited const get 0.058000 0.000000 0.058000 ( 0.058000) 100k * 100 inherited const get 0.063000 0.000000 0.063000 ( 0.064000) 100k * 100 both 0.060000 0.000000 0.060000 ( 0.060000) 100k * 100 both 0.060000 0.000000 0.060000 ( 0.060000) 100k * 100 both 0.059000 0.000000 0.059000 ( 0.059000) 100k * 100 both 0.058000 0.000000 0.058000 ( 0.058000) 100k * 100 both 0.059000 0.000000 0.059000 ( 0.059000) WITH: (specify -Xinvokedynamic.constants=true to JRuby, or -Djruby.invokedynamic.constants=true to JVM) 100k * 100 nested const get 1.321000 0.000000 1.321000 ( 1.321000) 100k * 100 nested const get 1.311000 0.000000 1.311000 ( 1.311000) 100k * 100 nested const get 1.305000 0.000000 1.305000 ( 1.305000) 100k * 100 nested const get 1.293000 0.000000 1.293000 ( 1.294000) 100k * 100 nested const get 1.292000 0.000000 1.292000 ( 1.293000) 100k * 100 inherited const get 1.295000 0.000000 1.295000 ( 1.295000) 100k * 100 inherited const get 1.241000 0.000000 1.241000 ( 1.241000) 100k * 100 inherited const get 1.241000 0.000000 1.241000 ( 1.241000) 100k * 100 inherited const get 1.244000 0.000000 1.244000 ( 1.244000) 100k * 100 inherited const get 1.236000 0.000000 1.236000 ( 1.236000) 100k * 100 both 1.280000 0.000000 1.280000 ( 1.280000) 100k * 100 both 1.236000 0.000000 1.236000 ( 1.236000) 100k * 100 both 1.229000 0.000000 1.229000 ( 1.230000) 100k * 100 both 1.236000 0.000000 1.236000 ( 1.236000) 100k * 100 both 1.248000 0.000000 1.248000 ( 1.248000) You can see there's some room for improvement :) The number should be faster with invokedynamic, since the SwitchPoint form has no active guard. - Charlie From christian.thalinger at oracle.com Mon Aug 8 11:12:06 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 8 Aug 2011 20:12:06 +0200 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: <4E3FF8E4.2070302@oracle.com> References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> <4E3F1331.2000909@oracle.com> <4E3FF8E4.2070302@oracle.com> Message-ID: On Aug 8, 2011, at 4:55 PM, Vladimir Kozlov wrote: > Christian, > > Should we put "skip bytecode quickening" code under flag to do this only when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic case? No, it doesn't buy us anything. The new checking code is only executed the first time as the bytecodes are quickened right after that. And in the case where a putfield isn't quickened and we call resolve_get_put it gets very expensive anyway. > > On 8/8/11 6:56 AM, Christian Thalinger wrote: >>> Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32 bit)? >> >> Good question. I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr. >> >> _indices in CosntantPoolCacheEntry is defined as intx: >> >> volatile intx _indices; // constant pool index& rewrite bytecodes >> >> and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word: >> >> // bit number |31 0| >> // bit length |-8--|-8--|---16----| >> // -------------------------------- >> // _indices [ b2 | b1 | index ] >> >> Loading 32-bit on LE gives you the right bits but on BE it does not. I think that's the reason for the "optimization" on x64. > > I don't like this "optimization" but I understand why we using it. Add a comment (especially in x64 file). I factored reading the bytecode into InterpreterMacroAssembler::get_cache_and_index_and_bytecode_at_bcp since the same code is used twice in TemplateTable and added the comment there. > >>> >>> I am concern about using next short branch in new code in templateTable_sparc.cpp: >>> >>> cmp_and_br_short(..., L_patch_done); // don't patch >>> >>> There is __ stop() call which generates a lot of code so that label L_patch_done could be far. >> >> Yeah, I thought I give it a try if it works. cmp_and_br_short should assert if the branch displacement is too far, right? >> > > Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM on Sparc + compressed oops + VerifyOops. That works. > >>> >>> >>> Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files. >> >> Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods). It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it. I missed that. But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies. >> > > OK. > >> >>> >>> Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it. >> >> The spec of MutableCallSite says: >> >> "For target values which will be frequently updated, consider using a volatile call site instead." >> >> And VolatileCallSite says: >> >> "A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads. >> >> Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads. >> >> In other respects, a VolatileCallSite is interchangeable with MutableCallSite." >> >> Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case. Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet. >> > > Thank you for explaining it. > >> Additionally I had to do two small changes because the build was broken on some configurations: >> >> - klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL; >> + klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL; >> >> and >> >> - MutexLockerEx ccl(CodeCache_lock, thread); >> + MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag); >> >> I updated the webrev. > > Good. Thanks. -- Christian > > Vladimir > >> >> -- Christian >> >>> >>> >>> Vladimir >>> >>> On 8/5/11 6:32 AM, Christian Thalinger wrote: >>>> http://cr.openjdk.java.net/~twisti/7071653 >>>> >>>> 7071653: JSR 292: call site change notification should be pushed not pulled >>>> Reviewed-by: >>>> >>>> Currently every speculatively inlined method handle call site has a >>>> guard that compares the current target of the CallSite object to the >>>> inlined one. This per-invocation overhead can be removed if the >>>> notification is changed from pulled to pushed (i.e. deoptimization). >>>> >>>> I had to change the logic in TemplateTable::patch_bytecode to skip >>>> bytecode quickening for putfield instructions when the put_code >>>> written to the constant pool cache is zero. This is required so that >>>> every execution of a putfield to CallSite.target calls out to >>>> InterpreterRuntime::resolve_get_put to do the deoptimization of >>>> depending compiled methods. >>>> >>>> I also had to change the dependency machinery to understand other >>>> dependencies than class hierarchy ones. DepChange got the super-type >>>> of two new dependencies, KlassDepChange and CallSiteDepChange. >>>> >>>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >>>> tests and vm.mlvm tests. >>>> >>>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >>>> second with 7071653). Since the CallSite targets don't change during >>>> the runtime of this benchmark we can see the performance benefit of >>>> eliminating the guard: >>>> >>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>> 0.883000 0.000000 0.883000 ( 0.854000) >>>> 0.715000 0.000000 0.715000 ( 0.715000) >>>> 0.712000 0.000000 0.712000 ( 0.712000) >>>> 0.713000 0.000000 0.713000 ( 0.713000) >>>> 0.713000 0.000000 0.713000 ( 0.712000) >>>> >>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>> 0.772000 0.000000 0.772000 ( 0.742000) >>>> 0.624000 0.000000 0.624000 ( 0.624000) >>>> 0.621000 0.000000 0.621000 ( 0.621000) >>>> 0.622000 0.000000 0.622000 ( 0.622000) >>>> 0.622000 0.000000 0.622000 ( 0.621000) >>>> >> From christian.thalinger at oracle.com Mon Aug 8 11:21:40 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 8 Aug 2011 20:21:40 +0200 Subject: JRuby/Seph/PHP.reboot/... SwitchPoint usage In-Reply-To: References: Message-ID: On Aug 8, 2011, at 6:39 PM, Charles Oliver Nutter wrote: > On Mon, Aug 8, 2011 at 9:51 AM, Christian Thalinger > wrote: >> Since I have the basic push-notification of CallSites I'm now looking into push-notification of SwitchPoints: >> >> 7071709: JSR 292: switchpoint invalidation should be pushed not pulled >> >> Basically it should be the same, just needs some additional love in the compiler. >> >> I looked into JRuby's usage of SwitchPoints and it seems it has something to do with constants. Is there an existing benchmark that would benefit from the SwitchPoint optimization? Seph also seems to use SwitchPoints, PHP.reboot does not (that's what grep tells me). > > Yes, currently SwitchPoint is only used for constant lookup, since > constant modification invalidates globally. A good benchmark to use > would be this one: > > bench/language/bench_const_lookup.rb > > Here's numbers with a recent openjdk-osx-build with and without > invokedynamic enabled > > WITHOUT: > > 100k * 100 nested const get 0.059000 0.000000 > 0.059000 ( 0.059000) > 100k * 100 nested const get 0.059000 0.000000 > 0.059000 ( 0.059000) > 100k * 100 nested const get 0.058000 0.000000 > 0.058000 ( 0.058000) > 100k * 100 nested const get 0.059000 0.000000 > 0.059000 ( 0.059000) > 100k * 100 nested const get 0.057000 0.000000 > 0.057000 ( 0.057000) > 100k * 100 inherited const get 0.058000 0.000000 > 0.058000 ( 0.058000) > 100k * 100 inherited const get 0.059000 0.000000 > 0.059000 ( 0.059000) > 100k * 100 inherited const get 0.058000 0.000000 > 0.058000 ( 0.058000) > 100k * 100 inherited const get 0.058000 0.000000 > 0.058000 ( 0.058000) > 100k * 100 inherited const get 0.063000 0.000000 > 0.063000 ( 0.064000) > 100k * 100 both 0.060000 0.000000 > 0.060000 ( 0.060000) > 100k * 100 both 0.060000 0.000000 > 0.060000 ( 0.060000) > 100k * 100 both 0.059000 0.000000 > 0.059000 ( 0.059000) > 100k * 100 both 0.058000 0.000000 > 0.058000 ( 0.058000) > 100k * 100 both 0.059000 0.000000 > 0.059000 ( 0.059000) > > WITH: (specify -Xinvokedynamic.constants=true to JRuby, or > -Djruby.invokedynamic.constants=true to JVM) > > 100k * 100 nested const get 1.321000 0.000000 > 1.321000 ( 1.321000) > 100k * 100 nested const get 1.311000 0.000000 > 1.311000 ( 1.311000) > 100k * 100 nested const get 1.305000 0.000000 > 1.305000 ( 1.305000) > 100k * 100 nested const get 1.293000 0.000000 > 1.293000 ( 1.294000) > 100k * 100 nested const get 1.292000 0.000000 > 1.292000 ( 1.293000) > 100k * 100 inherited const get 1.295000 0.000000 > 1.295000 ( 1.295000) > 100k * 100 inherited const get 1.241000 0.000000 > 1.241000 ( 1.241000) > 100k * 100 inherited const get 1.241000 0.000000 > 1.241000 ( 1.241000) > 100k * 100 inherited const get 1.244000 0.000000 > 1.244000 ( 1.244000) > 100k * 100 inherited const get 1.236000 0.000000 > 1.236000 ( 1.236000) > 100k * 100 both 1.280000 0.000000 > 1.280000 ( 1.280000) > 100k * 100 both 1.236000 0.000000 > 1.236000 ( 1.236000) > 100k * 100 both 1.229000 0.000000 > 1.229000 ( 1.230000) > 100k * 100 both 1.236000 0.000000 > 1.236000 ( 1.236000) > 100k * 100 both 1.248000 0.000000 > 1.248000 ( 1.248000) > > You can see there's some room for improvement :) The number should be > faster with invokedynamic, since the SwitchPoint form has no active > guard. That's perfect! Let's see what numbers I can come up with... -- Christian > > - Charlie > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From tom.rodriguez at oracle.com Mon Aug 8 11:49:16 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 8 Aug 2011 11:49:16 -0700 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> Message-ID: dependencies.cpp: in check_call_site_target_value, the changes == NULL case should be checking that the call site hasn't changed. It should probably look more like this: klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, CallSiteDepChange* changes) { assert(call_site->is_a(SystemDictionary::CallSite_klass()), "sanity"); // Same CallSite object but different target? Check this specific call site // if changes is non-NULL or validate all CallSites if ((changes == NULL || (call_site == changes->call_site())) && (java_lang_invoke_CallSite::target(call_site) != changes->method_handle())) { return ctxk; // assertion failed } assert(java_lang_invoke_CallSite::target(call_site) == changes->method_handle(), "should still be valid"); return NULL; // assertion still valid } The final assert is just a paranoia check that a call site hasn't changed without the dependencies being checked. interpreterRuntime.cpp: Please move the dependence check code into universe with the other dependence check code. Also add some comments explaining why it's doing what it's doing. doCall.cpp: Can you put in a comment explaining that VolatileCallSite is never inlined. Otherwise it looks good. tom On Aug 5, 2011, at 6:32 AM, Christian Thalinger wrote: > http://cr.openjdk.java.net/~twisti/7071653 > > 7071653: JSR 292: call site change notification should be pushed not pulled > Reviewed-by: > > Currently every speculatively inlined method handle call site has a > guard that compares the current target of the CallSite object to the > inlined one. This per-invocation overhead can be removed if the > notification is changed from pulled to pushed (i.e. deoptimization). > > I had to change the logic in TemplateTable::patch_bytecode to skip > bytecode quickening for putfield instructions when the put_code > written to the constant pool cache is zero. This is required so that > every execution of a putfield to CallSite.target calls out to > InterpreterRuntime::resolve_get_put to do the deoptimization of > depending compiled methods. > > I also had to change the dependency machinery to understand other > dependencies than class hierarchy ones. DepChange got the super-type > of two new dependencies, KlassDepChange and CallSiteDepChange. > > Tested with JRuby tests and benchmarks, hand-written testcases, JDK > tests and vm.mlvm tests. > > Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, > second with 7071653). Since the CallSite targets don't change during > the runtime of this benchmark we can see the performance benefit of > eliminating the guard: > > $ jruby --server bench/bench_fib_recursive.rb 5 35 > 0.883000 0.000000 0.883000 ( 0.854000) > 0.715000 0.000000 0.715000 ( 0.715000) > 0.712000 0.000000 0.712000 ( 0.712000) > 0.713000 0.000000 0.713000 ( 0.713000) > 0.713000 0.000000 0.713000 ( 0.712000) > > $ jruby --server bench/bench_fib_recursive.rb 5 35 > 0.772000 0.000000 0.772000 ( 0.742000) > 0.624000 0.000000 0.624000 ( 0.624000) > 0.621000 0.000000 0.621000 ( 0.621000) > 0.622000 0.000000 0.622000 ( 0.622000) > 0.622000 0.000000 0.622000 ( 0.621000) > From vladimir.kozlov at oracle.com Mon Aug 8 11:52:57 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 08 Aug 2011 11:52:57 -0700 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> <4E3F1331.2000909@oracle.com> <4E3FF8E4.2070302@oracle.com> Message-ID: <4E403089.5010204@oracle.com> Christian Thalinger wrote: > On Aug 8, 2011, at 4:55 PM, Vladimir Kozlov wrote: > >> Christian, >> >> Should we put "skip bytecode quickening" code under flag to do this only when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic case? > > No, it doesn't buy us anything. The new checking code is only executed the first time as the bytecodes are quickened right after that. And in the case where a putfield isn't quickened and we call resolve_get_put it gets very expensive anyway. You lost me here. New code in resolve_get_put() is executed only for putfield to CallSite.target. But new code in patch_bytecode() skips quickening for all putfield bytecodes. My question is: can you narrow skipping quickening only for putfield to CallSite.target? Or you are saying that there is no performance difference between executing _aputfield vs _fast_aputfield? Vladimir > >> On 8/8/11 6:56 AM, Christian Thalinger wrote: >>>> Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32 bit)? >>> Good question. I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr. >>> >>> _indices in CosntantPoolCacheEntry is defined as intx: >>> >>> volatile intx _indices; // constant pool index& rewrite bytecodes >>> >>> and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word: >>> >>> // bit number |31 0| >>> // bit length |-8--|-8--|---16----| >>> // -------------------------------- >>> // _indices [ b2 | b1 | index ] >>> >>> Loading 32-bit on LE gives you the right bits but on BE it does not. I think that's the reason for the "optimization" on x64. >> I don't like this "optimization" but I understand why we using it. Add a comment (especially in x64 file). > > I factored reading the bytecode into InterpreterMacroAssembler::get_cache_and_index_and_bytecode_at_bcp since the same code is used twice in TemplateTable and added the comment there. > >>>> I am concern about using next short branch in new code in templateTable_sparc.cpp: >>>> >>>> cmp_and_br_short(..., L_patch_done); // don't patch >>>> >>>> There is __ stop() call which generates a lot of code so that label L_patch_done could be far. >>> Yeah, I thought I give it a try if it works. cmp_and_br_short should assert if the branch displacement is too far, right? >>> >> Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM on Sparc + compressed oops + VerifyOops. > > That works. > >>>> >>>> Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files. >>> Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods). It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it. I missed that. But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies. >>> >> OK. >> >>>> Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it. >>> The spec of MutableCallSite says: >>> >>> "For target values which will be frequently updated, consider using a volatile call site instead." >>> >>> And VolatileCallSite says: >>> >>> "A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads. >>> >>> Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads. >>> >>> In other respects, a VolatileCallSite is interchangeable with MutableCallSite." >>> >>> Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case. Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet. >>> >> Thank you for explaining it. >> >>> Additionally I had to do two small changes because the build was broken on some configurations: >>> >>> - klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL; >>> + klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL; >>> >>> and >>> >>> - MutexLockerEx ccl(CodeCache_lock, thread); >>> + MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag); >>> >>> I updated the webrev. >> Good. > > Thanks. > > -- Christian > >> Vladimir >> >>> -- Christian >>> >>>> >>>> Vladimir >>>> >>>> On 8/5/11 6:32 AM, Christian Thalinger wrote: >>>>> http://cr.openjdk.java.net/~twisti/7071653 >>>>> >>>>> 7071653: JSR 292: call site change notification should be pushed not pulled >>>>> Reviewed-by: >>>>> >>>>> Currently every speculatively inlined method handle call site has a >>>>> guard that compares the current target of the CallSite object to the >>>>> inlined one. This per-invocation overhead can be removed if the >>>>> notification is changed from pulled to pushed (i.e. deoptimization). >>>>> >>>>> I had to change the logic in TemplateTable::patch_bytecode to skip >>>>> bytecode quickening for putfield instructions when the put_code >>>>> written to the constant pool cache is zero. This is required so that >>>>> every execution of a putfield to CallSite.target calls out to >>>>> InterpreterRuntime::resolve_get_put to do the deoptimization of >>>>> depending compiled methods. >>>>> >>>>> I also had to change the dependency machinery to understand other >>>>> dependencies than class hierarchy ones. DepChange got the super-type >>>>> of two new dependencies, KlassDepChange and CallSiteDepChange. >>>>> >>>>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >>>>> tests and vm.mlvm tests. >>>>> >>>>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >>>>> second with 7071653). Since the CallSite targets don't change during >>>>> the runtime of this benchmark we can see the performance benefit of >>>>> eliminating the guard: >>>>> >>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>>> 0.883000 0.000000 0.883000 ( 0.854000) >>>>> 0.715000 0.000000 0.715000 ( 0.715000) >>>>> 0.712000 0.000000 0.712000 ( 0.712000) >>>>> 0.713000 0.000000 0.713000 ( 0.713000) >>>>> 0.713000 0.000000 0.713000 ( 0.712000) >>>>> >>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>>> 0.772000 0.000000 0.772000 ( 0.742000) >>>>> 0.624000 0.000000 0.624000 ( 0.624000) >>>>> 0.621000 0.000000 0.621000 ( 0.621000) >>>>> 0.622000 0.000000 0.622000 ( 0.622000) >>>>> 0.622000 0.000000 0.622000 ( 0.621000) >>>>> > From tom.rodriguez at oracle.com Mon Aug 8 12:08:52 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 8 Aug 2011 12:08:52 -0700 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: <4E403089.5010204@oracle.com> References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> <4E3F1331.2000909@oracle.com> <4E3FF8E4.2070302@oracle.com> <4E403089.5010204@oracle.com> Message-ID: On Aug 8, 2011, at 11:52 AM, Vladimir Kozlov wrote: > Christian Thalinger wrote: >> On Aug 8, 2011, at 4:55 PM, Vladimir Kozlov wrote: >> >>> Christian, >>> >>> Should we put "skip bytecode quickening" code under flag to do this only when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic case? >> >> No, it doesn't buy us anything. The new checking code is only executed the first time as the bytecodes are quickened right after that. And in the case where a putfield isn't quickened and we call resolve_get_put it gets very expensive anyway. > > You lost me here. New code in resolve_get_put() is executed only for putfield to > CallSite.target. But new code in patch_bytecode() skips quickening for all > putfield bytecodes. My question is: can you narrow skipping quickening only for > putfield to CallSite.target? Or you are saying that there is no performance > difference between executing _aputfield vs _fast_aputfield? It only skips quickening if put_code is zero, which is only done for CallSite.target. All the others proceed as they used to. tom > > Vladimir > >> >>> On 8/8/11 6:56 AM, Christian Thalinger wrote: >>>>> Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32 bit)? >>>> Good question. I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr. >>>> >>>> _indices in CosntantPoolCacheEntry is defined as intx: >>>> >>>> volatile intx _indices; // constant pool index& rewrite bytecodes >>>> >>>> and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word: >>>> >>>> // bit number |31 0| >>>> // bit length |-8--|-8--|---16----| >>>> // -------------------------------- >>>> // _indices [ b2 | b1 | index ] >>>> >>>> Loading 32-bit on LE gives you the right bits but on BE it does not. I think that's the reason for the "optimization" on x64. >>> I don't like this "optimization" but I understand why we using it. Add a comment (especially in x64 file). >> >> I factored reading the bytecode into InterpreterMacroAssembler::get_cache_and_index_and_bytecode_at_bcp since the same code is used twice in TemplateTable and added the comment there. >> >>>>> I am concern about using next short branch in new code in templateTable_sparc.cpp: >>>>> >>>>> cmp_and_br_short(..., L_patch_done); // don't patch >>>>> >>>>> There is __ stop() call which generates a lot of code so that label L_patch_done could be far. >>>> Yeah, I thought I give it a try if it works. cmp_and_br_short should assert if the branch displacement is too far, right? >>>> >>> Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM on Sparc + compressed oops + VerifyOops. >> >> That works. >> >>>>> >>>>> Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files. >>>> Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods). It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it. I missed that. But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies. >>>> >>> OK. >>> >>>>> Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it. >>>> The spec of MutableCallSite says: >>>> >>>> "For target values which will be frequently updated, consider using a volatile call site instead." >>>> >>>> And VolatileCallSite says: >>>> >>>> "A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads. >>>> >>>> Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads. >>>> >>>> In other respects, a VolatileCallSite is interchangeable with MutableCallSite." >>>> >>>> Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case. Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet. >>>> >>> Thank you for explaining it. >>> >>>> Additionally I had to do two small changes because the build was broken on some configurations: >>>> >>>> - klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL; >>>> + klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL; >>>> >>>> and >>>> >>>> - MutexLockerEx ccl(CodeCache_lock, thread); >>>> + MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag); >>>> >>>> I updated the webrev. >>> Good. >> >> Thanks. >> >> -- Christian >> >>> Vladimir >>> >>>> -- Christian >>>> >>>>> >>>>> Vladimir >>>>> >>>>> On 8/5/11 6:32 AM, Christian Thalinger wrote: >>>>>> http://cr.openjdk.java.net/~twisti/7071653 >>>>>> >>>>>> 7071653: JSR 292: call site change notification should be pushed not pulled >>>>>> Reviewed-by: >>>>>> >>>>>> Currently every speculatively inlined method handle call site has a >>>>>> guard that compares the current target of the CallSite object to the >>>>>> inlined one. This per-invocation overhead can be removed if the >>>>>> notification is changed from pulled to pushed (i.e. deoptimization). >>>>>> >>>>>> I had to change the logic in TemplateTable::patch_bytecode to skip >>>>>> bytecode quickening for putfield instructions when the put_code >>>>>> written to the constant pool cache is zero. This is required so that >>>>>> every execution of a putfield to CallSite.target calls out to >>>>>> InterpreterRuntime::resolve_get_put to do the deoptimization of >>>>>> depending compiled methods. >>>>>> >>>>>> I also had to change the dependency machinery to understand other >>>>>> dependencies than class hierarchy ones. DepChange got the super-type >>>>>> of two new dependencies, KlassDepChange and CallSiteDepChange. >>>>>> >>>>>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >>>>>> tests and vm.mlvm tests. >>>>>> >>>>>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >>>>>> second with 7071653). Since the CallSite targets don't change during >>>>>> the runtime of this benchmark we can see the performance benefit of >>>>>> eliminating the guard: >>>>>> >>>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>>>> 0.883000 0.000000 0.883000 ( 0.854000) >>>>>> 0.715000 0.000000 0.715000 ( 0.715000) >>>>>> 0.712000 0.000000 0.712000 ( 0.712000) >>>>>> 0.713000 0.000000 0.713000 ( 0.713000) >>>>>> 0.713000 0.000000 0.713000 ( 0.712000) >>>>>> >>>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>>>> 0.772000 0.000000 0.772000 ( 0.742000) >>>>>> 0.624000 0.000000 0.624000 ( 0.624000) >>>>>> 0.621000 0.000000 0.621000 ( 0.621000) >>>>>> 0.622000 0.000000 0.622000 ( 0.622000) >>>>>> 0.622000 0.000000 0.622000 ( 0.621000) >>>>>> >> > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From vladimir.kozlov at oracle.com Mon Aug 8 12:36:45 2011 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 08 Aug 2011 12:36:45 -0700 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> <4E3F1331.2000909@oracle.com> <4E3FF8E4.2070302@oracle.com> <4E403089.5010204@oracle.com> Message-ID: <4E403ACD.5000500@oracle.com> Tom Rodriguez wrote: > On Aug 8, 2011, at 11:52 AM, Vladimir Kozlov wrote: > >> Christian Thalinger wrote: >>> On Aug 8, 2011, at 4:55 PM, Vladimir Kozlov wrote: >>> >>>> Christian, >>>> >>>> Should we put "skip bytecode quickening" code under flag to do this only when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic case? >>> No, it doesn't buy us anything. The new checking code is only executed the first time as the bytecodes are quickened right after that. And in the case where a putfield isn't quickened and we call resolve_get_put it gets very expensive anyway. >> You lost me here. New code in resolve_get_put() is executed only for putfield to >> CallSite.target. But new code in patch_bytecode() skips quickening for all >> putfield bytecodes. My question is: can you narrow skipping quickening only for >> putfield to CallSite.target? Or you are saying that there is no performance >> difference between executing _aputfield vs _fast_aputfield? > > It only skips quickening if put_code is zero, which is only done for CallSite.target. All the others proceed as they used to. Good. Thank you, Tom Vladimir > > tom > >> Vladimir >> >>>> On 8/8/11 6:56 AM, Christian Thalinger wrote: >>>>>> Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32 bit)? >>>>> Good question. I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr. >>>>> >>>>> _indices in CosntantPoolCacheEntry is defined as intx: >>>>> >>>>> volatile intx _indices; // constant pool index& rewrite bytecodes >>>>> >>>>> and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word: >>>>> >>>>> // bit number |31 0| >>>>> // bit length |-8--|-8--|---16----| >>>>> // -------------------------------- >>>>> // _indices [ b2 | b1 | index ] >>>>> >>>>> Loading 32-bit on LE gives you the right bits but on BE it does not. I think that's the reason for the "optimization" on x64. >>>> I don't like this "optimization" but I understand why we using it. Add a comment (especially in x64 file). >>> I factored reading the bytecode into InterpreterMacroAssembler::get_cache_and_index_and_bytecode_at_bcp since the same code is used twice in TemplateTable and added the comment there. >>> >>>>>> I am concern about using next short branch in new code in templateTable_sparc.cpp: >>>>>> >>>>>> cmp_and_br_short(..., L_patch_done); // don't patch >>>>>> >>>>>> There is __ stop() call which generates a lot of code so that label L_patch_done could be far. >>>>> Yeah, I thought I give it a try if it works. cmp_and_br_short should assert if the branch displacement is too far, right? >>>>> >>>> Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM on Sparc + compressed oops + VerifyOops. >>> That works. >>> >>>>>> Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files. >>>>> Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods). It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it. I missed that. But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies. >>>>> >>>> OK. >>>> >>>>>> Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it. >>>>> The spec of MutableCallSite says: >>>>> >>>>> "For target values which will be frequently updated, consider using a volatile call site instead." >>>>> >>>>> And VolatileCallSite says: >>>>> >>>>> "A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads. >>>>> >>>>> Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads. >>>>> >>>>> In other respects, a VolatileCallSite is interchangeable with MutableCallSite." >>>>> >>>>> Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case. Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet. >>>>> >>>> Thank you for explaining it. >>>> >>>>> Additionally I had to do two small changes because the build was broken on some configurations: >>>>> >>>>> - klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL; >>>>> + klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL; >>>>> >>>>> and >>>>> >>>>> - MutexLockerEx ccl(CodeCache_lock, thread); >>>>> + MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag); >>>>> >>>>> I updated the webrev. >>>> Good. >>> Thanks. >>> >>> -- Christian >>> >>>> Vladimir >>>> >>>>> -- Christian >>>>> >>>>>> Vladimir >>>>>> >>>>>> On 8/5/11 6:32 AM, Christian Thalinger wrote: >>>>>>> http://cr.openjdk.java.net/~twisti/7071653 >>>>>>> >>>>>>> 7071653: JSR 292: call site change notification should be pushed not pulled >>>>>>> Reviewed-by: >>>>>>> >>>>>>> Currently every speculatively inlined method handle call site has a >>>>>>> guard that compares the current target of the CallSite object to the >>>>>>> inlined one. This per-invocation overhead can be removed if the >>>>>>> notification is changed from pulled to pushed (i.e. deoptimization). >>>>>>> >>>>>>> I had to change the logic in TemplateTable::patch_bytecode to skip >>>>>>> bytecode quickening for putfield instructions when the put_code >>>>>>> written to the constant pool cache is zero. This is required so that >>>>>>> every execution of a putfield to CallSite.target calls out to >>>>>>> InterpreterRuntime::resolve_get_put to do the deoptimization of >>>>>>> depending compiled methods. >>>>>>> >>>>>>> I also had to change the dependency machinery to understand other >>>>>>> dependencies than class hierarchy ones. DepChange got the super-type >>>>>>> of two new dependencies, KlassDepChange and CallSiteDepChange. >>>>>>> >>>>>>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >>>>>>> tests and vm.mlvm tests. >>>>>>> >>>>>>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >>>>>>> second with 7071653). Since the CallSite targets don't change during >>>>>>> the runtime of this benchmark we can see the performance benefit of >>>>>>> eliminating the guard: >>>>>>> >>>>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>>>>> 0.883000 0.000000 0.883000 ( 0.854000) >>>>>>> 0.715000 0.000000 0.715000 ( 0.715000) >>>>>>> 0.712000 0.000000 0.712000 ( 0.712000) >>>>>>> 0.713000 0.000000 0.713000 ( 0.713000) >>>>>>> 0.713000 0.000000 0.713000 ( 0.712000) >>>>>>> >>>>>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>>>>> 0.772000 0.000000 0.772000 ( 0.742000) >>>>>>> 0.624000 0.000000 0.624000 ( 0.624000) >>>>>>> 0.621000 0.000000 0.621000 ( 0.621000) >>>>>>> 0.622000 0.000000 0.622000 ( 0.622000) >>>>>>> 0.622000 0.000000 0.622000 ( 0.621000) >>>>>>> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > From headius at headius.com Mon Aug 8 22:58:20 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Tue, 9 Aug 2011 01:58:20 -0400 Subject: Push invalidation + bimorphic fix Message-ID: I have set up a local build against "hsx" hotspot compiler dev branch, thanks to Christian. I'll try to write something up with the process tomorrow. Anyway, I wanted to test out Tom's patch for GWT bimorphic inlining and Christian's patch for push invalidation of mutable call sites. The results were peculiar! First I tried a build with no patches: headius at headius-desktop:~/projects/jruby$ JAVA_HOME=~/hsx-hotspot/build/linux/jdk-linux-i586/ bin/jruby --server bench/bench_fib_recursive.rb 5 35 9227465 1.977000 0.000000 1.977000 ( 1.867000) 9227465 1.524000 0.000000 1.524000 ( 1.524000) 9227465 1.524000 0.000000 1.524000 ( 1.524000) 9227465 1.535000 0.000000 1.535000 ( 1.535000) 9227465 1.529000 0.000000 1.529000 ( 1.529000) Then with both Tom's and Christian's patches applied together: headius at headius-desktop:~/projects/jruby$ JAVA_HOME=~/hsx-hotspot/build/linux/jdk-linux-i586/ bin/jruby --server bench/bench_fib_recursive.rb 5 35 9227465 1.761000 0.000000 1.761000 ( 1.651000) 9227465 1.457000 0.000000 1.457000 ( 1.457000) 9227465 1.456000 0.000000 1.456000 ( 1.456000) 9227465 1.482000 0.000000 1.482000 ( 1.482000) 9227465 1.454000 0.000000 1.454000 ( 1.453000) I like any improvements, so naturally I was happy. But Christian expected a bigger improvement. So I tried with just Christian's patch: headius at headius-desktop:~/projects/jruby$ JAVA_HOME=~/hsx-hotspot/build/linux/jdk-linux-i586/ bin/jruby --server bench/bench_fib_recursive.rb 5 35 9227465 1.665000 0.000000 1.665000 ( 1.555000) 9227465 1.270000 0.000000 1.270000 ( 1.270000) 9227465 1.268000 0.000000 1.268000 ( 1.268000) 9227465 1.292000 0.000000 1.292000 ( 1.292000) 9227465 1.259000 0.000000 1.259000 ( 1.259000) Interesting, eh? - Charlie From christian.thalinger at oracle.com Tue Aug 9 01:30:29 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 9 Aug 2011 10:30:29 +0200 Subject: Push invalidation + bimorphic fix In-Reply-To: References: Message-ID: On Aug 9, 2011, at 7:58 AM, Charles Oliver Nutter wrote: > I have set up a local build against "hsx" hotspot compiler dev branch, > thanks to Christian. I'll try to write something up with the process > tomorrow. > > Anyway, I wanted to test out Tom's patch for GWT bimorphic inlining > and Christian's patch for push invalidation of mutable call sites. The > results were peculiar! > > First I tried a build with no patches: > > headius at headius-desktop:~/projects/jruby$ > JAVA_HOME=~/hsx-hotspot/build/linux/jdk-linux-i586/ bin/jruby --server > bench/bench_fib_recursive.rb 5 35 > 9227465 > 1.977000 0.000000 1.977000 ( 1.867000) > 9227465 > 1.524000 0.000000 1.524000 ( 1.524000) > 9227465 > 1.524000 0.000000 1.524000 ( 1.524000) > 9227465 > 1.535000 0.000000 1.535000 ( 1.535000) > 9227465 > 1.529000 0.000000 1.529000 ( 1.529000) > > Then with both Tom's and Christian's patches applied together: > > headius at headius-desktop:~/projects/jruby$ > JAVA_HOME=~/hsx-hotspot/build/linux/jdk-linux-i586/ bin/jruby --server > bench/bench_fib_recursive.rb 5 35 > 9227465 > 1.761000 0.000000 1.761000 ( 1.651000) > 9227465 > 1.457000 0.000000 1.457000 ( 1.457000) > 9227465 > 1.456000 0.000000 1.456000 ( 1.456000) > 9227465 > 1.482000 0.000000 1.482000 ( 1.482000) > 9227465 > 1.454000 0.000000 1.454000 ( 1.453000) > > I like any improvements, so naturally I was happy. But Christian > expected a bigger improvement. So I tried with just Christian's patch: > > headius at headius-desktop:~/projects/jruby$ > JAVA_HOME=~/hsx-hotspot/build/linux/jdk-linux-i586/ bin/jruby --server > bench/bench_fib_recursive.rb 5 35 > 9227465 > 1.665000 0.000000 1.665000 ( 1.555000) > 9227465 > 1.270000 0.000000 1.270000 ( 1.270000) > 9227465 > 1.268000 0.000000 1.268000 ( 1.268000) > 9227465 > 1.292000 0.000000 1.292000 ( 1.292000) > 9227465 > 1.259000 0.000000 1.259000 ( 1.259000) > > Interesting, eh? Hmm. I remember you were saying this is some old Ultra 20. The only box I could find which is similar is a 2.0GHz Quad-core Opteron running Solaris and I can't confirm your numbers. No patch: $ jruby --server -J-showversion bench/bench_fib_recursive.rb 5 35 java version "1.7.0" Java(TM) SE Runtime Environment (build 1.7.0-b147) Java HotSpot(TM) Server VM (build 21.0-b14-internal, mixed mode) 1.680000 0.000000 1.680000 ( 1.615000) 1.345000 0.000000 1.345000 ( 1.345000) 1.341000 0.000000 1.341000 ( 1.340000) 1.343000 0.000000 1.343000 ( 1.343000) 1.344000 0.000000 1.344000 ( 1.343000) My patch: $ jruby --server -J-showversion bench/bench_fib_recursive.rb 5 35 java version "1.7.0" Java(TM) SE Runtime Environment (build 1.7.0-b147) Java HotSpot(TM) Server VM (build 22.0-b01-internal, mixed mode) 1.446000 0.000000 1.446000 ( 1.386000) 1.143000 0.000000 1.143000 ( 1.143000) 1.143000 0.000000 1.143000 ( 1.143000) 1.146000 0.000000 1.146000 ( 1.146000) 1.143000 0.000000 1.143000 ( 1.143000) Both patches: $ jruby --server -J-showversion bench/bench_fib_recursive.rb 5 35 java version "1.7.0" Java(TM) SE Runtime Environment (build 1.7.0-b147) Java HotSpot(TM) Server VM (build 22.0-b01-internal, mixed mode) 1.335000 0.000000 1.335000 ( 1.268000) 1.122000 0.000000 1.122000 ( 1.122000) 1.122000 0.000000 1.122000 ( 1.123000) 1.122000 0.000000 1.122000 ( 1.122000) 1.121000 0.000000 1.121000 ( 1.121000) Did you mix up product vs. debug builds? -- Christian > > - Charlie > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From christian.thalinger at oracle.com Tue Aug 9 04:33:29 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 9 Aug 2011 13:33:29 +0200 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> Message-ID: <6908A407-5908-4B30-8540-E6474B96DBA9@oracle.com> On Aug 8, 2011, at 8:49 PM, Tom Rodriguez wrote: > dependencies.cpp: > > in check_call_site_target_value, the changes == NULL case should be checking that the call site hasn't changed. It should probably look more like this: > > klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, CallSiteDepChange* changes) { > assert(call_site->is_a(SystemDictionary::CallSite_klass()), "sanity"); > // Same CallSite object but different target? Check this specific call site > // if changes is non-NULL or validate all CallSites > if ((changes == NULL || (call_site == changes->call_site())) && > (java_lang_invoke_CallSite::target(call_site) != changes->method_handle())) { > return ctxk; // assertion failed > } > assert(java_lang_invoke_CallSite::target(call_site) == changes->method_handle(), "should still be valid"); > return NULL; // assertion still valid > } I see your point. But the code above is broken as changes->method_handle() will not work when changes == NULL. One of my first versions of this code also stored the MethodHandle target in the dependence stream which seems to be required when we want to validate all CallSites. Something like this: ! klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, oop method_handle, CallSiteDepChange* changes) { + assert(call_site ->is_a(SystemDictionary::CallSite_klass()), "sanity"); + assert(method_handle->is_a(SystemDictionary::MethodHandle_klass()), "sanity"); + if (changes == NULL) { + // Validate all CallSites + if (java_lang_invoke_CallSite::target(call_site) != method_handle) + return ctxk; // assertion failed + } else { + // Validate the given CallSite + if (call_site == changes->call_site() && java_lang_invoke_CallSite::target(call_site) != changes->method_handle()) { + assert(method_handle != changes->method_handle(), "must be"); + return ctxk; // assertion failed + } + } + assert(java_lang_invoke_CallSite::target(call_site) == method_handle, "should still be valid"); + return NULL; // assertion still valid + } > > The final assert is just a paranoia check that a call site hasn't changed without the dependencies being checked. > > interpreterRuntime.cpp: > > Please move the dependence check code into universe with the other dependence check code. Where it says: // %%% The Universe::flush_foo methods belong in CodeCache. :-) > Also add some comments explaining why it's doing what it's doing. Done. > > doCall.cpp: > > Can you put in a comment explaining that VolatileCallSite is never inlined. Done. > > Otherwise it looks good. webrev updated. -- Christian > > tom > > > On Aug 5, 2011, at 6:32 AM, Christian Thalinger wrote: > >> http://cr.openjdk.java.net/~twisti/7071653 >> >> 7071653: JSR 292: call site change notification should be pushed not pulled >> Reviewed-by: >> >> Currently every speculatively inlined method handle call site has a >> guard that compares the current target of the CallSite object to the >> inlined one. This per-invocation overhead can be removed if the >> notification is changed from pulled to pushed (i.e. deoptimization). >> >> I had to change the logic in TemplateTable::patch_bytecode to skip >> bytecode quickening for putfield instructions when the put_code >> written to the constant pool cache is zero. This is required so that >> every execution of a putfield to CallSite.target calls out to >> InterpreterRuntime::resolve_get_put to do the deoptimization of >> depending compiled methods. >> >> I also had to change the dependency machinery to understand other >> dependencies than class hierarchy ones. DepChange got the super-type >> of two new dependencies, KlassDepChange and CallSiteDepChange. >> >> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >> tests and vm.mlvm tests. >> >> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >> second with 7071653). Since the CallSite targets don't change during >> the runtime of this benchmark we can see the performance benefit of >> eliminating the guard: >> >> $ jruby --server bench/bench_fib_recursive.rb 5 35 >> 0.883000 0.000000 0.883000 ( 0.854000) >> 0.715000 0.000000 0.715000 ( 0.715000) >> 0.712000 0.000000 0.712000 ( 0.712000) >> 0.713000 0.000000 0.713000 ( 0.713000) >> 0.713000 0.000000 0.713000 ( 0.712000) >> >> $ jruby --server bench/bench_fib_recursive.rb 5 35 >> 0.772000 0.000000 0.772000 ( 0.742000) >> 0.624000 0.000000 0.624000 ( 0.624000) >> 0.621000 0.000000 0.621000 ( 0.621000) >> 0.622000 0.000000 0.622000 ( 0.622000) >> 0.622000 0.000000 0.622000 ( 0.621000) >> > From headius at headius.com Tue Aug 9 11:31:03 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Tue, 9 Aug 2011 14:31:03 -0400 Subject: Push invalidation + bimorphic fix In-Reply-To: References: Message-ID: On Tue, Aug 9, 2011 at 4:30 AM, Christian Thalinger wrote: > Hmm. ?I remember you were saying this is some old Ultra 20. ?The only box I could find which is similar is a 2.0GHz Quad-core Opteron running Solaris and I can't confirm your numbers. Hmm, interesting. I'll give it another shot today. There certainly could have been a goofy build for that first "both patches" attempt. - Charlie From headius at headius.com Tue Aug 9 12:02:33 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Tue, 9 Aug 2011 15:02:33 -0400 Subject: Push invalidation + bimorphic fix In-Reply-To: References: Message-ID: On Tue, Aug 9, 2011 at 2:31 PM, Charles Oliver Nutter wrote: > On Tue, Aug 9, 2011 at 4:30 AM, Christian Thalinger > wrote: >> Hmm. ?I remember you were saying this is some old Ultra 20. ?The only box I could find which is similar is a 2.0GHz Quad-core Opteron running Solaris and I can't confirm your numbers. > > Hmm, interesting. I'll give it another shot today. There certainly > could have been a goofy build for that first "both patches" attempt. Same result with a clean build... * hg revert --all * apply Tom's patch * fix GCC warning/error * apply Christian's patch * ignore the one bad merge (Christian said it's dead code) * make product ; make create_jdk Same numbers. Is there some other patch you have applied locally? What's the best way for me to investigate? - Charlie From tom.rodriguez at oracle.com Tue Aug 9 12:43:35 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 9 Aug 2011 12:43:35 -0700 Subject: Push invalidation + bimorphic fix In-Reply-To: References: Message-ID: <74FB8226-951A-4B3A-9C9F-2CCFF3822771@oracle.com> On Aug 9, 2011, at 12:02 PM, Charles Oliver Nutter wrote: > On Tue, Aug 9, 2011 at 2:31 PM, Charles Oliver Nutter > wrote: >> On Tue, Aug 9, 2011 at 4:30 AM, Christian Thalinger >> wrote: >>> Hmm. I remember you were saying this is some old Ultra 20. The only box I could find which is similar is a 2.0GHz Quad-core Opteron running Solaris and I can't confirm your numbers. >> >> Hmm, interesting. I'll give it another shot today. There certainly >> could have been a goofy build for that first "both patches" attempt. > > Same result with a clean build... > > * hg revert --all > * apply Tom's patch > * fix GCC warning/error > * apply Christian's patch > * ignore the one bad merge (Christian said it's dead code) > * make product ; make create_jdk > > Same numbers. Is there some other patch you have applied locally? > What's the best way for me to investigate? Can you collect PrintCompilation/PrintInlining output for each of these? One thing I've seen with the frequency fix is that it that sometimes jruby produces GWTs with the direction reversed from that I expect, so that the invokeFallback path ends up being considered the frequency path. This can cause us not to inline the fast paths in these cases. I think we're going to have to add per GWT path profiling sooner rather than later. tom > > - Charlie > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From christian.thalinger at oracle.com Wed Aug 10 07:26:21 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 10 Aug 2011 16:26:21 +0200 Subject: JRuby/Seph/PHP.reboot/... SwitchPoint usage In-Reply-To: References: Message-ID: <670A8DA1-59F0-4387-BC2F-7813CF215EC8@oracle.com> On Aug 8, 2011, at 8:21 PM, Christian Thalinger wrote: > > On Aug 8, 2011, at 6:39 PM, Charles Oliver Nutter wrote: > >> On Mon, Aug 8, 2011 at 9:51 AM, Christian Thalinger >> wrote: >>> Since I have the basic push-notification of CallSites I'm now looking into push-notification of SwitchPoints: >>> >>> 7071709: JSR 292: switchpoint invalidation should be pushed not pulled >>> >>> Basically it should be the same, just needs some additional love in the compiler. >>> >>> I looked into JRuby's usage of SwitchPoints and it seems it has something to do with constants. Is there an existing benchmark that would benefit from the SwitchPoint optimization? Seph also seems to use SwitchPoints, PHP.reboot does not (that's what grep tells me). >> >> Yes, currently SwitchPoint is only used for constant lookup, since >> constant modification invalidates globally. A good benchmark to use >> would be this one: >> >> bench/language/bench_const_lookup.rb >> >> Here's numbers with a recent openjdk-osx-build with and without >> invokedynamic enabled >> >> WITHOUT: >> >> 100k * 100 nested const get 0.059000 0.000000 >> 0.059000 ( 0.059000) >> 100k * 100 nested const get 0.059000 0.000000 >> 0.059000 ( 0.059000) >> 100k * 100 nested const get 0.058000 0.000000 >> 0.058000 ( 0.058000) >> 100k * 100 nested const get 0.059000 0.000000 >> 0.059000 ( 0.059000) >> 100k * 100 nested const get 0.057000 0.000000 >> 0.057000 ( 0.057000) >> 100k * 100 inherited const get 0.058000 0.000000 >> 0.058000 ( 0.058000) >> 100k * 100 inherited const get 0.059000 0.000000 >> 0.059000 ( 0.059000) >> 100k * 100 inherited const get 0.058000 0.000000 >> 0.058000 ( 0.058000) >> 100k * 100 inherited const get 0.058000 0.000000 >> 0.058000 ( 0.058000) >> 100k * 100 inherited const get 0.063000 0.000000 >> 0.063000 ( 0.064000) >> 100k * 100 both 0.060000 0.000000 >> 0.060000 ( 0.060000) >> 100k * 100 both 0.060000 0.000000 >> 0.060000 ( 0.060000) >> 100k * 100 both 0.059000 0.000000 >> 0.059000 ( 0.059000) >> 100k * 100 both 0.058000 0.000000 >> 0.058000 ( 0.058000) >> 100k * 100 both 0.059000 0.000000 >> 0.059000 ( 0.059000) >> >> WITH: (specify -Xinvokedynamic.constants=true to JRuby, or >> -Djruby.invokedynamic.constants=true to JVM) >> >> 100k * 100 nested const get 1.321000 0.000000 >> 1.321000 ( 1.321000) >> 100k * 100 nested const get 1.311000 0.000000 >> 1.311000 ( 1.311000) >> 100k * 100 nested const get 1.305000 0.000000 >> 1.305000 ( 1.305000) >> 100k * 100 nested const get 1.293000 0.000000 >> 1.293000 ( 1.294000) >> 100k * 100 nested const get 1.292000 0.000000 >> 1.292000 ( 1.293000) >> 100k * 100 inherited const get 1.295000 0.000000 >> 1.295000 ( 1.295000) >> 100k * 100 inherited const get 1.241000 0.000000 >> 1.241000 ( 1.241000) >> 100k * 100 inherited const get 1.241000 0.000000 >> 1.241000 ( 1.241000) >> 100k * 100 inherited const get 1.244000 0.000000 >> 1.244000 ( 1.244000) >> 100k * 100 inherited const get 1.236000 0.000000 >> 1.236000 ( 1.236000) >> 100k * 100 both 1.280000 0.000000 >> 1.280000 ( 1.280000) >> 100k * 100 both 1.236000 0.000000 >> 1.236000 ( 1.236000) >> 100k * 100 both 1.229000 0.000000 >> 1.229000 ( 1.230000) >> 100k * 100 both 1.236000 0.000000 >> 1.236000 ( 1.236000) >> 100k * 100 both 1.248000 0.000000 >> 1.248000 ( 1.248000) >> >> You can see there's some room for improvement :) The number should be >> faster with invokedynamic, since the SwitchPoint form has no active >> guard. > > That's perfect! Let's see what numbers I can come up with... Here are the numbers for JDK 7 b147, 7071307+7071653, and 7071307+7071653+7071709: 7071307: MethodHandle bimorphic inlining should consider the frequency 7071653: JSR 292: call site change notification should be pushed not pulled 7071709: JSR 292: switchpoint invalidation should be pushed not pulled JDK 7 b147: $ jruby --server -Xinvokedynamic.constants=true bench/language/bench_const_lookup.rb 1 user system total real 100k * 100 nested const get 1.301000 0.000000 1.301000 ( 1.176000) 100k * 100 nested const get 1.057000 0.000000 1.057000 ( 1.057000) 100k * 100 nested const get 1.052000 0.000000 1.052000 ( 1.052000) 100k * 100 nested const get 1.051000 0.000000 1.051000 ( 1.052000) 100k * 100 nested const get 1.052000 0.000000 1.052000 ( 1.052000) 100k * 100 inherited const get 1.188000 0.000000 1.188000 ( 1.188000) 100k * 100 inherited const get 1.126000 0.000000 1.126000 ( 1.126000) 100k * 100 inherited const get 1.125000 0.000000 1.125000 ( 1.125000) 100k * 100 inherited const get 1.126000 0.000000 1.126000 ( 1.126000) 100k * 100 inherited const get 1.130000 0.000000 1.130000 ( 1.130000) 100k * 100 both 1.214000 0.000000 1.214000 ( 1.214000) 100k * 100 both 1.134000 0.000000 1.134000 ( 1.134000) 100k * 100 both 1.134000 0.000000 1.134000 ( 1.134000) 100k * 100 both 1.135000 0.000000 1.135000 ( 1.135000) 100k * 100 both 1.135000 0.000000 1.135000 ( 1.135000) 7071307+7071653: $ jruby --server -Xinvokedynamic.constants=true bench/language/bench_const_lookup.rb 1 user system total real 100k * 100 nested const get 0.552000 0.000000 0.552000 ( 0.522000) 100k * 100 nested const get 0.325000 0.000000 0.325000 ( 0.325000) 100k * 100 nested const get 0.345000 0.000000 0.345000 ( 0.345000) 100k * 100 nested const get 0.339000 0.000000 0.339000 ( 0.338000) 100k * 100 nested const get 0.343000 0.000000 0.343000 ( 0.343000) 100k * 100 inherited const get 0.477000 0.000000 0.477000 ( 0.477000) 100k * 100 inherited const get 0.307000 0.000000 0.307000 ( 0.308000) 100k * 100 inherited const get 0.309000 0.000000 0.309000 ( 0.309000) 100k * 100 inherited const get 0.309000 0.000000 0.309000 ( 0.309000) 100k * 100 inherited const get 0.307000 0.000000 0.307000 ( 0.307000) 100k * 100 both 0.486000 0.000000 0.486000 ( 0.486000) 100k * 100 both 0.346000 0.000000 0.346000 ( 0.346000) 100k * 100 both 0.340000 0.000000 0.340000 ( 0.340000) 100k * 100 both 0.347000 0.000000 0.347000 ( 0.347000) 100k * 100 both 0.340000 0.000000 0.340000 ( 0.340000) 7071307+7071653+7071709: $ jruby --server -Xinvokedynamic.constants=true bench/language/bench_const_lookup.rb 1 user system total real 100k * 100 nested const get 0.468000 0.000000 0.468000 ( 0.438000) 100k * 100 nested const get 0.238000 0.000000 0.238000 ( 0.238000) 100k * 100 nested const get 0.251000 0.000000 0.251000 ( 0.251000) 100k * 100 nested const get 0.242000 0.000000 0.242000 ( 0.242000) 100k * 100 nested const get 0.254000 0.000000 0.254000 ( 0.254000) 100k * 100 inherited const get 0.403000 0.000000 0.403000 ( 0.403000) 100k * 100 inherited const get 0.260000 0.000000 0.260000 ( 0.260000) 100k * 100 inherited const get 0.255000 0.000000 0.255000 ( 0.255000) 100k * 100 inherited const get 0.252000 0.000000 0.252000 ( 0.252000) 100k * 100 inherited const get 0.254000 0.000000 0.254000 ( 0.254000) 100k * 100 both 0.384000 0.000000 0.384000 ( 0.384000) 100k * 100 both 0.227000 0.000000 0.227000 ( 0.227000) 100k * 100 both 0.221000 0.000000 0.221000 ( 0.221000) 100k * 100 both 0.233000 0.000000 0.233000 ( 0.233000) 100k * 100 both 0.238000 0.000000 0.238000 ( 0.238000) That's pretty nice but compared to non-indy it sucks: JDK 7 b147: $ jruby --server bench/language/bench_const_lookup.rb 1 user system total real 100k * 100 nested const get 0.271000 0.000000 0.271000 ( 0.242000) 100k * 100 nested const get 0.065000 0.000000 0.065000 ( 0.065000) 100k * 100 nested const get 0.052000 0.000000 0.052000 ( 0.052000) 100k * 100 nested const get 0.052000 0.000000 0.052000 ( 0.052000) 100k * 100 nested const get 0.051000 0.000000 0.051000 ( 0.051000) 100k * 100 inherited const get 0.224000 0.000000 0.224000 ( 0.224000) 100k * 100 inherited const get 0.053000 0.000000 0.053000 ( 0.053000) 100k * 100 inherited const get 0.053000 0.000000 0.053000 ( 0.053000) 100k * 100 inherited const get 0.054000 0.000000 0.054000 ( 0.054000) 100k * 100 inherited const get 0.054000 0.000000 0.054000 ( 0.054000) 100k * 100 both 0.230000 0.000000 0.230000 ( 0.230000) 100k * 100 both 0.058000 0.000000 0.058000 ( 0.058000) 100k * 100 both 0.059000 0.000000 0.059000 ( 0.059000) 100k * 100 both 0.058000 0.000000 0.058000 ( 0.058000) 100k * 100 both 0.059000 0.000000 0.059000 ( 0.059000) Some assembly inspection showed that the performance difference between indy vs. non-indy is mostly the out-of-line calls that fall off the threshold cliff (10-15 call sites). When we rewrite the benchmark to loop more often (10M times) but only do 50 constant lookups then it gets interesting: JDK 7 b147: $ jruby --server -Xinvokedynamic.constants=true bench_const_lookup.rb 1 user system total real 10M * 50 nested const get 37.918000 0.000000 37.918000 ( 37.844000) 10M * 50 nested const get 37.448000 0.000000 37.448000 ( 37.448000) 10M * 50 nested const get 36.845000 0.000000 36.845000 ( 36.845000) 10M * 50 nested const get 36.841000 0.000000 36.841000 ( 36.841000) 10M * 50 nested const get 36.864000 0.000000 36.864000 ( 36.864000) 10M * 50 inherited const get 37.907000 0.000000 37.907000 ( 37.907000) 10M * 50 inherited const get 37.117000 0.000000 37.117000 ( 37.117000) 10M * 50 inherited const get 37.399000 0.000000 37.399000 ( 37.399000) 10M * 50 inherited const get 37.555000 0.000000 37.555000 ( 37.555000) 10M * 50 inherited const get 37.640000 0.000000 37.640000 ( 37.640000) 10M * 50 both 37.946000 0.000000 37.946000 ( 37.946000) 10M * 50 both 37.928000 0.000000 37.928000 ( 37.928000) 10M * 50 both 38.140000 0.000000 38.140000 ( 38.140000) 10M * 50 both 38.186000 0.000000 38.186000 ( 38.186000) 10M * 50 both 37.956000 0.000000 37.956000 ( 37.956000) JDK 7 b147: $ jruby --server bench_const_lookup.rb 1 user system total real 10M * 50 nested const get 2.790000 0.000000 2.790000 ( 2.756000) 10M * 50 nested const get 2.576000 0.000000 2.576000 ( 2.576000) 10M * 50 nested const get 2.499000 0.000000 2.499000 ( 2.499000) 10M * 50 nested const get 2.501000 0.000000 2.501000 ( 2.501000) 10M * 50 nested const get 2.497000 0.000000 2.497000 ( 2.497000) 10M * 50 inherited const get 2.556000 0.000000 2.556000 ( 2.556000) 10M * 50 inherited const get 2.419000 0.000000 2.419000 ( 2.419000) 10M * 50 inherited const get 2.419000 0.000000 2.419000 ( 2.419000) 10M * 50 inherited const get 2.414000 0.000000 2.414000 ( 2.414000) 10M * 50 inherited const get 2.418000 0.000000 2.418000 ( 2.418000) 10M * 50 both 2.546000 0.000000 2.546000 ( 2.546000) 10M * 50 both 2.419000 0.000000 2.419000 ( 2.419000) 10M * 50 both 2.417000 0.000000 2.417000 ( 2.417000) 10M * 50 both 2.414000 0.000000 2.414000 ( 2.415000) 10M * 50 both 2.421000 0.000000 2.421000 ( 2.421000) 7071307+7071653+7071709: $ jruby --server -Xinvokedynamic.constants=true bench_const_lookup.rb 1 user system total real 10M * 50 nested const get 0.590000 0.000000 0.590000 ( 0.560000) 10M * 50 nested const get 0.466000 0.000000 0.466000 ( 0.466000) 10M * 50 nested const get 0.305000 0.000000 0.305000 ( 0.305000) 10M * 50 nested const get 0.310000 0.000000 0.310000 ( 0.310000) 10M * 50 nested const get 0.304000 0.000000 0.304000 ( 0.303000) 10M * 50 inherited const get 0.461000 0.000000 0.461000 ( 0.461000) 10M * 50 inherited const get 0.426000 0.000000 0.426000 ( 0.426000) 10M * 50 inherited const get 0.353000 0.000000 0.353000 ( 0.353000) 10M * 50 inherited const get 0.355000 0.000000 0.355000 ( 0.355000) 10M * 50 inherited const get 0.356000 0.000000 0.356000 ( 0.356000) 10M * 50 both 0.459000 0.000000 0.459000 ( 0.458000) 10M * 50 both 0.435000 0.000000 0.435000 ( 0.435000) 10M * 50 both 0.363000 0.000000 0.363000 ( 0.363000) 10M * 50 both 0.360000 0.000000 0.360000 ( 0.360000) 10M * 50 both 0.364000 0.000000 0.364000 ( 0.364000) Well that's really nice! The compiler is able to optimize away all constant lookups because all guards in between are eliminated and it can prove that the constant is not used. The method is basically empty except a little JRuby boilerplate. Now we need a real benchmark ;-) -- Christian > > -- Christian > >> >> - Charlie >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From headius at headius.com Wed Aug 10 09:21:53 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Wed, 10 Aug 2011 11:21:53 -0500 Subject: JRuby/Seph/PHP.reboot/... SwitchPoint usage In-Reply-To: <670A8DA1-59F0-4387-BC2F-7813CF215EC8@oracle.com> References: <670A8DA1-59F0-4387-BC2F-7813CF215EC8@oracle.com> Message-ID: On Wed, Aug 10, 2011 at 9:26 AM, Christian Thalinger wrote: > Here are the numbers for JDK 7 b147, 7071307+7071653, and 7071307+7071653+7071709: > > 7071307: MethodHandle bimorphic inlining should consider the frequency > 7071653: JSR 292: call site change notification should be pushed not pulled > 7071709: JSR 292: switchpoint invalidation should be pushed not pulled Yay! I have a bunch of areas I want switchpoints for :) > That's pretty nice but compared to non-indy it sucks: ... > Some assembly inspection showed that the performance difference between indy vs. non-indy is mostly the out-of-line calls that fall off the threshold cliff (10-15 call sites). ?When we rewrite the benchmark to loop more often (10M times) but only do 50 constant lookups then it gets interesting: Let me expand on this so I understand... Because there's a lot of code in the benchmarked methods, some of the calls don't make the inlining cut. Because of that (and because of poor degraded performance for indy + mh?), we take a bigger hit than we gain. I guess my concern is that the original (long) version may or may not inline, but obviously doesn't degrade as badly. Why does the degraded performance of the long form suffer so much? Ruby is a very terse language, often resulting in methods that represent a lot of code. I'm using invokedynamic to reduce the amount of code with the assumption that indy + mh will not count against my inlning budgets. In this case, it appears they do? > Well that's really nice! ?The compiler is able to optimize away all constant lookups because all guards in between are eliminated and it can prove that the constant is not used. ?The method is basically empty except a little JRuby boilerplate. ?Now we need a real benchmark ;-) Now we're talking! That's exactly what I hoped to see, and what I hope to see if I start using SwitchPoint for other invalidation mechanisms. This is exciting! :) - Charlie From headius at headius.com Wed Aug 10 09:55:40 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Wed, 10 Aug 2011 11:55:40 -0500 Subject: JRuby/Seph/PHP.reboot/... SwitchPoint usage In-Reply-To: References: <670A8DA1-59F0-4387-BC2F-7813CF215EC8@oracle.com> Message-ID: On Wed, Aug 10, 2011 at 11:21 AM, Charles Oliver Nutter wrote: > I guess my concern is that the original (long) version may or may not > inline, but obviously doesn't degrade as badly. Why does the degraded > performance of the long form suffer so much? Ruby is a very terse > language, often resulting in methods that represent a lot of code. I'm > using invokedynamic to reduce the amount of code with the assumption > that indy + mh will not count against my inlning budgets. In this > case, it appears they do? I will also play devil's advocate here: a method with 100 constant accesses is probably rare, and I would expect to see other syntax water down any such method. So I'm not saying I expect a method with this many constant accesses to optimize perfectly...I just don't want to see it degrade compared to non-invokedynamic. - Charlie From christian.thalinger at oracle.com Wed Aug 10 11:16:07 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 10 Aug 2011 20:16:07 +0200 Subject: JRuby/Seph/PHP.reboot/... SwitchPoint usage In-Reply-To: References: <670A8DA1-59F0-4387-BC2F-7813CF215EC8@oracle.com> Message-ID: On Aug 10, 2011, at 6:21 PM, Charles Oliver Nutter wrote: > On Wed, Aug 10, 2011 at 9:26 AM, Christian Thalinger > wrote: >> Here are the numbers for JDK 7 b147, 7071307+7071653, and 7071307+7071653+7071709: >> >> 7071307: MethodHandle bimorphic inlining should consider the frequency >> 7071653: JSR 292: call site change notification should be pushed not pulled >> 7071709: JSR 292: switchpoint invalidation should be pushed not pulled > > Yay! I have a bunch of areas I want switchpoints for :) > >> That's pretty nice but compared to non-indy it sucks: > ... >> Some assembly inspection showed that the performance difference between indy vs. non-indy is mostly the out-of-line calls that fall off the threshold cliff (10-15 call sites). When we rewrite the benchmark to loop more often (10M times) but only do 50 constant lookups then it gets interesting: > > Let me expand on this so I understand... > > Because there's a lot of code in the benchmarked methods, some of the > calls don't make the inlining cut. Because of that (and because of > poor degraded performance for indy + mh?), we take a bigger hit than > we gain. > > I guess my concern is that the original (long) version may or may not > inline, but obviously doesn't degrade as badly. Why does the degraded > performance of the long form suffer so much? Ruby is a very terse > language, often resulting in methods that represent a lot of code. I'm > using invokedynamic to reduce the amount of code with the assumption > that indy + mh will not count against my inlning budgets. In this > case, it appears they do? Indeed, that's the interesting (or worrying) part. I can't answer right now why it degrades that much, I have to look at that again more closely. Today I made it to work, tomorrow I'll fix the other problems ;-) -- Christian > >> Well that's really nice! The compiler is able to optimize away all constant lookups because all guards in between are eliminated and it can prove that the constant is not used. The method is basically empty except a little JRuby boilerplate. Now we need a real benchmark ;-) > > Now we're talking! That's exactly what I hoped to see, and what I hope > to see if I start using SwitchPoint for other invalidation mechanisms. > This is exciting! :) > > - Charlie > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From christian.thalinger at oracle.com Wed Aug 10 12:10:28 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 10 Aug 2011 21:10:28 +0200 Subject: Performance problem with invoke dynamic In-Reply-To: <4E319C93.3010306@gmail.com> References: <4E30AB57.1030702@gmail.com> <4E3189F4.4000201@gmail.com> <0B56009D-F840-4BE5-BB96-5A51E7995605@oracle.com> <4E319C93.3010306@gmail.com> Message-ID: <48E62F5A-237F-45E0-8FD3-3A89F68218CA@oracle.com> On Jul 28, 2011, at 7:29 PM, Ola Bini wrote: > On 2011-07-28 10.11, Christian Thalinger wrote: >> >> On Jul 28, 2011, at 6:10 PM, Ola Bini wrote: >> >>>> The bad performance sounds like something is not inlined at all. >>>> How are you invoking valueMH? Via invokedynamic or a direct MH >>>> call? >>> >>> valueMH is combined with an exactInvoker and then chained together >>> with another method handle that calls a method on the result of it >>> that returns another methodhandle that is set as the target for the >>> call site. >> >> Hard to follow :-) >> >>> >>> Exactly the same thing that was done in the emitted bytecode, >>> actually. >> >> Do you have a test case to run? >> > > Yes, it's a specific benchmark run (bench/bench_read_lexical.sp). The > current master of Seph (12e906d6cac94608a1da565fe748f87421210c97) has > the method handle implementation. The > c92f2246c09cef9eb9018128e19bfa25593afd2f checkin is the last with the > bytecode based version. Sorry, I forgot to look into that. $ bin/seph --version bench/bench_read_lexical.sp Seph 0.0.1-SNAPSHOT [2011-08-10 07:44 -- c92f224] variable reading, flat short lex 1.803377237 variable reading, flat short lex 1.716815071 variable reading, flat short lex 1.720287124 variable reading, flat short lex 1.721601836 variable reading, flat short lex 1.718097445 $ bin/seph --version bench/bench_read_lexical.sp Seph 0.0.1-SNAPSHOT [2011-08-10 07:46 -- 12e906d] variable reading, flat short lex 7.131910330 variable reading, flat short lex 6.479506665 variable reading, flat short lex 6.427145377 variable reading, flat short lex 6.472033742 variable reading, flat short lex 6.481569097 Current master is even slower: $ bin/seph --version bench/bench_read_lexical.sp Seph 0.0.1-SNAPSHOT [2011-08-10 07:50 -- 4520464] variable reading, flat short lex 8.043511168 variable reading, flat short lex 7.227023135 variable reading, flat short lex 7.167401210 variable reading, flat short lex 7.153554078 variable reading, flat short lex 7.157729388 Looking at the compiled methods and the inlinees I found this one (I suppose this the body for flat_short_scope, correct?): 1490 167 seph$gen$abstraction$1$flat_short_scope::activationFor (9 bytes) @ 3 java.lang.invoke.MethodHandle::invokeExact (36 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (5 bytes) inline (hot) @ 1 sun.invoke.util.ValueConversions::identity (2 bytes) inline (hot) @ 12 java.lang.invoke.MethodHandle::invokeExact (12 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandleImpl::selectAlternative (10 bytes) inline (hot) @ 29 java.lang.invoke.MethodHandle::invokeExact (38 bytes) inline (hot) @ 29 java.lang.invoke.MethodHandle::invokeExact (11 bytes) inline (hot) @ 4 seph.lang.ActivationHelpers::arityErrorMH (45 bytes) too big @ 5 java.lang.invoke.MethodHandle::invokeExact (6 bytes) inline (hot) @ 2 seph.lang.compiler.SephCallSite::int_eq (11 bytes) inline (hot) @ 14 java.lang.invoke.MethodHandle::invokeExact (12 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandleImpl::selectAlternative (10 bytes) inline (hot) @ 31 java.lang.invoke.MethodHandle::invokeExact (38 bytes) inline (hot) @ 31 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 3 java.lang.invoke.MethodHandle::bindTo (58 bytes) already compiled into a medium method @ 5 java.lang.invoke.MethodHandle::invokeExact (6 bytes) inline (hot) @ 2 seph.lang.compiler.SephCallSite::int_eq (11 bytes) inline (hot) @ 14 java.lang.invoke.MethodHandle::invokeExact (12 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandleImpl::selectAlternative (10 bytes) inline (hot) @ 31 java.lang.invoke.MethodHandle::invokeExact (11 bytes) inline (hot) @ 31 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 3 java.lang.invoke.MethodHandle::bindTo (58 bytes) already compiled into a medium method @ 4 seph.lang.ActivationHelpers::arityErrorMH (45 bytes) inline (hot) The problem is: java.lang.invoke.MethodHandle::bindTo That one calls java.lang.invoke.BoundMethodHandle:: (19 bytes) already compiled into a big method which calls java.lang.invoke.MethodHandleNatives::init (0 bytes) native method and there we have our performance bugger. -- Christian > > Cheers > -- > Ola Bini (http://olabini.com) > Ioke - JRuby - ThoughtWorks > > "Yields falsehood when quined" yields falsehood when quined. > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From tom.rodriguez at oracle.com Wed Aug 10 12:16:14 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 10 Aug 2011 12:16:14 -0700 Subject: JRuby/Seph/PHP.reboot/... SwitchPoint usage In-Reply-To: References: <670A8DA1-59F0-4387-BC2F-7813CF215EC8@oracle.com> Message-ID: >> I guess my concern is that the original (long) version may or may not >> inline, but obviously doesn't degrade as badly. Why does the degraded >> performance of the long form suffer so much? Ruby is a very terse >> language, often resulting in methods that represent a lot of code. I'm >> using invokedynamic to reduce the amount of code with the assumption >> that indy + mh will not count against my inlning budgets. In this >> case, it appears they do? > > Indeed, that's the interesting (or worrying) part. I can't answer right now why it degrades that much, I have to look at that again more closely. Today I made it to work, tomorrow I'll fix the other problems ;-) One thing I we should fix is that the bytecodes of the method handle adapter shouldn't count against the DesiredMethodLimit, in the same way they don't count against the inline depth. tom > > -- Christian > >> >>> Well that's really nice! The compiler is able to optimize away all constant lookups because all guards in between are eliminated and it can prove that the constant is not used. The method is basically empty except a little JRuby boilerplate. Now we need a real benchmark ;-) >> >> Now we're talking! That's exactly what I hoped to see, and what I hope >> to see if I start using SwitchPoint for other invalidation mechanisms. >> This is exciting! :) >> >> - Charlie >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From tom.rodriguez at oracle.com Wed Aug 10 12:28:07 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 10 Aug 2011 12:28:07 -0700 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: <6908A407-5908-4B30-8540-E6474B96DBA9@oracle.com> References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> <6908A407-5908-4B30-8540-E6474B96DBA9@oracle.com> Message-ID: <15BDEB85-0323-4026-A249-D979D88E863B@oracle.com> On Aug 9, 2011, at 4:33 AM, Christian Thalinger wrote: > > On Aug 8, 2011, at 8:49 PM, Tom Rodriguez wrote: > >> dependencies.cpp: >> >> in check_call_site_target_value, the changes == NULL case should be checking that the call site hasn't changed. It should probably look more like this: >> >> klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, CallSiteDepChange* changes) { >> assert(call_site->is_a(SystemDictionary::CallSite_klass()), "sanity"); >> // Same CallSite object but different target? Check this specific call site >> // if changes is non-NULL or validate all CallSites >> if ((changes == NULL || (call_site == changes->call_site())) && >> (java_lang_invoke_CallSite::target(call_site) != changes->method_handle())) { >> return ctxk; // assertion failed >> } >> assert(java_lang_invoke_CallSite::target(call_site) == changes->method_handle(), "should still be valid"); >> return NULL; // assertion still valid >> } > > I see your point. But the code above is broken as changes->method_handle() will not work when changes == NULL. One of my first versions of this code also stored the MethodHandle target in the dependence stream which seems to be required when we want to validate all CallSites. Something like this Yes that right. The new webrev looks good. tom > > ! klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, oop method_handle, CallSiteDepChange* changes) { > + assert(call_site ->is_a(SystemDictionary::CallSite_klass()), "sanity"); > + assert(method_handle->is_a(SystemDictionary::MethodHandle_klass()), "sanity"); > + if (changes == NULL) { > + // Validate all CallSites > + if (java_lang_invoke_CallSite::target(call_site) != method_handle) > + return ctxk; // assertion failed > + } else { > + // Validate the given CallSite > + if (call_site == changes->call_site() && java_lang_invoke_CallSite::target(call_site) != changes->method_handle()) { > + assert(method_handle != changes->method_handle(), "must be"); > + return ctxk; // assertion failed > + } > + } > + assert(java_lang_invoke_CallSite::target(call_site) == method_handle, "should still be valid"); > + return NULL; // assertion still valid > + } > >> >> The final assert is just a paranoia check that a call site hasn't changed without the dependencies being checked. >> >> interpreterRuntime.cpp: >> >> Please move the dependence check code into universe with the other dependence check code. > > Where it says: > > // %%% The Universe::flush_foo methods belong in CodeCache. > > :-) > >> Also add some comments explaining why it's doing what it's doing. > > Done. > >> >> doCall.cpp: >> >> Can you put in a comment explaining that VolatileCallSite is never inlined. > > Done. > >> >> Otherwise it looks good. > > webrev updated. > > -- Christian > >> >> tom >> >> >> On Aug 5, 2011, at 6:32 AM, Christian Thalinger wrote: >> >>> http://cr.openjdk.java.net/~twisti/7071653 >>> >>> 7071653: JSR 292: call site change notification should be pushed not pulled >>> Reviewed-by: >>> >>> Currently every speculatively inlined method handle call site has a >>> guard that compares the current target of the CallSite object to the >>> inlined one. This per-invocation overhead can be removed if the >>> notification is changed from pulled to pushed (i.e. deoptimization). >>> >>> I had to change the logic in TemplateTable::patch_bytecode to skip >>> bytecode quickening for putfield instructions when the put_code >>> written to the constant pool cache is zero. This is required so that >>> every execution of a putfield to CallSite.target calls out to >>> InterpreterRuntime::resolve_get_put to do the deoptimization of >>> depending compiled methods. >>> >>> I also had to change the dependency machinery to understand other >>> dependencies than class hierarchy ones. DepChange got the super-type >>> of two new dependencies, KlassDepChange and CallSiteDepChange. >>> >>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >>> tests and vm.mlvm tests. >>> >>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >>> second with 7071653). Since the CallSite targets don't change during >>> the runtime of this benchmark we can see the performance benefit of >>> eliminating the guard: >>> >>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>> 0.883000 0.000000 0.883000 ( 0.854000) >>> 0.715000 0.000000 0.715000 ( 0.715000) >>> 0.712000 0.000000 0.712000 ( 0.712000) >>> 0.713000 0.000000 0.713000 ( 0.713000) >>> 0.713000 0.000000 0.713000 ( 0.712000) >>> >>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>> 0.772000 0.000000 0.772000 ( 0.742000) >>> 0.624000 0.000000 0.624000 ( 0.624000) >>> 0.621000 0.000000 0.621000 ( 0.621000) >>> 0.622000 0.000000 0.622000 ( 0.622000) >>> 0.622000 0.000000 0.622000 ( 0.621000) >>> >> > From christian.thalinger at oracle.com Wed Aug 10 12:34:27 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 10 Aug 2011 21:34:27 +0200 Subject: Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled In-Reply-To: <15BDEB85-0323-4026-A249-D979D88E863B@oracle.com> References: <34EE7AEC-FD11-4526-B49D-DCEA296E767A@oracle.com> <6908A407-5908-4B30-8540-E6474B96DBA9@oracle.com> <15BDEB85-0323-4026-A249-D979D88E863B@oracle.com> Message-ID: <27ED8701-5353-4929-B9F1-D5A4F7A361B4@oracle.com> On Aug 10, 2011, at 9:28 PM, Tom Rodriguez wrote: > > On Aug 9, 2011, at 4:33 AM, Christian Thalinger wrote: > >> >> On Aug 8, 2011, at 8:49 PM, Tom Rodriguez wrote: >> >>> dependencies.cpp: >>> >>> in check_call_site_target_value, the changes == NULL case should be checking that the call site hasn't changed. It should probably look more like this: >>> >>> klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, CallSiteDepChange* changes) { >>> assert(call_site->is_a(SystemDictionary::CallSite_klass()), "sanity"); >>> // Same CallSite object but different target? Check this specific call site >>> // if changes is non-NULL or validate all CallSites >>> if ((changes == NULL || (call_site == changes->call_site())) && >>> (java_lang_invoke_CallSite::target(call_site) != changes->method_handle())) { >>> return ctxk; // assertion failed >>> } >>> assert(java_lang_invoke_CallSite::target(call_site) == changes->method_handle(), "should still be valid"); >>> return NULL; // assertion still valid >>> } >> >> I see your point. But the code above is broken as changes->method_handle() will not work when changes == NULL. One of my first versions of this code also stored the MethodHandle target in the dependence stream which seems to be required when we want to validate all CallSites. Something like this > > Yes that right. The new webrev looks good. Thank you, Tom. -- Christian > > tom > > >> >> ! klassOop Dependencies::check_call_site_target_value(klassOop ctxk, oop call_site, oop method_handle, CallSiteDepChange* changes) { >> + assert(call_site ->is_a(SystemDictionary::CallSite_klass()), "sanity"); >> + assert(method_handle->is_a(SystemDictionary::MethodHandle_klass()), "sanity"); >> + if (changes == NULL) { >> + // Validate all CallSites >> + if (java_lang_invoke_CallSite::target(call_site) != method_handle) >> + return ctxk; // assertion failed >> + } else { >> + // Validate the given CallSite >> + if (call_site == changes->call_site() && java_lang_invoke_CallSite::target(call_site) != changes->method_handle()) { >> + assert(method_handle != changes->method_handle(), "must be"); >> + return ctxk; // assertion failed >> + } >> + } >> + assert(java_lang_invoke_CallSite::target(call_site) == method_handle, "should still be valid"); >> + return NULL; // assertion still valid >> + } >> >>> >>> The final assert is just a paranoia check that a call site hasn't changed without the dependencies being checked. >>> >>> interpreterRuntime.cpp: >>> >>> Please move the dependence check code into universe with the other dependence check code. >> >> Where it says: >> >> // %%% The Universe::flush_foo methods belong in CodeCache. >> >> :-) >> >>> Also add some comments explaining why it's doing what it's doing. >> >> Done. >> >>> >>> doCall.cpp: >>> >>> Can you put in a comment explaining that VolatileCallSite is never inlined. >> >> Done. >> >>> >>> Otherwise it looks good. >> >> webrev updated. >> >> -- Christian >> >>> >>> tom >>> >>> >>> On Aug 5, 2011, at 6:32 AM, Christian Thalinger wrote: >>> >>>> http://cr.openjdk.java.net/~twisti/7071653 >>>> >>>> 7071653: JSR 292: call site change notification should be pushed not pulled >>>> Reviewed-by: >>>> >>>> Currently every speculatively inlined method handle call site has a >>>> guard that compares the current target of the CallSite object to the >>>> inlined one. This per-invocation overhead can be removed if the >>>> notification is changed from pulled to pushed (i.e. deoptimization). >>>> >>>> I had to change the logic in TemplateTable::patch_bytecode to skip >>>> bytecode quickening for putfield instructions when the put_code >>>> written to the constant pool cache is zero. This is required so that >>>> every execution of a putfield to CallSite.target calls out to >>>> InterpreterRuntime::resolve_get_put to do the deoptimization of >>>> depending compiled methods. >>>> >>>> I also had to change the dependency machinery to understand other >>>> dependencies than class hierarchy ones. DepChange got the super-type >>>> of two new dependencies, KlassDepChange and CallSiteDepChange. >>>> >>>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >>>> tests and vm.mlvm tests. >>>> >>>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >>>> second with 7071653). Since the CallSite targets don't change during >>>> the runtime of this benchmark we can see the performance benefit of >>>> eliminating the guard: >>>> >>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>> 0.883000 0.000000 0.883000 ( 0.854000) >>>> 0.715000 0.000000 0.715000 ( 0.715000) >>>> 0.712000 0.000000 0.712000 ( 0.712000) >>>> 0.713000 0.000000 0.713000 ( 0.713000) >>>> 0.713000 0.000000 0.713000 ( 0.712000) >>>> >>>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>>> 0.772000 0.000000 0.772000 ( 0.742000) >>>> 0.624000 0.000000 0.624000 ( 0.624000) >>>> 0.621000 0.000000 0.621000 ( 0.621000) >>>> 0.622000 0.000000 0.622000 ( 0.622000) >>>> 0.622000 0.000000 0.622000 ( 0.621000) >>>> >>> >> > From headius at headius.com Wed Aug 10 20:31:02 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Wed, 10 Aug 2011 22:31:02 -0500 Subject: JRuby/Seph/PHP.reboot/... SwitchPoint usage In-Reply-To: References: <670A8DA1-59F0-4387-BC2F-7813CF215EC8@oracle.com> Message-ID: On Wed, Aug 10, 2011 at 2:16 PM, Tom Rodriguez wrote: > One thing I we should fix is that the bytecodes of the method handle adapter shouldn't count against the DesiredMethodLimit, in the same way they don't count against the inline depth. Yeah, I think I'm hitting that often now that I'm expanding invokedynamic into other areas and also starting to run larger benchmarks. Small benchmarks look fine, and then I expand the size of them and performance drops off dramatically. - Charlie From headius at headius.com Wed Aug 10 21:02:28 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Wed, 10 Aug 2011 23:02:28 -0500 Subject: JRuby invokedynamic updates Message-ID: Hello everyone! I have added a few new items to JRuby relating to invokedynamic. Let's dive in, shall we? 1. invokedynamic-based dispatch for literal binary operators with RHS a literal fixnum or float This is actually on by default because it didn't seem to hurt perf (much?) on JDK7, and it should be faster with recent patches to Hotspot. Before this change, JRuby was still using custom call sites for math, boolean, and bitwise operators that had a fixnum or float as the RHS argument. The new logic propagates the literal value and operator name to the bootstrap (reducing bytecode size). It has appropriate guards for when the LHS is not a fixnum or float or when fixnum or float classes have been modified (this should be SwitchPoint in the future), with fallback using a cached fixnum or float object to do an inline-cached call. Initial numbers showed it to improve fib(35) from 1.25s to 1.15s on my old linux machine (using hsx/hotspot-comp with recent patches applied). Invokedynamic-based "fast ops" are enabled by default, but the property to enable/disable is jruby.invokedynamic.fastops 2. SwitchPoint-based invalidation for class modification After seeing how fast constants are with SwitchPoint fixed, I've gone ahead and made a first pass at using SwitchPoint for invalidation due to class modification. Each class now holds an Invalidator instance. Without the new logic enabled, Invalidator just flips the generation int used for guards in inline caches and invokedynamic calls before. With the logic, an Invalidator is used that works using SwitchPoints. The guard in a dispatch's GWT is then reduced to a type check (currently cached_metaclass == object.getMetaclass), with the switch point wrapping GWT. It appears to work well...I'm surprised how quickly I got it wired up. This is not enabled by default. It is definitely faster for small benchmarks like fib, reducing the time to the 1.05s range on that same machine. However if the benchmark is just slightly larger, performance *tanks*. The property to enable switchpoint-based invocation is jruby.invokedynamic.invocation.switchpoint=true. 3. A more complex "fib" benchmark that stresses invokedynamic more I've added bench/bench_fib_complex.rb. This runs the original fib along with three variations: * One that uses constants for the literals 1 and 2 in the code * One that dispatches to other Ruby methods for the <, -, and + calls * One that does both Performance is perhaps most easily explained by showing the numbers: *** no switchpoint use normal fib 9227465 1.177000 0.000000 1.177000 ( 1.177000) fib with constants 9227465 3.750000 0.000000 3.750000 ( 3.750000) fib with additional calls 9227465 1.664000 0.000000 1.664000 ( 1.664000) fib with constants and additional calls 9227465 3.739000 0.000000 3.739000 ( 3.740000) Ok, so we have a baseline. A few notes: * For whatever reason, constants have quite an impact on performance here. That could be because it uses the boxed logic for all binops, or it could simply be the overhead of the old constant cache logic. * The method that adds additional calls for the four calls to <, -, and + degrades by a bit less than 50%. That's not great, but it's not especially bad either. *** switchpoints for constant cache normal fib 9227465 1.170000 0.000000 1.170000 ( 1.170000) fib with constants 9227465 2.790000 0.000000 2.790000 ( 2.790000) fib with additional calls 9227465 1.658000 0.000000 1.658000 ( 1.658000) fib with constants and additional calls 9227465 3.181000 0.000000 3.181000 ( 3.180000) Obvious improvement here from the SwitchPoint-based constants, but not as much as I'd like to see. Christian: This would be a good benchmark for you to use to test non-elidable constant access...it obviously still degrades a lot, and it would be good to know if that's invokedynamic stuff or just JRuby. I will say that the boxed math operators do more logic to determine what type the argument is, and because we're using constants instead of literals that's the logic we use. *** switchpoints for class-modification call site invalidation normal fib 9227465 1.128000 0.000000 1.128000 ( 1.128000) fib with constants 9227465 3.436000 0.000000 3.436000 ( 3.436000) fib with additional calls 9227465 5.724000 0.000000 5.724000 ( 5.725000) fib with constants and additional calls 9227465 12.419000 0.000000 12.419000 ( 12.419000) normal fib Woah nelly! You can see there's a small improvement to the normal case, showing that the switchpoint-based invalidation is at least working and not hurting things there. Also the version that just adds constants for literals is about the same as it was in the first run. However the others are *terrible*. There's gotta be some failure to inline causing these terrible numbers. *** switchpoints for both constants and method invalidation normal fib 9227465 1.070000 0.000000 1.070000 ( 1.071000) fib with constants 9227465 2.590000 0.000000 2.590000 ( 2.590000) fib with additional calls 9227465 5.667000 0.000000 5.667000 ( 5.667000) fib with constants and additional calls 9227465 11.863000 0.000000 11.863000 ( 11.863000) Final numbers for the bad cases are about the same, and we have the improvement for the first two cases. So for the larger cases, it seems like things fall off very quickly. I hope this is simply because MH chains' bytecode are still counting against jitting thresholds, because fib is *not* a very big method. I also hope that we can do better for degraded cases. - Charlie From forax at univ-mlv.fr Thu Aug 11 11:21:16 2011 From: forax at univ-mlv.fr (=?UTF-8?B?UsOpbWkgRm9yYXg=?=) Date: Thu, 11 Aug 2011 20:21:16 +0200 Subject: Fwd: [asm] toward ASM 4.0 In-Reply-To: <4E43F910.50104@free.fr> References: <4E43F910.50104@free.fr> Message-ID: <4E441D9C.5060703@univ-mlv.fr> A message from our glorious ASM leader. R?mi -------- Original Message -------- Subject: [asm] toward ASM 4.0 Date: Thu, 11 Aug 2011 17:45:20 +0200 From: Eric Bruneton To: asm at ow2.org Hi, we are about to release ASM 4.0. Compared to ASM 4.0 RC1, we mostly renamed the package from org.objectweb.asm to org.ow2.asm, and we renamed two classes (MethodHandle to Handle, and MethodType to Type). If you are already using ASM 4.0 RC1, we invite you to test the code in SVN HEAD before we release it as "ASM 4.0" in a few weeks (so that we can fix remaining bugs, if any). Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110811/2faa2f21/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: message-footer.txt Url: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110811/2faa2f21/attachment.txt From headius at headius.com Thu Aug 11 14:07:03 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Thu, 11 Aug 2011 16:07:03 -0500 Subject: JRuby invokedynamic updates In-Reply-To: References: Message-ID: On Wed, Aug 10, 2011 at 11:02 PM, Charles Oliver Nutter wrote: > I've added bench/bench_fib_complex.rb. This runs the original fib > along with three variations: > > * One that uses constants for the literals 1 and 2 in the code > * One that dispatches to other Ruby methods for the <, -, and + calls > * One that does both > > Performance is perhaps most easily explained by showing the numbers: After talking with Christian I realized that the constant-based versions were not getting fully bound with invokedynamic because they have pre/post logic (specifically, they need to update some thread-local frame state in JRuby). Ignore those results for the moment. There does still seem to be a problem with switchpoints though. I stripped out all but the "additional calls" version and compared switchpoint invalidation with normal. headius at headius-desktop:~/projects/jruby$ JAVA_HOME=~/hsx-hotspot/build/linux/jdk-linux-i586/ bin/jruby --server bench/bench_fib_complex.rb 5 35 fib with additional calls 9227465 2.180000 0.000000 2.180000 ( 2.070000) fib with additional calls 9227465 1.829000 0.000000 1.829000 ( 1.829000) fib with additional calls 9227465 1.870000 0.000000 1.870000 ( 1.870000) fib with additional calls 9227465 1.642000 0.000000 1.642000 ( 1.642000) fib with additional calls 9227465 1.639000 0.000000 1.639000 ( 1.639000) headius at headius-desktop:~/projects/jruby$ JAVA_HOME=~/hsx-hotspot/build/linux/jdk-linux-i586/ bin/jruby --server -Xinvokedynamic.invocation.switchpoint=true bench/bench_fib_complex.rb 5 35 fib with additional calls 9227465 5.951000 0.000000 5.951000 ( 5.841000) fib with additional calls 9227465 5.538000 0.000000 5.538000 ( 5.538000) fib with additional calls 9227465 5.531000 0.000000 5.531000 ( 5.531000) fib with additional calls 9227465 5.503000 0.000000 5.503000 ( 5.503000) fib with additional calls 9227465 5.505000 0.000000 5.505000 ( 5.505000) My quick look through inlining shows that "plus", "minus", "lt" all inline into fib_ruby3 ok, as do the binops they call. fib_ruby3 appears to recursively inline one level. I did not see any obvious failures in inlining, but there's something not right here. - Charlie From mroos at roos.com Thu Aug 11 14:39:29 2011 From: mroos at roos.com (Mark Roos) Date: Thu, 11 Aug 2011 14:39:29 -0700 Subject: Fwd: [asm] toward ASM 4.0 In-Reply-To: <4E441D9C.5060703@univ-mlv.fr> References: <4E43F910.50104@free.fr> <4E441D9C.5060703@univ-mlv.fr> Message-ID: R?mi, A message from our glorious ASM leader. I seem to recall some discussion on the mentioned class names ( MethodType and MethodHandle). I assume this means that asm no longer has these types? And thus I no longer need the full class names for the java.lang versions? thanks mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110811/3453efb6/attachment.html From mroos at roos.com Thu Aug 11 14:39:29 2011 From: mroos at roos.com (Mark Roos) Date: Thu, 11 Aug 2011 14:39:29 -0700 Subject: GWT fast vs slow path Message-ID: I was wondering what the expected differences are in the usage of these two paths. The names imply something but I find it confusing. In my use case ( in line cache) one of the paths is directly into a method handle to invoke and the other goes to either another GWT or the lookup. Both should be fast up to the lookup which could be slower. I would also like to see the optimizer reduce chains of GWTs or at least reorder them. So is there a difference I should be aware of? Right now I put the chain of GWTs into the slow path. thanks mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110811/6759e2c3/attachment.html From mroos at roos.com Thu Aug 11 14:39:29 2011 From: mroos at roos.com (Mark Roos) Date: Thu, 11 Aug 2011 14:39:29 -0700 Subject: Thread safety and nethod handle chains Message-ID: As I start to look at the multithreading aspects of the Smalltalk implementation I can see a situation where two threads could be running down the GWT lookup path both looking for the same implementation. I was thinking of making the lookup synchronized but then what if two threads are in the GWT pipeline looking for the same implementation. One gets to the lookup first and finds the method. Now is it possible that the second thread had entered the call site but is suspended between the site and the lookup? In that case the lookup will have been done but the second thread has passed the GWT for it (its added to the start). I am thinking that I may just accept the fact that there could be duplicate GWTs in the chain. Or is there a better approach? Thanks mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110811/09437f9d/attachment.html From forax at univ-mlv.fr Thu Aug 11 15:01:49 2011 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Fri, 12 Aug 2011 00:01:49 +0200 Subject: Thread safety and nethod handle chains In-Reply-To: References: Message-ID: <4E44514D.9020905@univ-mlv.fr> On 08/11/2011 11:39 PM, Mark Roos wrote: > As I start to look at the multithreading aspects of the Smalltalk > implementation I can see a situation where > two threads could be running down the GWT lookup path both looking for > the same implementation. I was > thinking of making the lookup synchronized but then what if two > threads are in the GWT pipeline looking for the > same implementation. One gets to the lookup first and finds the > method. Now is it possible that the second thread > had entered the call site but is suspended between the site and the > lookup? In that case the lookup will have been > done but the second thread has passed the GWT for it (its added to the > start). > > I am thinking that I may just accept the fact that there could be > duplicate GWTs in the chain. Or is there a better > approach? Yes, you can have duplicate GWTs and I'm also interested if someone find a better approach :) > > Thanks > > mark cheers, R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110812/b6080b0a/attachment.html From forax at univ-mlv.fr Thu Aug 11 15:10:35 2011 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Fri, 12 Aug 2011 00:10:35 +0200 Subject: GWT fast vs slow path In-Reply-To: References: Message-ID: <4E44535B.4020101@univ-mlv.fr> On 08/11/2011 11:39 PM, Mark Roos wrote: > I was wondering what the expected differences are in the usage of > these two paths. The names imply something > but I find it confusing. > > In my use case ( in line cache) one of the paths is directly into a > method handle to invoke and the other goes to > either another GWT or the lookup. Both should be fast up to the > lookup which could be slower. I would also like to > see the optimizer reduce chains of GWTs or at least reorder them. > > So is there a difference I should be aware of? Right now I put the > chain of GWTs into the slow path. There is no predefined slow path/fast path in GWT, target path and the fallback path are both fast/slow, only the usage, the number of time one path is taken is/will be used by the JIT in order to drive the optimization. About reordering the chain of GWT, what do you want exactly ? > > thanks > > mark cheers, R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110812/e9c7e1f6/attachment.html From mroos at roos.com Thu Aug 11 17:39:41 2011 From: mroos at roos.com (Mark Roos) Date: Thu, 11 Aug 2011 17:39:41 -0700 Subject: GWT fast vs slow path In-Reply-To: <4E44535B.4020101@univ-mlv.fr> References: <4E44535B.4020101@univ-mlv.fr> Message-ID: R?mi asked About reordering the chain of GWT, what do you want exactly ? Well until the chain gets long enough to change to some form of table lookup it would be nice if the GWTs which have the most 'fast' path test successes were moved to the front of the chain. In this case the 'fast' path being the direct invoke not the chained GWTs. I think that perhaps it would always be the case that the few most often used GWTs were always a test and jump and the rest went into a table. I am not sure how to make this decision or how often. regards mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110811/27770776/attachment.html From christian.thalinger at oracle.com Fri Aug 12 01:17:23 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 12 Aug 2011 10:17:23 +0200 Subject: Performance problem with invoke dynamic In-Reply-To: <48E62F5A-237F-45E0-8FD3-3A89F68218CA@oracle.com> References: <4E30AB57.1030702@gmail.com> <4E3189F4.4000201@gmail.com> <0B56009D-F840-4BE5-BB96-5A51E7995605@oracle.com> <4E319C93.3010306@gmail.com> <48E62F5A-237F-45E0-8FD3-3A89F68218CA@oracle.com> Message-ID: <617346F7-990D-42FF-922C-98C654E62C5D@oracle.com> On Aug 10, 2011, at 9:10 PM, Christian Thalinger wrote: > > On Jul 28, 2011, at 7:29 PM, Ola Bini wrote: > >> On 2011-07-28 10.11, Christian Thalinger wrote: >>> >>> On Jul 28, 2011, at 6:10 PM, Ola Bini wrote: >>> >>>>> The bad performance sounds like something is not inlined at all. >>>>> How are you invoking valueMH? Via invokedynamic or a direct MH >>>>> call? >>>> >>>> valueMH is combined with an exactInvoker and then chained together >>>> with another method handle that calls a method on the result of it >>>> that returns another methodhandle that is set as the target for the >>>> call site. >>> >>> Hard to follow :-) >>> >>>> >>>> Exactly the same thing that was done in the emitted bytecode, >>>> actually. >>> >>> Do you have a test case to run? >>> >> >> Yes, it's a specific benchmark run (bench/bench_read_lexical.sp). The >> current master of Seph (12e906d6cac94608a1da565fe748f87421210c97) has >> the method handle implementation. The >> c92f2246c09cef9eb9018128e19bfa25593afd2f checkin is the last with the >> bytecode based version. > > Sorry, I forgot to look into that. > > $ bin/seph --version bench/bench_read_lexical.sp > Seph 0.0.1-SNAPSHOT [2011-08-10 07:44 -- c92f224] > variable reading, flat short lex 1.803377237 > variable reading, flat short lex 1.716815071 > variable reading, flat short lex 1.720287124 > variable reading, flat short lex 1.721601836 > variable reading, flat short lex 1.718097445 > > $ bin/seph --version bench/bench_read_lexical.sp > Seph 0.0.1-SNAPSHOT [2011-08-10 07:46 -- 12e906d] > variable reading, flat short lex 7.131910330 > variable reading, flat short lex 6.479506665 > variable reading, flat short lex 6.427145377 > variable reading, flat short lex 6.472033742 > variable reading, flat short lex 6.481569097 > > Current master is even slower: > > $ bin/seph --version bench/bench_read_lexical.sp > Seph 0.0.1-SNAPSHOT [2011-08-10 07:50 -- 4520464] > variable reading, flat short lex 8.043511168 > variable reading, flat short lex 7.227023135 > variable reading, flat short lex 7.167401210 > variable reading, flat short lex 7.153554078 > variable reading, flat short lex 7.157729388 > > Looking at the compiled methods and the inlinees I found this one (I suppose this the body for flat_short_scope, correct?): > > 1490 167 seph$gen$abstraction$1$flat_short_scope::activationFor (9 bytes) > @ 3 java.lang.invoke.MethodHandle::invokeExact (36 bytes) inline (hot) > @ 5 java.lang.invoke.MethodHandle::invokeExact (5 bytes) inline (hot) > @ 1 sun.invoke.util.ValueConversions::identity (2 bytes) inline (hot) > @ 12 java.lang.invoke.MethodHandle::invokeExact (12 bytes) inline (hot) > @ 5 java.lang.invoke.MethodHandleImpl::selectAlternative (10 bytes) inline (hot) > @ 29 java.lang.invoke.MethodHandle::invokeExact (38 bytes) inline (hot) > @ 29 java.lang.invoke.MethodHandle::invokeExact (11 bytes) inline (hot) > @ 4 seph.lang.ActivationHelpers::arityErrorMH (45 bytes) too big > @ 5 java.lang.invoke.MethodHandle::invokeExact (6 bytes) inline (hot) > @ 2 seph.lang.compiler.SephCallSite::int_eq (11 bytes) inline (hot) > @ 14 java.lang.invoke.MethodHandle::invokeExact (12 bytes) inline (hot) > @ 5 java.lang.invoke.MethodHandleImpl::selectAlternative (10 bytes) inline (hot) > @ 31 java.lang.invoke.MethodHandle::invokeExact (38 bytes) inline (hot) > @ 31 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) > @ 3 java.lang.invoke.MethodHandle::bindTo (58 bytes) already compiled into a medium method > @ 5 java.lang.invoke.MethodHandle::invokeExact (6 bytes) inline (hot) > @ 2 seph.lang.compiler.SephCallSite::int_eq (11 bytes) inline (hot) > @ 14 java.lang.invoke.MethodHandle::invokeExact (12 bytes) inline (hot) > @ 5 java.lang.invoke.MethodHandleImpl::selectAlternative (10 bytes) inline (hot) > @ 31 java.lang.invoke.MethodHandle::invokeExact (11 bytes) inline (hot) > @ 31 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) > @ 3 java.lang.invoke.MethodHandle::bindTo (58 bytes) already compiled into a medium method > @ 4 seph.lang.ActivationHelpers::arityErrorMH (45 bytes) inline (hot) > > The problem is: java.lang.invoke.MethodHandle::bindTo > > That one calls > > java.lang.invoke.BoundMethodHandle:: (19 bytes) already compiled into a big method > > which calls > > java.lang.invoke.MethodHandleNatives::init (0 bytes) native method > > and there we have our performance bugger. FYI, this is: 7071693: JSR 292: native method calls slow down method handle constructor execution I'm not sure if John is already working on that but that's the plan. -- Christian > > -- Christian > >> >> Cheers >> -- >> Ola Bini (http://olabini.com) >> Ioke - JRuby - ThoughtWorks >> >> "Yields falsehood when quined" yields falsehood when quined. >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From christian.thalinger at oracle.com Fri Aug 12 02:44:53 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 12 Aug 2011 11:44:53 +0200 Subject: JRuby invokedynamic updates In-Reply-To: References: Message-ID: On Aug 11, 2011, at 11:07 PM, Charles Oliver Nutter wrote: > On Wed, Aug 10, 2011 at 11:02 PM, Charles Oliver Nutter > wrote: >> I've added bench/bench_fib_complex.rb. This runs the original fib >> along with three variations: >> >> * One that uses constants for the literals 1 and 2 in the code >> * One that dispatches to other Ruby methods for the <, -, and + calls >> * One that does both >> >> Performance is perhaps most easily explained by showing the numbers: > > After talking with Christian I realized that the constant-based > versions were not getting fully bound with invokedynamic because they > have pre/post logic (specifically, they need to update some > thread-local frame state in JRuby). Ignore those results for the > moment. > > There does still seem to be a problem with switchpoints though. I > stripped out all but the "additional calls" version and compared > switchpoint invalidation with normal. > > > headius at headius-desktop:~/projects/jruby$ > JAVA_HOME=~/hsx-hotspot/build/linux/jdk-linux-i586/ bin/jruby --server > bench/bench_fib_complex.rb 5 35 > fib with additional calls > 9227465 > 2.180000 0.000000 2.180000 ( 2.070000) > fib with additional calls > 9227465 > 1.829000 0.000000 1.829000 ( 1.829000) > fib with additional calls > 9227465 > 1.870000 0.000000 1.870000 ( 1.870000) > fib with additional calls > 9227465 > 1.642000 0.000000 1.642000 ( 1.642000) > fib with additional calls > 9227465 > 1.639000 0.000000 1.639000 ( 1.639000) > > headius at headius-desktop:~/projects/jruby$ > JAVA_HOME=~/hsx-hotspot/build/linux/jdk-linux-i586/ bin/jruby --server > -Xinvokedynamic.invocation.switchpoint=true bench/bench_fib_complex.rb > 5 35 > fib with additional calls > 9227465 > 5.951000 0.000000 5.951000 ( 5.841000) > fib with additional calls > 9227465 > 5.538000 0.000000 5.538000 ( 5.538000) > fib with additional calls > 9227465 > 5.531000 0.000000 5.531000 ( 5.531000) > fib with additional calls > 9227465 > 5.503000 0.000000 5.503000 ( 5.503000) > fib with additional calls > 9227465 > 5.505000 0.000000 5.505000 ( 5.505000) > > My quick look through inlining shows that "plus", "minus", "lt" all > inline into fib_ruby3 ok, as do the binops they call. fib_ruby3 > appears to recursively inline one level. I did not see any obvious > failures in inlining, but there's something not right here. Well, it's the good old: @ 95 java.lang.invoke.MethodHandle::invokeExact (45 bytes) size > DesiredMethodLimit This seems to be the last recursive call that doesn't get inlined. Setting MaxRecursiveInlineLevel=0 makes it go faster. I finally filed (a separate bug to keep this a single change): 7078382: JSR 292: don't count method handle adapters against inlining budgets The proposed fix is: http://cr.openjdk.java.net/~twisti/7078382/ The numbers are now like they should be: intelsdv07:~/mlvm/jruby$ jruby --server bench/bench_fib_complex.rb 5 35 fib with additional calls 0.865000 0.000000 0.865000 ( 0.835000) 0.745000 0.000000 0.745000 ( 0.745000) 0.750000 0.000000 0.750000 ( 0.750000) 0.742000 0.000000 0.742000 ( 0.742000) 0.743000 0.000000 0.743000 ( 0.744000) intelsdv07:~/mlvm/jruby$ jruby --server -Xinvokedynamic.invocation.switchpoint=true bench/bench_fib_complex.rb 5 35 fib with additional calls 0.789000 0.000000 0.789000 ( 0.759000) 0.661000 0.000000 0.661000 ( 0.661000) 0.659000 0.000000 0.659000 ( 0.660000) 0.661000 0.000000 0.661000 ( 0.661000) 0.661000 0.000000 0.661000 ( 0.661000) -- Christian > > - Charlie > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From forax at univ-mlv.fr Fri Aug 12 10:36:38 2011 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Fri, 12 Aug 2011 19:36:38 +0200 Subject: JRuby invokedynamic updates In-Reply-To: References: Message-ID: <4E4564A6.7090409@univ-mlv.fr> Hi Chistian, On 08/12/2011 11:44 AM, Christian Thalinger wrote: > On Aug 11, 2011, at 11:07 PM, Charles Oliver Nutter wrote: > >> On Wed, Aug 10, 2011 at 11:02 PM, Charles Oliver Nutter >> wrote: >>> I've added bench/bench_fib_complex.rb. This runs the original fib >>> along with three variations: >>> >>> * One that uses constants for the literals 1 and 2 in the code >>> * One that dispatches to other Ruby methods for the<, -, and + calls >>> * One that does both >>> >>> Performance is perhaps most easily explained by showing the numbers: >> After talking with Christian I realized that the constant-based >> versions were not getting fully bound with invokedynamic because they >> have pre/post logic (specifically, they need to update some >> thread-local frame state in JRuby). Ignore those results for the >> moment. >> >> There does still seem to be a problem with switchpoints though. I >> stripped out all but the "additional calls" version and compared >> switchpoint invalidation with normal. >> >> >> headius at headius-desktop:~/projects/jruby$ >> JAVA_HOME=~/hsx-hotspot/build/linux/jdk-linux-i586/ bin/jruby --server >> bench/bench_fib_complex.rb 5 35 >> fib with additional calls >> 9227465 >> 2.180000 0.000000 2.180000 ( 2.070000) >> fib with additional calls >> 9227465 >> 1.829000 0.000000 1.829000 ( 1.829000) >> fib with additional calls >> 9227465 >> 1.870000 0.000000 1.870000 ( 1.870000) >> fib with additional calls >> 9227465 >> 1.642000 0.000000 1.642000 ( 1.642000) >> fib with additional calls >> 9227465 >> 1.639000 0.000000 1.639000 ( 1.639000) >> >> headius at headius-desktop:~/projects/jruby$ >> JAVA_HOME=~/hsx-hotspot/build/linux/jdk-linux-i586/ bin/jruby --server >> -Xinvokedynamic.invocation.switchpoint=true bench/bench_fib_complex.rb >> 5 35 >> fib with additional calls >> 9227465 >> 5.951000 0.000000 5.951000 ( 5.841000) >> fib with additional calls >> 9227465 >> 5.538000 0.000000 5.538000 ( 5.538000) >> fib with additional calls >> 9227465 >> 5.531000 0.000000 5.531000 ( 5.531000) >> fib with additional calls >> 9227465 >> 5.503000 0.000000 5.503000 ( 5.503000) >> fib with additional calls >> 9227465 >> 5.505000 0.000000 5.505000 ( 5.505000) >> >> My quick look through inlining shows that "plus", "minus", "lt" all >> inline into fib_ruby3 ok, as do the binops they call. fib_ruby3 >> appears to recursively inline one level. I did not see any obvious >> failures in inlining, but there's something not right here. > Well, it's the good old: > > @ 95 java.lang.invoke.MethodHandle::invokeExact (45 bytes) size> DesiredMethodLimit > > This seems to be the last recursive call that doesn't get inlined. Setting MaxRecursiveInlineLevel=0 makes it go faster. I finally filed (a separate bug to keep this a single change): > > 7078382: JSR 292: don't count method handle adapters against inlining budgets > > The proposed fix is: > > http://cr.openjdk.java.net/~twisti/7078382/ > > The numbers are now like they should be: > > intelsdv07:~/mlvm/jruby$ jruby --server bench/bench_fib_complex.rb 5 35 > fib with additional calls > 0.865000 0.000000 0.865000 ( 0.835000) > 0.745000 0.000000 0.745000 ( 0.745000) > 0.750000 0.000000 0.750000 ( 0.750000) > 0.742000 0.000000 0.742000 ( 0.742000) > 0.743000 0.000000 0.743000 ( 0.744000) > > intelsdv07:~/mlvm/jruby$ jruby --server -Xinvokedynamic.invocation.switchpoint=true bench/bench_fib_complex.rb 5 35 > fib with additional calls > 0.789000 0.000000 0.789000 ( 0.759000) > 0.661000 0.000000 0.661000 ( 0.661000) > 0.659000 0.000000 0.659000 ( 0.660000) > 0.661000 0.000000 0.661000 ( 0.661000) > 0.661000 0.000000 0.661000 ( 0.661000) > > -- Christian I wonder if this patch is not too much. If a big method is called through an invokedynamic + method handle then it will be inlined. I wonder if it's not better to crawle the method handle tree and sum up the bytecode size of all method that have a bytecode size (all the ones that aren't in j.l.invoke). R?mi From tom.rodriguez at oracle.com Fri Aug 12 11:18:32 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 12 Aug 2011 11:18:32 -0700 Subject: JRuby invokedynamic updates In-Reply-To: References: Message-ID: <1297695F-CD22-43CB-A971-4CDF85035AFF@oracle.com> > Well, it's the good old: > > @ 95 java.lang.invoke.MethodHandle::invokeExact (45 bytes) size > DesiredMethodLimit > > This seems to be the last recursive call that doesn't get inlined. Setting MaxRecursiveInlineLevel=0 makes it go faster. I finally filed (a separate bug to keep this a single change): > > 7078382: JSR 292: don't count method handle adapters against inlining budgets > > The proposed fix is: > > http://cr.openjdk.java.net/~twisti/7078382/ I wonder if we need to be slightly more selective than this. Most method handle chains are relatively small and we shouldn't be penalized for that but they could be arbitrarily large too. Worst case they just expand into a bunch of call sites I guess so maybe it's not that bad. Maybe we need an alternate metric for this, like number of call sites in the method handle adapter? This wouldn't be so bad if method handle chains could be compiled separately. I suspect we're going to have to support that eventually. Doing that would make the performance cliff much smaller I think. tom > > The numbers are now like they should be: > > intelsdv07:~/mlvm/jruby$ jruby --server bench/bench_fib_complex.rb 5 35 > fib with additional calls > 0.865000 0.000000 0.865000 ( 0.835000) > 0.745000 0.000000 0.745000 ( 0.745000) > 0.750000 0.000000 0.750000 ( 0.750000) > 0.742000 0.000000 0.742000 ( 0.742000) > 0.743000 0.000000 0.743000 ( 0.744000) > > intelsdv07:~/mlvm/jruby$ jruby --server -Xinvokedynamic.invocation.switchpoint=true bench/bench_fib_complex.rb 5 35 > fib with additional calls > 0.789000 0.000000 0.789000 ( 0.759000) > 0.661000 0.000000 0.661000 ( 0.661000) > 0.659000 0.000000 0.659000 ( 0.660000) > 0.661000 0.000000 0.661000 ( 0.661000) > 0.661000 0.000000 0.661000 ( 0.661000) > > -- Christian > >> >> - Charlie >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From tom.rodriguez at oracle.com Fri Aug 12 11:22:02 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 12 Aug 2011 11:22:02 -0700 Subject: JRuby invokedynamic updates In-Reply-To: <4E4564A6.7090409@univ-mlv.fr> References: <4E4564A6.7090409@univ-mlv.fr> Message-ID: <848A70C6-B209-4693-8041-10F0A8EB58B2@oracle.com> >> > > I wonder if this patch is not too much. > If a big method is called through an invokedynamic + method handle > then it will be inlined. That's not how it works. The methods called by the adapter will still have to pass normal inlining checks before being inlined so at worst you just end up with a bunch of extra calls. The big cliff with invokedynamic performance is the slowness of the all the adapter machinery and currently the only way to avoid that is to inline at least the argument swizzling part. tom > > I wonder if it's not better to crawle the method handle tree and > sum up the bytecode size of all method that have a bytecode size > (all the ones that aren't in j.l.invoke). > > R?mi > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From christian.thalinger at oracle.com Fri Aug 12 11:41:17 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 12 Aug 2011 20:41:17 +0200 Subject: JRuby invokedynamic updates In-Reply-To: <1297695F-CD22-43CB-A971-4CDF85035AFF@oracle.com> References: <1297695F-CD22-43CB-A971-4CDF85035AFF@oracle.com> Message-ID: On Aug 12, 2011, at 8:18 PM, Tom Rodriguez wrote: >> Well, it's the good old: >> >> @ 95 java.lang.invoke.MethodHandle::invokeExact (45 bytes) size > DesiredMethodLimit >> >> This seems to be the last recursive call that doesn't get inlined. Setting MaxRecursiveInlineLevel=0 makes it go faster. I finally filed (a separate bug to keep this a single change): >> >> 7078382: JSR 292: don't count method handle adapters against inlining budgets >> >> The proposed fix is: >> >> http://cr.openjdk.java.net/~twisti/7078382/ > > I wonder if we need to be slightly more selective than this. Most method handle chains are relatively small and we shouldn't be penalized for that but they could be arbitrarily large too. Worst case they just expand into a bunch of call sites I guess so maybe it's not that bad. Maybe we need an alternate metric for this, like number of call sites in the method handle adapter? Yes, using zero is not the best approach but it proved the point. Number of call sites could be a good metric. > > This wouldn't be so bad if method handle chains could be compiled separately. I suspect we're going to have to support that eventually. Doing that would make the performance cliff much smaller I think. Exactly. Today I was thinking about this a lot and did some experiments. The problem we have right now is that invokedynamic instructions have j.l.i.MethodHandle.invokeExact as callee which is a native method. Maybe we could store the methodOop of the method handle adapter somewhere (in the constant pool cache?) when we have bytecode for a method handle chain and execute that? -- Christian > > tom > >> >> The numbers are now like they should be: >> >> intelsdv07:~/mlvm/jruby$ jruby --server bench/bench_fib_complex.rb 5 35 >> fib with additional calls >> 0.865000 0.000000 0.865000 ( 0.835000) >> 0.745000 0.000000 0.745000 ( 0.745000) >> 0.750000 0.000000 0.750000 ( 0.750000) >> 0.742000 0.000000 0.742000 ( 0.742000) >> 0.743000 0.000000 0.743000 ( 0.744000) >> >> intelsdv07:~/mlvm/jruby$ jruby --server -Xinvokedynamic.invocation.switchpoint=true bench/bench_fib_complex.rb 5 35 >> fib with additional calls >> 0.789000 0.000000 0.789000 ( 0.759000) >> 0.661000 0.000000 0.661000 ( 0.661000) >> 0.659000 0.000000 0.659000 ( 0.660000) >> 0.661000 0.000000 0.661000 ( 0.661000) >> 0.661000 0.000000 0.661000 ( 0.661000) >> >> -- Christian >> >>> >>> - Charlie >>> _______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From forax at univ-mlv.fr Fri Aug 12 13:55:16 2011 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Fri, 12 Aug 2011 22:55:16 +0200 Subject: JRuby invokedynamic updates In-Reply-To: <848A70C6-B209-4693-8041-10F0A8EB58B2@oracle.com> References: <4E4564A6.7090409@univ-mlv.fr> <848A70C6-B209-4693-8041-10F0A8EB58B2@oracle.com> Message-ID: <4E459334.3090204@univ-mlv.fr> On 08/12/2011 08:22 PM, Tom Rodriguez wrote: >> I wonder if this patch is not too much. >> If a big method is called through an invokedynamic + method handle >> then it will be inlined. > That's not how it works. The methods called by the adapter will still have to pass normal inlining checks before being inlined so at worst you just end up with a bunch of extra calls. Ok, sorry. > The big cliff with invokedynamic performance is the slowness of the all the adapter machinery and currently the only way to avoid that is to inline at least the argument swizzling part. > > tom R?mi From forax at univ-mlv.fr Fri Aug 12 13:59:24 2011 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Fri, 12 Aug 2011 22:59:24 +0200 Subject: JRuby invokedynamic updates In-Reply-To: References: <1297695F-CD22-43CB-A971-4CDF85035AFF@oracle.com> Message-ID: <4E45942C.4010406@univ-mlv.fr> On 08/12/2011 08:41 PM, Christian Thalinger wrote: > On Aug 12, 2011, at 8:18 PM, Tom Rodriguez wrote: > >>> Well, it's the good old: >>> >>> @ 95 java.lang.invoke.MethodHandle::invokeExact (45 bytes) size> DesiredMethodLimit >>> >>> This seems to be the last recursive call that doesn't get inlined. Setting MaxRecursiveInlineLevel=0 makes it go faster. I finally filed (a separate bug to keep this a single change): >>> >>> 7078382: JSR 292: don't count method handle adapters against inlining budgets >>> >>> The proposed fix is: >>> >>> http://cr.openjdk.java.net/~twisti/7078382/ >> I wonder if we need to be slightly more selective than this. Most method handle chains are relatively small and we shouldn't be penalized for that but they could be arbitrarily large too. Worst case they just expand into a bunch of call sites I guess so maybe it's not that bad. Maybe we need an alternate metric for this, like number of call sites in the method handle adapter? > Yes, using zero is not the best approach but it proved the point. Number of call sites could be a good metric. > >> This wouldn't be so bad if method handle chains could be compiled separately. I suspect we're going to have to support that eventually. Doing that would make the performance cliff much smaller I think. > Exactly. Today I was thinking about this a lot and did some experiments. The problem we have right now is that invokedynamic instructions have j.l.i.MethodHandle.invokeExact as callee which is a native method. Maybe we could store the methodOop of the method handle adapter somewhere (in the constant pool cache?) when we have bytecode for a method handle chain and execute that? It can also be good to store it into the method handle (if it's possible). In that case, even a MethodHandle not linked to invokedynamic callsite can be optimized. > > -- Christian R?mi > >> tom >> >>> The numbers are now like they should be: >>> >>> intelsdv07:~/mlvm/jruby$ jruby --server bench/bench_fib_complex.rb 5 35 >>> fib with additional calls >>> 0.865000 0.000000 0.865000 ( 0.835000) >>> 0.745000 0.000000 0.745000 ( 0.745000) >>> 0.750000 0.000000 0.750000 ( 0.750000) >>> 0.742000 0.000000 0.742000 ( 0.742000) >>> 0.743000 0.000000 0.743000 ( 0.744000) >>> >>> intelsdv07:~/mlvm/jruby$ jruby --server -Xinvokedynamic.invocation.switchpoint=true bench/bench_fib_complex.rb 5 35 >>> fib with additional calls >>> 0.789000 0.000000 0.789000 ( 0.759000) >>> 0.661000 0.000000 0.661000 ( 0.661000) >>> 0.659000 0.000000 0.659000 ( 0.660000) >>> 0.661000 0.000000 0.661000 ( 0.661000) >>> 0.661000 0.000000 0.661000 ( 0.661000) >>> >>> -- Christian >>> >>>> - Charlie >>>> _______________________________________________ >>>> mlvm-dev mailing list >>>> mlvm-dev at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >>> _______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From headius at headius.com Fri Aug 12 16:15:26 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Fri, 12 Aug 2011 18:15:26 -0500 Subject: JRuby invokedynamic updates In-Reply-To: <1297695F-CD22-43CB-A971-4CDF85035AFF@oracle.com> References: <1297695F-CD22-43CB-A971-4CDF85035AFF@oracle.com> Message-ID: On Fri, Aug 12, 2011 at 1:18 PM, Tom Rodriguez wrote: > I wonder if we need to be slightly more selective than this. ?Most method handle chains are relatively small and we shouldn't be penalized for that but they could be arbitrarily large too. ?Worst case they just expand into a bunch of call sites I guess so maybe it's not that bad. ?Maybe we need an alternate metric for this, like number of call sites in the method handle adapter? Well, let me play devil's advocate here: why not just discount MH chains completely? I've been treating method handles as a way to communicate intent directly to the JVM. Method handles are like JVM IR...and I'd like to see them unhindered by limits designed around JVM bytecode. What's the problem with having them discounted en masse? > This wouldn't be so bad if method handle chains could be compiled separately. ?I suspect we're going to have to support that eventually. ?Doing that would make the performance cliff much smaller I think. This probably needs to happen for client mode at the very least. We probably can't get client to inline invokedynamic, but if it at least dispatches to a compiled MH chain it would be a lot better than what it does now (which I think is just execute the chain of handles as-is...usually very slow). - Charlie From headius at headius.com Fri Aug 12 17:29:57 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Fri, 12 Aug 2011 19:29:57 -0500 Subject: JRuby invokedynamic updates In-Reply-To: References: Message-ID: On Fri, Aug 12, 2011 at 4:44 AM, Christian Thalinger wrote: > Well, it's the good old: > > ?@ 95 ? java.lang.invoke.MethodHandle::invokeExact (45 bytes) ? size > DesiredMethodLimit > > This seems to be the last recursive call that doesn't get inlined. ?Setting MaxRecursiveInlineLevel=0 makes it go faster. ?I finally filed (a separate bug to keep this a single change): > > 7078382: JSR 292: don't count method handle adapters against inlining budgets > > The proposed fix is: > > http://cr.openjdk.java.net/~twisti/7078382/ > > The numbers are now like they should be: That's great! I have confirmed the perf improvement here as well. The new numbers are looking *much* better now. There's still a bit of overhead from factoring the original fib into more methods, but it's getting smaller :) normal fib 9227465 1.153000 0.000000 1.153000 ( 1.153000) fib with constants 9227465 2.167000 0.000000 2.167000 ( 2.167000) fib with additional calls 9227465 1.275000 0.000000 1.275000 ( 1.275000) fib with constants and additional calls 9227465 2.442000 0.000000 2.442000 ( 2.442000) Compare to non-indy: normal fib 9227465 1.995000 0.000000 1.995000 ( 1.995000) fib with constants 9227465 3.637000 0.000000 3.637000 ( 3.637000) fib with additional calls 9227465 3.279000 0.000000 3.279000 ( 3.279000) fib with constants and additional calls 9227465 4.352000 0.000000 4.352000 ( 4.352000) I will proceed with more indy updates! - Charlie From headius at headius.com Fri Aug 12 21:49:07 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Fri, 12 Aug 2011 23:49:07 -0500 Subject: Performance problem with invoke dynamic In-Reply-To: <617346F7-990D-42FF-922C-98C654E62C5D@oracle.com> References: <4E30AB57.1030702@gmail.com> <4E3189F4.4000201@gmail.com> <0B56009D-F840-4BE5-BB96-5A51E7995605@oracle.com> <4E319C93.3010306@gmail.com> <48E62F5A-237F-45E0-8FD3-3A89F68218CA@oracle.com> <617346F7-990D-42FF-922C-98C654E62C5D@oracle.com> Message-ID: On Fri, Aug 12, 2011 at 3:17 AM, Christian Thalinger wrote: > FYI, this is: > > 7071693: JSR 292: native method calls slow down method handle constructor execution > > I'm not sure if John is already working on that but that's the plan. This will be very nice. I think it's going to be quite common to rejigger method handles...potentially for every call. The simple case which I know affects several languages is the heavy use of currying. On the other side of things...I hope those languages that will churn through method handles also realize they're unlikely to ever JIT... - Charlie From headius at headius.com Sat Aug 13 13:21:48 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Sat, 13 Aug 2011 15:21:48 -0500 Subject: hsx/hotspot-comp build plus patches Message-ID: I've posted a linux-i586 build of the hotspot-comp branch (compiler team dev branch) plus all recent invokedynamic-related patches: http://jruby.org/jdk I'll post other builds as I make them, but this is a nice one for experimenting with the recent patches being tossed about. This includes: * Tom's patch for bimorphic GWT profiled inlining * Christian's patches for push invalidation of MutableCallSite and SwitchPoint gwt * Christian's patch for full discounting of MethodHandle bytecode I'll update things as more patches come along (and I'll post unpatched builds too). - Charlie From tom.rodriguez at oracle.com Mon Aug 15 11:01:04 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 15 Aug 2011 11:01:04 -0700 Subject: JRuby invokedynamic updates In-Reply-To: References: <1297695F-CD22-43CB-A971-4CDF85035AFF@oracle.com> Message-ID: On Aug 12, 2011, at 4:15 PM, Charles Oliver Nutter wrote: > On Fri, Aug 12, 2011 at 1:18 PM, Tom Rodriguez wrote: >> I wonder if we need to be slightly more selective than this. Most method handle chains are relatively small and we shouldn't be penalized for that but they could be arbitrarily large too. Worst case they just expand into a bunch of call sites I guess so maybe it's not that bad. Maybe we need an alternate metric for this, like number of call sites in the method handle adapter? > > Well, let me play devil's advocate here: why not just discount MH > chains completely? It really only to deal with pathologically long chains. Most method handle chains are pretty simple and should generally just be inlined. However I could build a chain that included a very large number of calls embedded in it and blindly inlining that could cause the compile to grow too large. Chains of method handle chains make it worse. Whatever limit we would pick would be fairly high so that no normal usage would never be cut off. > > I've been treating method handles as a way to communicate intent > directly to the JVM. Method handles are like JVM IR...and I'd like to > see them unhindered by limits designed around JVM bytecode. > > What's the problem with having them discounted en masse? > >> This wouldn't be so bad if method handle chains could be compiled separately. I suspect we're going to have to support that eventually. Doing that would make the performance cliff much smaller I think. > > This probably needs to happen for client mode at the very least. We > probably can't get client to inline invokedynamic, but if it at least > dispatches to a compiled MH chain it would be a lot better than what > it does now (which I think is just execute the chain of handles > as-is...usually very slow). Client will start to do more inlining in 7u2 but it will still be limited because of the lack of profiling. Being able to compile them separately would make the whole system more stable performance wise. I don't know that we can do that for 7u2 though. Doing it correctly may require more machinery than we have time to build for 7u2. tom > > - Charlie > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From headius at headius.com Mon Aug 15 15:19:33 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Mon, 15 Aug 2011 17:19:33 -0500 Subject: JRuby invokedynamic updates In-Reply-To: References: <1297695F-CD22-43CB-A971-4CDF85035AFF@oracle.com> Message-ID: On Mon, Aug 15, 2011 at 1:01 PM, Tom Rodriguez wrote: > > On Aug 12, 2011, at 4:15 PM, Charles Oliver Nutter wrote: >> Well, let me play devil's advocate here: why not just discount MH >> chains completely? > > It really only to deal with pathologically long chains. ?Most method handle chains are pretty simple and should generally just be inlined. ?However I could build a chain that included a very large number of calls embedded in it and blindly inlining that could cause the compile to grow too large. ?Chains of method handle chains make it worse. ?Whatever limit we would pick would be fairly high so that no normal usage would never be cut off. I'm not a compiler guy, so I don't know what "too large" means when it comes to the compile. Is it: * Too complex graph so compilation runs too long? * Too big native code so something blows up or crashes? * Too big native code so it doesn't fit in cache and runs much slower? As long as the limit is suitably high, I think everyone will be happy. I can't see any JRuby use involving more than a few dozen adapters for the most complex cases, mostly argument juggling and converting. >> This probably needs to happen for client mode at the very least. We >> probably can't get client to inline invokedynamic, but if it at least >> dispatches to a compiled MH chain it would be a lot better than what >> it does now (which I think is just execute the chain of handles >> as-is...usually very slow). > > Client will start to do more inlining in 7u2 but it will still be limited because of the lack of profiling. ?Being able to compile them separately would make the whole system more stable performance wise. ?I don't know that we can do that for 7u2 though. ?Doing it correctly may require more machinery than we have time to build for 7u2. My contingency plan is to only turn on invokedynamic when I can see we're running on Hotspot C2, and use the old mechanisms when running under C1. That is an acceptable trade-off until client mode handles invokedynamic/MHs better than it does now. - Charlie From bernard.traversat at oracle.com Mon Aug 15 15:21:14 2011 From: bernard.traversat at oracle.com (bernard.traversat at oracle.com) Date: Mon, 15 Aug 2011 15:21:14 -0700 (PDT) Subject: Auto Reply: Re: JRuby invokedynamic updates Message-ID: This is an auto-replied message. I am out of the office until Aug 22nd with limited access to email and phone. From bernard.traversat at oracle.com Mon Aug 15 15:22:13 2011 From: bernard.traversat at oracle.com (bernard.traversat at oracle.com) Date: Mon, 15 Aug 2011 15:22:13 -0700 (PDT) Subject: Auto Reply: Auto Reply: Re: JRuby invokedynamic updates Message-ID: <28b90cab-0fef-4b16-b631-8de1481844d5@default> This is an auto-replied message. I am out of the office until Aug 22nd with limited access to email and phone. From tom.rodriguez at oracle.com Mon Aug 15 15:26:25 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 15 Aug 2011 15:26:25 -0700 Subject: JRuby invokedynamic updates In-Reply-To: References: <1297695F-CD22-43CB-A971-4CDF85035AFF@oracle.com> Message-ID: <27F11027-C35C-4C73-BEFE-226E5E822433@oracle.com> On Aug 15, 2011, at 3:19 PM, Charles Oliver Nutter wrote: > On Mon, Aug 15, 2011 at 1:01 PM, Tom Rodriguez wrote: >> >> On Aug 12, 2011, at 4:15 PM, Charles Oliver Nutter wrote: >>> Well, let me play devil's advocate here: why not just discount MH >>> chains completely? >> >> It really only to deal with pathologically long chains. Most method handle chains are pretty simple and should generally just be inlined. However I could build a chain that included a very large number of calls embedded in it and blindly inlining that could cause the compile to grow too large. Chains of method handle chains make it worse. Whatever limit we would pick would be fairly high so that no normal usage would never be cut off. > > I'm not a compiler guy, so I don't know what "too large" means when it > comes to the compile. Is it: > > * Too complex graph so compilation runs too long? > * Too big native code so something blows up or crashes? These two mostly. > * Too big native code so it doesn't fit in cache and runs much slower? > > As long as the limit is suitably high, I think everyone will be happy. > I can't see any JRuby use involving more than a few dozen adapters for > the most complex cases, mostly argument juggling and converting. Exactly. tom > >>> This probably needs to happen for client mode at the very least. We >>> probably can't get client to inline invokedynamic, but if it at least >>> dispatches to a compiled MH chain it would be a lot better than what >>> it does now (which I think is just execute the chain of handles >>> as-is...usually very slow). >> >> Client will start to do more inlining in 7u2 but it will still be limited because of the lack of profiling. Being able to compile them separately would make the whole system more stable performance wise. I don't know that we can do that for 7u2 though. Doing it correctly may require more machinery than we have time to build for 7u2. > > My contingency plan is to only turn on invokedynamic when I can see > we're running on Hotspot C2, and use the old mechanisms when running > under C1. That is an acceptable trade-off until client mode handles > invokedynamic/MHs better than it does now. > > - Charlie > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From jeroen at sumatra.nl Tue Aug 16 06:46:29 2011 From: jeroen at sumatra.nl (Jeroen Frijters) Date: Tue, 16 Aug 2011 13:46:29 +0000 Subject: JRuby + invokedynamic on IKVM.NET Message-ID: Hi everyone, I "finished" the initial JSR292 implementation. I haven't done any performance work and it shows: ikvm -Djruby.compile.invokedynamic=false -jar jruby-complete-1.7.0.dev.jar bench\bench_fib_complex.rb normal fib 832040 0.286000 0.000000 0.286000 ( 0.286000) fib with constants 832040 0.640000 0.000000 0.640000 ( 0.640000) fib with additional calls 832040 0.540000 0.000000 0.540000 ( 0.541000) fib with constants and additional calls 832040 0.832000 0.000000 0.832000 ( 0.832000) ikvm -Djruby.compile.invokedynamic=true -jar jruby-complete-1.7.0.dev.jar bench\bench_fib_complex.rb normal fib 832040 1.267000 0.000000 1.267000 ( 1.267000) fib with constants 832040 1.960000 0.000000 1.960000 ( 1.959000) fib with additional calls 832040 3.658000 0.000000 3.658000 ( 3.659000) fib with constants and additional calls 832040 3.696000 0.000000 3.696000 ( 3.696000) Still, pretty cool that it works now. Regards, Jeroen From bernard.traversat at oracle.com Tue Aug 16 06:48:27 2011 From: bernard.traversat at oracle.com (bernard.traversat at oracle.com) Date: Tue, 16 Aug 2011 06:48:27 -0700 (PDT) Subject: Auto Reply: JRuby + invokedynamic on IKVM.NET Message-ID: <84d1387d-ab3f-486b-887a-c5fc039eebbd@default> This is an auto-replied message. I am out of the office until Aug 22nd with limited access to email and phone. From forax at univ-mlv.fr Tue Aug 16 06:57:58 2011 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Tue, 16 Aug 2011 15:57:58 +0200 Subject: JRuby + invokedynamic on IKVM.NET In-Reply-To: References: Message-ID: <4E4A7766.1010807@univ-mlv.fr> On 08/16/2011 03:46 PM, Jeroen Frijters wrote: > Hi everyone, Hi Jeroen, > > I "finished" the initial JSR292 implementation. I haven't done any performance work and it shows: > > ikvm -Djruby.compile.invokedynamic=false -jar jruby-complete-1.7.0.dev.jar bench\bench_fib_complex.rb > normal fib > 832040 > 0.286000 0.000000 0.286000 ( 0.286000) > fib with constants > 832040 > 0.640000 0.000000 0.640000 ( 0.640000) > fib with additional calls > 832040 > 0.540000 0.000000 0.540000 ( 0.541000) > fib with constants and additional calls > 832040 > 0.832000 0.000000 0.832000 ( 0.832000) > > > ikvm -Djruby.compile.invokedynamic=true -jar jruby-complete-1.7.0.dev.jar bench\bench_fib_complex.rb > normal fib > 832040 > 1.267000 0.000000 1.267000 ( 1.267000) > fib with constants > 832040 > 1.960000 0.000000 1.960000 ( 1.959000) > fib with additional calls > 832040 > 3.658000 0.000000 3.658000 ( 3.659000) > fib with constants and additional calls > 832040 > 3.696000 0.000000 3.696000 ( 3.696000) > > > Still, pretty cool that it works now. Not only cool but impressive ! Do you have changed the argument of fib, the result is different from one post by Charles ? > > Regards, > Jeroen Regards, R?mi From jeroen at sumatra.nl Tue Aug 16 07:12:35 2011 From: jeroen at sumatra.nl (Jeroen Frijters) Date: Tue, 16 Aug 2011 14:12:35 +0000 Subject: JRuby + invokedynamic on IKVM.NET In-Reply-To: <4E4A7766.1010807@univ-mlv.fr> References: <4E4A7766.1010807@univ-mlv.fr> Message-ID: Hi R?mi, > Do you have changed the argument of fib, the result is different from > one post by Charles ? I didn't change anything and I get the same result with JDK 7, so something else must have changed. Regards, Jeroen From christian.thalinger at oracle.com Tue Aug 16 07:41:24 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 16 Aug 2011 16:41:24 +0200 Subject: JRuby + invokedynamic on IKVM.NET In-Reply-To: References: <4E4A7766.1010807@univ-mlv.fr> Message-ID: <71BC20BE-D782-4093-B825-1A1A58CC7673@oracle.com> On Aug 16, 2011, at 4:12 PM, Jeroen Frijters wrote: > Hi R?mi, > >> Do you have changed the argument of fib, the result is different from >> one post by Charles ? > > I didn't change anything and I get the same result with JDK 7, so something else must have changed. We usually use 5 35 as arguments to increase runtime in order to get more meaningful numbers. -- Christian > > Regards, > Jeroen > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From christian.thalinger at oracle.com Tue Aug 16 07:41:43 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 16 Aug 2011 16:41:43 +0200 Subject: JRuby + invokedynamic on IKVM.NET In-Reply-To: References: Message-ID: Very cool! -- Christian On Aug 16, 2011, at 3:46 PM, Jeroen Frijters wrote: > Hi everyone, > > I "finished" the initial JSR292 implementation. I haven't done any performance work and it shows: > > ikvm -Djruby.compile.invokedynamic=false -jar jruby-complete-1.7.0.dev.jar bench\bench_fib_complex.rb > normal fib > 832040 > 0.286000 0.000000 0.286000 ( 0.286000) > fib with constants > 832040 > 0.640000 0.000000 0.640000 ( 0.640000) > fib with additional calls > 832040 > 0.540000 0.000000 0.540000 ( 0.541000) > fib with constants and additional calls > 832040 > 0.832000 0.000000 0.832000 ( 0.832000) > > > ikvm -Djruby.compile.invokedynamic=true -jar jruby-complete-1.7.0.dev.jar bench\bench_fib_complex.rb > normal fib > 832040 > 1.267000 0.000000 1.267000 ( 1.267000) > fib with constants > 832040 > 1.960000 0.000000 1.960000 ( 1.959000) > fib with additional calls > 832040 > 3.658000 0.000000 3.658000 ( 3.659000) > fib with constants and additional calls > 832040 > 3.696000 0.000000 3.696000 ( 3.696000) > > > Still, pretty cool that it works now. > > Regards, > Jeroen > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From bernard.traversat at oracle.com Tue Aug 16 07:48:10 2011 From: bernard.traversat at oracle.com (bernard.traversat at oracle.com) Date: Tue, 16 Aug 2011 07:48:10 -0700 (PDT) Subject: Auto Reply: Re: Fwd: [asm] toward ASM 4.0 Message-ID: <952f0b4d-8fa1-4856-a741-64a799d5fd47@default> This is an auto-replied message. I am out of the office until Aug 22nd with limited access to email and phone. From headius at headius.com Tue Aug 16 10:48:37 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Tue, 16 Aug 2011 12:48:37 -0500 Subject: JRuby + invokedynamic on IKVM.NET In-Reply-To: References: Message-ID: On Tue, Aug 16, 2011 at 8:46 AM, Jeroen Frijters wrote: > Hi everyone, > > I "finished" the initial JSR292 implementation. I haven't done any performance work and it shows: Oh very nice :) Only took about a month (since JVMLS) for you to have a working 292 release? The base numbers aren't bad, but I'd be interested in helping look for perf wins with or without invokedynamic. I'd also be interested in comparative numbers with IronRuby...not because I'm a jerk and want to run faster, but because IronRuby has the DLR helping it out and I'd like to know how much it helps these days. Very cool either way. Looking forward to seeing your optimization work :) - Charlie From jeroen at sumatra.nl Wed Aug 17 00:03:15 2011 From: jeroen at sumatra.nl (Jeroen Frijters) Date: Wed, 17 Aug 2011 07:03:15 +0000 Subject: JRuby + invokedynamic on IKVM.NET In-Reply-To: References: Message-ID: Charles Oliver Nutter wrote: > On Tue, Aug 16, 2011 at 8:46 AM, Jeroen Frijters > wrote: > > Hi everyone, > > > > I "finished" the initial JSR292 implementation. I haven't done any > performance work and it shows: > > Oh very nice :) Only took about a month (since JVMLS) for you to have a > working 292 release? Yeah, I worked about three weeks full time on it. > The base numbers aren't bad, but I'd be interested in helping look for > perf wins with or without invokedynamic. I'd also be interested in > comparative numbers with IronRuby...not because I'm a jerk and want to > run faster, but because IronRuby has the DLR helping it out and I'd like > to know how much it helps these days. If you want to play with it, you can download the binaries from http://weblog.ikvm.net. I haven't done any JRuby on Mono testing though. > Very cool either way. Looking forward to seeing your optimization work > :) I'm probably not going to be working on optimizations right away, because I first want to finish the rest of the OpenJDK 7 functionality. Regards, Jeroen From headius at headius.com Wed Aug 17 00:22:58 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Wed, 17 Aug 2011 02:22:58 -0500 Subject: Push invalidation + bimorphic fix In-Reply-To: <74FB8226-951A-4B3A-9C9F-2CCFF3822771@oracle.com> References: <74FB8226-951A-4B3A-9C9F-2CCFF3822771@oracle.com> Message-ID: On Tue, Aug 9, 2011 at 2:43 PM, Tom Rodriguez wrote: >> Same numbers. Is there some other patch you have applied locally? >> What's the best way for me to investigate? > > Can you collect PrintCompilation/PrintInlining output for each of these? I shall, if I can reproduce it again. I've moved on to Christian's combined patch (which includes yours) and performance now exceeds all previous attempts. > One thing I've seen with the frequency fix is that it that sometimes jruby produces GWTs with the direction reversed from that I expect, so that the invokeFallback path ends up being considered the frequency path. ?This can cause us not to inline the fast paths in these cases. ?I think we're going to have to add per GWT path profiling sooner rather than later. I have made some improvements recently that might have fixed that. JRuby originally invalidated call sites if the associated class's serial number changed. However, this meant that if after code started executing, a few methods were added high in the class hierarchy (like at Object), every call site already encountered would invalidate and potentially start failing over to slow paths unnecessarily. In addition, when an interpreted method jitted the associated class's serial number was also updated, so normal execution could cause all sites to invalidate again. I changed the logic as follows: * If we encounter a new type at a call site and have not exceeded our GWT cascade limit, we add it to the chain. * If we exceed the limit of GWT chaining, wipe out the site and switch it permanently to an inline cache * If we encounter a type we've seen before but the serial number has changed, wipe out the site and start over again. This assumes that most class hierarchy modifications happen early, and if we're still seeing them we should consider the sites as fluxxy. Now it's also possible that what you're seeing is the "normal" PIC-like chaining of GWTs. If a call site first encounters type A, and later type B, JRuby will add a new GWT for B that falls back on the GWT for A. If A then turns out to be more common, the fallback path will fire more often. I think this will be a common pattern for dynamic languages on the JVM (and potentially for static languages that have peculiar type structures, like Scala's structural typing), so being able to promote *either* branch of any particular GWT will probably be necessary. - Charlie From jlaskey at me.com Wed Aug 17 05:01:13 2011 From: jlaskey at me.com (Jim Laskey) Date: Wed, 17 Aug 2011 09:01:13 -0300 Subject: Mac Build java-1.7.0-internal-mlvm-2011_08_08 crash Message-ID: <91820C2D-304C-4372-A808-EFCE2F0B5EE9@me.com> # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/Users/stephen/dev/java/src/mlvm/sources/hotspot/src/share/vm/runtime/frame.cpp:1158), pid=1021, tid=4410376192 # Error: ShouldNotReachHere() Just getting back from vacation and switched to the 2011_08_08 build from 2011_06_09. and ran into this crasher. From bernard.traversat at oracle.com Wed Aug 17 05:02:57 2011 From: bernard.traversat at oracle.com (bernard.traversat at oracle.com) Date: Wed, 17 Aug 2011 05:02:57 -0700 (PDT) Subject: Auto Reply: Mac Build java-1.7.0-internal-mlvm-2011_08_08 crash Message-ID: This is an auto-replied message. I am out of the office until Aug 22nd with limited access to email and phone. From christian.thalinger at oracle.com Wed Aug 17 05:07:32 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 17 Aug 2011 14:07:32 +0200 Subject: Mac Build java-1.7.0-internal-mlvm-2011_08_08 crash In-Reply-To: <91820C2D-304C-4372-A808-EFCE2F0B5EE9@me.com> References: <91820C2D-304C-4372-A808-EFCE2F0B5EE9@me.com> Message-ID: On Aug 17, 2011, at 2:01 PM, Jim Laskey wrote: > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/Users/stephen/dev/java/src/mlvm/sources/hotspot/src/share/vm/runtime/frame.cpp:1158), pid=1021, tid=4410376192 > # Error: ShouldNotReachHere() Do you have something to reproduce this one? -- Christian > > Just getting back from vacation and switched to the 2011_08_08 build from 2011_06_09. and ran into this crasher. > > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From jlaskey at me.com Wed Aug 17 05:38:30 2011 From: jlaskey at me.com (Jim Laskey) Date: Wed, 17 Aug 2011 09:38:30 -0300 Subject: Mac Build java-1.7.0-internal-mlvm-2011_08_08 crash In-Reply-To: References: <91820C2D-304C-4372-A808-EFCE2F0B5EE9@me.com> Message-ID: With 292, nothing is simple. If you pull the latest Nashorn and run - you should see it (make sure JDK7 platform is set to java-1.7.0-internal-mlvm-2011_08_08.) It might take a day or two to come up with an isolated example. Cheers, -- Jim On 2011-08-17, at 9:07 AM, Christian Thalinger wrote: > > On Aug 17, 2011, at 2:01 PM, Jim Laskey wrote: > >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/Users/stephen/dev/java/src/mlvm/sources/hotspot/src/share/vm/runtime/frame.cpp:1158), pid=1021, tid=4410376192 >> # Error: ShouldNotReachHere() > > Do you have something to reproduce this one? > > -- Christian > >> >> Just getting back from vacation and switched to the 2011_08_08 build from 2011_06_09. and ran into this crasher. >> >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From mroos at roos.com Wed Aug 17 09:06:59 2011 From: mroos at roos.com (Mark Roos) Date: Wed, 17 Aug 2011 09:06:59 -0700 Subject: Push invalidation + bimorphic fix In-Reply-To: References: <74FB8226-951A-4B3A-9C9F-2CCFF3822771@oracle.com> Message-ID: >From Charlie * If we encounter a new type at a call site and have not exceeded our GWT cascade limit, we add it to the chain. * If we exceed the limit of GWT chaining, wipe out the site and switch it permanently to an inline cache Charlie, what is the depth limit you set and how did you arrive at it? When you say 'wipe out the site' do you mean just replace its target or do you rewrite the method bytecodes with a new callsite type? or??? thanks mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110817/4b939594/attachment.html From headius at headius.com Wed Aug 17 09:16:42 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Wed, 17 Aug 2011 11:16:42 -0500 Subject: Push invalidation + bimorphic fix In-Reply-To: References: <74FB8226-951A-4B3A-9C9F-2CCFF3822771@oracle.com> Message-ID: <7580C95F-7495-4865-92EA-5662FCE4B24C@headius.com> For depth I am using 5 as the limit, but it is just a rough guess based on empirical comparisons of chained GWT versus our inline cache. I need to reevaluate with recent patches in place. By "wipe out" I mean clear the call site's target and start over with a new GWT chain. - Charlie (mobile) On Aug 17, 2011, at 11:06, Mark Roos wrote: > From Charlie > > * If we encounter a new type at a call site and have not exceeded our > GWT cascade limit, we add it to the chain. > * If we exceed the limit of GWT chaining, wipe out the site and switch > it permanently to an inline cache > > Charlie, what is the depth limit you set and how did you arrive at it? > > When you say 'wipe out the site' do you mean just replace its target or do you > rewrite the method bytecodes with a new callsite type? or??? > > thanks > mark > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110817/d5955fbc/attachment.html From duncan.macgregor at ge.com Wed Aug 17 11:12:52 2011 From: duncan.macgregor at ge.com (MacGregor, Duncan (GE Energy)) Date: Wed, 17 Aug 2011 20:12:52 +0200 Subject: Fwd: [asm] toward ASM 4.0 In-Reply-To: References: <4E43F910.50104@free.fr> <4E441D9C.5060703@univ-mlv.fr> Message-ID: <1DCC266105BEF8478FFB15F4BE468EC705D08642@BUDMLVEM09.e2k.ad.ge.com> Since Remi hasn't replied (or it hasn't reached my mailbox if he has). Yes, MethodType is now merged with Type, and MethodHandle is now Handle, which makes everything much more readable. You will however have to make sure you've changed everything from org.objectweb to org.ow2. One problem I have found is that in changing ClassWriter.getCommonSuperClass() to not call initialisation code they've limited it to classes loaded by the bootstrap ClassLoader. I'll try and talk with them about this on the ASM list just as soon as I can get it to send me a subscription mail. Although in our case it was easy to fix the problem by specifying the system ClassLoader I'm not sure if it should be that or ClassWriter's ClassLoader (they're unlikely to be different, but could be in theory). Regards, Duncan. From: mlvm-dev-bounces at openjdk.java.net [mailto:mlvm-dev-bounces at openjdk.java.net] On Behalf Of Mark Roos Sent: 11 August 2011 22:39 To: Da Vinci Machine Project Subject: Re: Fwd: [asm] toward ASM 4.0 R?mi, A message from our glorious ASM leader. I seem to recall some discussion on the mentioned class names ( MethodType and MethodHandle). I assume this means that asm no longer has these types? And thus I no longer need the full class names for the java.lang versions? thanks mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110817/a3078d2a/attachment-0001.html From bernard.traversat at oracle.com Wed Aug 17 11:13:52 2011 From: bernard.traversat at oracle.com (bernard.traversat at oracle.com) Date: Wed, 17 Aug 2011 11:13:52 -0700 (PDT) Subject: Auto Reply: RE: Fwd: [asm] toward ASM 4.0 Message-ID: <9f4d0fa5-f649-42f8-87bf-c72b55a93a2e@default> This is an auto-replied message. I am out of the office until Aug 22nd with limited access to email and phone. From stephen.bannasch at deanbrook.org Wed Aug 17 11:56:27 2011 From: stephen.bannasch at deanbrook.org (Stephen Bannasch) Date: Wed, 17 Aug 2011 14:56:27 -0400 Subject: are there changes for building mlvm now that 1.7 is released? Message-ID: I'm basing my mlvm builds on clones of the following two mercurial forests: hg fclone http://hg.openjdk.java.net/bsd-port/bsd-port sources hg fclone http://hg.openjdk.java.net/mlvm/mlvm patches Complete build scripts here: https://gist.github.com/243072 Are there changes I should make since Java 1.7 has been released? I know work is in progress to integrate the bsd and macosx ports into the mainline source ... but I don't expect this to be complete for some time. From bernard.traversat at oracle.com Wed Aug 17 11:57:53 2011 From: bernard.traversat at oracle.com (bernard.traversat at oracle.com) Date: Wed, 17 Aug 2011 11:57:53 -0700 (PDT) Subject: Auto Reply: are there changes for building mlvm now that 1.7 is released? Message-ID: <02f5c44b-7385-4e53-bca6-f8df8a3c37d0@default> This is an auto-replied message. I am out of the office until Aug 22nd with limited access to email and phone. From john.r.rose at oracle.com Wed Aug 17 11:59:44 2011 From: john.r.rose at oracle.com (John Rose) Date: Wed, 17 Aug 2011 11:59:44 -0700 Subject: Performance problem with invoke dynamic In-Reply-To: References: <4E30AB57.1030702@gmail.com> <4E3189F4.4000201@gmail.com> <0B56009D-F840-4BE5-BB96-5A51E7995605@oracle.com> <4E319C93.3010306@gmail.com> <48E62F5A-237F-45E0-8FD3-3A89F68218CA@oracle.com> <617346F7-990D-42FF-922C-98C654E62C5D@oracle.com> Message-ID: <679117DF-B87C-4874-B3D1-450653E89E26@oracle.com> On Aug 12, 2011, at 9:49 PM, Charles Oliver Nutter wrote: > On the other side of things...I hope those languages that will churn > through method handles also realize they're unlikely to ever JIT... Getting native code out of the MH-construction process brings them much closer to optimizability. Twenty-year-old systems with "method handles" (i.e., lambdas) are very good at compiling them. If you have a curried function (one which requires something like MH.bindTo on every call) you can optimize it after inlining the full call sequence and then using escape analysis to scalarize the intermediate MHs. That's what we have to do, if we are going to support Lambdas which are invoked in the curried style. (I can't see how we wouldn't support them.) Happily, JVMs already do EA, so our job is primarily a matter of making MHs scrutable to that algorithm. -- John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110817/9166462e/attachment.html From bernard.traversat at oracle.com Wed Aug 17 12:01:54 2011 From: bernard.traversat at oracle.com (bernard.traversat at oracle.com) Date: Wed, 17 Aug 2011 12:01:54 -0700 (PDT) Subject: Auto Reply: Re: Performance problem with invoke dynamic Message-ID: <373b35ff-836f-42e5-8354-42514c24fc3f@default> This is an auto-replied message. I am out of the office until Aug 22nd with limited access to email and phone. From john.r.rose at oracle.com Wed Aug 17 12:25:52 2011 From: john.r.rose at oracle.com (John Rose) Date: Wed, 17 Aug 2011 12:25:52 -0700 Subject: Performance problem with invoke dynamic In-Reply-To: <4E30AB57.1030702@gmail.com> References: <4E30AB57.1030702@gmail.com> Message-ID: On Jul 27, 2011, at 5:20 PM, Ola Bini wrote: > I tried switching out asType to explicitCastArguments. That ended up > being about 5% slower. Sorting out asType and eCA probably needs a Cookbook entry or two. I'll post those separately. The main differences between asType and eCA are: - eCA treats interfaces interchangeably with simple untyped Object refs - for primitives, eCA performs narrowing in addition to asType's widening - for primitives, eCA also interconverts between booleans and numbers - for primitives, eCA also unboxes nulls to zeroes (instead of asType's NPE) In your case, eCA should not be much different from asType, so I'm surprised that it is slower. We'll have to watch that. But (as others have pointed out) I think your bottleneck is native calls in MH construction. Ick. Going away. -- John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110817/2333429e/attachment.html From john.r.rose at oracle.com Wed Aug 17 12:47:25 2011 From: john.r.rose at oracle.com (John Rose) Date: Wed, 17 Aug 2011 12:47:25 -0700 Subject: recipes for asType and explicitCastArguments Message-ID: ## Using an interface as a marker for "dynamically typed reference" 1. Take one fresh interface D. Remove all methods. 2. Sprinkle D freely in invokedynamic signatures, wherever a static type "dynamic" is desired. 3. When binding a MH to such an interface, retype MH in two steps, an asType and an eCA. 3a. The asType retypes to the invokedynamic type, but with D replaced by Object. 3b. The eCA retypes Object references (with no runtime test) to D, as needed. 4. Mix and serve. ## Avoiding primitive conversions (a household hint) 1. Take a program which unboxes from references and primitives 1a. Observe that both asType and eCA potentially perform primitive conversions 1b. Observe that these primitive conversions are driven by the runtime type of the box 2. Decide whether your application doesn't need primitive conversions, just pure single-type unboxing. 2a. If it needs primitive conversions along with unboxing, find a different recipe. 3. Retype a MH which has primitives and requires unboxing in two steps. 3a. First, retype primitive to exact wrapper types (like Integer), to nail down the runtime type of each box. 3b. Then widen the exact wrapper types to the required type (e.g., Object, Number, or some interface). ## Make a statically typed method handle have a dynamically typed signature (ready in 1 minute) 1. Take an arbitrary method handle mh. 2. Incant: mh = mh.asType(mh.type().generic()) 3. (optional) Truss the arguments so they will invoke evenly: mh = mh.asSpreader(Object[].class, mh.type().parameterCount()) ## Mix your conversions days ahead of the party 1. Decide ahead of time on a target MH type T1 and a desired type T0. 2. Insert a new leading parameter of MethodHandle.class, into both types. (Call them T0' and T1'.) 3. Obtain a plain invoker from MHs.exactInvoker. 3a. You can use the inexact kind too, for those extra degrees of freedom at invocation time. 4. Perform retyping on the invoker, as needed, from T1' back to T0'. Be creative! 4a. Leave the leading MH argument unchanged. 4b. Name your invoker I'. Store and chill. 5. When it's time to invoke an actual MH (of exact type T1 but using type T0), use an invokeExact of I' instead of the MH itself. 5a. If you are binding the MH to an invokedynamic site (of type T0), simply use I'.bindTo(mh) 5b. If the eventual MH is not exactly compatible with T1, use the inexact form of invoker to build I'. 6. Invoke as needed. Serves any number. 7. Note that this recipe can be used to pre-combine any transformations, not just asType and explicitCastArguments. From mroos at roos.com Wed Aug 17 13:46:34 2011 From: mroos at roos.com (Mark Roos) Date: Wed, 17 Aug 2011 13:46:34 -0700 Subject: Fwd: [asm] toward ASM 4.0 In-Reply-To: <1DCC266105BEF8478FFB15F4BE468EC705D08642@BUDMLVEM09.e2k.ad.ge.com> References: <4E43F910.50104@free.fr> <4E441D9C.5060703@univ-mlv.fr> <1DCC266105BEF8478FFB15F4BE468EC705D08642@BUDMLVEM09.e2k.ad.ge.com> Message-ID: Thanks Duncan, you mentioned One problem I have found is that in changing ClassWriter.getCommonSuperClass() to not call initialisation code they?ve limited it to classes loaded by the bootstrap ClassLoader. How did this show up in your code? Did you have an explicit call or was it incidental to just creating classes? I think I am ok for now as I switched to using the standard class loader but who know where we will end up. thanks mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110817/6d0731be/attachment.html From forax at univ-mlv.fr Wed Aug 17 14:03:06 2011 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Wed, 17 Aug 2011 23:03:06 +0200 Subject: Fwd: [asm] toward ASM 4.0 In-Reply-To: <1DCC266105BEF8478FFB15F4BE468EC705D08642@BUDMLVEM09.e2k.ad.ge.com> References: <4E43F910.50104@free.fr> <4E441D9C.5060703@univ-mlv.fr> <1DCC266105BEF8478FFB15F4BE468EC705D08642@BUDMLVEM09.e2k.ad.ge.com> Message-ID: <4E4C2C8A.8070507@univ-mlv.fr> On 08/17/2011 08:12 PM, MacGregor, Duncan (GE Energy) wrote: > > Since Remi hasn't replied (or it hasn't reached my mailbox if he has). > Yes, MethodType is now merged with Type, and MethodHandle is now > Handle, which makes everything much more readable. You will however > have to make sure you've changed everything from org.objectweb to org.ow2. > > One problem I have found is that in changing > ClassWriter.getCommonSuperClass() to not call initialisation code > they've limited it to classes loaded by the bootstrap ClassLoader. > I'll try and talk with them about this on the ASM list just as soon as > I can get it to send me a subscription mail. Although in our case it > was easy to fix the problem by specifying the system ClassLoader I'm > not sure if it should be that or ClassWriter's ClassLoader (they're > unlikely to be different, but could be in theory). > Hi Duncan, You can talk to me :) I'm the one that is responsible for this change, I will try to fix that > Regards, Duncan. > cheers, R?mi > *From:*mlvm-dev-bounces at openjdk.java.net > [mailto:mlvm-dev-bounces at openjdk.java.net] *On Behalf Of *Mark Roos > *Sent:* 11 August 2011 22:39 > *To:* Da Vinci Machine Project > *Subject:* Re: Fwd: [asm] toward ASM 4.0 > > R?mi, A message from our glorious ASM leader. > > I seem to recall some discussion on the mentioned class names ( > MethodType and MethodHandle). > > I assume this means that asm no longer has these types? And thus I no > longer need the full class names > for the java.lang versions? > > thanks > mark > > > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110817/4520cab1/attachment-0001.html From forax at univ-mlv.fr Wed Aug 17 14:12:18 2011 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Wed, 17 Aug 2011 23:12:18 +0200 Subject: recipes for asType and explicitCastArguments In-Reply-To: References: Message-ID: <4E4C2EB2.8070603@univ-mlv.fr> On 08/17/2011 09:47 PM, John Rose wrote: > ## Using an interface as a marker for "dynamically typed reference" > 1. Take one fresh interface D. Remove all methods. > 2. Sprinkle D freely in invokedynamic signatures, wherever a static type "dynamic" is desired. > 3. When binding a MH to such an interface, retype MH in two steps, an asType and an eCA. > 3a. The asType retypes to the invokedynamic type, but with D replaced by Object. > 3b. The eCA retypes Object references (with no runtime test) to D, as needed. > 4. Mix and serve. This one is really nice. > > ## Avoiding primitive conversions (a household hint) > 1. Take a program which unboxes from references and primitives > 1a. Observe that both asType and eCA potentially perform primitive conversions > 1b. Observe that these primitive conversions are driven by the runtime type of the box > 2. Decide whether your application doesn't need primitive conversions, just pure single-type unboxing. > 2a. If it needs primitive conversions along with unboxing, find a different recipe. > 3. Retype a MH which has primitives and requires unboxing in two steps. > 3a. First, retype primitive to exact wrapper types (like Integer), to nail down the runtime type of each box. > 3b. Then widen the exact wrapper types to the required type (e.g., Object, Number, or some interface). This one is also explained in my presentation at last JVM Summit. > > ## Make a statically typed method handle have a dynamically typed signature (ready in 1 minute) > 1. Take an arbitrary method handle mh. > 2. Incant: mh = mh.asType(mh.type().generic()) > 3. (optional) Truss the arguments so they will invoke evenly: mh = mh.asSpreader(Object[].class, mh.type().parameterCount()) We have invokeWithArguments, perhaps this recipe is only for plumbers like you. > > ## Mix your conversions days ahead of the party > 1. Decide ahead of time on a target MH type T1 and a desired type T0. > 2. Insert a new leading parameter of MethodHandle.class, into both types. (Call them T0' and T1'.) > 3. Obtain a plain invoker from MHs.exactInvoker. > 3a. You can use the inexact kind too, for those extra degrees of freedom at invocation time. > 4. Perform retyping on the invoker, as needed, from T1' back to T0'. Be creative! > 4a. Leave the leading MH argument unchanged. > 4b. Name your invoker I'. Store and chill. > 5. When it's time to invoke an actual MH (of exact type T1 but using type T0), use an invokeExact of I' instead of the MH itself. > 5a. If you are binding the MH to an invokedynamic site (of type T0), simply use I'.bindTo(mh) > 5b. If the eventual MH is not exactly compatible with T1, use the inexact form of invoker to build I'. > 6. Invoke as needed. Serves any number. > 7. Note that this recipe can be used to pre-combine any transformations, not just asType and explicitCastArguments. I'm looking for a use case for this one. Perhaps if a language (even statically typed) use a dedicated list/array like data structure for doing varargs in a lambda and you want to simplify your call site. R?mi From john.r.rose at oracle.com Wed Aug 17 15:27:43 2011 From: john.r.rose at oracle.com (John Rose) Date: Wed, 17 Aug 2011 15:27:43 -0700 Subject: recipes for asType and explicitCastArguments In-Reply-To: <4E4C2EB2.8070603@univ-mlv.fr> References: <4E4C2EB2.8070603@univ-mlv.fr> Message-ID: <9F7CA008-CB26-445C-8555-A931FEF06A5F@oracle.com> On Aug 17, 2011, at 2:12 PM, R?mi Forax wrote: > perhaps this recipe is only for plumbers like you. S'il vous pla?t! Je suis un chef! Un artiste! :-) (Sounds better in French, courtesy of translate.google.com .) -- John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110817/fef891e2/attachment.html From mroos at roos.com Wed Aug 17 17:47:02 2011 From: mroos at roos.com (Mark Roos) Date: Wed, 17 Aug 2011 17:47:02 -0700 Subject: No Class Def Found again Message-ID: I am getting this error again but its not repeatable. Can you give me any idea of how I might stress it to make a test case that happens all of the time? Or is it nothing to worry about? This is the stack trace. This sequence is the main event handler in my user io stream so at this point it had run successfully multiple times. It also repeats now the same for any event I send. But if I restart it could go quite awhile before it fails. thanks mark Using the OSX port jvm from google code, http://code.google.com/p/openjdk-osx-build/ will try windows as well. OpenJDK 64-Bit Server VM (build 21.0-b17, mixed mode) I believe its based on b147. java.lang.NoClassDefFoundError: ri/core/rtalk/RtObject at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at rtPbc.26.invoke(String riAsArrayOfSubstringsSepBy::11) at rtPbc.25.invoke(RtSystemMessage convertParameterToMap::7) at rtPbc.3.invoke(RtSystemMessage initializeFromBytes::47) at rtPbc.1.invoke(RiSystemMessageManager class receiveMessage::6) at ri.core.rtalk.RtCallSite.sendSmToRtalk(RtCallSite.java:453) at ri.experiment2.TestSm$1.run(TestSm.java:71) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110817/b303b541/attachment.html From duncan.macgregor at ge.com Thu Aug 18 02:20:56 2011 From: duncan.macgregor at ge.com (MacGregor, Duncan (GE Energy)) Date: Thu, 18 Aug 2011 11:20:56 +0200 Subject: Fwd: [asm] toward ASM 4.0 In-Reply-To: References: <4E43F910.50104@free.fr><4E441D9C.5060703@univ-mlv.fr> <1DCC266105BEF8478FFB15F4BE468EC705D08642@BUDMLVEM09.e2k.ad.ge.com> Message-ID: <1DCC266105BEF8478FFB15F4BE468EC705D08A88@BUDMLVEM09.e2k.ad.ge.com> It showed up for me in generating classes within compilation tests, which were being loaded by a test runner. I doubt it would show up normally. From: mlvm-dev-bounces at openjdk.java.net [mailto:mlvm-dev-bounces at openjdk.java.net] On Behalf Of Mark Roos Sent: 17 August 2011 21:47 To: Da Vinci Machine Project Subject: RE: Fwd: [asm] toward ASM 4.0 Thanks Duncan, you mentioned One problem I have found is that in changing ClassWriter.getCommonSuperClass() to not call initialisation code they've limited it to classes loaded by the bootstrap ClassLoader. How did this show up in your code? Did you have an explicit call or was it incidental to just creating classes? I think I am ok for now as I switched to using the standard class loader but who know where we will end up. thanks mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110818/6b5c9698/attachment.html From forax at univ-mlv.fr Thu Aug 18 06:25:06 2011 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Thu, 18 Aug 2011 15:25:06 +0200 Subject: Fwd: [asm] toward ASM 4.0 In-Reply-To: <1DCC266105BEF8478FFB15F4BE468EC705D08A88@BUDMLVEM09.e2k.ad.ge.com> References: <4E43F910.50104@free.fr><4E441D9C.5060703@univ-mlv.fr> <1DCC266105BEF8478FFB15F4BE468EC705D08642@BUDMLVEM09.e2k.ad.ge.com> <1DCC266105BEF8478FFB15F4BE468EC705D08A88@BUDMLVEM09.e2k.ad.ge.com> Message-ID: <4E4D12B2.8060906@univ-mlv.fr> There is a fix available in the trunk of ASM. Thanks Ducan. R?mi On 08/18/2011 11:20 AM, MacGregor, Duncan (GE Energy) wrote: > > It showed up for me in generating classes within compilation tests, > which were being loaded by a test runner. I doubt it would show up > normally. > > *From:*mlvm-dev-bounces at openjdk.java.net > [mailto:mlvm-dev-bounces at openjdk.java.net] *On Behalf Of *Mark Roos > *Sent:* 17 August 2011 21:47 > *To:* Da Vinci Machine Project > *Subject:* RE: Fwd: [asm] toward ASM 4.0 > > Thanks Duncan, > > you mentioned > > One problem I have found is that in changing > ClassWriter.getCommonSuperClass() to not > > call initialisation code they've limited it to classes loaded by the > bootstrap ClassLoader. > > How did this show up in your code? Did you have an explicit call or > was it incidental to just creating classes? > I think I am ok for now as I switched to using the standard class > loader but who know where we will end up. > > thanks > > mark > > > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110818/ea9acd6d/attachment-0001.html From nerdscentral at googlemail.com Thu Aug 18 07:15:17 2011 From: nerdscentral at googlemail.com (Alexander Turner) Date: Thu, 18 Aug 2011 15:15:17 +0100 Subject: Fwd: [asm] toward ASM 4.0 In-Reply-To: References: <4E43F910.50104@free.fr> <4E441D9C.5060703@univ-mlv.fr> <1DCC266105BEF8478FFB15F4BE468EC705D08642@BUDMLVEM09.e2k.ad.ge.com> Message-ID: Hi, It worries a little if everyone starts going over to using the system loader as this will badly break in systems like Eclipse and app-servers which require using non system loaders. I suspect we need a more general solution to this. Regards - AJ On 17 August 2011 21:46, Mark Roos wrote: > Thanks Duncan, > > you mentioned > > One problem I have found is that in changing > ClassWriter.getCommonSuperClass() to not > > call initialisation code they?ve limited it to classes loaded by the > bootstrap ClassLoader. > > How did this show up in your code? ?Did you have an explicit call or was it > incidental to just creating classes? > I think I am ok for now as I switched to using the standard class loader but > who know where we will end up. > > thanks > > mark > > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > -- Dr Alexander J Turner http://nerds-central.blogspot.com/2008/01/about-me.html From bernard.traversat at oracle.com Thu Aug 18 07:23:48 2011 From: bernard.traversat at oracle.com (bernard.traversat at oracle.com) Date: Thu, 18 Aug 2011 07:23:48 -0700 (PDT) Subject: Auto Reply: Re: Fwd: [asm] toward ASM 4.0 Message-ID: <8594ba91-e780-4fe0-a59c-773e84a7784d@default> This is an auto-replied message. I am out of the office until Aug 22nd with limited access to email and phone. From forax at univ-mlv.fr Thu Aug 18 07:36:50 2011 From: forax at univ-mlv.fr (=?windows-1252?Q?R=E9mi_Forax?=) Date: Thu, 18 Aug 2011 16:36:50 +0200 Subject: Fwd: [asm] toward ASM 4.0 In-Reply-To: References: <4E43F910.50104@free.fr> <4E441D9C.5060703@univ-mlv.fr> <1DCC266105BEF8478FFB15F4BE468EC705D08642@BUDMLVEM09.e2k.ad.ge.com> Message-ID: <4E4D2382.5020803@univ-mlv.fr> Hi Alexander, the generic solution is to override getCommonSuperClass() to provide the common super class of your type system (runtime or not). The default implementation uses the reflection, i.e the type system provided by the VM. The issue raised by Duncan was just a bug introduced by me when I was trying to avoid to force the linking of the involved classes used by this resolution. I just fuck up with the classloaders. R?mi On 08/18/2011 04:15 PM, Alexander Turner wrote: > Hi, > > It worries a little if everyone starts going over to using the system > loader as this will badly break in systems like Eclipse and > app-servers which require using non system loaders. I suspect we need > a more general solution to this. > > Regards - AJ > > On 17 August 2011 21:46, Mark Roos wrote: >> Thanks Duncan, >> >> you mentioned >> >> One problem I have found is that in changing >> ClassWriter.getCommonSuperClass() to not >> >> call initialisation code they?ve limited it to classes loaded by the >> bootstrap ClassLoader. >> >> How did this show up in your code? Did you have an explicit call or was it >> incidental to just creating classes? >> I think I am ok for now as I switched to using the standard class loader but >> who know where we will end up. >> >> thanks >> >> mark >> >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> >> > > From nerdscentral at googlemail.com Thu Aug 18 07:43:42 2011 From: nerdscentral at googlemail.com (Alexander Turner) Date: Thu, 18 Aug 2011 15:43:42 +0100 Subject: Fwd: [asm] toward ASM 4.0 In-Reply-To: <4E4D2382.5020803@univ-mlv.fr> References: <4E43F910.50104@free.fr> <4E441D9C.5060703@univ-mlv.fr> <1DCC266105BEF8478FFB15F4BE468EC705D08642@BUDMLVEM09.e2k.ad.ge.com> <4E4D2382.5020803@univ-mlv.fr> Message-ID: R?mi, Great :) Thanks for clearing that up for me. Kind regards - AJ On 18 August 2011 15:36, R?mi Forax wrote: > Hi Alexander, > the generic solution is to override getCommonSuperClass() to > provide the common super class of your type system (runtime or not). > The default implementation uses the reflection, > i.e the type system provided by the VM. > > The issue raised by Duncan was just a bug introduced > by me when I was trying to avoid to force the linking of > the involved classes used by this resolution. > I just fuck up with the classloaders. > > R?mi > > On 08/18/2011 04:15 PM, Alexander Turner wrote: >> Hi, >> >> It worries a little if everyone starts going over to using the system >> loader as this will badly break in systems like Eclipse and >> app-servers which require using non system loaders. I suspect we need >> a more general solution to this. >> >> Regards - AJ >> >> On 17 August 2011 21:46, Mark Roos ?wrote: >>> Thanks Duncan, >>> >>> you mentioned >>> >>> One problem I have found is that in changing >>> ClassWriter.getCommonSuperClass() to not >>> >>> call initialisation code they?ve limited it to classes loaded by the >>> bootstrap ClassLoader. >>> >>> How did this show up in your code? ?Did you have an explicit call or was it >>> incidental to just creating classes? >>> I think I am ok for now as I switched to using the standard class loader but >>> who know where we will end up. >>> >>> thanks >>> >>> mark >>> >>> >>> _______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >>> >>> >> >> > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > -- Dr Alexander J Turner http://nerds-central.blogspot.com/2008/01/about-me.html From bernard.traversat at oracle.com Thu Aug 18 07:46:01 2011 From: bernard.traversat at oracle.com (bernard.traversat at oracle.com) Date: Thu, 18 Aug 2011 07:46:01 -0700 (PDT) Subject: Auto Reply: Re: Fwd: [asm] toward ASM 4.0 Message-ID: This is an auto-replied message. I am out of the office until Aug 22nd with limited access to email and phone. From john.r.rose at oracle.com Thu Aug 18 15:58:14 2011 From: john.r.rose at oracle.com (John Rose) Date: Thu, 18 Aug 2011 15:58:14 -0700 Subject: are there changes for building mlvm now that 1.7 is released? In-Reply-To: References: Message-ID: On Aug 17, 2011, at 11:56 AM, Stephen Bannasch wrote: > I'm basing my mlvm builds on clones of the following two mercurial forests: > > hg fclone http://hg.openjdk.java.net/bsd-port/bsd-port sources > hg fclone http://hg.openjdk.java.net/mlvm/mlvm patches > > Complete build scripts here: https://gist.github.com/243072 > > Are there changes I should make since Java 1.7 has been released? > > I know work is in progress to integrate the bsd and macosx ports into the mainline source ... but I don't expect this to be > complete for some time. I'm open to suggestions on this one. It seems to me that the bsd-port repo. is slowing down prior to absorption into the mainline. In this case, maybe the rational thing is to reparent to a faster-moving repo. In particular, I think we should reparent to from bsd-port/bsd-port to hsx/hotspot-comp . That's (approximately) where our future is anyway. Current diffs only in bsd-port (which are necessary for mac builds!) can be posted as a suitably conditional part of the mlvm patch queue. Comments? -- John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110818/106e507e/attachment.html From john.r.rose at oracle.com Thu Aug 18 16:31:26 2011 From: john.r.rose at oracle.com (John Rose) Date: Thu, 18 Aug 2011 16:31:26 -0700 Subject: naming of internal MH classes Message-ID: <52B65101-C1AE-4FDA-9387-3FD38245F5AE@oracle.com> Here is a heads-up before an engineering code review! The OpenJDK implementation of JSR 292 has a number of private MH subclasses, including DirectMethodHandle, BoundMethodHandle, AdapterMethodHandle, AdapterMethodHandle.AsVarargsCollector. Except for the last, these names date back to the days of prototyping JSR 292. As we tune and refactor the implementation, I am starting to define more subclasses. Now it is time to consider some name changes. Because these can impact the JVM (as it recognizes intrinsics) I would like to keep these changes as low-impact as possible. In particular, I want to switch to something similar to Dan Heidinga's naming convention (in IBM's implementation), where private subclasses have simpler but more specific names like AsTypeHandle. Dan's account of MH type names is on slide 6 of his 2010 talk: http://www.wiki.jvmlangsummit.com/images/a/ad/J9_MethodHandle_Impl.pdf (2010) I don't propose to duplicate this, but rather to use the names as a model. (It is possible that this might ease the creation of a common implementation API with IBM, which would be good, but that's not the immediate goal.) Before I start refactoring and renaming, in the OpenJDK implementation, I want to know if we see any downside to using such a naming scheme in OpenJDK. Comments? -- John P.S. The immediate occasion for naming and refactoring is the (likely) need for a CountingHandle (or is it CountingAdapterMethodHandle or AdapterMethodHandle$AsCounter?), to assist in profiling tasks. (Cf. the methodDataOop which profiles for bytecodes. We need similar infra. for MH graphs.) P.P.S. Dan's 2011 talk is here, for the record. It's delves into other interesting details of the J9 implementation, including their version of supercombinators: http://www.wiki.jvmlangsummit.com/images/6/6b/2011_Heidinga.pdf (2011) From mroos at roos.com Thu Aug 18 20:19:35 2011 From: mroos at roos.com (Mark Roos) Date: Thu, 18 Aug 2011 20:19:35 -0700 Subject: NoClassDefFoundError, more notes Message-ID: So it happens with Windows XP server and OSX server but not XP client. I enabled the print compilation and the only thing I see that is consistent is the 'made not entrant' on java.lang.Class::GetName. This happens the same for each time the error occurs but it happens after the error is thrown ( queue error is my catch). Again this ran about 100+ times before the failure starts and then its forever. 108585 337 ri.core.rtalk.RtPrimitives::primObjectIdentEquals (59 bytes) 110268 338 rtPbc.70::block$1 (34 bytes) 111578 339 rtPbc.20::invoke (71 bytes) 117379 340 rtPbc.108::invoke (17 bytes) 127706 341 rtPbc.107::invoke (322 bytes) queue error error with SM send 128276 167 java.lang.Class::getName (21 bytes) made not entrant java.lang.NoClassDefFoundError: ri/core/rtalk/RtObject java.lang.NoClassDefFoundError: ri/core/rtalk/RtObject at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at rtPbc.107.invoke(IndexedCollection replaceFrom:to:with:startingAt::61) at rtPbc.166.invoke(IndexedCollection ,:37) I have not found a way yet to get it to occur from some test only from running lots of different user interactions ( button clicks etc ). Again any suggestions on what I could try to make it occur more often so I can get a test case? thanks mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110818/c2e532b3/attachment.html From christian.thalinger at oracle.com Fri Aug 19 01:19:31 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 19 Aug 2011 10:19:31 +0200 Subject: are there changes for building mlvm now that 1.7 is released? In-Reply-To: References: Message-ID: On Aug 19, 2011, at 12:58 AM, John Rose wrote: > On Aug 17, 2011, at 11:56 AM, Stephen Bannasch wrote: > >> I'm basing my mlvm builds on clones of the following two mercurial forests: >> >> hg fclone http://hg.openjdk.java.net/bsd-port/bsd-port sources >> hg fclone http://hg.openjdk.java.net/mlvm/mlvm patches >> >> Complete build scripts here: https://gist.github.com/243072 >> >> Are there changes I should make since Java 1.7 has been released? >> >> I know work is in progress to integrate the bsd and macosx ports into the mainline source ... but I don't expect this to be >> complete for some time. > > I'm open to suggestions on this one. > > It seems to me that the bsd-port repo. is slowing down prior to absorption into the mainline. > > In this case, maybe the rational thing is to reparent to a faster-moving repo. > > In particular, I think we should reparent to from bsd-port/bsd-port to hsx/hotspot-comp . That's (approximately) where our future is anyway. > > Current diffs only in bsd-port (which are necessary for mac builds!) can be posted as a suitably conditional part of the mlvm patch queue. > > Comments? I concur! Moving to hsx/hotspot-comp would be a very good idea. That would make publishing patches to mlvm a lot easier. -- Christian > > -- John > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110819/ea3bf1d5/attachment.html From forax at univ-mlv.fr Fri Aug 19 05:26:58 2011 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Fri, 19 Aug 2011 14:26:58 +0200 Subject: naming of internal MH classes In-Reply-To: <52B65101-C1AE-4FDA-9387-3FD38245F5AE@oracle.com> References: <52B65101-C1AE-4FDA-9387-3FD38245F5AE@oracle.com> Message-ID: <4E4E5692.2080705@univ-mlv.fr> On 08/19/2011 01:31 AM, John Rose wrote: > Here is a heads-up before an engineering code review! > > The OpenJDK implementation of JSR 292 has a number of private MH subclasses, including DirectMethodHandle, BoundMethodHandle, AdapterMethodHandle, AdapterMethodHandle.AsVarargsCollector. Except for the last, these names date back to the days of prototyping JSR 292. > > As we tune and refactor the implementation, I am starting to define more subclasses. Now it is time to consider some name changes. Because these can impact the JVM (as it recognizes intrinsics) I would like to keep these changes as low-impact as possible. > > In particular, I want to switch to something similar to Dan Heidinga's naming convention (in IBM's implementation), where private subclasses have simpler but more specific names like AsTypeHandle. > > Dan's account of MH type names is on slide 6 of his 2010 talk: > http://www.wiki.jvmlangsummit.com/images/a/ad/J9_MethodHandle_Impl.pdf (2010) > > I don't propose to duplicate this, but rather to use the names as a model. > > (It is possible that this might ease the creation of a common implementation API with IBM, which would be good, but that's not the immediate goal.) > > Before I start refactoring and renaming, in the OpenJDK implementation, I want to know if we see any downside to using such a naming scheme in OpenJDK. > > Comments? I don't see any downside. I have prototyped a way to do reflection on method handle, (more on this in a later mail) which depends on these names but I don't see why I will not be able to update the implementation to use the new name. BTW, I don't understand why I don't understand why the current implementation of a constructor call ( new + dup + constructor call) uses spread/collect instead of two consecutive folds. > > -- John R?mi > > P.S. The immediate occasion for naming and refactoring is the (likely) need for a CountingHandle (or is it CountingAdapterMethodHandle or AdapterMethodHandle$AsCounter?), to assist in profiling tasks. (Cf. the methodDataOop which profiles for bytecodes. We need similar infra. for MH graphs.) You can already do that with a fold that takes a method that returns void and bound the counter object. > > P.P.S. Dan's 2011 talk is here, for the record. It's delves into other interesting details of the J9 implementation, including their version of supercombinators: > http://www.wiki.jvmlangsummit.com/images/6/6b/2011_Heidinga.pdf (2011) From tom.rodriguez at oracle.com Fri Aug 19 10:33:34 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 19 Aug 2011 10:33:34 -0700 Subject: are there changes for building mlvm now that 1.7 is released? In-Reply-To: References: Message-ID: >>> >> >> I'm open to suggestions on this one. >> >> It seems to me that the bsd-port repo. is slowing down prior to absorption into the mainline. >> >> In this case, maybe the rational thing is to reparent to a faster-moving repo. >> >> In particular, I think we should reparent to from bsd-port/bsd-port to hsx/hotspot-comp . That's (approximately) where our future is anyway. >> >> Current diffs only in bsd-port (which are necessary for mac builds!) can be posted as a suitably conditional part of the mlvm patch queue. >> >> Comments? > > I concur! Moving to hsx/hotspot-comp would be a very good idea. That would make publishing patches to mlvm a lot easier. I think we're not very far from having the bsd-port diffs in hotspot which will alleviate this. I kind of volunteered myself to shepherd them in and I want to get it done early next week. Hopefully this will allow the mac builds to start directly from hsx, modulo any build breakages since neither BSD nor Mac are in JPRT yet. tom > > -- Christian > >> >> -- John >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From christian.thalinger at oracle.com Fri Aug 19 11:47:44 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 19 Aug 2011 20:47:44 +0200 Subject: are there changes for building mlvm now that 1.7 is released? In-Reply-To: References: Message-ID: <17B0F6E0-6A27-4AA7-B77A-C83E26057CD6@oracle.com> On Aug 19, 2011, at 7:33 PM, Tom Rodriguez wrote: >>>> >>> >>> I'm open to suggestions on this one. >>> >>> It seems to me that the bsd-port repo. is slowing down prior to absorption into the mainline. >>> >>> In this case, maybe the rational thing is to reparent to a faster-moving repo. >>> >>> In particular, I think we should reparent to from bsd-port/bsd-port to hsx/hotspot-comp . That's (approximately) where our future is anyway. >>> >>> Current diffs only in bsd-port (which are necessary for mac builds!) can be posted as a suitably conditional part of the mlvm patch queue. >>> >>> Comments? >> >> I concur! Moving to hsx/hotspot-comp would be a very good idea. That would make publishing patches to mlvm a lot easier. > > I think we're not very far from having the bsd-port diffs in hotspot which will alleviate this. I kind of volunteered myself to shepherd them in and I want to get it done early next week. Hopefully this will allow the mac builds to start directly from hsx, modulo any build breakages since neither BSD nor Mac are in JPRT yet. I volunteer to review (and try) them too. -- Christian > > tom > >> >> -- Christian >> >>> >>> -- John >>> >>> _______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From headius at headius.com Fri Aug 19 12:12:28 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Fri, 19 Aug 2011 14:12:28 -0500 Subject: Building trunk hotspot-comp Message-ID: I've posted a gist with a script and instructions for setting up a trunk build environment for the Hotspot compiler team's branch: https://gist.github.com/1148321 Hotspot guys: let me know if anything else needs to be tweaked here. It works for me, and it's surprisingly easy to build. I'll also try to periodically post builds with in-flight patches like the recent invokedynamic optimizations. For now the i586 build (with everything 292-related up through Christian's inlining discount patch) is available at http://jruby.org/jdk. Linux only right now. I wish bsd-port would get merged in! - Charlie From john.r.rose at oracle.com Fri Aug 19 20:05:37 2011 From: john.r.rose at oracle.com (John Rose) Date: Fri, 19 Aug 2011 20:05:37 -0700 Subject: Thread safety and nethod handle chains In-Reply-To: <4E44514D.9020905@univ-mlv.fr> References: <4E44514D.9020905@univ-mlv.fr> Message-ID: <732F2618-1469-4540-86F1-2978DA24B63A@oracle.com> On Aug 11, 2011, at 3:01 PM, R?mi Forax wrote: > Yes, you can have duplicate GWTs and > I'm also interested if someone find a better approach :) You can use a custom subclass of MutableCallSite which synchronizes its updates. That way two racing threads cannot install duplicate case-handling logic. Or am I missing the point? -- John From mroos at roos.com Fri Aug 19 21:15:15 2011 From: mroos at roos.com (Mark Roos) Date: Fri, 19 Aug 2011 21:15:15 -0700 Subject: Thread safety and nethod handle chains In-Reply-To: <732F2618-1469-4540-86F1-2978DA24B63A@oracle.com> References: <4E44514D.9020905@univ-mlv.fr> <732F2618-1469-4540-86F1-2978DA24B63A@oracle.com> Message-ID: Hi John The thought experiment is that I have a long GWT chain and two threads are racing down it. Each is looking for the same missing class. The end fallback/lookup is synchronized and so there is only one update at a time, but the new GWT is added to the start of the chain and the second thread is past that point. So when the second gets to the fallback it still thinks the class is missing so it adds it again. The issue would be that the entry to the target is not synchronized. Does this make sense? mark mlvm-dev-bounces at openjdk.java.net wrote on 08/19/2011 08:05:37 PM: > From: John Rose > To: Da Vinci Machine Project > Date: 08/19/2011 08:14 PM > Subject: Re: Thread safety and nethod handle chains > Sent by: mlvm-dev-bounces at openjdk.java.net > > On Aug 11, 2011, at 3:01 PM, R?mi Forax wrote: > > > Yes, you can have duplicate GWTs and > > I'm also interested if someone find a better approach :) > > You can use a custom subclass of MutableCallSite which synchronizes > its updates. > That way two racing threads cannot install duplicate case-handling logic. > Or am I missing the point? > > -- John > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110819/e65d787d/attachment.html From john.r.rose at oracle.com Fri Aug 19 23:14:00 2011 From: john.r.rose at oracle.com (John Rose) Date: Fri, 19 Aug 2011 23:14:00 -0700 Subject: Thread safety and nethod handle chains In-Reply-To: References: <4E44514D.9020905@univ-mlv.fr> <732F2618-1469-4540-86F1-2978DA24B63A@oracle.com> Message-ID: <9154900D-DB5F-433A-8A10-7A2A68898039@oracle.com> On Aug 19, 2011, at 9:15 PM, Mark Roos wrote: > The end fallback/lookup is synchronized and so there is only one update at a time, but the > new GWT is added to the start of the chain and the second thread is past that point. So > when the second gets to the fallback it still thinks the class is missing so it adds it again. > > The issue would be that the entry to the target is not synchronized. > > Does this make sense? Although the GWT chain is opaque you can store a summary of what's in it on the MCS, in subclass field(s). The update might look something like this: class MyMCS extends MutableCallSite { ... List casesHandled; synchronized MethodHandle ensureCaseIsHandled(Class recv) { if (casesHandled.contains(recv)) return; setTarget(addCaseToChain(getTarget(), recv)); casesHandled.add(recv); return getTarget(); } Object handleMethodNotFound(Object... args) throws Throwable { return ensureCaseIsHandled(args[0].getClass()).invokeWithArguments(args); } } Here, the previous state of the MCS.target chain is updated incrementally by pushing new receiver type logic on the head. Or, the target chain could be completely regenerated each time the MCS state changes: synchronized MethodHandle ensureCaseIsHandled(Class recv) { if (casesHandled.contains(recv)) return; casesHandled.add(recv); optimizeCaseOrdering(casesHandled); setTarget(generateNewDispatchTree(casesHandled)); return getTarget(); } Of course, the metadata used to regenerate the MH chain will probably be more complex than a simple list of receiver classes. My point is that it can all be managed under a lock, and then the results can be digested down to an immutable, executable MH chain, which can then be read outside of the lock. Also, new logic can be injected incrementally into the end of an MH chain, if there is an MCS hooked in there. In this example, the last fallback is always the dynamicInvoker of a fresh MCS, pointed to by the head MCS. MutableCallSite tail = new MutableCallSite(this#methodNotFound); synchronized MethodHandle ensureCaseIsHandled(Class recv) { if (casesHandled.contains(recv)) return; casesHandled.add(recv); MutableCallSite newTail = new MutableCallSite(tail.getTarget()); tail.setTarget(generateNewCase(recv, newTail.dynamicInvoker())); tail = newTail; return getTarget(); } Actually, in that last case, the head CS could be another sort of CS, not just a MCS. But if your language allows radical scheme changes, or if you just want to be able to rebuild GWT trees, having the head be an MCS means you can always tidy things up again. In this last case, we assume that the overhead of running through all the extra layers of MCS is not too expensive. With Christian's latest no-polling change, that will be true, generally. Best wishes, -- John From sebastian.sickelmann at gmx.de Fri Aug 19 23:56:19 2011 From: sebastian.sickelmann at gmx.de (Sebastian Sickelmann) Date: Sat, 20 Aug 2011 08:56:19 +0200 Subject: Thread safety and nethod handle chains In-Reply-To: <9154900D-DB5F-433A-8A10-7A2A68898039@oracle.com> References: <4E44514D.9020905@univ-mlv.fr> <732F2618-1469-4540-86F1-2978DA24B63A@oracle.com> <9154900D-DB5F-433A-8A10-7A2A68898039@oracle.com> Message-ID: <4E4F5A93.1000606@gmx.de> Am 20.08.2011 08:14, schrieb John Rose: > On Aug 19, 2011, at 9:15 PM, Mark Roos wrote: > >> The end fallback/lookup is synchronized and so there is only one update at a time, but the >> new GWT is added to the start of the chain and the second thread is past that point. So >> when the second gets to the fallback it still thinks the class is missing so it adds it again. >> >> The issue would be that the entry to the target is not synchronized. >> >> Does this make sense? > Although the GWT chain is opaque you can store a summary of what's in it on the MCS, in subclass field(s). > > The update might look something like this: > > class MyMCS extends MutableCallSite { ... > List casesHandled; > synchronized MethodHandle ensureCaseIsHandled(Class recv) { > if (casesHandled.contains(recv)) return; > setTarget(addCaseToChain(getTarget(), recv)); > casesHandled.add(recv); > return getTarget(); > } > Object handleMethodNotFound(Object... args) throws Throwable { > return ensureCaseIsHandled(args[0].getClass()).invokeWithArguments(args); > } > } If you expect to get much less situations where you need to change the chain, then you may use the double check ideom here. class MyMCS extends MutableCallSite { ... MethodHandle ensureCaseIsHandled(Class recv) { MethodHandle toReturn = analyseChain(recv); if (toReturn != null) return toReturn; else { syschronized(this) { toReturn = analyseChain(recv) if (toReturn != null) return toReturn; return updateChainToHandleCase(recv); } } } } Of cause you can use the "shadow data structure" like List casesHandled; as John suggested if this speed up your chain-analyses. -- Sebastian From forax at univ-mlv.fr Sat Aug 20 02:33:34 2011 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Sat, 20 Aug 2011 11:33:34 +0200 Subject: Thread safety and nethod handle chains In-Reply-To: <9154900D-DB5F-433A-8A10-7A2A68898039@oracle.com> References: <4E44514D.9020905@univ-mlv.fr> <732F2618-1469-4540-86F1-2978DA24B63A@oracle.com> <9154900D-DB5F-433A-8A10-7A2A68898039@oracle.com> Message-ID: <4E4F7F6E.9050400@univ-mlv.fr> On 08/20/2011 08:14 AM, John Rose wrote: > On Aug 19, 2011, at 9:15 PM, Mark Roos wrote: > >> The end fallback/lookup is synchronized and so there is only one update at a time, but the >> new GWT is added to the start of the chain and the second thread is past that point. So >> when the second gets to the fallback it still thinks the class is missing so it adds it again. >> >> The issue would be that the entry to the target is not synchronized. >> >> Does this make sense? > Although the GWT chain is opaque you can store a summary of what's in it on the MCS, in subclass field(s). > > The update might look something like this: > > class MyMCS extends MutableCallSite { ... > List casesHandled; > synchronized MethodHandle ensureCaseIsHandled(Class recv) { > if (casesHandled.contains(recv)) return; > setTarget(addCaseToChain(getTarget(), recv)); > casesHandled.add(recv); > return getTarget(); > } > Object handleMethodNotFound(Object... args) throws Throwable { > return ensureCaseIsHandled(args[0].getClass()).invokeWithArguments(args); > } > } > > Here, the previous state of the MCS.target chain is updated incrementally by pushing new receiver type logic on the head. I think you can have a stack overflow with this code. As Mark said, the problem is that the information that you store in the callsite and the target of the callsite can be unrelated. if a thread T1 do a cache miss, update the target and is swap out of the processor in handleMethodNotFound after calling ensureCaseIsHandled and before invokeWithArguments. a thread T2 can do a cache miss on the same receiver class and goes in an infinite loop. You can avoid that if you store the target in the callsite (so the target will be stored twice). class MyMCS extends MutableCallSite { ... Set casesHandled; MethodHandle syncedTarget; synchronized MethodHandle ensureCaseIsHandled(Class recv) { if (casesHandled.contains(recv)) { setTarget(syncedTarget); return syncedTarget; } MethodHandle newTarget = addCaseToChain(syncedTarget, recv); syncedTarget = newTarget; casesHandled.add(recv); setTarget(newTarget); return newTarget; } Object handleMethodNotFound(Object... args) throws Throwable { return ensureCaseIsHandled(args[0].getClass()).invokeWithArguments(args); } } cheers, R?mi > Or, the target chain could be completely regenerated each time the MCS state changes: > > synchronized MethodHandle ensureCaseIsHandled(Class recv) { > if (casesHandled.contains(recv)) return; > casesHandled.add(recv); > optimizeCaseOrdering(casesHandled); > setTarget(generateNewDispatchTree(casesHandled)); > return getTarget(); > } > > Of course, the metadata used to regenerate the MH chain will probably be more complex than a simple list of receiver classes. My point is that it can all be managed under a lock, and then the results can be digested down to an immutable, executable MH chain, which can then be read outside of the lock. > > Also, new logic can be injected incrementally into the end of an MH chain, if there is an MCS hooked in there. > In this example, the last fallback is always the dynamicInvoker of a fresh MCS, pointed to by the head MCS. > > MutableCallSite tail = new MutableCallSite(this#methodNotFound); > synchronized MethodHandle ensureCaseIsHandled(Class recv) { > if (casesHandled.contains(recv)) return; > casesHandled.add(recv); > MutableCallSite newTail = new MutableCallSite(tail.getTarget()); > tail.setTarget(generateNewCase(recv, newTail.dynamicInvoker())); > tail = newTail; > return getTarget(); > } > > Actually, in that last case, the head CS could be another sort of CS, not just a MCS. But if your language allows radical scheme changes, or if you just want to be able to rebuild GWT trees, having the head be an MCS means you can always tidy things up again. > > In this last case, we assume that the overhead of running through all the extra layers of MCS is not too expensive. With Christian's latest no-polling change, that will be true, generally. > > Best wishes, > -- John > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From christian.thalinger at oracle.com Mon Aug 22 00:43:18 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 22 Aug 2011 09:43:18 +0200 Subject: Building trunk hotspot-comp In-Reply-To: References: Message-ID: <49450EB7-FCFA-4991-9D4E-D377466FB277@oracle.com> That's pretty much it. If you have more that two cores increase the number for HOTSPOT_BUILD_JOBS (it's like make -j [jobs]). -- Christian On Aug 19, 2011, at 9:12 PM, Charles Oliver Nutter wrote: > I've posted a gist with a script and instructions for setting up a > trunk build environment for the Hotspot compiler team's branch: > > https://gist.github.com/1148321 > > Hotspot guys: let me know if anything else needs to be tweaked here. > It works for me, and it's surprisingly easy to build. > > I'll also try to periodically post builds with in-flight patches like > the recent invokedynamic optimizations. For now the i586 build (with > everything 292-related up through Christian's inlining discount patch) > is available at http://jruby.org/jdk. > > Linux only right now. I wish bsd-port would get merged in! > > - Charlie > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From mroos at roos.com Mon Aug 22 09:43:49 2011 From: mroos at roos.com (Mark Roos) Date: Mon, 22 Aug 2011 09:43:49 -0700 Subject: Thread safety and nethod handle chains In-Reply-To: <9154900D-DB5F-433A-8A10-7A2A68898039@oracle.com> References: <4E44514D.9020905@univ-mlv.fr> <732F2618-1469-4540-86F1-2978DA24B63A@oracle.com> <9154900D-DB5F-433A-8A10-7A2A68898039@oracle.com> Message-ID: Thanks for the commentary to all. I learn something new each time. My use of callsites, lookups etc are such that there are lots of lookups ( at least one per generated callsite + class pair). And lookups vary greatly in time, from a quick IV access to a full generation and compile of the method. Because of this I have not been too thrilled with synchronizing the lookup and thus forcing long waits on threads which could proceed. While the thought experiment is potentially rare it does show up when I launch multiple threads on the same task. Then the callsite usage tends to overlap and I start to get duplicate GWTs. But is this bad? Or bad enough to add the overhead of collecting GWT chain information and the analysis. It should not cause failures as each callsite will invoke it method. The question is which target update survives. What I need to look at closely is where to sync. There could be issues if changes to the target occur between the get, modify and set target operations. But again one will win and the other will have to lookup again the next time. I am thinking that it will not turn out to be a big deal. regards mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110822/50d62850/attachment.html From christian.thalinger at oracle.com Tue Aug 23 05:17:29 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 23 Aug 2011 14:17:29 +0200 Subject: JRuby invokedynamic updates In-Reply-To: References: <1297695F-CD22-43CB-A971-4CDF85035AFF@oracle.com> Message-ID: On Aug 12, 2011, at 8:41 PM, Christian Thalinger wrote: > > On Aug 12, 2011, at 8:18 PM, Tom Rodriguez wrote: > >>> Well, it's the good old: >>> >>> @ 95 java.lang.invoke.MethodHandle::invokeExact (45 bytes) size > DesiredMethodLimit >>> >>> This seems to be the last recursive call that doesn't get inlined. Setting MaxRecursiveInlineLevel=0 makes it go faster. I finally filed (a separate bug to keep this a single change): >>> >>> 7078382: JSR 292: don't count method handle adapters against inlining budgets >>> >>> The proposed fix is: >>> >>> http://cr.openjdk.java.net/~twisti/7078382/ >> >> I wonder if we need to be slightly more selective than this. Most method handle chains are relatively small and we shouldn't be penalized for that but they could be arbitrarily large too. Worst case they just expand into a bunch of call sites I guess so maybe it's not that bad. Maybe we need an alternate metric for this, like number of call sites in the method handle adapter? > > Yes, using zero is not the best approach but it proved the point. Number of call sites could be a good metric. How about this: http://cr.openjdk.java.net/~twisti/7078382/ -- Christian > >> >> This wouldn't be so bad if method handle chains could be compiled separately. I suspect we're going to have to support that eventually. Doing that would make the performance cliff much smaller I think. > > Exactly. Today I was thinking about this a lot and did some experiments. The problem we have right now is that invokedynamic instructions have j.l.i.MethodHandle.invokeExact as callee which is a native method. Maybe we could store the methodOop of the method handle adapter somewhere (in the constant pool cache?) when we have bytecode for a method handle chain and execute that? > > -- Christian > >> >> tom >> >>> >>> The numbers are now like they should be: >>> >>> intelsdv07:~/mlvm/jruby$ jruby --server bench/bench_fib_complex.rb 5 35 >>> fib with additional calls >>> 0.865000 0.000000 0.865000 ( 0.835000) >>> 0.745000 0.000000 0.745000 ( 0.745000) >>> 0.750000 0.000000 0.750000 ( 0.750000) >>> 0.742000 0.000000 0.742000 ( 0.742000) >>> 0.743000 0.000000 0.743000 ( 0.744000) >>> >>> intelsdv07:~/mlvm/jruby$ jruby --server -Xinvokedynamic.invocation.switchpoint=true bench/bench_fib_complex.rb 5 35 >>> fib with additional calls >>> 0.789000 0.000000 0.789000 ( 0.759000) >>> 0.661000 0.000000 0.661000 ( 0.661000) >>> 0.659000 0.000000 0.659000 ( 0.660000) >>> 0.661000 0.000000 0.661000 ( 0.661000) >>> 0.661000 0.000000 0.661000 ( 0.661000) >>> >>> -- Christian >>> >>>> >>>> - Charlie >>>> _______________________________________________ >>>> mlvm-dev mailing list >>>> mlvm-dev at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >>> >>> _______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From mroos at roos.com Tue Aug 23 18:15:11 2011 From: mroos at roos.com (Mark Roos) Date: Tue, 23 Aug 2011 18:15:11 -0700 Subject: Problem with 08/08 BSD port from Stephen, MLVM changes recently? Message-ID: In looking for my classDefNotFound bug I tried the latest build I could find for bsd port. The 0609 version worked at least past this point but the 0808 version gave a fatal error # fatal error: memory leak: allocating without ResourceMark V [libjvm.dylib+0x22ebd] VMError::report_and_die()+0x13d V [libjvm.dylib+0xd1e4] report_vm_error(char const*, int, char const*, char const*)+0x84 V [libjvm.dylib+0x1f8be] resource_allocate_bytes(unsigned long)+0x14e V [libjvm.dylib+0x9f10c7] methodOopDesc::make_invoke_method(KlassHandle, Symbol*, Symbol*, Handle, Thread*)+0x387 V [libjvm.dylib+0xbebd8d] SystemDictionary::find_method_handle_invoke(Symbol*, Symbol*, KlassHandle, Thread*)+0x36d V [libjvm.dylib+0x8e7238] LinkResolver::lookup_implicit_method(methodHandle&, KlassHandle, Symbol*, Symbol*, KlassHandle, Thread*)+0x108 V [libjvm.dylib+0x8e943e] LinkResolver::resolve_method(methodHandle&, KlassHandle, Symbol*, Symbol*, KlassHandle, bool, Thread*)+0xc7e V [libjvm.dylib+0x8ed466] LinkResolver::linktime_resolve_virtual_method(methodHandle&, KlassHandle, Symbol*, Symbol*, KlassHandle, bool, Thread*)+0x36 V [libjvm.dylib+0x8ee539] LinkResolver::resolve_virtual_call(CallInfo&, Handle, KlassHandle, KlassHandle, Symbol*, Symbol*, KlassHandle, bool, bool, Thread*)+0x59 V [libjvm.dylib+0x8ee758] LinkResolver::resolve_invokevirtual(CallInfo&, Handle, constantPoolHandle, int, Thread*)+0x1b8 V [libjvm.dylib+0x6c43ec] InterpreterRuntime::resolve_invoke(JavaThread*, Bytecodes::Code)+0x2ac j ri.core.rtalk.RtCallSite.sendSmToRtalk([B)Z+79 j ri.experiment2.TestSm$1.run()V+7 Are there changes to MLVM that I may need to adapt to? mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110823/c79502da/attachment.html From stephen.bannasch at deanbrook.org Tue Aug 23 21:07:28 2011 From: stephen.bannasch at deanbrook.org (Stephen Bannasch) Date: Wed, 24 Aug 2011 00:07:28 -0400 Subject: FAILED: java/lang/invoke/ThrowExceptionsTest.java ... fatal error Message-ID: After I build mlvm I run the running jdk/test/java/lang/invoke tests as well as jdk/test/java/dyn/CoroutineTest.java I'm getting fatal errors in Java running: java/lang/invoke/ThrowExceptionsTest.java jtreg jtr and hs_err_pid log files here: https://gist.github.com/1167284 From christian.thalinger at oracle.com Wed Aug 24 06:50:43 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 24 Aug 2011 15:50:43 +0200 Subject: JRuby invokedynamic updates In-Reply-To: References: <1297695F-CD22-43CB-A971-4CDF85035AFF@oracle.com> Message-ID: On Aug 16, 2011, at 12:19 AM, Charles Oliver Nutter wrote: > On Mon, Aug 15, 2011 at 1:01 PM, Tom Rodriguez wrote: >> >> On Aug 12, 2011, at 4:15 PM, Charles Oliver Nutter wrote: >>> Well, let me play devil's advocate here: why not just discount MH >>> chains completely? >> >> It really only to deal with pathologically long chains. Most method handle chains are pretty simple and should generally just be inlined. However I could build a chain that included a very large number of calls embedded in it and blindly inlining that could cause the compile to grow too large. Chains of method handle chains make it worse. Whatever limit we would pick would be fairly high so that no normal usage would never be cut off. > > I'm not a compiler guy, so I don't know what "too large" means when it > comes to the compile. Is it: > > * Too complex graph so compilation runs too long? > * Too big native code so something blows up or crashes? > * Too big native code so it doesn't fit in cache and runs much slower? > > As long as the limit is suitably high, I think everyone will be happy. > I can't see any JRuby use involving more than a few dozen adapters for > the most complex cases, mostly argument juggling and converting. > >>> This probably needs to happen for client mode at the very least. We >>> probably can't get client to inline invokedynamic, but if it at least >>> dispatches to a compiled MH chain it would be a lot better than what >>> it does now (which I think is just execute the chain of handles >>> as-is...usually very slow). >> >> Client will start to do more inlining in 7u2 but it will still be limited because of the lack of profiling. Being able to compile them separately would make the whole system more stable performance wise. I don't know that we can do that for 7u2 though. Doing it correctly may require more machinery than we have time to build for 7u2. > > My contingency plan is to only turn on invokedynamic when I can see > we're running on Hotspot C2, and use the old mechanisms when running > under C1. That is an acceptable trade-off until client mode handles > invokedynamic/MHs better than it does now. I have some preliminary results for: 7079673: JSR 292: C1 should inline bytecoded method handle adapters These numbers are with: 7078382: JSR 292: don't count method handle adapters against inlining budgets applied (which I currently have out for review). Without that patch the performance goes back down the toilet. I think the next single most important thing is to add support for calling bytecoded method handle adapters directly. intelsdv07:~/mlvm/jruby$ jruby -J-showversion --client bench/bench_fib_complex.rb 5 35 java version "1.7.0" Java(TM) SE Runtime Environment (build 1.7.0-b147) Java HotSpot(TM) Client VM (build 21.0-b17, mixed mode) normal fib 11.765000 0.000000 11.765000 ( 11.748000) 11.765000 0.000000 11.765000 ( 11.765000) 11.746000 0.000000 11.746000 ( 11.746000) 11.745000 0.000000 11.745000 ( 11.745000) 11.702000 0.000000 11.702000 ( 11.702000) fib with constants 14.879000 0.000000 14.879000 ( 14.879000) 14.912000 0.000000 14.912000 ( 14.912000) 15.255000 0.000000 15.255000 ( 15.255000) 15.168000 0.000000 15.168000 ( 15.169000) 15.320000 0.000000 15.320000 ( 15.320000) fib with additional calls 25.799000 0.000000 25.799000 ( 25.798000) 25.705000 0.000000 25.705000 ( 25.705000) 26.044000 0.000000 26.044000 ( 26.044000) 26.028000 0.000000 26.028000 ( 26.028000) 26.351000 0.000000 26.351000 ( 26.351000) fib with constants and additional calls 25.053000 0.000000 25.053000 ( 25.053000) 24.406000 0.000000 24.406000 ( 24.406000) 24.550000 0.000000 24.550000 ( 24.550000) 24.478000 0.000000 24.478000 ( 24.478000) 24.381000 0.000000 24.381000 ( 24.381000) intelsdv07:~/mlvm/jruby$ jruby -J-showversion --client -Xcompile.invokedynamic=false bench/bench_fib_complex.rb 5 35 java version "1.7.0" Java(TM) SE Runtime Environment (build 1.7.0-b147) Java HotSpot(TM) Client VM (build 21.0-b17, mixed mode) normal fib 1.778000 0.000000 1.778000 ( 1.724000) 1.740000 0.000000 1.740000 ( 1.740000) 1.734000 0.000000 1.734000 ( 1.734000) 1.735000 0.000000 1.735000 ( 1.736000) 1.745000 0.000000 1.745000 ( 1.745000) fib with constants 3.420000 0.000000 3.420000 ( 3.420000) 3.379000 0.000000 3.379000 ( 3.379000) 3.387000 0.000000 3.387000 ( 3.387000) 3.398000 0.000000 3.398000 ( 3.398000) 3.389000 0.000000 3.389000 ( 3.389000) fib with additional calls 2.953000 0.000000 2.953000 ( 2.953000) 2.973000 0.000000 2.973000 ( 2.973000) 2.974000 0.000000 2.974000 ( 2.974000) 2.977000 0.000000 2.977000 ( 2.977000) 2.979000 0.000000 2.979000 ( 2.979000) fib with constants and additional calls 4.290000 0.000000 4.290000 ( 4.290000) 4.222000 0.000000 4.222000 ( 4.222000) 4.221000 0.000000 4.221000 ( 4.222000) 4.223000 0.000000 4.223000 ( 4.223000) 4.222000 0.000000 4.222000 ( 4.221000) intelsdv07:~/mlvm/jruby$ jruby -J-showversion --client bench/bench_fib_complex.rb 5 35 java version "1.8.0-ea" Java(TM) SE Runtime Environment (build 1.8.0-ea-b01) Java HotSpot(TM) Client VM (build 22.0-b01-internal, mixed mode) normal fib 1.072000 0.000000 1.072000 ( 1.056000) 1.059000 0.000000 1.059000 ( 1.059000) 1.042000 0.000000 1.042000 ( 1.042000) 1.043000 0.000000 1.043000 ( 1.044000) 1.048000 0.000000 1.048000 ( 1.049000) fib with constants 3.178000 0.000000 3.178000 ( 3.179000) 3.036000 0.000000 3.036000 ( 3.036000) 3.056000 0.000000 3.056000 ( 3.056000) 3.052000 0.000000 3.052000 ( 3.052000) 3.052000 0.000000 3.052000 ( 3.052000) fib with additional calls 1.698000 0.000000 1.698000 ( 1.698000) 1.677000 0.000000 1.677000 ( 1.677000) 1.677000 0.000000 1.677000 ( 1.677000) 1.678000 0.000000 1.678000 ( 1.678000) 1.680000 0.000000 1.680000 ( 1.680000) fib with constants and additional calls 3.483000 0.000000 3.483000 ( 3.483000) 3.501000 0.000000 3.501000 ( 3.501000) 3.496000 0.000000 3.496000 ( 3.496000) 3.498000 0.000000 3.498000 ( 3.498000) 3.530000 0.000000 3.530000 ( 3.530000) And here is a redblack tree: intelsdv07:~/mlvm/redblack$ jruby -J-showversion --client bm1.rb java version "1.7.0" Java(TM) SE Runtime Environment (build 1.7.0-b147) Java HotSpot(TM) Client VM (build 21.0-b17, mixed mode) 18.136 17.942 17.969 17.783 17.916 intelsdv07:~/mlvm/redblack$ jruby -J-showversion --client -Xcompile.invokedynamic=false bm1.rb java version "1.7.0" Java(TM) SE Runtime Environment (build 1.7.0-b147) Java HotSpot(TM) Client VM (build 21.0-b17, mixed mode) 2.791 2.563 2.587 2.568 2.628 intelsdv07:~/mlvm/redblack$ jruby -J-showversion --client bm1.rb java version "1.8.0-ea" Java(TM) SE Runtime Environment (build 1.8.0-ea-b01) Java HotSpot(TM) Client VM (build 22.0-b01-internal, mixed mode) 3.998 2.278 2.25 2.204 2.193 @Charlie: Is redblack doing self-verification? -- Christian > > - Charlie > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From christian.thalinger at oracle.com Wed Aug 24 07:50:04 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 24 Aug 2011 16:50:04 +0200 Subject: Problem with 08/08 BSD port from Stephen, MLVM changes recently? In-Reply-To: References: Message-ID: <52D426DA-8358-4C31-A0EE-A4520A144897@oracle.com> On Aug 24, 2011, at 3:15 AM, Mark Roos wrote: > In looking for my classDefNotFound bug I tried the latest build I could find for bsd port. > > The 0609 version worked at least past this point but the 0808 version gave a fatal error > > # fatal error: memory leak: allocating without ResourceMark > > V [libjvm.dylib+0x22ebd] VMError::report_and_die()+0x13d > V [libjvm.dylib+0xd1e4] report_vm_error(char const*, int, char const*, char const*)+0x84 > V [libjvm.dylib+0x1f8be] resource_allocate_bytes(unsigned long)+0x14e > V [libjvm.dylib+0x9f10c7] methodOopDesc::make_invoke_method(KlassHandle, Symbol*, Symbol*, Handle, Thread*)+0x387 > V [libjvm.dylib+0xbebd8d] SystemDictionary::find_method_handle_invoke(Symbol*, Symbol*, KlassHandle, Thread*)+0x36d > V [libjvm.dylib+0x8e7238] LinkResolver::lookup_implicit_method(methodHandle&, KlassHandle, Symbol*, Symbol*, KlassHandle, Thread*)+0x108 > V [libjvm.dylib+0x8e943e] LinkResolver::resolve_method(methodHandle&, KlassHandle, Symbol*, Symbol*, KlassHandle, bool, Thread*)+0xc7e > V [libjvm.dylib+0x8ed466] LinkResolver::linktime_resolve_virtual_method(methodHandle&, KlassHandle, Symbol*, Symbol*, KlassHandle, bool, Thread*)+0x36 > V [libjvm.dylib+0x8ee539] LinkResolver::resolve_virtual_call(CallInfo&, Handle, KlassHandle, KlassHandle, Symbol*, Symbol*, KlassHandle, bool, bool, Thread*)+0x59 > V [libjvm.dylib+0x8ee758] LinkResolver::resolve_invokevirtual(CallInfo&, Handle, constantPoolHandle, int, Thread*)+0x1b8 > V [libjvm.dylib+0x6c43ec] InterpreterRuntime::resolve_invoke(JavaThread*, Bytecodes::Code)+0x2ac > j ri.core.rtalk.RtCallSite.sendSmToRtalk([B)Z+79 > j ri.experiment2.TestSm$1.run()V+7 > > > Are there changes to MLVM that I may need to adapt to? Looks like we are missing a ResourceMark somewhere. Odd thing is that we never hit that problem before. Is it reproducible? If yes, do you have a test case to reproduce it? -- Christian > > mark > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110824/e04f4d51/attachment.html From christian.thalinger at oracle.com Wed Aug 24 08:24:01 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 24 Aug 2011 17:24:01 +0200 Subject: FAILED: java/lang/invoke/ThrowExceptionsTest.java ... fatal error In-Reply-To: References: Message-ID: <0C3B3919-3F3A-427A-A317-75781281421A@oracle.com> On Aug 24, 2011, at 6:07 AM, Stephen Bannasch wrote: > After I build mlvm I run the running jdk/test/java/lang/invoke tests as well as jdk/test/java/dyn/CoroutineTest.java > > I'm getting fatal errors in Java running: java/lang/invoke/ThrowExceptionsTest.java > > jtreg jtr and hs_err_pid log files here: https://gist.github.com/1167284 I doubt I can can reproduce that error on Linux or Solaris but could you send detailed instructions how to reproduce this? Does it also reproduce when you just run this test? -- Christian > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From headius at headius.com Wed Aug 24 09:26:27 2011 From: headius at headius.com (Charles Oliver Nutter) Date: Wed, 24 Aug 2011 11:26:27 -0500 Subject: JRuby invokedynamic updates In-Reply-To: References: <1297695F-CD22-43CB-A971-4CDF85035AFF@oracle.com> Message-ID: On Wed, Aug 24, 2011 at 8:50 AM, Christian Thalinger wrote: > I have some preliminary results for: > > 7079673: JSR 292: C1 should inline bytecoded method handle adapters > > These numbers are with: > > 7078382: JSR 292: don't count method handle adapters against inlining budgets > > applied (which I currently have out for review). ?Without that patch the performance goes back down the toilet. ?I think the next single most important thing is to add support for calling bytecoded method handle adapters directly. Wow, these numbers are way better! I didn't expect such a drastic difference right away. > And here is a redblack tree: Also excellent! > @Charlie: ?Is redblack doing self-verification? Self-verification as in checking that it's working properly? FWIW, I have been playing with a JRuby port of @notch's little game (notch is creator of minecraft and wrote a tiny dungeon hacking game in Java for some contest). On Java 6 I can get 16-17fps at best. Java 7 on OS X doesn't handle GUI well yet, but if I disable GUI the same slow processing yields 60fps. If it's working like it's supposed to, that's another big win for invokedynamic. Christian: I would like to apply recent patches to hotspot-comp, but the older unapplied webrevs either don't apply cleanly or conflict with each other. I'll find you on Skype to get recommendations and then try to post an updated build. - Charlie From stephen.bannasch at deanbrook.org Wed Aug 24 09:40:31 2011 From: stephen.bannasch at deanbrook.org (Stephen Bannasch) Date: Wed, 24 Aug 2011 12:40:31 -0400 Subject: FAILED: java/lang/invoke/ThrowExceptionsTest.java ... fatal error In-Reply-To: <0C3B3919-3F3A-427A-A317-75781281421A@oracle.com> References: <0C3B3919-3F3A-427A-A317-75781281421A@oracle.com> Message-ID: >On Aug 24, 2011, at 6:07 AM, Stephen Bannasch wrote: > >> After I build mlvm I run the running jdk/test/java/lang/invoke tests as well as jdk/test/java/dyn/CoroutineTest.java >> > > I'm getting fatal errors in Java running: java/lang/invoke/ThrowExceptionsTest.java >> >> jtreg jtr and hs_err_pid log files here: https://gist.github.com/1167284 > >I doubt I can can reproduce that error on Linux or Solaris but could you send detailed instructionshow to reproduce this? Does it also reproduce when you just run this test? In the sources directory after a successful build: $ jtreg -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions -ignore:quiet -jdk:build/bsd-amd64/j2sdk-image jdk/test/java/lang/invoke/ThrowExceptionsTest.java From mroos at roos.com Wed Aug 24 10:54:01 2011 From: mroos at roos.com (Mark Roos) Date: Wed, 24 Aug 2011 10:54:01 -0700 Subject: Problem with 08/08 BSD port from Stephen, MLVM changes recently? In-Reply-To: <52D426DA-8358-4C31-A0EE-A4520A144897@oracle.com> References: <52D426DA-8358-4C31-A0EE-A4520A144897@oracle.com> Message-ID: I do have a test case that repeats the resource mark error. We are trying a new packaging so you should be able to run it without an external connection. Hopefully later today mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110824/ca8b49c4/attachment.html From tom.rodriguez at oracle.com Wed Aug 24 11:44:46 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 24 Aug 2011 11:44:46 -0700 Subject: Problem with 08/08 BSD port from Stephen, MLVM changes recently? In-Reply-To: References: <52D426DA-8358-4C31-A0EE-A4520A144897@oracle.com> Message-ID: I made one: import java.lang.invoke.*; import static java.lang.invoke.MethodHandles.*; import static java.lang.invoke.MethodType.*; public class RM implements Runnable { public static void main(String... args) throws Throwable { new Thread(new RM()).start(); } public static RM test() { return null; } public void run() { try { MethodHandle m1 = MethodHandles.lookup().findStatic(RM.class, "test", methodType(RM.class)); RM v = (RM)m1.invokeExact(); } catch (Throwable t) { t.printStackTrace(); } } } The problem is that make_invoke_method only does extra allocation when there are non-bcp methods on the stack and there's no ResourceMark above the run() method of a thread. make_invoke_method just needs a ResourceMark. I filed 7082949 for this. Thanks for the report. tom On Aug 24, 2011, at 10:54 AM, Mark Roos wrote: > I do have a test case that repeats the resource mark error. We are trying a new packaging so you should be able to run it > without an external connection. > > Hopefully later today > > mark_______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From christian.thalinger at oracle.com Wed Aug 24 12:41:59 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 24 Aug 2011 21:41:59 +0200 Subject: Problem with 08/08 BSD port from Stephen, MLVM changes recently? In-Reply-To: References: <52D426DA-8358-4C31-A0EE-A4520A144897@oracle.com> Message-ID: On Aug 24, 2011, at 8:44 PM, Tom Rodriguez wrote: > I made one: > > import java.lang.invoke.*; > import static java.lang.invoke.MethodHandles.*; > import static java.lang.invoke.MethodType.*; > > public class RM implements Runnable { > public static void main(String... args) throws Throwable { > new Thread(new RM()).start(); > } > > public static RM test() { > return null; > } > > public void run() { > try { > MethodHandle m1 = MethodHandles.lookup().findStatic(RM.class, "test", methodType(RM.class)); > RM v = (RM)m1.invokeExact(); > } catch (Throwable t) { > t.printStackTrace(); > } > } > } > > The problem is that make_invoke_method only does extra allocation when there are non-bcp methods on the stack and there's no ResourceMark above the run() method of a thread. make_invoke_method just needs a ResourceMark. I filed 7082949 for this. Thanks for the report. Thanks for investigating it. -- Christian > > tom > > On Aug 24, 2011, at 10:54 AM, Mark Roos wrote: > >> I do have a test case that repeats the resource mark error. We are trying a new packaging so you should be able to run it >> without an external connection. >> >> Hopefully later today >> >> mark_______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From tom.rodriguez at oracle.com Wed Aug 24 13:58:52 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 24 Aug 2011 13:58:52 -0700 Subject: Problem with 08/08 BSD port from Stephen, MLVM changes recently? In-Reply-To: References: <52D426DA-8358-4C31-A0EE-A4520A144897@oracle.com> Message-ID: <3AEEFCEF-2377-4451-9B30-1821A483C8D5@oracle.com> I didn't expect you to still be working and I was pretty sure how to reproduce it... tom On Aug 24, 2011, at 12:41 PM, Christian Thalinger wrote: > > On Aug 24, 2011, at 8:44 PM, Tom Rodriguez wrote: > >> I made one: >> >> import java.lang.invoke.*; >> import static java.lang.invoke.MethodHandles.*; >> import static java.lang.invoke.MethodType.*; >> >> public class RM implements Runnable { >> public static void main(String... args) throws Throwable { >> new Thread(new RM()).start(); >> } >> >> public static RM test() { >> return null; >> } >> >> public void run() { >> try { >> MethodHandle m1 = MethodHandles.lookup().findStatic(RM.class, "test", methodType(RM.class)); >> RM v = (RM)m1.invokeExact(); >> } catch (Throwable t) { >> t.printStackTrace(); >> } >> } >> } >> >> The problem is that make_invoke_method only does extra allocation when there are non-bcp methods on the stack and there's no ResourceMark above the run() method of a thread. make_invoke_method just needs a ResourceMark. I filed 7082949 for this. Thanks for the report. > > Thanks for investigating it. > > -- Christian > >> >> tom >> >> On Aug 24, 2011, at 10:54 AM, Mark Roos wrote: >> >>> I do have a test case that repeats the resource mark error. We are trying a new packaging so you should be able to run it >>> without an external connection. >>> >>> Hopefully later today >>> >>> mark_______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From forax at univ-mlv.fr Thu Aug 25 11:06:18 2011 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Thu, 25 Aug 2011 20:06:18 +0200 Subject: Bootstrap arguments Message-ID: <4E568F1A.9050002@univ-mlv.fr> Hi John, hi all, several people (2 actually :) ask me how to use a boolean/byte/short/char as a bootstrap argument. As you know, you can't because you can't encode a constant boolean/byte/short/char in the constant pool but only an int and because asType() doesn't do narrowing primitive conversion. I think the doc of the overview of java.lang.invoke can be updated to add a line about that. cheers, R?mi From john.r.rose at oracle.com Thu Aug 25 16:48:22 2011 From: john.r.rose at oracle.com (John Rose) Date: Thu, 25 Aug 2011 16:48:22 -0700 Subject: Bootstrap arguments In-Reply-To: <4E568F1A.9050002@univ-mlv.fr> References: <4E568F1A.9050002@univ-mlv.fr> Message-ID: <5E74BCFB-07C5-461D-A7D9-536378ADCEED@oracle.com> That's a good suggestion; thanks. -- John On Aug 25, 2011, at 11:06 AM, R?mi Forax wrote: > Hi John, hi all, > several people (2 actually :) ask me how to use a boolean/byte/short/char > as a bootstrap argument. > > As you know, you can't because you can't encode a constant > boolean/byte/short/char > in the constant pool but only an int and because asType() doesn't do > narrowing primitive conversion. > I think the doc of the overview of java.lang.invoke can be updated to add > a line about that. From mroos at roos.com Thu Aug 25 17:00:26 2011 From: mroos at roos.com (Mark Roos) Date: Thu, 25 Aug 2011 17:00:26 -0700 Subject: Finally a test case for the ClassNotFound Message-ID: The error I see is that at random times when I am executing Smalltalk on jvm I get occasional a ClassDefNotFound during an InvokeExact. Usually when I am doing a demo. After lots of trying I have a test which fails every time for me. When it fails the stack depth varies from 1 or 2 invokeExacts to lots. And its never in the same place or aligned with anything I can see. Not allways the same method of mine for instance. This test seems to cause the following error on server versions of JDK 7 but not windows client. And seems to happen when hotspot gets aggressive on code which was running. Once it happens it continues on that call site. Tried on Windows XP, 7 and recent OSX port. Also on the 8/8 Bsd Port from Stephen but it has other errors. Its at ftp.roos.com user=javadev pw=mlvm RtalkTest.zip. The .gob files are the smalltalk source code and the libs are optional debug agents ( not used by this test). when run ( java -server -cp RtalkTest.jar ri/experiment2/RtalkTest ) it: reads and loads the smalltalk image from the .gobs does an initial Hanoi recursion does lots of string handling similar to socket rad writes lists the GWT depths invalidates the call sites runs some stuff again to re bootstrap the sites runs the same Hanoi again. Takes a minute or so on my mac. Fails about half way Thrashes pretty well all of the facets I am using and generates some pretty long GWT chains. Other than the trap its interesting that the second Hanoi is much slower than the first. Let me know how I can help regards mark java.lang.NoClassDefFoundError: ri/core/rtalk/RtObject at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at rtPbc.r204.block$2(RiRmtViewCode addClasses:toList:indent::62) at ri.core.rtalk.RtPrimitives.primBlockInvokeWithWith(RtPrimitives.java:1178) at rtPbc.r111.invoke(RtBlock2Arg value:value:) at rtPbc.r101.invoke(SortedCollection sort:to::103) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at rtPbc.r100.invoke(SortedCollection reSort:4) at rtPbc.r103.invoke(SortedCollection addAll::13) at rtPbc.r205.invoke(Collection asSortedCollection::14) at rtPbc.r204.invoke(RiRmtViewCode addClasses:toList:indent::72) at rtPbc.r204.block$1(RiRmtViewCode addClasses:toList:indent::47) at ri.core.rtalk.RtPrimitives.primBlockInvokeWith(RtPrimitives.java:1172) at rtPbc.r46.invoke(RtBlock1Arg value:) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at rtPbc.r45.invoke(OrderedCollection do::14) at rtPbc.r204.invoke(RiRmtViewCode addClasses:toList:indent::73) at rtPbc.r184.invoke(RiRmtViewCode classList:48) at rtPbc.r161.invoke(RiRmtViewCode viewerDefBrowserOn::501) at rtPbc.r147.invoke(RiRmtViewCode subscribe::53) at rtPbc.r146.invoke(RiRmtViewCode handleSystemMessage::28) at rtPbc.r85.block$3(RiSystemMessageManager handleMessage::78) at ri.core.rtalk.RtPrimitives.primBlockInvokeWith(RtPrimitives.java:1172) at rtPbc.r46.invoke(RtBlock1Arg value:) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) at rtPbc.r45.invoke(OrderedCollection do::14) at rtPbc.r85.invoke(RiSystemMessageManager handleMessage::85) at rtPbc.r5.invoke(RiSystemMessageManager class receiveMessage::10) at ri.core.rtalk.RtCallSite.sendSmToRtalk(RtCallSite.java:456) at ri.experiment2.RtalkTest.testSM(RtalkTest.java:205) at ri.experiment2.RtalkTest.testImageLoadFile(RtalkTest.java:138) at ri.experiment2.RtalkTest.main(RtalkTest.java:25) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110825/7638c3aa/attachment.html From jeroen at sumatra.nl Thu Aug 25 23:23:59 2011 From: jeroen at sumatra.nl (Jeroen Frijters) Date: Fri, 26 Aug 2011 06:23:59 +0000 Subject: Bootstrap arguments In-Reply-To: <4E568F1A.9050002@univ-mlv.fr> References: <4E568F1A.9050002@univ-mlv.fr> Message-ID: Hi all, I was surprised by this as well (from an implementers point of view), because the use of asType is an implementation detail. Normally when you call a method taking a boolean/byte/short/char you also load an int onto the stack, so why would this case be any different? Not that it makes a great deal of difference, of course. Regards, Jeroen > -----Original Message----- > From: mlvm-dev-bounces at openjdk.java.net [mailto:mlvm-dev- > bounces at openjdk.java.net] On Behalf Of R?mi Forax > Sent: Thursday, August 25, 2011 20:06 > To: Da Vinci Machine Project > Subject: Bootstrap arguments > > Hi John, hi all, > several people (2 actually :) ask me how to use a > boolean/byte/short/char as a bootstrap argument. > > As you know, you can't because you can't encode a constant > boolean/byte/short/char in the constant pool but only an int and because > asType() doesn't do narrowing primitive conversion. > I think the doc of the overview of java.lang.invoke can be updated to > add a line about that. > > cheers, > R?mi > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From john.r.rose at oracle.com Thu Aug 25 23:30:58 2011 From: john.r.rose at oracle.com (John Rose) Date: Thu, 25 Aug 2011 23:30:58 -0700 Subject: Bootstrap arguments In-Reply-To: References: <4E568F1A.9050002@univ-mlv.fr> Message-ID: <32E8D882-7C7B-4765-B1FD-A7CB0E361A43@oracle.com> On Aug 25, 2011, at 11:23 PM, Jeroen Frijters wrote: > I was surprised by this as well (from an implementers point of view), because the use of asType is an implementation detail. Normally when you call a method taking a boolean/byte/short/char you also load an int onto the stack, so why would this case be any different? When you call a function of type, say, (B), you are promising that the 32-bit int you loaded onto the stack fits into the declared subrange, say, -128..127. (This is a little-known invariant of the verifier. Although all primitive arguments and return values are passed in 32 and 64 bit containers, some of the 32-bit values are required/guaranteed to be constrained to a subrange of values.) So if you start with an arbitrary int value (as an ldc produces) you need an extra dynamic operation to force the value into the required subrange. -- John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110825/4037de55/attachment-0001.html From jeroen at sumatra.nl Thu Aug 25 23:49:55 2011 From: jeroen at sumatra.nl (Jeroen Frijters) Date: Fri, 26 Aug 2011 06:49:55 +0000 Subject: Bootstrap arguments In-Reply-To: <32E8D882-7C7B-4765-B1FD-A7CB0E361A43@oracle.com> References: <4E568F1A.9050002@univ-mlv.fr> <32E8D882-7C7B-4765-B1FD-A7CB0E361A43@oracle.com> Message-ID: John Rose wrote: > When you call a function of type, say, (B), you are promising that the > 32-bit int you loaded onto the stack fits into the declared subrange, > say, -128..127. > > (This is a little-known invariant of the verifier. Although all > primitive arguments and return values are passed in 32 and 64 bit > containers, some of the 32-bit values are required/guaranteed to be > constrained to a subrange of values.) I don't think so. See the attached class. It violates both the boolean and byte constraints ("\jdk1.7\bin\java -Xverify test" runs without complaints). Regards, Jeroen -------------- next part -------------- A non-text attachment was scrubbed... Name: test.class Type: application/x-java-applet Size: 786 bytes Desc: test.class Url : http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110826/255b1f45/attachment.bin From jeroen at sumatra.nl Fri Aug 26 02:21:52 2011 From: jeroen at sumatra.nl (Jeroen Frijters) Date: Fri, 26 Aug 2011 09:21:52 +0000 Subject: Finally a test case for the ClassNotFound In-Reply-To: References: Message-ID: Mark Roos wrote: > The error I see is that at random times when I am executing Smalltalk on > jvm I get occasional a ClassDefNotFound during an InvokeExact. Usually > when I am doing a demo. After lots of trying I have a test which fails > every time for me. When it fails the stack depth varies from 1 or 2 > invokeExacts to lots. And its never in the same place or aligned with > anything I can see. Not allways the same method of mine for instance. I took the liberty of downloading it as well, to test on IKVM.NET :-) It found a couple of bugs, so thank you! BTW, I can also reproduce the Hotspot Server issue (on Windows x64). > Other than the trap its interesting that the second Hanoi is much slower > than the first. On IKVM.NET both are equally slow (about 20 seconds, vs. 2.7 and 15 on HotSpot). Regards, Jeroen From christian.thalinger at oracle.com Fri Aug 26 04:48:15 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Fri, 26 Aug 2011 13:48:15 +0200 Subject: Finally a test case for the ClassNotFound In-Reply-To: References: Message-ID: I can confirm the NCDFE. This could be a duplicate of: 7055941: JSR 292 method handle invocation causes excessive deoptimization for types not on boot class path The workaround is to put the runtime on the boot class path. I also can confirm that the second run of Hanoi is slower with JDK 7. This is the call site invalidation. A recent version of HotSpot runs both Hanois at the same speed. -- Christian On Aug 26, 2011, at 2:00 AM, Mark Roos wrote: > The error I see is that at random times when I am executing Smalltalk on jvm I get occasional a ClassDefNotFound > during an InvokeExact. Usually when I am doing a demo. After lots of trying I have a test which fails every time > for me. When it fails the stack depth varies from 1 or 2 invokeExacts to lots. And its never in the same place or > aligned with anything I can see. Not allways the same method of mine for instance. > > This test seems to cause the following error on server versions of JDK 7 but not windows client. And seems to happen > when hotspot gets aggressive on code which was running. Once it happens it continues on that call site. > Tried on Windows XP, 7 and recent OSX port. Also on the 8/8 Bsd Port from Stephen but it has other > errors. > > Its at ftp.roos.com user=javadev pw=mlvm RtalkTest.zip. The .gob files are the smalltalk source code > and the libs are optional debug agents ( not used by this test). > > when run ( java -server -cp RtalkTest.jar ri/experiment2/RtalkTest ) it: > reads and loads the smalltalk image from the .gobs > does an initial Hanoi recursion > does lots of string handling similar to socket rad writes > lists the GWT depths > invalidates the call sites > runs some stuff again to re bootstrap the sites > runs the same Hanoi again. > > Takes a minute or so on my mac. Fails about half way > > Thrashes pretty well all of the facets I am using and generates some pretty long GWT chains. > > Other than the trap its interesting that the second Hanoi is much slower than the first. > > Let me know how I can help > > regards > mark > > java.lang.NoClassDefFoundError: ri/core/rtalk/RtObject > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at rtPbc.r204.block$2(RiRmtViewCode addClasses:toList:indent::62) > at ri.core.rtalk.RtPrimitives.primBlockInvokeWithWith(RtPrimitives.java:1178) > at rtPbc.r111.invoke(RtBlock2Arg value:value:) > at rtPbc.r101.invoke(SortedCollection sort:to::103) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at rtPbc.r100.invoke(SortedCollection reSort:4) > at rtPbc.r103.invoke(SortedCollection addAll::13) > at rtPbc.r205.invoke(Collection asSortedCollection::14) > at rtPbc.r204.invoke(RiRmtViewCode addClasses:toList:indent::72) > at rtPbc.r204.block$1(RiRmtViewCode addClasses:toList:indent::47) > at ri.core.rtalk.RtPrimitives.primBlockInvokeWith(RtPrimitives.java:1172) > at rtPbc.r46.invoke(RtBlock1Arg value:) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at rtPbc.r45.invoke(OrderedCollection do::14) > at rtPbc.r204.invoke(RiRmtViewCode addClasses:toList:indent::73) > at rtPbc.r184.invoke(RiRmtViewCode classList:48) > at rtPbc.r161.invoke(RiRmtViewCode viewerDefBrowserOn::501) > at rtPbc.r147.invoke(RiRmtViewCode subscribe::53) > at rtPbc.r146.invoke(RiRmtViewCode handleSystemMessage::28) > at rtPbc.r85.block$3(RiSystemMessageManager handleMessage::78) > at ri.core.rtalk.RtPrimitives.primBlockInvokeWith(RtPrimitives.java:1172) > at rtPbc.r46.invoke(RtBlock1Arg value:) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at java.lang.invoke.MethodHandle.invokeExact(MethodHandle.java) > at rtPbc.r45.invoke(OrderedCollection do::14) > at rtPbc.r85.invoke(RiSystemMessageManager handleMessage::85) > at rtPbc.r5.invoke(RiSystemMessageManager class receiveMessage::10) > at ri.core.rtalk.RtCallSite.sendSmToRtalk(RtCallSite.java:456) > at ri.experiment2.RtalkTest.testSM(RtalkTest.java:205) > at ri.experiment2.RtalkTest.testImageLoadFile(RtalkTest.java:138) > at ri.experiment2.RtalkTest.main(RtalkTest.java:25) > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110826/4509f11f/attachment-0001.html From mroos at roos.com Fri Aug 26 12:24:59 2011 From: mroos at roos.com (Mark Roos) Date: Fri, 26 Aug 2011 12:24:59 -0700 Subject: Finally a test case for the ClassNotFound In-Reply-To: References: Message-ID: Thanks for the quick look. (glad to see it ran for you) By the 'runtime' to add to the bootclass path do you mean rt.jar or my runtime (RtalkTest.jar)? mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110826/86bdabd6/attachment.html From forax at univ-mlv.fr Fri Aug 26 13:14:17 2011 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Fri, 26 Aug 2011 22:14:17 +0200 Subject: Finally a test case for the ClassNotFound In-Reply-To: References: Message-ID: <4E57FE99.3010700@univ-mlv.fr> On 08/26/2011 09:24 PM, Mark Roos wrote: > Thanks for the quick look. (glad to see it ran for you) > > By the 'runtime' to add to the bootclass path do you mean rt.jar or my > runtime (RtalkTest.jar)? Your runtime. rt.jar is already in the bootstrap path. > > mark cheers, R?mi -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110826/04f46969/attachment.html From mroos at roos.com Fri Aug 26 13:25:04 2011 From: mroos at roos.com (Mark Roos) Date: Fri, 26 Aug 2011 13:25:04 -0700 Subject: Finally a test case for the ClassNotFound In-Reply-To: References: Message-ID: >From Jeroen I took the liberty of downloading it as well, to test on IKVM.NET :-) Glad it helped mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110826/112bbb42/attachment.html From mroos at roos.com Fri Aug 26 13:25:05 2011 From: mroos at roos.com (Mark Roos) Date: Fri, 26 Aug 2011 13:25:05 -0700 Subject: Finally a test case for the ClassNotFound In-Reply-To: References: Message-ID: I put my jar on the boot class and the problem goes away on OSX version. Thanks mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110826/b92991fb/attachment.html From mroos at roos.com Fri Aug 26 15:25:17 2011 From: mroos at roos.com (Mark Roos) Date: Fri, 26 Aug 2011 15:25:17 -0700 Subject: Finally a test case for the ClassNotFound In-Reply-To: <4E57FE99.3010700@univ-mlv.fr> References: <4E57FE99.3010700@univ-mlv.fr> Message-ID: Thanks R?mi I tried to put just the offending class on the boot path but that was not enough. The entire app worked so I guess the solution is somewhere in between mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110826/14d9c4fc/attachment.html From cremes.devlist at mac.com Sat Aug 27 13:17:33 2011 From: cremes.devlist at mac.com (Chuck Remes) Date: Sat, 27 Aug 2011 15:17:33 -0500 Subject: plea for a Windows build Message-ID: <17EE0EA3-2957-4D85-B674-19A36F174A08@mac.com> I'd like to help Charlie Nutter out with testing some of the new invokedynamic support for the JVM on Windows. I hear that the latest fixes (which folks on this list have produced) are *not* part of the openjdk repository and are stored elsewhere. While researching the steps necessary to build the JVM for myself on Windows, I quickly saw that it was going to be beyond my capability in the short term. The Cygwin dependency isn't such a big deal, but many of the issues that cause the build to fail (having to do with freetype) seemed rather difficult to work around. Just doing a build of the normal openjdk code is difficult enough without also throwing external patches into the mix which may or may not apply cleanly. So, I am hoping I can get help from someone on this list. I assume someone is building and testing on Windows. If they are willing to share their 32-bit and 64-bit builds with me, I am more than willing to run some of my "real world" Ruby code under JRuby and help Charlie vet the correctness and performance of the latest fixes. If such a build already exists and is publicly available, please point me to it (my google searches have been fruitless). Please respond to me either here on the list or privately. Thank you. cr From dalibor.topic at oracle.com Sat Aug 27 14:11:43 2011 From: dalibor.topic at oracle.com (Dalibor Topic) Date: Sat, 27 Aug 2011 23:11:43 +0200 Subject: plea for a Windows build In-Reply-To: <17EE0EA3-2957-4D85-B674-19A36F174A08@mac.com> References: <17EE0EA3-2957-4D85-B674-19A36F174A08@mac.com> Message-ID: <4E595D8F.80608@oracle.com> On 8/27/11 10:17 PM, Chuck Remes wrote: >I hear that the latest fixes (which folks on this list have produced) are *not* part of the openjdk repository and are stored elsewhere. You likely heard wrong. If you want the bleeding edge of hostpot, see for example https://twitter.com/#!/headius/status/104630959068020736 for a pointer to build instructions. cheers, dalibor topic -- Oracle Dalibor Topic | Java F/OSS Ambassador Phone: +494023646738 | Mobile: +491772664192 Oracle Java Platform Group ORACLE Deutschland B.V. & Co. KG | Nagelsweg 55 | 20097 Hamburg ORACLE Deutschland B.V. & Co. KG Hauptverwaltung: Riesstr. 25, D-80992 M?nchen Registergericht: Amtsgericht M?nchen, HRA 95603 Komplement?rin: ORACLE Deutschland Verwaltung B.V. Hertogswetering 163/167, 3543 AS Utrecht, Niederlande Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697 Gesch?ftsf?hrer: J?rgen Kunz, Marcel van de Molen, Alexander van der Ven Green Oracle Oracle is committed to developing practices and products that help protect the environment From mroos at roos.com Sat Aug 27 15:44:59 2011 From: mroos at roos.com (Mark Roos) Date: Sat, 27 Aug 2011 15:44:59 -0700 Subject: Building trunk hotspot-comp In-Reply-To: References: Message-ID: Charles Does this script get all of the latest mlvm patches or is there another step to pull those? What about the conditional stuff like coro? thanks mark mlvm-dev-bounces at openjdk.java.net wrote on 08/19/2011 12:12:28 PM: > From: Charles Oliver Nutter > To: Da Vinci Machine Project > Date: 08/19/2011 12:21 PM > Subject: Building trunk hotspot-comp > Sent by: mlvm-dev-bounces at openjdk.java.net > > I've posted a gist with a script and instructions for setting up a > trunk build environment for the Hotspot compiler team's branch: > > https://gist.github.com/1148321 > > Hotspot guys: let me know if anything else needs to be tweaked here. > It works for me, and it's surprisingly easy to build. > > I'll also try to periodically post builds with in-flight patches like > the recent invokedynamic optimizations. For now the i586 build (with > everything 292-related up through Christian's inlining discount patch) > is available at http://jruby.org/jdk. > > Linux only right now. I wish bsd-port would get merged in! > > - Charlie > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110827/26dae0b3/attachment.html From mroos at roos.com Mon Aug 29 00:00:07 2011 From: mroos at roos.com (Mark Roos) Date: Mon, 29 Aug 2011 00:00:07 -0700 Subject: Debugging a dynamic language on the JVM, some comments Message-ID: Things are looking pretty good now that my implementation is stable ( thanks for the inputs Christian). It turns out debugging in Smalltalk is pretty nice ( IMHO ) as its a live debugger inside a live application. To accomplish this only a few features are required. For the heap we need to get all instances of a class and all references to an object. For a thread we need to suspend, step and resume. And for the stack we need to inspect and possibly modify stack vars. While one could argue that the current jdb is just as nice my goal is to do a full implementation of Smalltalk on the JVM so I wanted to give it a try. Here are some notes on how it went. It seemed like the best way to start was to use the JVMTI api provided by the jvm. It appears to have the capabilities required with the only question being could we access it from inside the same process being debugged. To test this we (Oscar) created a C agent which on one side hooked to the jvm and on the other exposed a jni interface back to code on the same jvm. its about 300 lines of c++ Works great ( with the caveats below ) we have in on windows and OSX doing everything except the stepping. So what are the caveats? None are show stoppers for me but may be of interest to others. 1. JSR 292 broken? features of JVMTI. Stepping seems to have issues when it hits a stack frame with an invokeDynamic instruction Even so the callback model seems like it could be complex to support. I plan to try implementing a step operation using the call sites logic instead. Also complex perhaps. Heap walking when asked to filter on a class also returns BoundMethodHandles and primitive arrays of the class. I solved this via a multipass tagging approach. 2. Deprecated features Suspend and resume are deprecated but are needed to stop a thread to debug it. But they still work of course. 3. The need for a environment specific c agent seems unnecessary and makes it not "run anywhere" For instance one can get stack frames now but not the vars via high level java. We will do a linux version as well so all of our needs are covered. I will be glad to be done with multiplatform C. 4. Everything slows down when the agent is attached even if not used. About 40% slower. Would be nice to have it available ( some folks use all instances in coding ) but we can run without it. In summary everything is pretty much there to support a Smalltalk like debugger. If anyone would like to see the C code let me know regards mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110829/c3bfaca0/attachment.html From christian.thalinger at oracle.com Mon Aug 29 01:00:32 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 29 Aug 2011 01:00:32 -0700 (PDT) Subject: Building trunk hotspot-comp In-Reply-To: References: Message-ID: No. That script just builds HotSpot as is in the hotspot-comp repository (no patches applied). -- Christian On Aug 28, 2011, at 12:44 AM, Mark Roos wrote: > Charles > > Does this script get all of the latest mlvm patches or is there another step to pull those? > > What about the conditional stuff like coro? > > thanks > > mark > > mlvm-dev-bounces at openjdk.java.net wrote on 08/19/2011 12:12:28 PM: > > > From: Charles Oliver Nutter > > To: Da Vinci Machine Project > > Date: 08/19/2011 12:21 PM > > Subject: Building trunk hotspot-comp > > Sent by: mlvm-dev-bounces at openjdk.java.net > > > > I've posted a gist with a script and instructions for setting up a > > trunk build environment for the Hotspot compiler team's branch: > > > > https://gist.github.com/1148321 > > > > Hotspot guys: let me know if anything else needs to be tweaked here. > > It works for me, and it's surprisingly easy to build. > > > > I'll also try to periodically post builds with in-flight patches like > > the recent invokedynamic optimizations. For now the i586 build (with > > everything 292-related up through Christian's inlining discount patch) > > is available at http://jruby.org/jdk. > > > > Linux only right now. I wish bsd-port would get merged in! > > > > - Charlie > > _______________________________________________ > > mlvm-dev mailing list > > mlvm-dev at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110829/840cdf56/attachment.html From mroos at roos.com Mon Aug 29 11:50:49 2011 From: mroos at roos.com (Mark Roos) Date: Mon, 29 Aug 2011 11:50:49 -0700 Subject: Building trunk hotspot-comp In-Reply-To: References: Message-ID: Thx Christian, Is there any efforts to update the build instructions on the wiki? It seems as if we no longer need to do the full JDK build, only the hotspot part, in order to test mlvm updates. It also seems like you are close to integrating the OSX/BSD requirements into the standard hotspot build files. making that a standard compile. It does seem like there would be more interested folks if there were cookbook like instructions to take an existing jdk build and add the mlvm patches and options to it. We are interested in Windows, OSX and (later)Suse versions and would be willing to act as a newbie testing the recipes or trying out suggestions. regards mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110829/3367848e/attachment.html From littlee at linux.vnet.ibm.com Tue Aug 30 00:58:06 2011 From: littlee at linux.vnet.ibm.com (Charles Lee) Date: Tue, 30 Aug 2011 15:58:06 +0800 Subject: plea for a Windows build In-Reply-To: <17EE0EA3-2957-4D85-B674-19A36F174A08@mac.com> References: <17EE0EA3-2957-4D85-B674-19A36F174A08@mac.com> Message-ID: <4E5C980E.90204@linux.vnet.ibm.com> On 08/28/2011 04:17 AM, Chuck Remes wrote: > I'd like to help Charlie Nutter out with testing some of the new invokedynamic support for the JVM on Windows. I hear that the latest fixes (which folks on this list have produced) are *not* part of the openjdk repository and are stored elsewhere. > > While researching the steps necessary to build the JVM for myself on Windows, I quickly saw that it was going to be beyond my capability in the short term. The Cygwin dependency isn't such a big deal, but many of the issues that cause the build to fail (having to do with freetype) seemed rather difficult to work around. Just doing a build of the normal openjdk code is difficult enough without also throwing external patches into the mix which may or may not apply cleanly. > > So, I am hoping I can get help from someone on this list. I assume someone is building and testing on Windows. If they are willing to share their 32-bit and 64-bit builds with me, I am more than willing to run some of my "real world" Ruby code under JRuby and help Charlie vet the correctness and performance of the latest fixes. If such a build already exists and is publicly available, please point me to it (my google searches have been fruitless). > > Please respond to me either here on the list or privately. Thank you. > > cr > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev Hi Chuck, Could you please share your problem on building on Windows? Are they mlvm specific problems? -- Yours Charles From fcassia at gmail.com Tue Aug 30 01:13:42 2011 From: fcassia at gmail.com (Fernando Cassia) Date: Tue, 30 Aug 2011 05:13:42 -0300 Subject: plea for a Windows build In-Reply-To: <4E595D8F.80608@oracle.com> References: <17EE0EA3-2957-4D85-B674-19A36F174A08@mac.com> <4E595D8F.80608@oracle.com> Message-ID: On Sat, Aug 27, 2011 at 18:11, Dalibor Topic wrote: > If you want the bleeding edge of hostpot Speaking of bleeding edge. Is there an ETA for java7 update 1 ? You know, to shut up all those "Java 7 is broken, don't use it" naysayers... TIA... FC From mlists at juma.me.uk Tue Aug 30 01:31:46 2011 From: mlists at juma.me.uk (Ismael Juma) Date: Tue, 30 Aug 2011 09:31:46 +0100 Subject: plea for a Windows build In-Reply-To: References: <17EE0EA3-2957-4D85-B674-19A36F174A08@mac.com> <4E595D8F.80608@oracle.com> Message-ID: On Tue, Aug 30, 2011 at 9:13 AM, Fernando Cassia wrote: > Speaking of bleeding edge. Is there an ETA for java7 update 1 ? You > know, to shut up all those "Java 7 is broken, don't use it" > naysayers... I read in an Eclipse status update that it would be out in October (Eclipse uses Lucene for its Help component and suffers from the highly publicised bug). Not sure if that is the official position though. Meanwhile you can find a "non-broken" Java 7 build in the following link (it has been confirmed by a Lucene committer that it fixes the problem[1]): http://jdk7.java.net/download.html Best, Ismael [1] https://twitter.com/#!/thetaph1/status/108359143374594049 From fcassia at gmail.com Tue Aug 30 01:39:58 2011 From: fcassia at gmail.com (Fernando Cassia) Date: Tue, 30 Aug 2011 05:39:58 -0300 Subject: plea for a Windows build In-Reply-To: References: <17EE0EA3-2957-4D85-B674-19A36F174A08@mac.com> <4E595D8F.80608@oracle.com> Message-ID: On Tue, Aug 30, 2011 at 05:31, Ismael Juma wrote: > I read in an Eclipse status update that it would be out in October > (Eclipse uses Lucene for its Help component and suffers from the > highly publicised bug). Not sure if that is the official position > though. Thanks for the link, Ismael. The "official possition" and estimated ETA is what I was looking for, that's why I quoted Dalibor's message... since Dalibor stopped answering my e-mails (company policy? who knows, my mails were always off-the-record and more from a java fan standpoint), I'm forced to ask him publicly on mailing lists where he participates. :) FC From mlists at juma.me.uk Tue Aug 30 01:43:31 2011 From: mlists at juma.me.uk (Ismael Juma) Date: Tue, 30 Aug 2011 09:43:31 +0100 Subject: plea for a Windows build In-Reply-To: References: <17EE0EA3-2957-4D85-B674-19A36F174A08@mac.com> <4E595D8F.80608@oracle.com> Message-ID: On Tue, Aug 30, 2011 at 9:39 AM, Fernando Cassia wrote: > Thanks for the link, Ismael. The "official possition" and estimated > ETA is what I was looking for Sure. For whatever it's worth, I would imagine that the Eclipse people got their information from an Oracle person. So, while you wait for answers, you have an idea. Best, Ismael From christian.thalinger at oracle.com Wed Aug 31 07:08:39 2011 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 31 Aug 2011 16:08:39 +0200 Subject: RtalkTest examination Message-ID: <675EDB5A-6D04-4619-A271-1DA71724B787@oracle.com> Tom was looking into a bug with his bimorphic inlining patch and said something about massive inlining with RtalkTest so I thought I also give it a shot. One thing I can say for sure: massive inlining and huge methods! We get a lot like: @ 115 java.lang.invoke.MethodHandle::asSpreader (13 bytes) NodeCountInliningCutoff @ 60 ri.core.rtalk.RtFixedObjects::getNil (5 bytes) size > DesiredMethodLimit I think the thing Tom as was a method like e.g.: 79216 390 rtPbc.r184::block$1 (49 bytes) It seems it does recursive inlining of this pattern: @ 3 java.lang.invoke.MethodHandle::invokeExact (7 bytes) inline (hot) @ 3 ri.core.rtalk.RtCallSite::test (16 bytes) inline (hot) @ 1 ri.core.rtalk.RtObject::classField (5 bytes) inline (hot) @ 10 java.lang.invoke.MethodHandle::invokeExact (12 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandleImpl::selectAlternative (10 bytes) inline (hot) @ 23 java.lang.invoke.MethodHandle::invokeExact (30 bytes) inline (hot) @ 23 java.lang.invoke.MethodHandle::invokeExact (8 bytes) inline (hot) @ 1 rtPbc.r192::invoke (10 bytes) inline (hot) @ 4 ri.core.rtalk.RtFixedObjects::getTrue (5 bytes) inline (hot) @ 1 ri.core.rtalk.RtFixedObjects::getFixedAsRtObject (8 bytes) inline (hot) @ 1 ri.core.rtalk.RtFixedObjects::getFixedObject (8 bytes) inline (hot) @ 4 ri.core.container.OrderedMap::getValueAt (9 bytes) inline (hot) @ 5 java.util.ArrayList::get (11 bytes) inline (hot) @ 2 java.util.ArrayList::rangeCheck (22 bytes) inline (hot) @ 7 java.util.ArrayList::elementData (7 bytes) inline (hot) Is that your GWT chains? If so, these are either a little too long or we have a bug somewhere. -- Christian From mroos at roos.com Wed Aug 31 12:23:40 2011 From: mroos at roos.com (Mark Roos) Date: Wed, 31 Aug 2011 12:23:40 -0700 Subject: RtalkTest examination In-Reply-To: <675EDB5A-6D04-4619-A271-1DA71724B787@oracle.com> References: <675EDB5A-6D04-4619-A271-1DA71724B787@oracle.com> Message-ID: Very interesting, I see that I need a hint for the class/method name. ( it is the source file name but that doesn't print here) getNIl/true and false are currently a probe into a hash map but should be a static var. I can see that it could get pretty large to inline. There are some other objects as well that get used a lot as constants shared between java and Rtalk. The deep GWTs are rare normally but appear here due to a method like Object allSublclasses do:[ :o | o name]. There are about 140 classes so the name call site will have a GWT depth of 140. and [] resolves to a java method like block$n. You do see it a lot as I needed it to get the class def not found error. If you look at the depth list ( middle of the console print) you see all of sites with a depth > 2 and its only a few out of the 1000. I'll tweak the code and put up another example. For tom would it be better if I had a test that had only a few GWTs per site? thanks mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110831/0e7e0780/attachment.html From mroos at roos.com Wed Aug 31 12:23:41 2011 From: mroos at roos.com (Mark Roos) Date: Wed, 31 Aug 2011 12:23:41 -0700 Subject: RtalkTest examination In-Reply-To: <675EDB5A-6D04-4619-A271-1DA71724B787@oracle.com> References: <675EDB5A-6D04-4619-A271-1DA71724B787@oracle.com> Message-ID: Is that your GWT chains? If so, these are either a little too long or we have a bug somewhere. I don't think so ( I think this is a constant look up ) The test part of gwt looks like public static boolean test(RtObject oldClass, RtObject receiver) { //System.out.println("test"); //if(rtn) {RtCallSite.sends = RtCallSite.sends + 1; //} //if(rtn == false){ // System.out.println("fail"); //} //System.out.println(receiver); //System.out.println(oldClass.getCfClassName()); boolean rtn=receiver.classField() == oldClass; return rtn; } mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110831/b5639b86/attachment.html From tom.rodriguez at oracle.com Wed Aug 31 12:55:05 2011 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 31 Aug 2011 12:55:05 -0700 Subject: RtalkTest examination In-Reply-To: References: <675EDB5A-6D04-4619-A271-1DA71724B787@oracle.com> Message-ID: On Aug 31, 2011, at 12:23 PM, Mark Roos wrote: > Very interesting, I see that I need a hint for the class/method name. ( it is the source file name but that doesn't print here) > > getNIl/true and false are currently a probe into a hash map but should be a static var. I can > see that it could get pretty large to inline. There are some other objects as well that get used > a lot as constants shared between java and Rtalk. > > The deep GWTs are rare normally but appear here due to a method like > > Object allSublclasses do:[ :o | o name]. > > There are about 140 classes so the name call site will have a GWT depth of 140. > and [] resolves to a java method like block$n. You do see it a lot as I needed it to get > the class def not found error. > > If you look at the depth list ( middle of the console print) you see all of sites with a depth > 2 and its > only a few out of the 1000. > > I'll tweak the code and put up another example. > > For tom would it be better if I had a test that had only a few GWTs per site? I got what I needed out of it but I think you might want to reconsider this structure. Using GWT in a chain that long isn't really going to perform that well, even if it does get compiled and inlined. tom > > thanks > > mark_______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From mroos at roos.com Wed Aug 31 14:23:53 2011 From: mroos at roos.com (Mark Roos) Date: Wed, 31 Aug 2011 14:23:53 -0700 Subject: RtalkTest examination In-Reply-To: References: <675EDB5A-6D04-4619-A271-1DA71724B787@oracle.com> Message-ID: >From Tom I got what I needed out of it but I think you might want to reconsider this structure. Using GWT in a chain that long isn't really going to perform that well, even if it does get compiled and inlined. I have thought of that ( Charles coverts from GWT to vTable at 5 GWTs deep) but its very rare to see more than 2 is real code so the pragmatic part of me says lets see how often it shows up before I 'solve' it. An interesting note is even though there are 140 GWTs they all point to the same methodHandle for the execution path. This happens a lot due to class hierarchy where the code is shared even though the classes are different. I have been looking at a solution to that as the real solution. Effectively all methods of Object are usually shared code. So what I may want to do is test for the unique classes not the matching class. Thanks, the insight on the isNil was very helpful mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110831/fde57815/attachment.html