From christian.thalinger at oracle.com Tue May 1 12:24:41 2012 From: christian.thalinger at oracle.com (christian.thalinger at oracle.com) Date: Tue, 01 May 2012 19:24:41 +0000 Subject: hg: mlvm/mlvm/jdk: meth-lazy: moved to ASM for bytecode generation Message-ID: <20120501192442.244F3470C3@hg.openjdk.java.net> Changeset: 5ec0f6472b4b Author: twisti Date: 2012-05-01 12:24 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/jdk/rev/5ec0f6472b4b meth-lazy: moved to ASM for bytecode generation ! meth-lazy-7023639.patch From forax at univ-mlv.fr Tue May 1 14:38:22 2012 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Tue, 01 May 2012 23:38:22 +0200 Subject: hg: mlvm/mlvm/jdk: meth-lazy: moved to ASM for bytecode generation In-Reply-To: <20120501192442.244F3470C3@hg.openjdk.java.net> References: <20120501192442.244F3470C3@hg.openjdk.java.net> Message-ID: <4FA057CE.2050406@univ-mlv.fr> On 05/01/2012 09:24 PM, christian.thalinger at oracle.com wrote: > Changeset: 5ec0f6472b4b > Author: twisti > Date: 2012-05-01 12:24 -0700 > URL: http://hg.openjdk.java.net/mlvm/mlvm/jdk/rev/5ec0f6472b4b > > meth-lazy: moved to ASM for bytecode generation > > ! meth-lazy-7023639.patch > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev Hi Christian, in emitIconstInsn, you forget ICONST_M1, also ICONST_M1, ICONST_0, ICONST_1, etc are subsequent values so you can write: switch(i) { case -1: case 0: case 1: case 2: case 3: case 4: case 5: opcode = Opcodes.ICONST_0 + i; ... } for emitLoadInsn, emitStoreInsn and emitReturnType, you can use the same switch that map 'I', 'J', 'F', 'D', 'A', 'V' to 0...5, because ISTORE, ILOAD and IRETURN variants are all in the same order. Also nextEIID and nextNFID can use an AtomicInter and its method getAndIncrement instead of a synchronized method, it should be more efficient. cheers, R?mi From christian.thalinger at oracle.com Tue May 1 17:37:46 2012 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 1 May 2012 17:37:46 -0700 Subject: hg: mlvm/mlvm/jdk: meth-lazy: moved to ASM for bytecode generation In-Reply-To: <4FA057CE.2050406@univ-mlv.fr> References: <20120501192442.244F3470C3@hg.openjdk.java.net> <4FA057CE.2050406@univ-mlv.fr> Message-ID: On May 1, 2012, at 2:38 PM, R?mi Forax wrote: > On 05/01/2012 09:24 PM, christian.thalinger at oracle.com wrote: >> Changeset: 5ec0f6472b4b >> Author: twisti >> Date: 2012-05-01 12:24 -0700 >> URL: http://hg.openjdk.java.net/mlvm/mlvm/jdk/rev/5ec0f6472b4b >> >> meth-lazy: moved to ASM for bytecode generation >> >> ! meth-lazy-7023639.patch >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > Hi Christian, > in emitIconstInsn, you forget ICONST_M1, also We don't need M1 :-) > ICONST_M1, ICONST_0, ICONST_1, etc are subsequent values so > you can write: > switch(i) { > case -1: > case 0: > case 1: > case 2: > case 3: > case 4: > case 5: > opcode = Opcodes.ICONST_0 + i; > ... > } I could do that, right. > > for emitLoadInsn, emitStoreInsn and emitReturnType, you can use > the same switch that map 'I', 'J', 'F', 'D', 'A', 'V' to 0...5, > because ISTORE, ILOAD and IRETURN variants are all in the same order. > > Also nextEIID and nextNFID can use an AtomicInter and its method > getAndIncrement > instead of a synchronized method, it should be more efficient. These were supposed to be debugging only since the anonymous class loader defines each class uniquely anyway. Not sure if we keep them or not. -- Chris > > cheers, > R?mi > > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From forax at univ-mlv.fr Wed May 2 03:04:25 2012 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Wed, 02 May 2012 12:04:25 +0200 Subject: hg: mlvm/mlvm/jdk: meth-lazy: moved to ASM for bytecode generation In-Reply-To: References: <20120501192442.244F3470C3@hg.openjdk.java.net> <4FA057CE.2050406@univ-mlv.fr> Message-ID: <4FA106A9.90408@univ-mlv.fr> On 05/02/2012 02:37 AM, Christian Thalinger wrote: > On May 1, 2012, at 2:38 PM, R?mi Forax wrote: > >> On 05/01/2012 09:24 PM, christian.thalinger at oracle.com wrote: >>> Changeset: 5ec0f6472b4b >>> Author: twisti >>> Date: 2012-05-01 12:24 -0700 >>> URL: http://hg.openjdk.java.net/mlvm/mlvm/jdk/rev/5ec0f6472b4b >>> >>> meth-lazy: moved to ASM for bytecode generation >>> >>> ! meth-lazy-7023639.patch >>> >>> _______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> Hi Christian, >> in emitIconstInsn, you forget ICONST_M1, also > We don't need M1 :-) > >> ICONST_M1, ICONST_0, ICONST_1, etc are subsequent values so >> you can write: >> switch(i) { >> case -1: >> case 0: >> case 1: >> case 2: >> case 3: >> case 4: >> case 5: >> opcode = Opcodes.ICONST_0 + i; >> ... >> } > I could do that, right. > >> for emitLoadInsn, emitStoreInsn and emitReturnType, you can use >> the same switch that map 'I', 'J', 'F', 'D', 'A', 'V' to 0...5, >> because ISTORE, ILOAD and IRETURN variants are all in the same order. >> >> Also nextEIID and nextNFID can use an AtomicInter and its method >> getAndIncrement >> instead of a synchronized method, it should be more efficient. > These were supposed to be debugging only since the anonymous class loader defines each class uniquely anyway. Not sure if we keep them or not. Yes, right. > > -- Chris R?mi From john.r.rose at oracle.com Wed May 2 22:14:17 2012 From: john.r.rose at oracle.com (john.r.rose at oracle.com) Date: Thu, 03 May 2012 05:14:17 +0000 Subject: hg: mlvm/mlvm/langtools: rebase to current hsx/hotspot-comp Message-ID: <20120503051418.231934710F@hg.openjdk.java.net> Changeset: 23fbf92fa564 Author: jrose Date: 2012-05-02 22:14 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/langtools/rev/23fbf92fa564 rebase to current hsx/hotspot-comp ! series From john.r.rose at oracle.com Wed May 2 22:16:13 2012 From: john.r.rose at oracle.com (john.r.rose at oracle.com) Date: Thu, 03 May 2012 05:16:13 +0000 Subject: hg: mlvm/mlvm/jdk: 2 new changesets Message-ID: <20120503051613.5CBD847110@hg.openjdk.java.net> Changeset: b3ece8b8fef7 Author: jrose Date: 2012-05-02 22:16 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/jdk/rev/b3ece8b8fef7 rebase to current hsx/hotspot-comp ! series Changeset: 1d4c63c88970 Author: jrose Date: 2012-05-02 22:16 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/jdk/rev/1d4c63c88970 rename Inline to ForceInline ! meth-lazy-7023639.patch From john.r.rose at oracle.com Wed May 2 22:22:23 2012 From: john.r.rose at oracle.com (john.r.rose at oracle.com) Date: Thu, 03 May 2012 05:22:23 +0000 Subject: hg: mlvm/mlvm/hotspot: 3 new changesets Message-ID: <20120503052224.27E1F47111@hg.openjdk.java.net> Changeset: 0623b7f0e9f5 Author: jrose Date: 2012-05-02 22:20 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/hotspot/rev/0623b7f0e9f5 tweak annotation parsing code ! annot-inline.patch ! annot.patch Changeset: a6195ff712bb Author: jrose Date: 2012-05-02 22:21 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/hotspot/rev/a6195ff712bb rebase to current hsx/hotspot-comp ! meth-lazy-7023639.patch ! series Changeset: 3590dd8deb64 Author: jrose Date: 2012-05-02 22:22 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/hotspot/rev/3590dd8deb64 rename Inline to ForceInline ! annot-inline.patch ! meth-lazy-7023639.patch From john.r.rose at oracle.com Wed May 2 22:23:57 2012 From: john.r.rose at oracle.com (john.r.rose at oracle.com) Date: Thu, 03 May 2012 05:23:57 +0000 Subject: hg: mlvm/mlvm: update netbeans-based test setup Message-ID: <20120503052357.CD3A247112@hg.openjdk.java.net> Changeset: 1ecfbcb08379 Author: jrose Date: 2012-05-02 22:23 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/rev/1ecfbcb08379 update netbeans-based test setup ! netbeans/meth/build.xml ! netbeans/meth/nbproject/build-impl.xml ! netbeans/meth/nbproject/project.properties ! netbeans/meth/test/sun/invoke/util/BytecodeNameTest.java From christian.thalinger at oracle.com Thu May 3 14:48:52 2012 From: christian.thalinger at oracle.com (christian.thalinger at oracle.com) Date: Thu, 03 May 2012 21:48:52 +0000 Subject: hg: mlvm/mlvm/jdk: meth-lazy: fixed CP patches size Message-ID: <20120503214852.E9DE947135@hg.openjdk.java.net> Changeset: ff435bb51f76 Author: twisti Date: 2012-05-03 14:48 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/jdk/rev/ff435bb51f76 meth-lazy: fixed CP patches size ! meth-lazy-7023639.patch From henri.gomez at gmail.com Thu May 3 23:43:06 2012 From: henri.gomez at gmail.com (Henri Gomez) Date: Fri, 4 May 2012 08:43:06 +0200 Subject: OSX port In-Reply-To: References: <06E1B1D8-8C6A-409B-BB2C-0313A4765613@gmail.com> Message-ID: Back to this subject. Any date about back port/merge of OSX support in mlvm project ? 2012/4/21 Martijn Verburg : > This would be extremely helpful in being able to organise hack nights with > the Java User Groups globally to beat up on an experimental feature and > provide feedback on it here (such as coroutines). > > Cheers, > Martijn > > On 20 April 2012 00:27, Henri Gomez wrote: >> >> Hi to all, >> >> I'm wondering when OSX support will be merged in mlvm trunk. >> >> I'm releasing OSX packages for stock Java 8, ?jigsaw and lambda easily now >> but mlvm build is still too tricky and a pain. >> >> Any chance to get OSX code merged so OSX Java 8 early adopters could play >> with Continuous packages as they do with others Java 8 projects ? >> >> Cheers >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > From headius at headius.com Fri May 4 00:10:49 2012 From: headius at headius.com (Charles Oliver Nutter) Date: Fri, 4 May 2012 07:10:49 +0000 Subject: OSX port In-Reply-To: References: <06E1B1D8-8C6A-409B-BB2C-0313A4765613@gmail.com> Message-ID: I need this to test the updated indy backend on OS X as well... - Charlie On Fri, May 4, 2012 at 6:43 AM, Henri Gomez wrote: > Back to this subject. > > Any date about back port/merge of OSX support in mlvm project ? > > 2012/4/21 Martijn Verburg : >> This would be extremely helpful in being able to organise hack nights with >> the Java User Groups globally to beat up on an experimental feature and >> provide feedback on it here (such as coroutines). >> >> Cheers, >> Martijn >> >> On 20 April 2012 00:27, Henri Gomez wrote: >>> >>> Hi to all, >>> >>> I'm wondering when OSX support will be merged in mlvm trunk. >>> >>> I'm releasing OSX packages for stock Java 8, ?jigsaw and lambda easily now >>> but mlvm build is still too tricky and a pain. >>> >>> Any chance to get OSX code merged so OSX Java 8 early adopters could play >>> with Continuous packages as they do with others Java 8 projects ? >>> >>> Cheers >>> _______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> >> >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From christian.thalinger at oracle.com Mon May 7 09:34:10 2012 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 7 May 2012 09:34:10 -0700 Subject: OSX port In-Reply-To: References: <06E1B1D8-8C6A-409B-BB2C-0313A4765613@gmail.com> Message-ID: On May 3, 2012, at 11:43 PM, Henri Gomez wrote: > Back to this subject. > > Any date about back port/merge of OSX support in mlvm project ? What exactly do you think is missing? Does the build fail for you? John updated the mlvm patches a couple of days ago to the recent hotspot-comp repository versions which should include OS X support. -- Chris > > 2012/4/21 Martijn Verburg : >> This would be extremely helpful in being able to organise hack nights with >> the Java User Groups globally to beat up on an experimental feature and >> provide feedback on it here (such as coroutines). >> >> Cheers, >> Martijn >> >> On 20 April 2012 00:27, Henri Gomez wrote: >>> >>> Hi to all, >>> >>> I'm wondering when OSX support will be merged in mlvm trunk. >>> >>> I'm releasing OSX packages for stock Java 8, jigsaw and lambda easily now >>> but mlvm build is still too tricky and a pain. >>> >>> Any chance to get OSX code merged so OSX Java 8 early adopters could play >>> with Continuous packages as they do with others Java 8 projects ? >>> >>> Cheers >>> _______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> >> >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From headius at headius.com Mon May 7 14:33:40 2012 From: headius at headius.com (Charles Oliver Nutter) Date: Mon, 7 May 2012 21:33:40 +0000 Subject: OSX port In-Reply-To: References: <06E1B1D8-8C6A-409B-BB2C-0313A4765613@gmail.com> Message-ID: On Mon, May 7, 2012 at 4:34 PM, Christian Thalinger wrote: > What exactly do you think is missing? ?Does the build fail for you? ?John updated the mlvm patches a couple of days ago to the recent hotspot-comp repository versions which should include OS X support. I'll try to give it a shot on my end later tonight. I need to do more frequent builds, so I don't have this bootstrapping process every time :) - Charlie From blackdrag at gmx.org Mon May 7 15:18:44 2012 From: blackdrag at gmx.org (Jochen Theodorou) Date: Tue, 08 May 2012 00:18:44 +0200 Subject: Question about inlining caching by Hotspot and classes implementing an method from an interface Message-ID: <4FA84A44.6020200@gmx.org> Hi all, I have the feeling we have talked about this already, but I couldn't really find it and back then it was independend of invokedynamic... Anyway... Assuming I have an interface Foo with a void method foo(). And assuming I have the class Bar0, Bar1 to BarN implementing that interface. Let us assume further I have code like this: method(bar0) method(bar1) ... method(barN) where bar0,bar1,..., barN are instance of the corresponding Bar classes and method is more or less this: void method(Foo f){f.foo();} I was thinking about how I would have to implement the call site checks for this case in invokedynamic in that method and the current approach would be to check for the implementation class, meaning BarX and meaning also to have an invalidation and reselection of the method each time method is called. This is surely not the optimal case. It surely would be better to check that f implements Foo... well or in the case here we can simply forget that check on the receiver... if that where reflection. But I think that does not work with MethodHandles I think. I guess I will need the reference to the implementation method instead and then also a guard for the receiver. Am I wrong? IS there a way around? THis thought did lead me to another question.... how does the JVM do this for the pure bytecode case? Wouldn't the JVM do mostly the same? I mean add a receiver check for the implementation class, just to invalidate this site for each of the calls to method? Or how is it done there? bye Jochen From miguelalfredo.garcia at epfl.ch Tue May 8 02:11:27 2012 From: miguelalfredo.garcia at epfl.ch (Garcia Gutierrez Miguel Alfredo) Date: Tue, 8 May 2012 09:11:27 +0000 Subject: selective inlining of MH.invokeExact() callsites Message-ID: <7E4228B446372948BBB2916FC53FA49E235DA3EA@REXMB.intranet.epfl.ch> What's the behavior of @ForceInlining , in particular for MethodHandle.invokeExact() ? The context for this question is ongoing brainstorming on a new compilation scheme for the Scala compiler: http://mail.openjdk.java.net/pipermail/graal-dev/2012-May/000038.html http://mail.openjdk.java.net/pipermail/graal-dev/2012-May/000040.html https://groups.google.com/d/topic/jvm-languages/eEp3Z8tS8wo/discussion Miguel http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/ From forax at univ-mlv.fr Tue May 8 03:43:22 2012 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Tue, 08 May 2012 12:43:22 +0200 Subject: Question about inlining caching by Hotspot and classes implementing an method from an interface In-Reply-To: <4FA84A44.6020200@gmx.org> References: <4FA84A44.6020200@gmx.org> Message-ID: <4FA8F8CA.7060305@univ-mlv.fr> f.foo() is a megamorphic call so the inlining cache does not work, too many possible receiver types, so the inlining cache tree is like a linked list, painfully. With a language like Java, the VM will fallback to use a vtable dispatch, you have no inlining but it's faster than either update the inlining cache each time or to crawle the linked list. So the idea here is to do exactly the same thing and simulate a vtable. Because it's a dynamic language, you can't construct a vtable but you can construct a dynamic lookup table for this precise call site. I've tried to explain this pattern at the JVM Summit last year, but I'm not sure someone even understand me :( Anyway, take a look to the sample bi-morphic cache in the cookbook [1], look for the comment 'bimorphic cache defeated', the DispatchMap is what you' looking for. R?mi [1] http://code.google.com/p/jsr292-cookbook/source/browse/trunk/bimorphic-cache/src/jsr292/cookbook/bicache/RT.java On 05/08/2012 12:18 AM, Jochen Theodorou wrote: > Hi all, > > I have the feeling we have talked about this already, but I couldn't > really find it and back then it was independend of invokedynamic... > Anyway... > > > Assuming I have an interface Foo with a void method foo(). And assuming > I have the class Bar0, Bar1 to BarN implementing that interface. Let us > assume further I have code like this: > > > method(bar0) > method(bar1) > ... > method(barN) > > where bar0,bar1,..., barN are instance of the corresponding Bar classes > and method is more or less this: > > void method(Foo f){f.foo();} > > I was thinking about how I would have to implement the call site checks > for this case in invokedynamic in that method and the current approach > would be to check for the implementation class, meaning BarX and meaning > also to have an invalidation and reselection of the method each time > method is called. This is surely not the optimal case. It surely would > be better to check that f implements Foo... well or in the case here we > can simply forget that check on the receiver... if that where > reflection. But I think that does not work with MethodHandles I think. I > guess I will need the reference to the implementation method instead and > then also a guard for the receiver. Am I wrong? IS there a way around? > > THis thought did lead me to another question.... how does the JVM do > this for the pure bytecode case? Wouldn't the JVM do mostly the same? I > mean add a receiver check for the implementation class, just to > invalidate this site for each of the calls to method? Or how is it done > there? > > bye Jochen > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From henri.gomez at gmail.com Tue May 8 09:00:08 2012 From: henri.gomez at gmail.com (Henri Gomez) Date: Tue, 8 May 2012 18:00:08 +0200 Subject: OSX port In-Reply-To: References: <06E1B1D8-8C6A-409B-BB2C-0313A4765613@gmail.com> Message-ID: >> Any date about back port/merge of OSX support in mlvm project ? > > What exactly do you think is missing? ?Does the build fail for you? ?John updated the mlvm patches a couple of days ago to the recent hotspot-comp repository versions which should include OS X support. Good news. We still need multiples patches process ? cf: http://openjdk-osx-build.googlecode.com/svn/trunk/build-openjdk8-64-mlvm.sh From headius at headius.com Tue May 8 15:29:58 2012 From: headius at headius.com (Charles Oliver Nutter) Date: Tue, 8 May 2012 22:29:58 +0000 Subject: OSX port In-Reply-To: References: <06E1B1D8-8C6A-409B-BB2C-0313A4765613@gmail.com> Message-ID: On Tue, May 8, 2012 at 4:00 PM, Henri Gomez wrote: > cf: http://openjdk-osx-build.googlecode.com/svn/trunk/build-openjdk8-64-mlvm.sh I'm trying a build with that on Snow Leopard now. Ran into an issue first time, now trying again with a clean env. - Charlie From christian.thalinger at oracle.com Wed May 9 10:54:29 2012 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 9 May 2012 10:54:29 -0700 Subject: OSX port In-Reply-To: References: <06E1B1D8-8C6A-409B-BB2C-0313A4765613@gmail.com> Message-ID: On May 8, 2012, at 9:00 AM, Henri Gomez wrote: >>> Any date about back port/merge of OSX support in mlvm project ? >> >> What exactly do you think is missing? Does the build fail for you? John updated the mlvm patches a couple of days ago to the recent hotspot-comp repository versions which should include OS X support. > > Good news. > > We still need multiples patches process ? I'm not sure I understand what you mean by "multiple patches" but your build script seems to be doing what is required. -- Chris > > cf: http://openjdk-osx-build.googlecode.com/svn/trunk/build-openjdk8-64-mlvm.sh > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From christian.thalinger at oracle.com Wed May 9 11:02:17 2012 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 9 May 2012 11:02:17 -0700 Subject: selective inlining of MH.invokeExact() callsites In-Reply-To: <7E4228B446372948BBB2916FC53FA49E235DA3EA@REXMB.intranet.epfl.ch> References: <7E4228B446372948BBB2916FC53FA49E235DA3EA@REXMB.intranet.epfl.ch> Message-ID: <5AAB5618-A25F-47D8-BB03-0C13ADE9306C@oracle.com> On May 8, 2012, at 2:11 AM, Garcia Gutierrez Miguel Alfredo wrote: > > What's the behavior of @ForceInlining , in particular for MethodHandle.invokeExact() ? We introduced that annotation as an experiment for inlining exact invokers for LambdaForms (note: LambdaForm is not directly related to Project Lambda). The generated bytecode versions of these LFs are usually just argument shuffling or binding and calling the target. We know that this code compiles down to almost nothing in native machine code and that's why we force them to inline. The ForceInline annotation is a powerful tool which may produce bad results in the hands of costumers. Currently we don't hide it completely from users but that might happen in the future. -- Chris > > The context for this question is ongoing brainstorming on a new compilation scheme for the Scala compiler: > > http://mail.openjdk.java.net/pipermail/graal-dev/2012-May/000038.html > http://mail.openjdk.java.net/pipermail/graal-dev/2012-May/000040.html > > https://groups.google.com/d/topic/jvm-languages/eEp3Z8tS8wo/discussion > > > Miguel > http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/ > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From headius at headius.com Wed May 9 11:07:26 2012 From: headius at headius.com (Charles Oliver Nutter) Date: Wed, 9 May 2012 18:07:26 +0000 Subject: OSX port In-Reply-To: References: <06E1B1D8-8C6A-409B-BB2C-0313A4765613@gmail.com> Message-ID: I managed to get MLVM to build on OS X Snow Leopard with Henri's scripts. Notes: * I get a failure like this at the end of the build, but this appears to be after the jdk/jre have successfully built (or at least it appears to work fine): https://gist.github.com/749b4fe1d3b469644c11 * In order to get the meth-lazy patches to apply (I'm testing the new indy backend) I had to remove -testable from them. I'd like a cleaner way. - Charlie On Wed, May 9, 2012 at 5:54 PM, Christian Thalinger wrote: > > On May 8, 2012, at 9:00 AM, Henri Gomez wrote: > >>>> Any date about back port/merge of OSX support in mlvm project ? >>> >>> What exactly do you think is missing? ?Does the build fail for you? ?John updated the mlvm patches a couple of days ago to the recent hotspot-comp repository versions which should include OS X support. >> >> Good news. >> >> We still need multiples patches process ? > > I'm not sure I understand what you mean by "multiple patches" but your build script seems to be doing what is required. > > -- Chris > >> >> cf: http://openjdk-osx-build.googlecode.com/svn/trunk/build-openjdk8-64-mlvm.sh >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From forax at univ-mlv.fr Wed May 9 11:15:24 2012 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Wed, 09 May 2012 20:15:24 +0200 Subject: selective inlining of MH.invokeExact() callsites In-Reply-To: <5AAB5618-A25F-47D8-BB03-0C13ADE9306C@oracle.com> References: <7E4228B446372948BBB2916FC53FA49E235DA3EA@REXMB.intranet.epfl.ch> <5AAB5618-A25F-47D8-BB03-0C13ADE9306C@oracle.com> Message-ID: <4FAAB43C.7080104@univ-mlv.fr> On 05/09/2012 08:02 PM, Christian Thalinger wrote: > On May 8, 2012, at 2:11 AM, Garcia Gutierrez Miguel Alfredo wrote: > >> What's the behavior of @ForceInlining , in particular for MethodHandle.invokeExact() ? > We introduced that annotation as an experiment for inlining exact invokers for LambdaForms (note: LambdaForm is not directly related to Project Lambda). The generated bytecode versions of these LFs are usually just argument shuffling or binding and calling the target. We know that this code compiles down to almost nothing in native machine code and that's why we force them to inline. > > The ForceInline annotation is a powerful tool which may produce bad results in the hands of costumers. Currently we don't hide it completely from users but that might happen in the future. and Christian correctly if I'm wrong but it only forces inlining when the JIT is triggered. So using the annotation on something which is not called enough will do nothing. As I said in an earlier mail, the golden hammer is the annotation PleaseJITThisCodeWithAPrivateMetaDataObjectNowAndReJITLaterWhenTheMethodOrTheLoopIsHot > > -- Chris R?mi From john.r.rose at oracle.com Wed May 9 12:04:40 2012 From: john.r.rose at oracle.com (John Rose) Date: Wed, 9 May 2012 12:04:40 -0700 Subject: OSX port In-Reply-To: References: <06E1B1D8-8C6A-409B-BB2C-0313A4765613@gmail.com> Message-ID: <30DB491D-6A1B-41D9-B57C-A629716D7E1C@oracle.com> On May 9, 2012, at 11:07 AM, Charles Oliver Nutter wrote: > I managed to get MLVM to build on OS X Snow Leopard with Henri's scripts. Notes: > > * I get a failure like this at the end of the build, but this appears > to be after the jdk/jre have successfully built (or at least it > appears to work fine): https://gist.github.com/749b4fe1d3b469644c11 Yes, I get this too. I don't know where it comes from. There is probably some configuration variable that will make it go away. Like you, I saw that the build was usable and moved on. > * In order to get the meth-lazy patches to apply (I'm testing the new > indy backend) I had to remove -testable from them. I'd like a cleaner > way. Christian Thalinger, Michael Haupt, and I are working hard to get these patches ready for prime time. There are several problems that show up in comprehensive testing. The worst is that TieredCompilation mode shows some failures. Until we get cleaner tests, we're keeping "-testable". (This raises the question of whether the keyword is misleading but I can't think of another candidate meaning for "testable". Keyword means "able to pass reasonable tests", not "able to commence testing"; the latter phrase is the meaning of "buildable".) You could regard the act of removing "-testable" as the equivalent of clicking through the "yes I know what I'm doing" button. You can disable the testable filter by adjusting either the series file (which is managed source code) or the guards file (which is unmanaged and private to the installation). ? John From john.r.rose at oracle.com Wed May 9 12:34:58 2012 From: john.r.rose at oracle.com (John Rose) Date: Wed, 9 May 2012 12:34:58 -0700 Subject: Lazy Method Handle update Message-ID: Lazy Method Handles is a project to defer code generation for method handle behavior, and minimize requirements for support from hand-written assembly code. The idea is to make all method handle behavior visible to the JVM compiler, so that it can be optimized properly. Assembly code cannot be optimized. In JDK 7 FCS a method handle is represented as a chain of argument transformation blocks, ending in a pointer to a methodOop. The argument transformations are assembly coded and work in the interpreter stack. The reason this is not outrageously slow is that we vigorously inline method handle calls whenever we can. But there is a performance cliff you can drop off of, when you are working with non-constant MHs. (BTW, invokedynamic almost always inlines its target.) Project Lambda needs us not to drop off of this cliff. To fix this, we are now representing the argument transformations using a simple AST-like IR, called a LambdaForm. This form can be easily rendered down to bytecodes. (Eventually it maybe directly rendered to native code.) The form is *also* interpretable by a Java-coded AST walker. This allows the system to be lazy, and to work hardest on optimizing those method handles that are actually called frequently. The laziness also helps simplify bootstrapping. The remaining assembly code is much smaller, and can be mirrored in the JIT IR and optimized. Here's an update on where we are. Christian Thalinger, Michel Haupt, and I are currently working on the following tasks: A. clean out the compiled method calling path, for non-constant method handles B. flatten the BMH layout (no boxing, linked lists, or arrays) C. make the handling of MethodType checking visible to the compiler (removing more assembly code) D. tuning reuse and compilation of LambdaForm instances E. profiling MH.LambdaForm values at MH call sites F. tuning optimization of call sites involving LFs For A. the remaining snag is getting the argument register assignments correct for the call to the target method. There is also an issue with representing non-nominal calls in the backend. For B. we are currently working on bootstrap issues. The idea here is that, while we can do escape analysis, etc., a cleaner data structure will make the compiler succeed more often. For C. we have a refactoring in process for moving the MT value out of the methodOop. Chris, Michael, and I are working on A, B, C, respectively. We think a first cut of lazy MHs needs the first three items in order to be reasonably faster than the all-assembly implementation of JDK 7. In order to address the infamous NoClassDefFound error, we are minimizing nominal information in MH adapter code (LambdaForms and their bytecode). Only names on the BCP will be in adapter code. Part C. is an important part of this, since it allows the system to internally "flatten" calls like MH.invokeExact((MyFunnyType)x) to MH.invokeExact((Object)x). The new internal MH methods (invokeBasic, invokeStatic, etc.) all use "denominalized" types, which is to say that all reference types are represented as java.lang.Object. Best wishes, ? John From mroos at roos.com Wed May 9 13:37:20 2012 From: mroos at roos.com (Mark Roos) Date: Wed, 9 May 2012 13:37:20 -0700 Subject: Lazy Method Handle update In-Reply-To: References: Message-ID: Quite interesting John. A few curiosity based questions. we are now representing the argument transformations using a simple AST-like IR, called a LambdaForm would this be something we could inspect, build or manipulate? This form can be easily rendered down to bytecodes Are these jvm bytecodes or internal to hotspot? Again would this be a space one could play in? thanks for the update mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20120509/a1f7bc23/attachment.html From christian.thalinger at oracle.com Wed May 9 14:01:19 2012 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 9 May 2012 14:01:19 -0700 Subject: selective inlining of MH.invokeExact() callsites In-Reply-To: <4FAAB43C.7080104@univ-mlv.fr> References: <7E4228B446372948BBB2916FC53FA49E235DA3EA@REXMB.intranet.epfl.ch> <5AAB5618-A25F-47D8-BB03-0C13ADE9306C@oracle.com> <4FAAB43C.7080104@univ-mlv.fr> Message-ID: <62F2B90B-DC09-4F65-B725-6435509AB994@oracle.com> On May 9, 2012, at 11:15 AM, R?mi Forax wrote: > On 05/09/2012 08:02 PM, Christian Thalinger wrote: >> On May 8, 2012, at 2:11 AM, Garcia Gutierrez Miguel Alfredo wrote: >> >>> What's the behavior of @ForceInlining , in particular for MethodHandle.invokeExact() ? >> We introduced that annotation as an experiment for inlining exact invokers for LambdaForms (note: LambdaForm is not directly related to Project Lambda). The generated bytecode versions of these LFs are usually just argument shuffling or binding and calling the target. We know that this code compiles down to almost nothing in native machine code and that's why we force them to inline. >> >> The ForceInline annotation is a powerful tool which may produce bad results in the hands of costumers. Currently we don't hide it completely from users but that might happen in the future. > > and Christian correctly if I'm wrong but it only forces inlining when > the JIT is triggered. > So using the annotation on something which is not called enough will do > nothing. Correct. > > As I said in an earlier mail, the golden hammer is the annotation > PleaseJITThisCodeWithAPrivateMetaDataObjectNowAndReJITLaterWhenTheMethodOrTheLoopIsHot Also correct :-D -- Chris > > >> >> -- Chris > > R?mi > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From john.r.rose at oracle.com Wed May 9 14:14:41 2012 From: john.r.rose at oracle.com (John Rose) Date: Wed, 9 May 2012 14:14:41 -0700 Subject: Lazy Method Handle update In-Reply-To: References: Message-ID: <79C202C0-A688-4876-9F85-387B7FBC7DF5@oracle.com> On May 9, 2012, at 1:37 PM, Mark Roos wrote: > > Quite interesting John. A few curiosity based questions. > > we are now representing the argument transformations using > a simple AST-like IR, called a LambdaForm > > would this be something we could inspect, build or manipulate? Only in privileged code. Like the sun.misc.Unsafe API, LFs and the associated non-public MH APIs are designed to be partially type-unsafe, and usable only from trusted code. There may be a type-safe way to use them beyond implementing JSR 292, but that would require further research. (Which this community enjoys!) For an approximately current sketch of lambda forms, see this file: http://hg.openjdk.java.net/mlvm/mlvm/hotspot/file/tip/meth-lazy.txt Especially the section "## form". A key feature of LFs is that their grammar is regular but not recursive (I can't resist: sub-Chomskian). > This form can be easily rendered down to bytecodes > > Are these jvm bytecodes or internal to hotspot? Again would this be a space one > could play in? Vanilla JVM bytecodes. Some of the magic comes from the methods they are able to call, like MethodHandle.invokeBasic. These work (in a sense) like extended bytecodes. > thanks for the update My pleasure. ? John From mroos at roos.com Wed May 9 16:22:14 2012 From: mroos at roos.com (Mark Roos) Date: Wed, 9 May 2012 16:22:14 -0700 Subject: Lazy Method Handle update In-Reply-To: <79C202C0-A688-4876-9F85-387B7FBC7DF5@oracle.com> References: <79C202C0-A688-4876-9F85-387B7FBC7DF5@oracle.com> Message-ID: from John There may be a type-safe way to use them... Well since I only have one type I guess I don't have to wait -) regards mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20120509/0506f92a/attachment.html From headius at headius.com Wed May 9 20:30:59 2012 From: headius at headius.com (Charles Oliver Nutter) Date: Wed, 9 May 2012 22:30:59 -0500 Subject: Lazy Method Handle update In-Reply-To: References: Message-ID: Thanks for the update, John! Comments below... On Wed, May 9, 2012 at 2:34 PM, John Rose wrote: > In JDK 7 FCS a method handle is represented as a chain of argument transformation blocks, ending in a pointer to a methodOop. ?The argument transformations are assembly coded and work in the interpreter stack. ?The reason this is not outrageously slow is that we vigorously inline method handle calls whenever we can. ?But there is a performance cliff you can drop off of, when you are working with non-constant MHs. ?(BTW, invokedynamic almost always inlines its target.) ?Project Lambda needs us not to drop off of this cliff. And I need you to not drop off that cliff too! It's very easy to trigger...just make a method big enough, and AAAAAAAARRGH into the pit you go. Luckily, for the ambitious early-access JRuby users running JRuby master + Java 7u2+ in production, the code they're hitting is all small enough to avoid the cliff, but with JRuby 1.7 preview release coming out in a couple weeks more people are going to start trying things out. > To fix this, we are now representing the argument transformations using a simple AST-like IR, called a LambdaForm. ?This form can be easily rendered down to bytecodes. ?(Eventually it maybe directly rendered to native code.) ?The form is *also* interpretable by a Java-coded AST walker. ?This allows the system to be lazy, and to work hardest on optimizing those method handles that are actually called frequently. ?The laziness also helps simplify bootstrapping. ?The remaining assembly code is much smaller, and can be mirrored in the JIT IR and optimized. It also creates some *epic* stack traces when it blows up. Will those fold away in the future? > Here's an update on where we are. ?Christian Thalinger, Michel Haupt, and I are currently working on the following tasks: > > A. clean out the compiled method calling path, for non-constant method handles > B. flatten the BMH layout (no boxing, linked lists, or arrays) > C. make the handling of MethodType checking visible to the compiler (removing more assembly code) > D. tuning reuse and compilation of LambdaForm instances > E. profiling MH.LambdaForm values at MH call sites > F. tuning optimization of call sites involving LFs I have been tossing numbers and benchmarks back and forth with Christian, and now testing a local build of the meth-lazy stuff myself. Numbers haven't been great, but I think Christian made great progress today (based on an email showing C1 + indy beating C1 without indy and drastically beating C1 + indy in a stock u6 build that falls off the cliff). It's very exciting! > For A. the remaining snag is getting the argument register assignments correct for the call to the target method. ?There is also an issue with representing non-nominal calls in the backend. I assume this is the problem Christian described to me, where it was calling back into the interpreter to fix up the arguments? > For B. we are currently working on bootstrap issues. ?The idea here is that, while we can do escape analysis, etc., a cleaner data structure will make the compiler succeed more often. I will be *thrilled* when EA works across indy call sites. We have started work on our new compiler, which uses a simpler intermediate representation and which will be indy-only from day 1. Already we're seeing gains since we don't have to hand-write all the different call paths we want to represent; we can wire up any combinations of arguments, handles, and target using only method handles. That means we do things that will be ripe for EA like: * Allocating heap storage for closures right next to the closure creation * Passing closures as a handle rather than as an opaque, polymorphic structure * Specializing closure-receiving code in *our* compiler until Hotspot can specialize it for us I'd be very surprised if we can't approach Java performance for the *general* cases of Ruby code by end of year, and if we can specialize closure-receiving code *and* get EA, we might be able to compete with Java 8 lambda performance for Ruby's closures too. We also have our own profiling, inlining, and so on...but that's all above the level of bytecode to work around as-yet-unoptimized patterns in Hotspot. :) > For C. we have a refactoring in process for moving the MT value out of the methodOop. > > Chris, Michael, and I are working on A, B, C, respectively. ?We think a first cut of lazy MHs needs the first three items in order to be reasonably faster than the all-assembly implementation of JDK 7. > > In order to address the infamous NoClassDefFound error, we are minimizing nominal information in MH adapter code (LambdaForms and their bytecode). ?Only names on the BCP will be in adapter code. ? Part C. is an important part of this, since it allows the system to internally "flatten" calls like MH.invokeExact((MyFunnyType)x) to MH.invokeExact((Object)x). ?The new internal MH methods (invokeBasic, invokeStatic, etc.) all use "denominalized" types, which is to say that all reference types are represented as java.lang.Object. I have not been able to stump Chris with any NCDFEs lately, so that's good. But I do have some hacks in place to prevent them I can't remove until the new logic solidifies a bit. Now that the logic has started to land, I'm going to do some benchmarking and assembly-reading of my own to help from my end. And hopefully there's a chance I'll be able to help more directly over the summer. Very exciting stuff...I'm thrilled that dynlangs and indy are being taken so seriously. I told a couple thousand people at JAX 2012 how strongly I believe that indy is the most important work happening on the JVM right now, and I'm looking forward to doing more and more with it :) - Charlie From headius at headius.com Wed May 9 21:36:39 2012 From: headius at headius.com (Charles Oliver Nutter) Date: Wed, 9 May 2012 23:36:39 -0500 Subject: meth-lazy bug in bytecode generation? Message-ID: I'm getting a lot of these messages trying to test against meth-lazy, which is preventing me from validating it for larger benchmarks and JRuby's test suite: Invalid gemspec in [/Users/headius/projects/jruby/lib/ruby/gems/shared/specifications/treetop-1.4.10.gemspec]: =============== DEBUG MESSAGE: illegal bytecode sequence - method not verified ================ Invalid gemspec in [/Users/headius/projects/jruby/lib/ruby/gems/shared/specifications/trinidad-1.3.5.gemspec]: =============== DEBUG MESSAGE: illegal bytecode sequence - method not verified ================ Invalid gemspec in [/Users/headius/projects/jruby/lib/ruby/gems/shared/specifications/trinidad_jars-1.0.3.gemspec]: =============== DEBUG MESSAGE: illegal bytecode sequence - method not verified ================ Invalid gemspec in [/Users/headius/projects/jruby/lib/ruby/gems/shared/specifications/tzinfo-0.3.33.gemspec]: =============== DEBUG MESSAGE: illegal bytecode sequence - method not verified ================ I tracked this message down to TemplateInterpreterGenerator::set_entry_points in templateInterpreter.cpp. If I'm reading it right, this message is set as a default for several of the various entry points, which are *supposed* to be updated if the given bytecode is valid. If it's not, you get this result when it attempts to proceed. Now normally JRuby loads itself from boot classloader, to skip verification for code that has been verified a million times in test runs. So I thought perhaps if I forced it to verify I'd see it fail in the actual verifier. No dice, I still get these messages. If I turn off indy, the messages go away. If I run without meth-lazy, the messages go away. Christian suggested -Xverify:all, which provides the following...so I guess something is generating bad code here? Invalid gemspec in [/Users/headius/projects/jruby/lib/ruby/gems/shared/specifications/trinidad_jars-1.0.3.gemspec]: (class: java/lang/invoke/LambdaForm$LFI, method: invoke signature: (Ljava/lang/invoke/MethodHandle;Ljava/lang/Object;Ljava/lang/Object;)V) Expecting to find integer on stack - Charlie From michael.haupt at oracle.com Wed May 9 23:24:32 2012 From: michael.haupt at oracle.com (Michael Haupt) Date: Thu, 10 May 2012 08:24:32 +0200 Subject: meth-lazy bug in bytecode generation? In-Reply-To: References: Message-ID: <2BF002CA-5D6F-4B6E-9B7A-DF52C74849D8@oracle.com> Charlie, Am 10.05.2012 um 06:36 schrieb Charles Oliver Nutter: > Christian suggested -Xverify:all, which provides > the following...so I guess something is generating bad code here? > > Invalid gemspec in > [/Users/headius/projects/jruby/lib/ruby/gems/shared/specifications/trinidad_jars-1.0.3.gemspec]: > (class: java/lang/invoke/LambdaForm$LFI, method: invoke signature: > (Ljava/lang/invoke/MethodHandle;Ljava/lang/Object;Ljava/lang/Object;)V) > Expecting to find integer on stack hard to tell what's wrong without seeing the bytecode. :-) Please set InvokerBytecodeGenerator.DUMP_CLASS_FILES to true to have all LFI (NFI, EI) class files serialised. This also enables unique numbering. The dumped class files contain textual descriptions of the respective LFs as well, so it's easier to see what the problem is all about. Best, Michael -- Dr. Michael Haupt | Principal Member of Technical Staff Phone: +49 331 200 7277 | Fax: +49 331 200 7561 Oracle Labs Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany Oracle is committed to developing practices and products that help protect the environment -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20120510/03b034b1/attachment.html From jeroen at sumatra.nl Thu May 10 00:16:39 2012 From: jeroen at sumatra.nl (Jeroen Frijters) Date: Thu, 10 May 2012 07:16:39 +0000 Subject: Lazy Method Handle update In-Reply-To: References: Message-ID: Thanks for the update. This sounds great. It looks like this would be helpful for me as well. Is the LambdaForm stuff written in Java? Hopefully my strategy of doing nothing will pay off ;-) Regards, Jeroen > -----Original Message----- > From: mlvm-dev-bounces at openjdk.java.net [mailto:mlvm-dev- > bounces at openjdk.java.net] On Behalf Of John Rose > Sent: Wednesday, May 09, 2012 21:35 > To: Da Vinci Machine Project > Subject: Lazy Method Handle update > > Lazy Method Handles is a project to defer code generation for method > handle behavior, and minimize requirements for support from hand-written > assembly code. > > The idea is to make all method handle behavior visible to the JVM > compiler, so that it can be optimized properly. Assembly code cannot be > optimized. > > In JDK 7 FCS a method handle is represented as a chain of argument > transformation blocks, ending in a pointer to a methodOop. The argument > transformations are assembly coded and work in the interpreter stack. > The reason this is not outrageously slow is that we vigorously inline > method handle calls whenever we can. But there is a performance cliff > you can drop off of, when you are working with non-constant MHs. (BTW, > invokedynamic almost always inlines its target.) Project Lambda needs > us not to drop off of this cliff. > > To fix this, we are now representing the argument transformations using > a simple AST-like IR, called a LambdaForm. This form can be easily > rendered down to bytecodes. (Eventually it maybe directly rendered to > native code.) The form is *also* interpretable by a Java-coded AST > walker. This allows the system to be lazy, and to work hardest on > optimizing those method handles that are actually called frequently. > The laziness also helps simplify bootstrapping. The remaining assembly > code is much smaller, and can be mirrored in the JIT IR and optimized. > > Here's an update on where we are. Christian Thalinger, Michel Haupt, > and I are currently working on the following tasks: > > A. clean out the compiled method calling path, for non-constant method > handles B. flatten the BMH layout (no boxing, linked lists, or arrays) > C. make the handling of MethodType checking visible to the compiler > (removing more assembly code) D. tuning reuse and compilation of > LambdaForm instances E. profiling MH.LambdaForm values at MH call sites > F. tuning optimization of call sites involving LFs > > For A. the remaining snag is getting the argument register assignments > correct for the call to the target method. There is also an issue with > representing non-nominal calls in the backend. > > For B. we are currently working on bootstrap issues. The idea here is > that, while we can do escape analysis, etc., a cleaner data structure > will make the compiler succeed more often. > > For C. we have a refactoring in process for moving the MT value out of > the methodOop. > > Chris, Michael, and I are working on A, B, C, respectively. We think a > first cut of lazy MHs needs the first three items in order to be > reasonably faster than the all-assembly implementation of JDK 7. > > In order to address the infamous NoClassDefFound error, we are > minimizing nominal information in MH adapter code (LambdaForms and their > bytecode). Only names on the BCP will be in adapter code. Part C. is > an important part of this, since it allows the system to internally > "flatten" calls like MH.invokeExact((MyFunnyType)x) to > MH.invokeExact((Object)x). The new internal MH methods (invokeBasic, > invokeStatic, etc.) all use "denominalized" types, which is to say that > all reference types are represented as java.lang.Object. > > Best wishes, > - John > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From headius at headius.com Thu May 10 09:49:18 2012 From: headius at headius.com (Charles Oliver Nutter) Date: Thu, 10 May 2012 11:49:18 -0500 Subject: meth-lazy bug in bytecode generation? In-Reply-To: <2BF002CA-5D6F-4B6E-9B7A-DF52C74849D8@oracle.com> References: <2BF002CA-5D6F-4B6E-9B7A-DF52C74849D8@oracle.com> Message-ID: On Thu, May 10, 2012 at 1:24 AM, Michael Haupt wrote: > hard to tell what's wrong without seeing the bytecode. :-) Indeed! > Please set InvokerBytecodeGenerator.DUMP_CLASS_FILES to true to have all > LFI (NFI, EI) class files serialised. This also enables unique numbering. > The dumped class files contain textual descriptions of the respective LFs as > well, so it's easier to see what the problem is all about. Ok, I'll see if I can get dumps today. I did show it to Christian last night, so he may have something in progress to investigate...but I'm happy to provide dumps too. - Charlie From headius at headius.com Thu May 10 10:33:30 2012 From: headius at headius.com (Charles Oliver Nutter) Date: Thu, 10 May 2012 12:33:30 -0500 Subject: meth-lazy bug in bytecode generation? In-Reply-To: References: <2BF002CA-5D6F-4B6E-9B7A-DF52C74849D8@oracle.com> Message-ID: Here's a dump of one of the classes in question. Note that offset 7 in the invoke method is definitely wrong: https://gist.github.com/5a5e49f31bc79ac344b3 Christian so far can't reproduce, but it seems like a strange thing to be a problem only in my build... - Charlie On Thu, May 10, 2012 at 11:49 AM, Charles Oliver Nutter wrote: > On Thu, May 10, 2012 at 1:24 AM, Michael Haupt wrote: >> hard to tell what's wrong without seeing the bytecode. :-) > > Indeed! > >> Please set InvokerBytecodeGenerator.DUMP_CLASS_FILES to true to have all >> LFI (NFI, EI) class files serialised. This also enables unique numbering. >> The dumped class files contain textual descriptions of the respective LFs as >> well, so it's easier to see what the problem is all about. > > Ok, I'll see if I can get dumps today. I did show it to Christian last > night, so he may have something in progress to investigate...but I'm > happy to provide dumps too. > > - Charlie From christian.thalinger at oracle.com Thu May 10 11:09:29 2012 From: christian.thalinger at oracle.com (christian.thalinger at oracle.com) Date: Thu, 10 May 2012 18:09:29 +0000 Subject: hg: mlvm/mlvm/jdk: meth-lazy: removed BytecodeBuffer; other minor changes Message-ID: <20120510180929.C477147232@hg.openjdk.java.net> Changeset: 32e044b02c0b Author: twisti Date: 2012-05-10 11:09 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/jdk/rev/32e044b02c0b meth-lazy: removed BytecodeBuffer; other minor changes ! meth-lazy-7023639.patch From christian.thalinger at oracle.com Thu May 10 11:11:21 2012 From: christian.thalinger at oracle.com (christian.thalinger at oracle.com) Date: Thu, 10 May 2012 18:11:21 +0000 Subject: hg: mlvm/mlvm/hotspot: meth-lazy: fixed C1; C2 still not working Message-ID: <20120510181121.CD40047233@hg.openjdk.java.net> Changeset: f45d25ab71e1 Author: twisti Date: 2012-05-10 11:11 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/hotspot/rev/f45d25ab71e1 meth-lazy: fixed C1; C2 still not working ! meth-lazy-7023639.jit.patch From christian.thalinger at oracle.com Thu May 10 11:13:26 2012 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Thu, 10 May 2012 11:13:26 -0700 Subject: meth-lazy bug in bytecode generation? In-Reply-To: References: <2BF002CA-5D6F-4B6E-9B7A-DF52C74849D8@oracle.com> Message-ID: On May 10, 2012, at 10:33 AM, Charles Oliver Nutter wrote: > Here's a dump of one of the classes in question. Note that offset 7 in > the invoke method is definitely wrong: > https://gist.github.com/5a5e49f31bc79ac344b3 > > Christian so far can't reproduce, but it seems like a strange thing to > be a problem only in my build... I just pushed my latest changes. Let me know if it still reproduces. -- Chris > > - Charlie > > On Thu, May 10, 2012 at 11:49 AM, Charles Oliver Nutter > wrote: >> On Thu, May 10, 2012 at 1:24 AM, Michael Haupt wrote: >>> hard to tell what's wrong without seeing the bytecode. :-) >> >> Indeed! >> >>> Please set InvokerBytecodeGenerator.DUMP_CLASS_FILES to true to have all >>> LFI (NFI, EI) class files serialised. This also enables unique numbering. >>> The dumped class files contain textual descriptions of the respective LFs as >>> well, so it's easier to see what the problem is all about. >> >> Ok, I'll see if I can get dumps today. I did show it to Christian last >> night, so he may have something in progress to investigate...but I'm >> happy to provide dumps too. >> >> - Charlie > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From john.r.rose at oracle.com Thu May 10 11:34:56 2012 From: john.r.rose at oracle.com (John Rose) Date: Thu, 10 May 2012 11:34:56 -0700 Subject: Lazy Method Handle update In-Reply-To: References: Message-ID: <2E164226-A78E-4A21-8B58-B27F9CF0843B@oracle.com> On May 10, 2012, at 12:16 AM, Jeroen Frijters wrote: > Thanks for the update. This sounds great. It looks like this would be helpful for me as well. Is the LambdaForm stuff written in Java? Yes, all in Java. This refactoring moves a lot of stuff up-stack from assembly to C++ and C++ to Java. > Hopefully my strategy of doing nothing will pay off ;-) Hey! Can you DO that? ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20120510/ae32f658/attachment.html From christian.thalinger at oracle.com Thu May 10 13:12:41 2012 From: christian.thalinger at oracle.com (christian.thalinger at oracle.com) Date: Thu, 10 May 2012 20:12:41 +0000 Subject: hg: mlvm/mlvm/jdk: meth-lazy: fixed bytecode generation bug; enabled EI and NFI caches and predefined invokers Message-ID: <20120510201241.749F147238@hg.openjdk.java.net> Changeset: 07e1220e834c Author: twisti Date: 2012-05-10 13:12 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/jdk/rev/07e1220e834c meth-lazy: fixed bytecode generation bug; enabled EI and NFI caches and predefined invokers ! meth-lazy-7023639.patch From john.r.rose at oracle.com Thu May 10 23:24:18 2012 From: john.r.rose at oracle.com (John Rose) Date: Thu, 10 May 2012 23:24:18 -0700 Subject: Lazy Method Handle update In-Reply-To: References: Message-ID: <971BED09-4998-46A8-9BE2-F2FBA765D5B5@oracle.com> On May 9, 2012, at 8:30 PM, Charles Oliver Nutter wrote: > Thanks for the update, John! Comments below... > > On Wed, May 9, 2012 at 2:34 PM, John Rose wrote: >> In JDK 7 FCS a method handle is represented as a chain of argument transformation blocks, ending in a pointer to a methodOop. The argument transformations are assembly coded and work in the interpreter stack. The reason this is not outrageously slow is that we vigorously inline method handle calls whenever we can. But there is a performance cliff you can drop off of, when you are working with non-constant MHs. (BTW, invokedynamic almost always inlines its target.) Project Lambda needs us not to drop off of this cliff. > > And I need you to not drop off that cliff too! It's very easy to > trigger...just make a method big enough, and AAAAAAAARRGH into the pit > you go. Adapting my favorite grook (Piet Hein): There's an art to knowing when: Never try to guess. Code until it stalls and then, Twenty methods less. (Ref: http://www.archimedes-lab.org/grooks.html ) Seriously, we are adjusting inlining strategy, and using JRuby benchmarks to check. > Luckily, for the ambitious early-access JRuby users running JRuby > master + Java 7u2+ in production, the code they're hitting is all > small enough to avoid the cliff, but with JRuby 1.7 preview release > coming out in a couple weeks more people are going to start trying > things out. Is "richards" big enough to be an interesting benchmark? > It also creates some *epic* stack traces when it blows up. Will those > fold away in the future? Yes, by a combination of hiding and tail-calling. > > I have been tossing numbers and benchmarks back and forth with > Christian, and now testing a local build of the meth-lazy stuff > myself. Numbers haven't been great, but I think Christian made great > progress today (based on an email showing C1 + indy beating C1 without > indy and drastically beating C1 + indy in a stock u6 build that falls > off the cliff). It's very exciting! Thanks for rejoining the adventure. We're doing our best to make it even more exciting. > I assume this is the problem Christian described to me, where it was > calling back into the interpreter to fix up the arguments? Yes. Problem in a nutshell: When dropping from MH.invokeExact(a,b,c) on a direct method handle down to target(a,b,c) (i.e., the target method of the MH), we need to delete MH from the argument list, leaving all other arguments unchanged. This is easy in the interpreter: Just blindly jump into the target, ignoring the MH deep in the JVM argument stack. It is hard in the compiled code; as with C calling sequences, leading arguments are given registers and extra trailing arguments can be safely ignored. Removing the leading MH from the compiled calling sequence is tricky, and we were bailing out of this problem by calling a C2I adapter, ignoring the MH (deep in the unpacked interpreter stack), and going back to the I2C adapter of the target method. This caused brutal data motion for many out-of-line MH calls. We have a fix now, which is to drop the MH in the compiled caller's outgoing argument list (a,b,c), in certain carefully controlled circumstances. With care this can be done even when the target method is a variable. I'll refer to the source code for further details. :-) >> For B. we are currently working on bootstrap issues. The idea here is that, while we can do escape analysis, etc., a cleaner data structure will make the compiler succeed more often. > > I will be *thrilled* when EA works across indy call sites. We have > started work on our new compiler, which uses a simpler intermediate > representation and which will be indy-only from day 1. Already we're > seeing gains since we don't have to hand-write all the different call > paths we want to represent; we can wire up any combinations of > arguments, handles, and target using only method handles. That means > we do things that will be ripe for EA like: > > * Allocating heap storage for closures right next to the closure creation > * Passing closures as a handle rather than as an opaque, polymorphic structure > * Specializing closure-receiving code in *our* compiler until Hotspot > can specialize it for us If there is a convenient moment to do so, write some micro-benchmarks that exercise these code patterns, and we can use them to test our inlining, EA, and constant propagation. > I'd be very surprised if we can't approach Java performance for the > *general* cases of Ruby code by end of year, and if we can specialize > closure-receiving code *and* get EA, we might be able to compete with > Java 8 lambda performance for Ruby's closures too. > > We also have our own profiling, inlining, and so on...but that's all > above the level of bytecode to work around as-yet-unoptimized patterns > in Hotspot. :) We'll get there? My ideal is to have layer-appropriate profiling (not necessarily inlining) at each layer, with enough optimization so that up-stack profiling decisions are exploited as thoroughly as internal JVM decisions. > I have not been able to stump Chris with any NCDFEs lately, so that's > good. But I do have some hacks in place to prevent them I can't remove > until the new logic solidifies a bit. That's fair. Those NCDFEs are kind of spooky; I will be glad when it's 1 year past the last known occurrence. > Now that the logic has started to land, I'm going to do some > benchmarking and assembly-reading of my own to help from my end. And > hopefully there's a chance I'll be able to help more directly over the > summer. Great. > Very exciting stuff...I'm thrilled that dynlangs and indy are being > taken so seriously. I told a couple thousand people at JAX 2012 how > strongly I believe that indy is the most important work happening on > the JVM right now, and I'm looking forward to doing more and more with > it :) Your enthusiastic and patient adoption of this technology has helped to foster it; many thanks. Best, ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20120510/b8f3d677/attachment.html From fyaoxy at gmail.com Fri May 11 09:22:28 2012 From: fyaoxy at gmail.com (=?UTF-8?B?5ZCR6ZuF?=) Date: Sat, 12 May 2012 00:22:28 +0800 Subject: MethodType and others In-Reply-To: References: Message-ID: Today checkout some messages, and browse the mlvm jdk patches. Seems I slow the step, the MethodType changed by weak style. Again, really not enough, The internal weakset can do more work, since it be there, let the internal self be a KEY to just like this: synch{ internTable.get(type.... types)}. There is no necessary to create a new MethodType gc-ed object. Maybe when the change occurs, I buzy in my bytecode library, so maybe ignore something. I found there have some like thing in the LF. In my library there have a factory builder, and less wrapper object. I think the vm runtime more need the style than my library. again BTW, the MH constant seems use full visibility Lookup, my feel at least util now. If so, you know, bad? again, BTW, people in jdk project really like cp code? two many cp code in rt! it's a really cp? ?? ?? 2012/4/24 ?? : > Hi, > Design of MethodType have problems: > 1, the immutable have more execution time. > 2, It will prevent ?loader from ?gc-ing. > > why? > 1, in methodType(*) creation methods, every temp rubbish MethodType > immutable will take more time, There must have more people know more > about it. > There need a true factory for MethodType, > In my implementation, it will be: > MTFactory extends ArrayList{ Class rtype; > hashCode(){} > equals(){} > methodType(*){ > ?rtype=returnType, addAll(types); > ?result=internTable.get(this); > ?if(result!=null) return result; > ? ... > ?} > } > And the factory far away the global singleton, so the point 2 not > problem any more. > As to how to hold your MTFactory, it's your biz, OK? > > > BTW: > 1, JVMS7 StackMapTable attribute chapter say: it's can be ignore even > if >=java7. > But in 4.10.1 say like this : must verify ?the stack frame map, if > not, error reported. > So who is the stupid PM? > > 2, Lamda is just a tool, If the tool changed to the target, the nature > of tool is lose! > > 3, can you redesign the BootstrapMethods? it's just a CallSiteFactory. KISS. > Since it's a factory, so there are rich factory pattern and ?factory > method implementations. > Why not limit it static or new style? In nature the new is static of Class. > My advice, when redesign, clear the concepts: > MethodHandle just a invoke* instruction. > MethodType is just FunctionType. dont just for OO so OO. > BootstrapMethod is right the CallSiteFactory, it's nothing. > So if it factorized, the static limitation will removed. > > 4, Again, I hope JDK team not follow the way of MSh. > Many people must admit, there are some guys in Msh really smarter than > most people(include myself). But what about the thing? Msh just not > let the thing worse, but never have capability to make thing better. > so even if genius, so what? > And chinan say: The smarter will kill yourself. > And I known, some one in the lists not like me. Not a matter. The > idiot think of all as idiot. That make it funny. > > OK, If wrong, pls correct me if you will. I will happy to say thanks. > > Best, > Qinxian From christian.thalinger at oracle.com Fri May 11 17:32:16 2012 From: christian.thalinger at oracle.com (christian.thalinger at oracle.com) Date: Sat, 12 May 2012 00:32:16 +0000 Subject: hg: mlvm/mlvm/hotspot: meth-lazy: fixed a couple of C1 problems Message-ID: <20120512003216.D49EA472AD@hg.openjdk.java.net> Changeset: e130a7249754 Author: twisti Date: 2012-05-11 17:32 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/hotspot/rev/e130a7249754 meth-lazy: fixed a couple of C1 problems ! meth-lazy-7023639.jit.patch From christian.thalinger at oracle.com Fri May 11 17:35:44 2012 From: christian.thalinger at oracle.com (christian.thalinger at oracle.com) Date: Sat, 12 May 2012 00:35:44 +0000 Subject: hg: mlvm/mlvm/jdk: meth-lazy: removed direct-call optimization from LFIs; fixed small problem with primitive float conversions Message-ID: <20120512003545.37160472AE@hg.openjdk.java.net> Changeset: 1a5dff46cc7f Author: twisti Date: 2012-05-11 17:35 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/jdk/rev/1a5dff46cc7f meth-lazy: removed direct-call optimization from LFIs; fixed small problem with primitive float conversions ! meth-lazy-7023639.patch From blackdrag at gmx.org Sun May 13 09:04:51 2012 From: blackdrag at gmx.org (Jochen Theodorou) Date: Sun, 13 May 2012 18:04:51 +0200 Subject: Boxing, still a limit of invokedynamic? Message-ID: <4FAFDBA3.8020405@gmx.org> Hi all, I wanted to ask you of your opinion. If I am going to compile something like a+b-c and a,b,c are all primtives, but I won't know that the results will be really the primtives too, then this means I will most probably compile it like this: invokedynamic("minus", invokedynamic("plus",a,b), c) meaning the result of a+b will be an Object (since I won't know it is a primitive) and then there will be one boxing for that, just to unbox again for the minus and then box again for the result of the minus. If now the result is not supposed to be a primitive, then there won't be another unbox, till the next operation done with that value. Now, even if the JIT is able to see through the one boxing and unboxing fro the result of plus, what will stay is the boxing for the result of the minus.... plus the many unboxing actions used when this result is used. Isn't that a conceptual problem? And how do others deal with that? I am asking because I was experimenting with method signatures and for such plus and minus methods and got mixed results. I was expecting the primtive versions would achieve the best results, but that was not the case. a plus(int, int) had worse performance than a plus(int,Integer) or plus(Integer,int) in some cases and sometimes it looked like plus(Integer,Integer) is worse, in other cases not. Well, this is causing me some problems. Why do I get such strange results? I would assume it depends on the JIT and the boxing logic it is able to recognize and not. One more thing I noticed is, that if I have a = b+c, with all of them being int and b+c returning object, then letting the MethodHandle do the conversion from Object to int is actually much worse performance wise, than a cast to integer and calling valueOf. Shouldn't that be at least equal, if not fast considering that the result of b+c was first boxed and then is unboxed? Can anyone give me some advice on how to do these things? bye Jochen -- Jochen "blackdrag" Theodorou - Groovy Project Tech Lead blog: http://blackdragsview.blogspot.com/ german groovy discussion newsgroup: de.comp.lang.misc For Groovy programming sources visit http://groovy-lang.org From headius at headius.com Sun May 13 10:21:11 2012 From: headius at headius.com (Charles Oliver Nutter) Date: Sun, 13 May 2012 12:21:11 -0500 Subject: Boxing, still a limit of invokedynamic? In-Reply-To: <4FAFDBA3.8020405@gmx.org> References: <4FAFDBA3.8020405@gmx.org> Message-ID: On Sun, May 13, 2012 at 11:04 AM, Jochen Theodorou wrote: > I wanted to ask you of your opinion. If I am going to compile something > like a+b-c and a,b,c are all primtives, but I won't know that the > results will be really the primtives too, then this means I will most > probably compile it like this: > > invokedynamic("minus", invokedynamic("plus",a,b), c) > > meaning the result of a+b will be an Object (since I won't know it is a > primitive) and then there will be one boxing for that, just to unbox > again for the minus and then box again for the result of the minus. If > now the result is not supposed to be a primitive, then there won't be > another unbox, till the next operation done with that value. You could also encode "a+b-c" as a single invokedynamic operation, but I guess you're looking for a general solution... > Now, even if the JIT is able to see through the one boxing and unboxing > fro the result of plus, what will stay is the boxing for the result of > the minus.... plus the many unboxing actions used when this result is > used. Isn't that a conceptual problem? And how do others deal with that? First of all...how are you expecting that JIT will see through the first boxing? If the return result is going to be Object, it's going to go into an Integer. Perhaps you are hoping for escape analysis to get rid of it? If that's the case, why wouldn't the same expectation apply to the second call? If (a+b) returns an Integer that's immediately passed into (tmp-c) and both calls inline, in theory EA should have enough to eliminate the intermediate. If the result of (tmp-c) is never used as an object and never escapes, then EA should be able to get rid of that too. Of course this is all assuming that EA will be working across indy boundaries in the near future. Currently, it does not. In JRuby, where we have no static typing or type hints, we always do these invocations as all reference types. We're banking on JVM helping us out in the future, so my goal is to just use indy as efficiently as possible and keep call protocols simple. A confusing point for me: in your case, where you know they're all ints, how do you not know that + and - also return int? Can't you determine statically that this whole expression will return a primitive int? > I am asking because I was experimenting with method signatures and for > such plus and minus methods and got mixed results. I was expecting the > primtive versions would achieve the best results, but that was not the > case. a plus(int, int) had worse performance than a plus(int,Integer) or > plus(Integer,int) in some cases and sometimes it looked like > plus(Integer,Integer) is worse, in other cases not. Well, this is > causing me some problems. Why do I get such strange results? I would > assume it depends on the JIT and the boxing logic it is able to > recognize and not. What does the assembly look like? In my case, passing int instead of Fixnum where possible (usually only when a literal integer appears in the argument list) definitely helps; I don't have to construct a Fixnum or go to a cache to get it, and on the other side there's no type-checking required to make sure I really have a Fixnum. The int paths should be faster than the Integer paths. And again remember...I don't think the JIT in u4- does anything with the boxing coming out of these calls. It might do something on the other side, but not across the invokedynamic call. > One more thing I noticed is, that if I have a = b+c, with all of them > being int and b+c returning object, then letting the MethodHandle do the > conversion from Object to int is actually much worse performance wise, > than a cast to integer and calling valueOf. Shouldn't that be at least > equal, if not fast considering that the result of b+c was first boxed > and then is unboxed? Perhaps doing it in the handles makes the code more opaque? Do the non-handle way and the handle way have exactly the same logic? Bottom line here is that if you need a reference type on LHS, you'll have to create a reference type, and we need the JVM to figure out that it can brush that part away. - Charlie From forax at univ-mlv.fr Sun May 13 10:55:08 2012 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Sun, 13 May 2012 19:55:08 +0200 Subject: Boxing, still a limit of invokedynamic? In-Reply-To: References: <4FAFDBA3.8020405@gmx.org> Message-ID: <4FAFF57C.7020305@univ-mlv.fr> On 05/13/2012 07:21 PM, Charles Oliver Nutter wrote: > On Sun, May 13, 2012 at 11:04 AM, Jochen Theodorou wrote: >> I wanted to ask you of your opinion. If I am going to compile something >> like a+b-c and a,b,c are all primtives, but I won't know that the >> results will be really the primtives too, then this means I will most >> probably compile it like this: >> >> invokedynamic("minus", invokedynamic("plus",a,b), c) >> >> meaning the result of a+b will be an Object (since I won't know it is a >> primitive) and then there will be one boxing for that, just to unbox >> again for the minus and then box again for the result of the minus. If >> now the result is not supposed to be a primitive, then there won't be >> another unbox, till the next operation done with that value. > You could also encode "a+b-c" as a single invokedynamic operation, but > I guess you're looking for a general solution... > >> Now, even if the JIT is able to see through the one boxing and unboxing >> fro the result of plus, what will stay is the boxing for the result of >> the minus.... plus the many unboxing actions used when this result is >> used. Isn't that a conceptual problem? And how do others deal with that? > First of all...how are you expecting that JIT will see through the > first boxing? If the return result is going to be Object, it's going > to go into an Integer. Perhaps you are hoping for escape analysis to > get rid of it? > > If that's the case, why wouldn't the same expectation apply to the > second call? If (a+b) returns an Integer that's immediately passed > into (tmp-c) and both calls inline, in theory EA should have enough to > eliminate the intermediate. If the result of (tmp-c) is never used as > an object and never escapes, then EA should be able to get rid of that > too. > > Of course this is all assuming that EA will be working across indy > boundaries in the near future. Currently, it does not. > > In JRuby, where we have no static typing or type hints, we always do > these invocations as all reference types. We're banking on JVM helping > us out in the future, so my goal is to just use indy as efficiently as > possible and keep call protocols simple. > > A confusing point for me: in your case, where you know they're all > ints, how do you not know that + and - also return int? Can't you > determine statically that this whole expression will return a > primitive int? I think currently Groovy allows to replace + by a method that will return everything you want. But here, I think the spec of Groovy (if it means something) should be changed to say that when your replace a method by another, the return type must be a subtype of the existing method. > >> I am asking because I was experimenting with method signatures and for >> such plus and minus methods and got mixed results. I was expecting the >> primtive versions would achieve the best results, but that was not the >> case. a plus(int, int) had worse performance than a plus(int,Integer) or >> plus(Integer,int) in some cases and sometimes it looked like >> plus(Integer,Integer) is worse, in other cases not. Well, this is >> causing me some problems. Why do I get such strange results? I would >> assume it depends on the JIT and the boxing logic it is able to >> recognize and not. > What does the assembly look like? > > In my case, passing int instead of Fixnum where possible (usually only > when a literal integer appears in the argument list) definitely helps; > I don't have to construct a Fixnum or go to a cache to get it, and on > the other side there's no type-checking required to make sure I really > have a Fixnum. The int paths should be faster than the Integer paths. > > And again remember...I don't think the JIT in u4- does anything with > the boxing coming out of these calls. It might do something on the > other side, but not across the invokedynamic call. > >> One more thing I noticed is, that if I have a = b+c, with all of them >> being int and b+c returning object, then letting the MethodHandle do the >> conversion from Object to int is actually much worse performance wise, >> than a cast to integer and calling valueOf. Shouldn't that be at least >> equal, if not fast considering that the result of b+c was first boxed >> and then is unboxed? > Perhaps doing it in the handles makes the code more opaque? Do the > non-handle way and the handle way have exactly the same logic? Object -> int is not equivalent to Object -> Integer -> int, it can be Object -> Byte -> int by example. You have to chain several calls to asType() see slide 20 of my jvm summit talk last year, http://wiki.jvmlangsummit.com/images/9/93/2011_Forax.pdf > > Bottom line here is that if you need a reference type on LHS, you'll > have to create a reference type, and we need the JVM to figure out > that it can brush that part away. > > - Charlie R?mi From blackdrag at gmx.org Sun May 13 13:14:40 2012 From: blackdrag at gmx.org (Jochen Theodorou) Date: Sun, 13 May 2012 22:14:40 +0200 Subject: Boxing, still a limit of invokedynamic? In-Reply-To: References: <4FAFDBA3.8020405@gmx.org> Message-ID: <4FB01630.4040702@gmx.org> Am 13.05.2012 19:21, schrieb Charles Oliver Nutter: [...] > You could also encode "a+b-c" as a single invokedynamic operation, but > I guess you're looking for a general solution... yes, I am looking for a general solution. I was thinking of making the whole expression as a MethodHandle combination, which then has a,b,c as input arguments... but that's a pretty big step to do. I don't want to spend months in changing the compiler just to find it doesn't give me the performance I am looking for. Plus this approach has its own problems with evaluation order and such. [...] > First of all...how are you expecting that JIT will see through the > first boxing? If the return result is going to be Object, it's going > to go into an Integer. Perhaps you are hoping for escape analysis to > get rid of it? I don't know what part it does, but I assume EA is right. > If that's the case, why wouldn't the same expectation apply to the > second call? If (a+b) returns an Integer that's immediately passed > into (tmp-c) and both calls inline, in theory EA should have enough to > eliminate the intermediate. If the result of (tmp-c) is never used as > an object and never escapes, then EA should be able to get rid of that > too. well.. in my example the result of tmp-c is returned, so it escapes. But even if I only store it in a bytecode slot... I mean I wouldn't EA expect to even optimize these cases.... on further thought though it might be possible. > Of course this is all assuming that EA will be working across indy > boundaries in the near future. Currently, it does not. Indeed, I was kind of assuming that. You telling me it does not makes some results much clearer to me. The question then is... should I wait for EA working across indy boundaries? And when would that be available? [...] > A confusing point for me: in your case, where you know they're all > ints, how do you not know that + and - also return int? Can't you > determine statically that this whole expression will return a > primitive int? I may not have written that part clearly enough. We don't know that + and - return int. You may vagualy remember my JVM talk 2 years ago in which I explained how I plan to make a primitive optimization path. In this path the compiler will indeed assume that a+b will return an int and will then emit iadd instead of using static method calls or any other helpers. This optimized path has basically the same performance as Java in the best case, but it is guarded, which reduces the performance to half of Java speed in the best case. The problem is that prim opts cannot handle more complex cases and it is really easy to turn them off... That plus the problem of almost doubling the method bytecode make them a sub optimal solution. But it is one indy has to compete with. >> I am asking because I was experimenting with method signatures and for >> such plus and minus methods and got mixed results. I was expecting the >> primtive versions would achieve the best results, but that was not the >> case. a plus(int, int) had worse performance than a plus(int,Integer) or >> plus(Integer,int) in some cases and sometimes it looked like >> plus(Integer,Integer) is worse, in other cases not. Well, this is >> causing me some problems. Why do I get such strange results? I would >> assume it depends on the JIT and the boxing logic it is able to >> recognize and not. > > What does the assembly look like? you mean the compiled code? I will try to give examples of this later. But if > And again remember...I don't think the JIT in u4- does anything with > the boxing coming out of these calls. It might do something on the > other side, but not across the invokedynamic call. is right, then it is no wonder, that one time this and another time that is faster. But I suspect it is worse. It is not only across indy calls, that the JIT does nothing with boxing, I assume it is even across MethodHandles in the same indy call. To be more exact with my suspecion, I expect a constant int boxed by an MethodHandle and then unboxed by another one in the same indy call to be slower, than just returning the int itself. If I have a+1, then the ideal plus is one that takes int,int and returns Integer, because that way everything can happen inside the invokeddynamic part. if I have a=a+1 (a being an int) then plus(int,int):int is probably better, but using the one from before and unboxing the Integer to int is not. And depending from where your results are coming from you get better performance by using plus(Integer,int) plus(int,Integer) and plus(Integer,Integer)... with different return types probably as well. >> One more thing I noticed is, that if I have a = b+c, with all of them >> being int and b+c returning object, then letting the MethodHandle do the >> conversion from Object to int is actually much worse performance wise, >> than a cast to integer and calling valueOf. Shouldn't that be at least >> equal, if not as fast considering that the result of b+c was first boxed >> and then is unboxed? > > Perhaps doing it in the handles makes the code more opaque? Do the > non-handle way and the handle way have exactly the same logic? the non handle way means to calculate b+c using a handle and then unbox the result using a library function from Groovy... afaik. The handle way uses the abilities of MethodHandles to convert the Integer into an int. I don't know for sure what that part is doing in the end, but normally it shouldn't be slower. bye Jochen From blackdrag at gmx.org Sun May 13 13:21:18 2012 From: blackdrag at gmx.org (Jochen Theodorou) Date: Sun, 13 May 2012 22:21:18 +0200 Subject: Boxing, still a limit of invokedynamic? In-Reply-To: <4FAFF57C.7020305@univ-mlv.fr> References: <4FAFDBA3.8020405@gmx.org> <4FAFF57C.7020305@univ-mlv.fr> Message-ID: <4FB017BE.5060306@gmx.org> Am 13.05.2012 19:55, schrieb R?mi Forax: [...] > I think currently Groovy allows to replace + by a method > that will return everything you want. > But here, I think the spec of Groovy (if it means something) > should be changed to say that when your replace a method > by another, the return type must be a subtype of the > existing method. that helps if the argument types are known and there is an exact match to a method. It helps because in that case we can save the casting and converting plus the checks to see if we even have to do that. Yes. But currently that is not the case. And it won't change for Groovy2, maybe for a later Groovy. [...] > Object -> int is not equivalent to Object -> Integer -> int, > it can be Object -> Byte -> int by example. > You have to chain several calls to asType() > > see slide 20 of my jvm summit talk last year, > http://wiki.jvmlangsummit.com/images/9/93/2011_Forax.pdf so you mean to tell me that I have first to convert the Object to an Integer and that Integer to int, instead of directly converting the Object to an int? I see, I will try that out. bye Jochen From headius at headius.com Sun May 13 13:58:09 2012 From: headius at headius.com (Charles Oliver Nutter) Date: Sun, 13 May 2012 15:58:09 -0500 Subject: Boxing, still a limit of invokedynamic? In-Reply-To: <4FB01630.4040702@gmx.org> References: <4FAFDBA3.8020405@gmx.org> <4FB01630.4040702@gmx.org> Message-ID: Inline... On May 13, 2012 3:15 PM, "Jochen Theodorou" wrote: > > Am 13.05.2012 19:21, schrieb Charles Oliver Nutter: > [...] > > You could also encode "a+b-c" as a single invokedynamic operation, but > > I guess you're looking for a general solution... > > yes, I am looking for a general solution. I was thinking of making the > whole expression as a MethodHandle combination, which then has a,b,c as > input arguments... but that's a pretty big step to do. I don't want to > spend months in changing the compiler just to find it doesn't give me > the performance I am looking for. Plus this approach has its own > problems with evaluation order and such. Yeah, that might be a good point, though this particular case of int + int - int has no side effects if you retrieve the values all at once before calculating... > I don't know what part it does, but I assume EA is right. Yeah, if you implemented this statically you would very likely see EA eliminate all boxing, since it is a pretty simple case. Indy is getting in the way here. > well.. in my example the result of tmp-c is returned, so it escapes. But > even if I only store it in a bytecode slot... I mean I wouldn't EA > expect to even optimize these cases.... on further thought though it > might be possible. Even if the tmp-c produces an Integer, it could still EA away if *that* object doesn't escape from this compile unit. Basically, if object construction and all possible code paths that would see an object inline together, EA can potentially eliminate the object (except if the unlined logic involves Indy or MH call paths...for now). > Indeed, I was kind of assuming that. You telling me it does not makes > some results much clearer to me. The question then is... should I wait > for EA working across indy boundaries? And when would that be available? Well, I suppose that is up to you. In JRuby we have worked around the JVM in places, but so far I have not felt like it is worth the compiler complexity to try to optimize math down to primitive speeds right now. My goal for JRuby 1.7 is to utilize Indy as much as possible and be ready for upcoming optimization work, rather than trying to be tricky now. The ideal case for me is that we get Indy fully-integrated into JRuby, and then sit back and wait for (and help) the JVM catch up. > I may not have written that part clearly enough. We don't know that + > and - return int. You may vagualy remember my JVM talk 2 years ago in > which I explained how I plan to make a primitive optimization path. In > this path the compiler will indeed assume that a+b will return an int > and will then emit iadd instead of using static method calls or any > other helpers. This optimized path has basically the same performance as > Java in the best case, but it is guarded, which reduces the performance > to half of Java speed in the best case. The problem is that prim opts > cannot handle more complex cases and it is really easy to turn them > off... That plus the problem of almost doubling the method bytecode make > them a sub optimal solution. But it is one indy has to compete with. My position is that the JVM should be doing that for us, so I am wiring up Indy in the logical way and working with JVM guys to make that happen. I am less concerned about short-term math perf than I am about making best possible use of Indy. > > >> I am asking because I was experimenting with method signatures and for > >> such plus and minus methods and got mixed results. I was expecting the > >> primtive versions would achieve the best results, but that was not the > >> case. a plus(int, int) had worse performance than a plus(int,Integer) or > >> plus(Integer,int) in some cases and sometimes it looked like > >> plus(Integer,Integer) is worse, in other cases not. Well, this is > >> causing me some problems. Why do I get such strange results? I would > >> assume it depends on the JIT and the boxing logic it is able to > >> recognize and not. > > > > What does the assembly look like? > > you mean the compiled code? I will try to give examples of this later. Well, I mean the assembly :-) if you really want to see why two pieces of code perform differently, the assembly output for hotspot will show you the answer. > But if > > > And again remember...I don't think the JIT in u4- does anything with > > the boxing coming out of these calls. It might do something on the > > other side, but not across the invokedynamic call. > > is right, then it is no wonder, that one time this and another time that > is faster. But I suspect it is worse. It is not only across indy calls, > that the JIT does nothing with boxing, I assume it is even across > MethodHandles in the same indy call. To be more exact with my suspecion, > I expect a constant int boxed by an MethodHandle and then unboxed by > another one in the same indy call to be slower, than just returning the > int itself. I would not at all be surprised if EA does nothing across MH boundaries either. That is (among other things) what the new LambdaForm is supposed to fix by translating method handle graphs into a form the JVM can optimize along with surrounding code. > If I have a+1, then the ideal plus is one that takes int,int and returns > Integer, because that way everything can happen inside the > invokeddynamic part. if I have a=a+1 (a being an int) then > plus(int,int):int is probably better, but using the one from before and > unboxing the Integer to int is not. And depending from where your > results are coming from you get better performance by using > plus(Integer,int) plus(int,Integer) and plus(Integer,Integer)... with > different return types probably as well. I'm sure there's many reasons for variability here. Seek out LogCompilation and PrintAssembly, my son! :-) - Charlie -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20120513/1551b2ea/attachment.html From mroos at roos.com Sun May 13 16:16:21 2012 From: mroos at roos.com (Mark Roos) Date: Sun, 13 May 2012 16:16:21 -0700 Subject: Boxing, still a limit of invokedynamic? In-Reply-To: <4FAFDBA3.8020405@gmx.org> References: <4FAFDBA3.8020405@gmx.org> Message-ID: Hi Jochen; I also spent some time being concerned about the integer performance but then decided that this concern is probably premature based on the current state of indy. One big unknown to me is whether EA would ever recognize my holder for the prim int or if it would only recognize java Integer when it looks at boxing. There is also the possibility of fixnums support ( prims in pointers) that may or may not be useful in my design. So I decided that all my ints will be boxed (references), added a cache and some special prim methods for now and then follow Charles approach of a good indy design to take advantage of the optimization efforts. Based on my large scale benchmarks this is ok so I am focusing on other places to optimize. This also supports my automatic conversion to/from BigInts. Log term I am guessing that I will follow your path of special paths with guards, Unless I can get the performance I want with more cores. regards mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20120513/0ebce495/attachment.html From forax at univ-mlv.fr Sun May 13 23:55:48 2012 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Mon, 14 May 2012 08:55:48 +0200 Subject: Boxing, still a limit of invokedynamic? In-Reply-To: <4FB017BE.5060306@gmx.org> References: <4FAFDBA3.8020405@gmx.org> <4FAFF57C.7020305@univ-mlv.fr> <4FB017BE.5060306@gmx.org> Message-ID: <4FB0AC74.1070107@univ-mlv.fr> On 05/13/2012 10:21 PM, Jochen Theodorou wrote: > Am 13.05.2012 19:55, schrieb R?mi Forax: > [...] >> I think currently Groovy allows to replace + by a method >> that will return everything you want. >> But here, I think the spec of Groovy (if it means something) >> should be changed to say that when your replace a method >> by another, the return type must be a subtype of the >> existing method. > that helps if the argument types are known and there is an exact match > to a method. It helps because in that case we can save the casting and > converting plus the checks to see if we even have to do that. Yes. But > currently that is not the case. And it won't change for Groovy2, maybe > for a later Groovy. > > [...] >> Object -> int is not equivalent to Object -> Integer -> int, >> it can be Object -> Byte -> int by example. >> You have to chain several calls to asType() >> >> see slide 20 of my jvm summit talk last year, >> http://wiki.jvmlangsummit.com/images/9/93/2011_Forax.pdf > so you mean to tell me that I have first to convert the Object to an > Integer and that Integer to int, instead of directly converting the > Object to an int? I see, I will try that out. Yes. You should not have to do that because you first check if the parameter is an Integer but currently the VM doesn't propagate this information. (there is a patch for that, but not yet integrated) Also, it should be faster before the JIT is kicked in. > > bye Jochen cheers, R?mi From blackdrag at gmx.org Mon May 14 02:30:47 2012 From: blackdrag at gmx.org (Jochen Theodorou) Date: Mon, 14 May 2012 11:30:47 +0200 Subject: Boxing, still a limit of invokedynamic? In-Reply-To: References: <4FAFDBA3.8020405@gmx.org> Message-ID: <4FB0D0C7.6060204@gmx.org> Am 14.05.2012 01:16, schrieb Mark Roos: [...] > So I decided that all my ints will be boxed (references), added a cache > and some special prim > methods for now and then follow Charles approach of a good indy design > to take advantage > of the optimization efforts. Based on my large scale benchmarks this is > ok so I am focusing > on other places to optimize. This also supports my automatic conversion > to/from BigInts. > > Log term I am guessing that I will follow your path of special paths > with guards, Unless I can > get the performance I want with more cores. the special paths with guards in bytecode is actually a thing I was hoping to get rid of with indy. The current state of the implementation of indy in Groovy is, that it is slightly better than our call site caching and worse than our prim opts. In total that means, unless I combine indy with prim opts the indy versions is in general a tiny bit slower, since even the small advantage over call site caching is not always there. And call site caching in Groovy means we operate with at runtime generated classes, with call sites, that are mostly not inlined and other problems. Indy has the potential to be faster than that. Only in reality I am missing that extra of performance. And that is a bit sad. We had recently another 2.0 beta and a day later we had already people complaining why the indy version is not faster. I mean, if I find other places to optimize, then call site caching will profit from that as well, not giving indy the real advantage here. I am worried about indy getting a bad image here. bye Jochen -- Jochen "blackdrag" Theodorou - Groovy Project Tech Lead blog: http://blackdragsview.blogspot.com/ german groovy discussion newsgroup: de.comp.lang.misc For Groovy programming sources visit http://groovy-lang.org From blackdrag at gmx.org Mon May 14 02:32:51 2012 From: blackdrag at gmx.org (Jochen Theodorou) Date: Mon, 14 May 2012 11:32:51 +0200 Subject: Boxing, still a limit of invokedynamic? In-Reply-To: <4FB0AC74.1070107@univ-mlv.fr> References: <4FAFDBA3.8020405@gmx.org> <4FAFF57C.7020305@univ-mlv.fr> <4FB017BE.5060306@gmx.org> <4FB0AC74.1070107@univ-mlv.fr> Message-ID: <4FB0D143.9090402@gmx.org> Am 14.05.2012 08:55, schrieb R?mi Forax: [...] >> so you mean to tell me that I have first to convert the Object to an >> Integer and that Integer to int, instead of directly converting the >> Object to an int? I see, I will try that out. > > Yes. > You should not have to do that because you first check if the > parameter is an Integer but currently the VM doesn't propagate > this information. > (there is a patch for that, but not yet integrated) > Also, it should be faster before the JIT is kicked in. That it is slower at first is ok. Only I kind of assumed, that such things can be optimized away. The less the JIT can optimize, the more I have to do that and work around the limitations, making my runtime more complex. And with the next JVM update all that work might be for nothing. bye Jochen -- Jochen "blackdrag" Theodorou - Groovy Project Tech Lead blog: http://blackdragsview.blogspot.com/ german groovy discussion newsgroup: de.comp.lang.misc For Groovy programming sources visit http://groovy-lang.org From headius at headius.com Mon May 14 08:09:11 2012 From: headius at headius.com (Charles Oliver Nutter) Date: Mon, 14 May 2012 10:09:11 -0500 Subject: Boxing, still a limit of invokedynamic? In-Reply-To: <4FB0D0C7.6060204@gmx.org> References: <4FAFDBA3.8020405@gmx.org> <4FB0D0C7.6060204@gmx.org> Message-ID: On Mon, May 14, 2012 at 4:30 AM, Jochen Theodorou wrote: > the special paths with guards in bytecode is actually a thing I was > hoping to get rid of with indy. The current state of the implementation > of indy in Groovy is, that it is slightly better than our call site > caching and worse than our prim opts. In total that means, unless I > combine indy with prim opts the indy versions is in general a tiny bit > slower, since even the small advantage over call site caching is not > always there. And call site caching in Groovy means we operate with at > runtime generated classes, with call sites, that are mostly not inlined > and other problems. Indy has the potential to be faster than that. Only > in reality I am missing that extra of performance. And that is a bit > sad. We had recently another 2.0 beta and a day later we had already > people complaining why the indy version is not faster. I mean, if I find > other places to optimize, then call site caching will profit from that > as well, not giving indy the real advantage here. > > I am worried about indy getting a bad image here. Well, keep the faith :) In JRuby, indy has been truly excellent...significantly better than inline caching and many times better boxed numerics (we do not have primitive optimizations right now). It is not without its warts, of course. Complex method handle changes or large numbers of indy call sites can cause method bodies to fall off a performance cliff (like John talked about last week). A key goal for JRuby's uses of indy has been to keep the handles as simple as possible. I have also installed several tuning flags to turn off the use of indy for certain cases, for users that run into problems with it. I've tuned the length of polymorphic GWT chains, and made heavy use of SwitchPoint to reduce guard costs. Here's the red/black bench that's been going around...the compiler-level optimizations are the same in both cases, but the latter numbers are with invokedynamic. (higher is better...iterations/sec) No indy: #delete 12.0 (?0.0%) i/s - 60 in 5.014000s #add 26.3 (?0.0%) i/s - 132 in 5.019000s #search 47.6 (?6.3%) i/s - 240 in 5.065000s #inorder_walk 183.7 (?7.6%) i/s - 918 in 5.041000s #rev_inorder_walk 212.9 (?3.8%) i/s - 1080 in 5.080000s #minimum 92.4 (?1.1%) i/s - 468 in 5.065000s #maximum 95.6 (?2.1%) i/s - 486 in 5.086000s With indy: #delete 35.1 (?5.7%) i/s - 174 in 5.008000s #add 69.9 (?2.9%) i/s - 350 in 5.014000s #search 126.4 (?3.2%) i/s - 640 in 5.069999s #inorder_walk 711.1 (?6.7%) i/s - 3591 in 5.079000s #rev_inorder_walk 693.1 (?11.3%) i/s - 3422 in 5.027000s #minimum 305.3 (?2.0%) i/s - 1530 in 5.013000s #maximum 282.2 (?1.8%) i/s - 1428 in 5.062000s So 2-4x improvement on this benchmark *just* by using invokedynamic. This one is not numeric-heavy, so boxing costs don't come into play as much, but to me the results are incredibly promising. We've also had reports from users of large, heterogeneous applications of at least doubled perf running on indy, and in a couple cases improvements as much as 10x over non-indy perf. I'm very happy with the results so far :) - Charlie - Charlie From headius at headius.com Mon May 14 08:19:40 2012 From: headius at headius.com (Charles Oliver Nutter) Date: Mon, 14 May 2012 10:19:40 -0500 Subject: Boxing, still a limit of invokedynamic? In-Reply-To: <4FB0D143.9090402@gmx.org> References: <4FAFDBA3.8020405@gmx.org> <4FAFF57C.7020305@univ-mlv.fr> <4FB017BE.5060306@gmx.org> <4FB0AC74.1070107@univ-mlv.fr> <4FB0D143.9090402@gmx.org> Message-ID: On Mon, May 14, 2012 at 4:32 AM, Jochen Theodorou wrote: > That it is slower at first is ok. Only I kind of assumed, that such > things can be optimized away. The less the JIT can optimize, the more I > have to do that and work around the limitations, making my runtime more > complex. And with the next JVM update all that work might be for nothing. This is a key point you will have to weigh. You want to have Groovy 2.0 released by this fall, with invokedynamic support. It's certainly possible that your uses of invokedynamic have not gotten the optimization love they need, and that your results won't be to your liking until that happens. Getting JRuby + invokedynamic to work well was the product of several months of back-and-forth between me and the Hotspot guys, tossing assembly dumps around, tweaking inlining budgets, trying out new optimization strategies. The initial performance was terrible, but we all persevered and I continued to wire things up and play with handles. Ultimately we (well, mostly the Hotspot guys) worked out all (well, almost all) of the performance issues I saw; that would not have happened without a lot of back-and-forth. Even though indy isn't "perfect" yet, we're comfortable putting out a JRuby 1.7 release in the next couple months that has invokedynamic support...because it's looking pretty good now, I know that there's more improvements coming, and I'm working to help find those perf opportunities every day. I *strongly* urge you to dig deeper than running benchmarks while investigating this. It sounds like you may be looking at inlining results, but don't trust that either (inlining might happen early on and then get thrown away). Look into the assembly code to see the actual work being done for your indy calls. If it's a lot more than you get with your old logic, there's an opportunity to improve. And when you get to that point and can't figure out why the assembly for your indy stuff is more complicated than the assembly for the non-indy logic, we can help you decide if it's a JVM issue or a Groovy issue :) My attitude has been this: 1. I assume invokedynamic should be fast. 2. If it's not, either I'm doing something wrong or the JVM's doing something wrong. 3. We figure out which one it is and fix it. - Charlie From blackdrag at gmx.org Mon May 14 08:49:05 2012 From: blackdrag at gmx.org (Jochen Theodorou) Date: Mon, 14 May 2012 17:49:05 +0200 Subject: Boxing, still a limit of invokedynamic? In-Reply-To: References: <4FAFDBA3.8020405@gmx.org> <4FAFF57C.7020305@univ-mlv.fr> <4FB017BE.5060306@gmx.org> <4FB0AC74.1070107@univ-mlv.fr> <4FB0D143.9090402@gmx.org> Message-ID: <4FB12971.4090302@gmx.org> Am 14.05.2012 17:19, schrieb Charles Oliver Nutter: > On Mon, May 14, 2012 at 4:32 AM, Jochen Theodorou wrote: >> That it is slower at first is ok. Only I kind of assumed, that such >> things can be optimized away. The less the JIT can optimize, the more I >> have to do that and work around the limitations, making my runtime more >> complex. And with the next JVM update all that work might be for nothing. > > This is a key point you will have to weigh. > > You want to have Groovy 2.0 released by this fall, with invokedynamic > support. This fall? Ahem... ideally it would have been out already by now ;) If possible 1 month only is left.... that is late enough > It's certainly possible that your uses of invokedynamic have > not gotten the optimization love they need, and that your results > won't be to your liking until that happens. that might be, but I find some quite basic things that I really wonder if they are not really of relevance for your cases. The boxing issues I can understand, but that performance drops so drastically (it halves) once you use a catch guard is something I find strange... you are not using that one? > Getting JRuby + > invokedynamic to work well was the product of several months of > back-and-forth between me and the Hotspot guys, tossing assembly dumps > around, tweaking inlining budgets, trying out new optimization > strategies. The initial performance was terrible, but we all > persevered and I continued to wire things up and play with handles. > Ultimately we (well, mostly the Hotspot guys) worked out all (well, > almost all) of the performance issues I saw; that would not have > happened without a lot of back-and-forth. Yes the question is only if they are as eager on supporting my stuff. Nothing against John and all the other Hotspot guys, really, but their time is at least as limited as mine. [...] > And when you get to that point and can't figure out why the assembly > for your indy stuff is more complicated than the assembly for the > non-indy logic, we can help you decide if it's a JVM issue or a Groovy > issue :) well, if you offer to help, I take you by your word ;) > My attitude has been this: > > 1. I assume invokedynamic should be fast. > 2. If it's not, either I'm doing something wrong or the JVM's doing > something wrong. > 3. We figure out which one it is and fix it. yes, that is my approach as well, but you can see on the feedback on my catch exception guard problem, that this doesn't mean I will ever get to a fix in the JVM. It may happen or not... depending on the available time slots. Or maybe I should rant more ? ;) bye Jochen -- Jochen "blackdrag" Theodorou - Groovy Project Tech Lead blog: http://blackdragsview.blogspot.com/ german groovy discussion newsgroup: de.comp.lang.misc For Groovy programming sources visit http://groovy-lang.org From christian.thalinger at oracle.com Tue May 15 18:21:51 2012 From: christian.thalinger at oracle.com (christian.thalinger at oracle.com) Date: Wed, 16 May 2012 01:21:51 +0000 Subject: hg: mlvm/mlvm/hotspot: meth-lazy: fixed performance regression introduced with last commit Message-ID: <20120516012152.0F23D47335@hg.openjdk.java.net> Changeset: 78cf8e98a006 Author: twisti Date: 2012-05-15 18:21 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/hotspot/rev/78cf8e98a006 meth-lazy: fixed performance regression introduced with last commit ! meth-lazy-7023639.jit.patch From christian.thalinger at oracle.com Tue May 15 18:34:37 2012 From: christian.thalinger at oracle.com (christian.thalinger at oracle.com) Date: Wed, 16 May 2012 01:34:37 +0000 Subject: hg: mlvm/mlvm/hotspot: meth-lazy: fixed debug builds Message-ID: <20120516013437.6C4F147338@hg.openjdk.java.net> Changeset: bed66523454f Author: twisti Date: 2012-05-15 18:34 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/hotspot/rev/bed66523454f meth-lazy: fixed debug builds ! meth-lazy-7023639.jit.patch From per at bothner.com Wed May 16 10:55:06 2012 From: per at bothner.com (Per Bothner) Date: Wed, 16 May 2012 10:55:06 -0700 Subject: using MethodHandle in place of virtual methods? Message-ID: <4FB3E9FA.2060501@bothner.com> I'm pondering how Kawa can make use of MethodHandles without (yet) going the full invokedynamic route. Specifically, I'm considering using MethodHandles in place of virtual methods, and I'm hoping someone has some information on the likely costs and benefits. The attachment ProcUsingVirt.java sketches out the current implementation. This uses virtual apply[01N] methods that are overridden by specific Procedure subclasses. (Kawa actually has apply[01234N], but I've simplified.) (See also http://www.gnu.org/software/kawa/internals/procedures.html) The attachment ProcUsingMH.java is a sketch of an alternative implementation where each apply[01N] method has a matching MethodHandle field. The apply[01N] method is now final and just invokes the corresponding MethodHandle. The first issue is that ProcUsingMH requires more fields, and thus more memory. This is mitigated because we don't need space for the virtual method tables. Because Procedure classes are often singletons, this may be a wash. Calling an MethodHandle using invokeExact requires more indirection and more checking than just doing an invokevirtual. Would Hotspot be able to optimize this to roughly the same speed? Presumably it might be a little bit slower on less-optimizing VMs. To avoid one-class-per-function, Kawa will compile a switch-table per module, as sketched by the ProcUsingVirt.Builtins2 class. (see the section named "Closures" in the internals document linked to above.) This is where using MethodHandles seems most winning, since we can do as shown in the ProcUingMH.Builins class. Is this likely to have a performance impact? What if the apply[01N] method is non-final? That might make interoperability easier. For example, it's desirable to support a '--target 6' flag, though supporting both styles in the same runtime may reduce the advantages of either. Finally, it seems that if Kawa in the future makes use of invokedynamic, having the MethodHandles in the Procedure would be an advantage. Comments? Advice? -- --Per Bothner per at bothner.com http://per.bothner.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: ProcUsingVirt.java Type: text/x-java Size: 1717 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20120516/687b21e5/ProcUsingVirt.java -------------- next part -------------- A non-text attachment was scrubbed... Name: ProcUsingMH.java Type: text/x-java Size: 3343 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20120516/687b21e5/ProcUsingMH.java From christian.thalinger at oracle.com Wed May 16 11:56:59 2012 From: christian.thalinger at oracle.com (christian.thalinger at oracle.com) Date: Wed, 16 May 2012 18:56:59 +0000 Subject: hg: mlvm/mlvm/jdk: meth-lazy: removed debug statements Message-ID: <20120516185659.DC11F47362@hg.openjdk.java.net> Changeset: 989b69af27d3 Author: twisti Date: 2012-05-16 11:56 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/jdk/rev/989b69af27d3 meth-lazy: removed debug statements ! meth-lazy-7023639.patch From headius at headius.com Wed May 16 17:52:03 2012 From: headius at headius.com (Charles Oliver Nutter) Date: Wed, 16 May 2012 19:52:03 -0500 Subject: using MethodHandle in place of virtual methods? In-Reply-To: <4FB3E9FA.2060501@bothner.com> References: <4FB3E9FA.2060501@bothner.com> Message-ID: On Wed, May 16, 2012 at 12:55 PM, Per Bothner wrote: > The attachment ProcUsingVirt.java sketches out the current implementation. > This uses virtual apply[01N] methods that are overridden by specific > Procedure subclasses. ?(Kawa actually has apply[01234N], but I've > simplified.) Very similar to JRuby's implementation, in fact. We generate [0123N] with and without closure argument, generally one class per method, either a handle class pointing at a method in a larger class file, or the actual method body in its own small class. > (See also http://www.gnu.org/software/kawa/internals/procedures.html) > > The attachment ProcUsingMH.java is a sketch of an alternative implementation > where each apply[01N] method has a matching MethodHandle field. ?The > apply[01N] > method is now final and just invokes the corresponding MethodHandle. I don't think this would optimize like you're hoping. First off, assuming you're going to use many Procedure object instances with their own handles, that makes the invokeExact calls polymorphic. You're still dispatching through a generic stub every time, and Hotspot can't (currently) inline across it. > Calling an MethodHandle using invokeExact requires more indirection and > more checking than just doing an invokevirtual. ?Would Hotspot be able to > optimize this to roughly the same speed? ?Presumably it might be a little > bit slower on less-optimizing VMs. I'm pretty sure truly constant handles invoked will be inlined, and by truly constant I mean in the constant pool. I doubt final MH fields will optimize the same way, since for optimization purposes Hotspot doesn't (currently) care about final since it can still be tweaked via reflection. There was talk about adding ways to optimize final fields better at last year's JVMLS, but most of the experts agreed it would only really be applicable to non-inlined static finals. Of course things may have changed here, but I think you're still out of luck with this approach. > To avoid one-class-per-function, Kawa will compile a switch-table per > module, > as sketched by the ProcUsingVirt.Builtins2 class. ?(see the section named > "Closures" in the internals document linked to above.) ?This is where using > MethodHandles seems most winning, since we can do as shown in the > ProcUingMH.Builins class. ?Is this likely to have a performance impact? JRuby has had switch-based dispatchers at various times during its life. I believe Jython still uses this mechanism. The main problem with switches, in my experience, is that they very quickly get too big for Hotspot to optimize well. In our case we saw the effect in our interpreter and parser. The interpreter used to be a big switch based on the AST node type. When we moved the individual AST nodes' logic into the nodes themselves, and dispatched virtually rather than through the switch, performance increased many times. The parser also used to be a big switch, at first generated with all code inline (as parser generators are wont to do), then with some post-processing to move the bodies out to separate methods (which gave us a boost because the inlined switch often was to large for Hotspot to compile!), and then finally with the current logic where all bodies are in their own classes, and dispatch uses a table of instance of those classes. The last version, even with indirection through an array and virtual dispatch, has been by far the fastest. > What if the apply[01N] method is non-final? ?That might make > interoperability > easier. ?For example, it's desirable to support a '--target 6' flag, though > supporting both styles in the same runtime may reduce the advantages of > either. In JRuby, AOT-compiled methods directly implement the equivalent of your apply method, and in JIT mode the apply methods dispatch to our one-method-per-class. In invokedynamic mode, all dispatches (will eventually) be direct from call site to a chain of handles leading to the actual target method. Obviously nothing has even come close to the invokedynamic approach on performance. > Finally, it seems that if Kawa in the future makes use of invokedynamic, > having the MethodHandles in the Procedure would be an advantage. We do keep a method handle in each method object, but only so we can more easily retrieve it and bind it to the invokedynamic call site. I don't think your'e going to see the performance gain you're hoping for with the Procedure object that aggregates handles. You really need the invokedynamic call site for the whole thing to optimize together well. - Charlie From john.r.rose at oracle.com Wed May 16 22:57:59 2012 From: john.r.rose at oracle.com (john.r.rose at oracle.com) Date: Thu, 17 May 2012 05:57:59 +0000 Subject: hg: mlvm/mlvm/hotspot: annot: adjust comments Message-ID: <20120517055800.2580647379@hg.openjdk.java.net> Changeset: ee6b2d85bfdd Author: jrose Date: 2012-05-16 22:57 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/hotspot/rev/ee6b2d85bfdd annot: adjust comments ! annot.patch ! annot.txt From john.r.rose at oracle.com Thu May 17 01:42:00 2012 From: john.r.rose at oracle.com (john.r.rose at oracle.com) Date: Thu, 17 May 2012 08:42:00 +0000 Subject: hg: mlvm/mlvm/jdk: meth-lazy: ValueConversions.convertPrimitive fixes & unit test; detune a LF assertion; fix Lookup.unreflect Message-ID: <20120517084200.84F814737D@hg.openjdk.java.net> Changeset: eca1358fc6d5 Author: jrose Date: 2012-05-17 01:41 -0700 URL: http://hg.openjdk.java.net/mlvm/mlvm/jdk/rev/eca1358fc6d5 meth-lazy: ValueConversions.convertPrimitive fixes & unit test; detune a LF assertion; fix Lookup.unreflect + meth-lazy-7023639.mtarg.patch ! meth-lazy-7023639.patch ! series From per at bothner.com Fri May 18 23:04:01 2012 From: per at bothner.com (Per Bothner) Date: Fri, 18 May 2012 23:04:01 -0700 Subject: using MethodHandle in place of virtual methods? In-Reply-To: References: <4FB3E9FA.2060501@bothner.com> Message-ID: <4FB737D1.4080401@bothner.com> On 05/16/2012 05:52 PM, Charles Oliver Nutter wrote: Thanks for a detailed and helpful response! > On Wed, May 16, 2012 at 12:55 PM, Per Bothner wrote: >> The attachment ProcUsingMH.java is a sketch of an alternative implementation >> where each apply[01N] method has a matching MethodHandle field. The >> apply[01N] >> method is now final and just invokes the corresponding MethodHandle. > > I don't think this would optimize like you're hoping. I'm not expecting a big performance gain. I'm hoping for a slight gain getting rid of using a dispatch-switch, partly because it avoids an extra indirection (and virtual dispatch), and partly because (as you mentioned) HotSpot may have trouble optimizing switches, at least large ones. To summarize my questions, before I actually try to implement this: (1) A change to replace a virtual applyX method by a final method that calls a InvokeMethod in a final field: My guess is this would performancewise be more-or-less a wash, with neither major gain or less. Is that likely? (2) My guess is using the new MethodHandle scheme might be slighly more efficient than using a switch on a "procedure-index". It be be significantly more efficient in cases of big switch. Is that likely? I think there may be additional benefits to getting rid of the dispatch-switch: More helpful exception stack traces; plus avoiding the need to generate the dispatch classes. >> Finally, it seems that if Kawa in the future makes use of invokedynamic, >> having the MethodHandles in the Procedure would be an advantage. > > We do keep a method handle in each method object, but only so we can > more easily retrieve it and bind it to the invokedynamic call site. I > don't think your'e going to see the performance gain you're hoping for > with the Procedure object that aggregates handles. You really need the > invokedynamic call site for the whole thing to optimize together well. It probably makes sense to not implement a switch to using MethodHandles until I have a design (roadmap) for using invokedynamic, since the benefit of the former change is at best modest and probably not worth the effort unless Kawa also makes use of invokedynamic. Does that match your advice? -- --Per Bothner per at bothner.com http://per.bothner.com/ From blackdrag at gmx.org Sat May 19 00:23:31 2012 From: blackdrag at gmx.org (Jochen Theodorou) Date: Sat, 19 May 2012 09:23:31 +0200 Subject: performance degeneration from jdk7u2 to jdk7u6? Message-ID: <4FB74A73.7@gmx.org> Hi all, I was about to get a brand new assembly to ask some questions on the list here when I installed the newest available jdk7 update 6. I ran my simple Fibonacci test program and noticed that the time it took was sudden 4.6s, where it was 3.5s before. This plus 1s doesn't look too god to me. Has there been anything special that causes this? I mean before the indy version was a little faster than our call site caching, now it is a little slower. General java performance seems not to be reduced, so I would assume it is special to indy. See http://rifers.org/paste/show/1702 ... so now... is there sombody who can tell me why my indy version is now even slower than my call site version? Sadly I am not really fit enough in assembler anymore, especially the one produced by hotspot to really read that output. So I have no idea what is wrong. Two things I should mention... the call to $getCallSiteArray is surplus, yes, but that doesn't take much time. The only thing I know that really takes a lot of time is the exception guard, but that I have to have until I find a way to avoid it.... and the call site version has both as well of course. They should not count as satisfying explanation. bye Jochen -- Jochen "blackdrag" Theodorou - Groovy Project Tech Lead blog: http://blackdragsview.blogspot.com/ german groovy discussion newsgroup: de.comp.lang.misc For Groovy programming sources visit http://groovy-lang.org From forax at univ-mlv.fr Sat May 19 05:02:49 2012 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Sat, 19 May 2012 14:02:49 +0200 Subject: using MethodHandle in place of virtual methods? In-Reply-To: <4FB737D1.4080401@bothner.com> References: <4FB3E9FA.2060501@bothner.com> <4FB737D1.4080401@bothner.com> Message-ID: <4FB78BE9.3090702@univ-mlv.fr> On 05/19/2012 08:04 AM, Per Bothner wrote: > On 05/16/2012 05:52 PM, Charles Oliver Nutter wrote: > > Thanks for a detailed and helpful response! > >> On Wed, May 16, 2012 at 12:55 PM, Per Bothner wrote: >>> The attachment ProcUsingMH.java is a sketch of an alternative implementation >>> where each apply[01N] method has a matching MethodHandle field. The >>> apply[01N] >>> method is now final and just invokes the corresponding MethodHandle. >> I don't think this would optimize like you're hoping. > I'm not expecting a big performance gain. > I'm hoping for a slight gain getting rid of using a dispatch-switch, > partly because it avoids an extra indirection (and virtual dispatch), and > partly because (as you mentioned) HotSpot may have trouble > optimizing switches, at least large ones. > > To summarize my questions, before I actually try to implement this: > (1) A change to replace a virtual applyX method by a final method > that calls a InvokeMethod in a final field: My guess is this would > performancewise be more-or-less a wash, with neither major gain > or less. Is that likely? > (2) My guess is using the new MethodHandle scheme might be slighly > more efficient than using a switch on a "procedure-index". It be > be significantly more efficient in cases of big switch. Is that likely? > > I think there may be additional benefits to getting rid of the > dispatch-switch: More helpful exception stack traces; > plus avoiding the need to generate the dispatch classes. > >>> Finally, it seems that if Kawa in the future makes use of invokedynamic, >>> having the MethodHandles in the Procedure would be an advantage. >> We do keep a method handle in each method object, but only so we can >> more easily retrieve it and bind it to the invokedynamic call site. I >> don't think your'e going to see the performance gain you're hoping for >> with the Procedure object that aggregates handles. You really need the >> invokedynamic call site for the whole thing to optimize together well. > It probably makes sense to not implement a switch to using MethodHandles > until I have a design (roadmap) for using invokedynamic, since the > benefit of the former change is at best modest and probably not worth the > effort unless Kawa also makes use of invokedynamic. Does that match > your advice? As Charles said, you will not see big gain to use method handle until you use invokedynamic, but I think it worth a test. Here a version of Procedure that doesn't use inheritance at all and that you be more or less backward compatible with code that are already compiled with Procedure. import java.lang.invoke.MethodHandle; import java.lang.invoke.MethodHandles; import java.lang.invoke.MethodHandles.Lookup; import java.lang.invoke.MethodType; class Procedure { public final MethodHandle mh; // Used for classical call. private MethodHandle varargsMH; // Used for varargs call, allocated lazily Procedure(MethodHandle mh) { this.mh = mh; } /** backward compatibility. */ public final Object apply0() throws Throwable { if (mh.type().parameterCount() != 0) throw new RuntimeException(); return mh.invokeExact(); } /** backward compatibility. */ public final Object apply1(Object arg1) throws Throwable { if (mh.type().parameterCount() != 1) throw new RuntimeException(); return mh.invokeExact(arg1); } /** backward compatibility. */ public final Object applyN(Object[] args) throws Throwable { MethodHandle varargsMH = this.varargsMH; if (varargsMH == null) { varargsMH = mh.asSpreader(Object[].class, mh.type().parameterCount()); } return varargsMH.invokeExact(args); } } class Builtins { public static final Object hash(Object arg1) { return arg1.hashCode(); } private static Procedure createProcedure(Lookup lookup, String name, int parameterCount) { MethodHandle mh; try { mh = lookup.findStatic(Builtins.class, name, MethodType.genericMethodType(parameterCount)); } catch (NoSuchMethodException | IllegalAccessException e) { throw (LinkageError)new LinkageError().initCause(e); } return new Procedure(mh); } static final Procedure hash; static { Lookup lookup = MethodHandles.lookup(); hash = createProcedure(lookup, "hash", 1); } } class Main { public static void main(String[] args) throws Throwable { // new way, get the method handle and call System.out.println(Builtins.hash.mh.invokeExact((Object)"foo")); // old way - apply1 System.out.println(Builtins.hash.apply1("foo")); // old way - applyN System.out.println(Builtins.hash.applyN(new Object[]{"foo"})); } } cheers, R?mi From mroos at roos.com Sun May 20 22:34:59 2012 From: mroos at roos.com (Mark Roos) Date: Sun, 20 May 2012 22:34:59 -0700 Subject: performance degeneration from jdk7u2 to jdk7u6? In-Reply-To: <4FB74A73.7@gmx.org> References: <4FB74A73.7@gmx.org> Message-ID: Hi Jochen I ran into a similar issue ( between versions within jdk8 ) where the default compile mode had changed to tieredCompile. This made the benchmark timing inconsistent. I have learned to look closely at the defaults and have moved to specifying quite a few in my batch files. FWIW regards mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20120520/140894ff/attachment.html From blackdrag at gmx.org Mon May 21 01:13:40 2012 From: blackdrag at gmx.org (Jochen Theodorou) Date: Mon, 21 May 2012 10:13:40 +0200 Subject: performance degeneration from jdk7u2 to jdk7u6? In-Reply-To: References: <4FB74A73.7@gmx.org> Message-ID: <4FB9F934.5040509@gmx.org> Am 21.05.2012 07:34, schrieb Mark Roos: > Hi Jochen > > I ran into a similar issue ( between versions within jdk8 ) where the > default compile > mode had changed to tieredCompile. This made the benchmark timing > inconsistent. > > I have learned to look closely at the defaults and have moved to > specifying quite > a few in my batch files. how can you see the defaults? and how can you change to the "old" hotspot engine? I also should mention that even under jdk7u2 the invokedynamic version was only minimally faster than my call site version. That's not what is to be expected of invokedynamic. bye Jochen -- Jochen "blackdrag" Theodorou - Groovy Project Tech Lead blog: http://blackdragsview.blogspot.com/ german groovy discussion newsgroup: de.comp.lang.misc For Groovy programming sources visit http://groovy-lang.org From mroos at roos.com Mon May 21 11:18:41 2012 From: mroos at roos.com (Mark Roos) Date: Mon, 21 May 2012 11:18:41 -0700 Subject: performance degeneration from jdk7u2 to jdk7u6? In-Reply-To: <4FB9F934.5040509@gmx.org> References: <4FB74A73.7@gmx.org> <4FB9F934.5040509@gmx.org> Message-ID: Hi Jochen Since I am using a Mac I can get a wide range of builds to try (almost daily) http://code.google.com/p/openjdk-osx-build/ For the compiler Christian recommended I try -XX:-TieredCompilation which was my problem on jdk8 versions regards mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20120521/a1f7bf00/attachment.html From blackdrag at gmx.org Tue May 22 00:15:23 2012 From: blackdrag at gmx.org (Jochen Theodorou) Date: Tue, 22 May 2012 09:15:23 +0200 Subject: performance degeneration from jdk7u2 to jdk7u6? In-Reply-To: References: <4FB74A73.7@gmx.org> <4FB9F934.5040509@gmx.org> Message-ID: <4FBB3D0B.40600@gmx.org> Am 21.05.2012 20:18, schrieb Mark Roos: > Hi Jochen > > Since I am using a Mac I can get a wide range of builds to try (almost > daily) > http://code.google.com/p/openjdk-osx-build/ > > For the compiler Christian recommended I try > -XX:-TieredCompilation > > which was my problem on jdk8 versions ok, I see, I will try that... Though.. my problem is that in Groovy we don't have always full control over the JVM people run. So it is good if thinks can be tweaked by a command line option, but if the default is bad, it is bad for our indy port and ultimatively for invokedynamic itself bye blackdrag -- Jochen "blackdrag" Theodorou - Groovy Project Tech Lead blog: http://blackdragsview.blogspot.com/ german groovy discussion newsgroup: de.comp.lang.misc For Groovy programming sources visit http://groovy-lang.org From blackdrag at gmx.org Wed May 23 10:50:57 2012 From: blackdrag at gmx.org (Jochen Theodorou) Date: Wed, 23 May 2012 19:50:57 +0200 Subject: performance degeneration from jdk7u2 to jdk7u6? In-Reply-To: <4FB74A73.7@gmx.org> References: <4FB74A73.7@gmx.org> Message-ID: <4FBD2381.9020803@gmx.org> no one helping me on the assembly analysis? Am 19.05.2012 09:23, schrieb Jochen Theodorou: > Hi all, > > I was about to get a brand new assembly to ask some questions on the > list here when I installed the newest available jdk7 update 6. I ran my > simple Fibonacci test program and noticed that the time it took was > sudden 4.6s, where it was 3.5s before. This plus 1s doesn't look too god > to me. Has there been anything special that causes this? I mean before > the indy version was a little faster than our call site caching, now it > is a little slower. General java performance seems not to be reduced, so > I would assume it is special to indy. > > See http://rifers.org/paste/show/1702 > > ... so now... is there sombody who can tell me why my indy version is > now even slower than my call site version? Sadly I am not really fit > enough in assembler anymore, especially the one produced by hotspot to > really read that output. So I have no idea what is wrong. > > Two things I should mention... the call to $getCallSiteArray is surplus, > yes, but that doesn't take much time. The only thing I know that really > takes a lot of time is the exception guard, but that I have to have > until I find a way to avoid it.... and the call site version has both as > well of course. They should not count as satisfying explanation. > > bye Jochen > -- Jochen "blackdrag" Theodorou - Groovy Project Tech Lead blog: http://blackdragsview.blogspot.com/ german groovy discussion newsgroup: de.comp.lang.misc For Groovy programming sources visit http://groovy-lang.org From forax at univ-mlv.fr Wed May 23 14:33:27 2012 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Wed, 23 May 2012 23:33:27 +0200 Subject: performance degeneration from jdk7u2 to jdk7u6? In-Reply-To: <4FBD2381.9020803@gmx.org> References: <4FB74A73.7@gmx.org> <4FBD2381.9020803@gmx.org> Message-ID: <4FBD57A7.3030908@univ-mlv.fr> On 05/23/2012 07:50 PM, Jochen Theodorou wrote: > no one helping me on the assembly analysis? Wow, you have generated the biggest fib function I have ever seen. About the bytecode you generate, as you said you have to remove $getCallSiteArray() because it seems it does some side effects so the JIT is not able to remove it. I don't understand why when you call fib in the body of fib, you are not able to say that the signature is (Object)I. You detect that this is a recursive call (you don't use the same BSM) but it seems you think that because fib can be changed using the meta object protocol, you should type it (Object)Object. But because you are in already fib, you already suppose that at least the return type is int. Basically, you can change the method fib when being in the middle of the execution of fib, because at least one call of fib is on the stack, the new method must have a return type which is compatible with fib. Also, you should never use methods like |DefaultTypeTransformation.intUnbox| because you know that the return type is an int, you should back-propagate it and the return type of plus should be (Object;Object)I Now, the generated code, because of getCallSiteArray(), your real code starts at line 168 and here you start to box the two ints to two Integers to be able to call NumberMath.subtract(Number,Number) which call IntegerMath.substractImpl that unbox them. The VM is not able to remove calls to box / unbox for j.l.Integer. You should generate a must simpler path here. You should never call a class like g.r.typehandling.*Math because all of these methods takes Numbers as parameters. You should create one simple class, with methods like this: static int add(int left, int right) { return left + right; } because it doesn't force you to do the boxing. So you will do the boxing only if it's necessary, i.e. only when the parameter is Object. And to now which method you have to call, instead of relying on getMath() you should use guardWithTest and test only parameters that are Object. cheers, R?mi > > Am 19.05.2012 09:23, schrieb Jochen Theodorou: >> Hi all, >> >> I was about to get a brand new assembly to ask some questions on the >> list here when I installed the newest available jdk7 update 6. I ran my >> simple Fibonacci test program and noticed that the time it took was >> sudden 4.6s, where it was 3.5s before. This plus 1s doesn't look too god >> to me. Has there been anything special that causes this? I mean before >> the indy version was a little faster than our call site caching, now it >> is a little slower. General java performance seems not to be reduced, so >> I would assume it is special to indy. >> >> See http://rifers.org/paste/show/1702 >> >> ... so now... is there sombody who can tell me why my indy version is >> now even slower than my call site version? Sadly I am not really fit >> enough in assembler anymore, especially the one produced by hotspot to >> really read that output. So I have no idea what is wrong. >> >> Two things I should mention... the call to $getCallSiteArray is surplus, >> yes, but that doesn't take much time. The only thing I know that really >> takes a lot of time is the exception guard, but that I have to have >> until I find a way to avoid it.... and the call site version has both as >> well of course. They should not count as satisfying explanation. >> >> bye Jochen >> > From blackdrag at gmx.org Thu May 24 01:38:11 2012 From: blackdrag at gmx.org (Jochen Theodorou) Date: Thu, 24 May 2012 10:38:11 +0200 Subject: performance degeneration from jdk7u2 to jdk7u6? In-Reply-To: <4FBD57A7.3030908@univ-mlv.fr> References: <4FB74A73.7@gmx.org> <4FBD2381.9020803@gmx.org> <4FBD57A7.3030908@univ-mlv.fr> Message-ID: <4FBDF373.90102@gmx.org> Am 23.05.2012 23:33, schrieb R?mi Forax: > On 05/23/2012 07:50 PM, Jochen Theodorou wrote: >> no one helping me on the assembly analysis? > > Wow, you have generated the biggest fib function I have ever seen. > > About the bytecode you generate, as you said you have to remove > $getCallSiteArray() because it seems it does some side effects > so the JIT is not able to remove it. the first time an array is generated, later nothing happens anymore... but yes, this has to go. > I don't understand why when you call fib in the body of fib, > you are not able to say that the signature is (Object)I. You detect > that this is a recursive call (you don't use the same BSM) > but it seems you think that because fib can be changed > using the meta object protocol, you should type it (Object)Object. > But because you are in already fib, you already suppose that > at least the return type is int. Basically, you can change > the method fib when being in the middle of the execution of fib, > because at least one call of fib is on the stack, > the new method must have a return type which is compatible with fib. the method signature is (I)I, it is called with Object in the body, because I cannot ensure that x-1 and x-2 will return an int. Since I cannot know for sure that fib(I)I is called, the result of the recursive fib call is seen as Object. All I know for sure is that the result of "fib(x-1)+fib(x-2)" will be converted to an int later and will cause an exception if the conversion is not possible. But that is the result of the plus, thus you don't exactly need a compatible return type for fib. In for example: int fib(int x) { if (x<2) return 1 this.metaClass.fib = {int i -> i==1?"Forax":"Remi "} String.metaClass.plus = {String b -> delegate.length()+b.length()} return fib(x-1)+fib(x-2) } assert fib(3)== 10 I replace fib inside fib with, well it returns a String, but signature wise I replace it with a method returning Object. String is not compatible with int. And it does not lead to an exception because I also replace String#plus with a version that simply returns the added length of both Strings. So fib(3) will call fib(2) and fib(1), which has the results "Remi " and "Forax". Then I call plus on those results, leading to String#plus, which returns the added lengths, which is 10 and compatible with int. > Also, you should never use methods like > |DefaultTypeTransformation.intUnbox| > because you know that the return type is an int, you should > back-propagate it and the return type of plus should be (Object;Object)I in the original example that is true, yes. That may allow to skip at intUnbox call... but only if I later select a plus method that returns int or Integer. In a different thread I already asked for requirements in that direction and that I get very differing results depending on what signatures I use. No I have at lest the hint, that returning int or Integer might be a good idea. > Now, the generated code, because of getCallSiteArray(), > your real code starts at line 168 and here you start > to box the two ints to two Integers to be able to call > NumberMath.subtract(Number,Number) which call > IntegerMath.substractImpl that unbox them. > The VM is not able to remove calls to box / unbox for j.l.Integer. I see... unfortunate. > You should generate a must simpler path here. > You should never call a class like g.r.typehandling.*Math because > all of these methods takes Numbers as parameters. > You should create one simple class, with methods like this: > static int add(int left, int right) { > return left + right; > } > because it doesn't force you to do the boxing. > So you will do the boxing only if it's necessary, i.e. only > when the parameter is Object. > And to now which method you have to call, instead of > relying on getMath() you should use guardWithTest and > test only parameters that are Object. Object is pretty often the case. Well if I count boxing and unboxing for (a+b)+(c+d), then now I have to box a to d, unbox for the add, box the result, no boxing for the outer plus call, but two unboxing and a boxing for the result and... if the method returns int, a final unboxing. That makes 7 boxing and 7 unboxing. If I use your add(II)I and backpropagate the return type, then I have one boxing each for the inner pluses, two unboxing for the outer plus. Totals to 2 boxing, 2 unboxing. Sounds better... even without backpropagating the call. I will try that out. bye Jochen -- Jochen "blackdrag" Theodorou - Groovy Project Tech Lead blog: http://blackdragsview.blogspot.com/ german groovy discussion newsgroup: de.comp.lang.misc For Groovy programming sources visit http://groovy-lang.org From forax at univ-mlv.fr Thu May 24 04:43:45 2012 From: forax at univ-mlv.fr (=?ISO-8859-1?Q?R=E9mi_Forax?=) Date: Thu, 24 May 2012 13:43:45 +0200 Subject: performance degeneration from jdk7u2 to jdk7u6? In-Reply-To: <4FBDF373.90102@gmx.org> References: <4FB74A73.7@gmx.org> <4FBD2381.9020803@gmx.org> <4FBD57A7.3030908@univ-mlv.fr> <4FBDF373.90102@gmx.org> Message-ID: <4FBE1EF1.4060601@univ-mlv.fr> On 05/24/2012 10:38 AM, Jochen Theodorou wrote: > Am 23.05.2012 23:33, schrieb R?mi Forax: >> On 05/23/2012 07:50 PM, Jochen Theodorou wrote: >>> no one helping me on the assembly analysis? >> Wow, you have generated the biggest fib function I have ever seen. >> >> About the bytecode you generate, as you said you have to remove >> $getCallSiteArray() because it seems it does some side effects >> so the JIT is not able to remove it. > the first time an array is generated, later nothing happens anymore... > but yes, this has to go. [...] > the method signature is (I)I, it is called with Object in the body, > because I cannot ensure that x-1 and x-2 will return an int. Since I > cannot know for sure that fib(I)I is called, the result of the recursive > fib call is seen as Object. All I know for sure is that the result of > "fib(x-1)+fib(x-2)" will be converted to an int later and will cause an > exception if the conversion is not possible. But that is the result of > the plus, thus you don't exactly need a compatible return type for fib. > In for example: > > int fib(int x) { > if (x<2) return 1 > this.metaClass.fib = {int i -> i==1?"Forax":"Remi "} > String.metaClass.plus = {String b -> delegate.length()+b.length()} > return fib(x-1)+fib(x-2) > } > assert fib(3)== 10 > > I replace fib inside fib with, well it returns a String, but signature > wise I replace it with a method returning Object. String is not > compatible with int. And it does not lead to an exception because I also > replace String#plus with a version that simply returns the added length > of both Strings. So fib(3) will call fib(2) and fib(1), which has the > results "Remi " and "Forax". Then I call plus on those results, leading > to String#plus, which returns the added lengths, which is 10 and > compatible with int. yes, you're right, I've forgotten that you can change two methods at the same time. > >> Also, you should never use methods like >> |DefaultTypeTransformation.intUnbox| >> because you know that the return type is an int, you should >> back-propagate it and the return type of plus should be (Object;Object)I > in the original example that is true, yes. That may allow to skip at > intUnbox call... but only if I later select a plus method that returns > int or Integer. In a different thread I already asked for requirements > in that direction and that I get very differing results depending on > what signatures I use. No I have at lest the hint, that returning int or > Integer might be a good idea. if invokedynamic knows more, you can provide a path with less boxing so it's usually better. > >> Now, the generated code, because of getCallSiteArray(), >> your real code starts at line 168 and here you start >> to box the two ints to two Integers to be able to call >> NumberMath.subtract(Number,Number) which call >> IntegerMath.substractImpl that unbox them. >> The VM is not able to remove calls to box / unbox for j.l.Integer. > I see... unfortunate. > >> You should generate a must simpler path here. >> You should never call a class like g.r.typehandling.*Math because >> all of these methods takes Numbers as parameters. >> You should create one simple class, with methods like this: >> static int add(int left, int right) { >> return left + right; >> } >> because it doesn't force you to do the boxing. >> So you will do the boxing only if it's necessary, i.e. only >> when the parameter is Object. >> And to now which method you have to call, instead of >> relying on getMath() you should use guardWithTest and >> test only parameters that are Object. > Object is pretty often the case. Well if I count boxing and unboxing for > (a+b)+(c+d), then now I have to box a to d, unbox for the add, box the > result, no boxing for the outer plus call, but two unboxing and a boxing > for the result and... if the method returns int, a final unboxing. That > makes 7 boxing and 7 unboxing. > > If I use your add(II)I and backpropagate the return type, then I have > one boxing each for the inner pluses, two unboxing for the outer plus. > Totals to 2 boxing, 2 unboxing. Sounds better... even without > backpropagating the call. I will try that out. :) > > bye Jochen > cheers, R?mi From dain at iq80.com Sun May 27 11:12:01 2012 From: dain at iq80.com (Dain Sundstrom) Date: Sun, 27 May 2012 11:12:01 -0700 Subject: Byte code or branching method handle? Message-ID: <5E81A3F2-8874-4A57-87CF-CEB310EE63AD@iq80.com> Hi all, I have a general strategy question about using invoke dynamic. Say I have a dynamic language that has null safe property chaining where "a.b.c.d" results in null if a null is encountered anywhere in the chain. I could implement this two ways, 1) generate byte code that checks for nulls or 2) use a guarded method handle at each step that takes care of the nulls. Either option is ok with me, but I'd rather not write both versions to and then try to figure out which one will make the JVM angry. As a broader question, there are lots of places where I can make this type of decision (e.g., pass by value parameters, copy on assignment value classes), is there a rule of thumb for going with traditional byte code over branching method handles? Thanks, -dain From szegedia at gmail.com Mon May 28 10:45:54 2012 From: szegedia at gmail.com (Attila Szegedi) Date: Mon, 28 May 2012 10:45:54 -0700 Subject: Byte code or branching method handle? In-Reply-To: <5E81A3F2-8874-4A57-87CF-CEB310EE63AD@iq80.com> References: <5E81A3F2-8874-4A57-87CF-CEB310EE63AD@iq80.com> Message-ID: My rule of thumb is: if you know that one concrete code shape will always fit the execution, I'd just emit the bytecode. Invokedynamic is great for cases where the executed code needs to change from time to time. You can also combine it - the invocation of actual property getters "getA()", "getB()" etc. are invokeDynamic linked, and the branching is in bytecode; i.e. if you know that the sequence at that point in the code will always be "a.b.c.d". Attila. On Sun, May 27, 2012 at 11:12 AM, Dain Sundstrom wrote: > Hi all, > > I have a general strategy question about using invoke dynamic. ?Say I have a dynamic language that has null safe property chaining where "a.b.c.d" results in null if a null is encountered anywhere in the chain. ?I could implement this two ways, 1) generate byte code that checks for nulls or 2) use a guarded method handle at each step that takes care of the nulls. ?Either option is ok with me, but I'd rather not write both versions to and then try to figure out which one will make the JVM angry. > > As a broader question, there are lots of places where I can make this type of decision (e.g., pass by value parameters, copy on assignment value classes), is there a rule of thumb for going with traditional byte code over branching method handles? > > Thanks, > > -dain > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From mroos at roos.com Mon May 28 12:40:42 2012 From: mroos at roos.com (Mark Roos) Date: Mon, 28 May 2012 12:40:42 -0700 Subject: Byte code or branching method handle? In-Reply-To: <5E81A3F2-8874-4A57-87CF-CEB310EE63AD@iq80.com> References: <5E81A3F2-8874-4A57-87CF-CEB310EE63AD@iq80.com> Message-ID: Hi Dain This sounds similar to a non local return in smalltalk where some method in a chain returns to the starting method. I did this with a throw and a catch based on the invoker of the chain. Seems to work for me but I will leave it to the experts to tell you the best way for your specific need. regards mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20120528/b9eb05fe/attachment.html From blackdrag at gmx.org Wed May 30 14:16:48 2012 From: blackdrag at gmx.org (Jochen Theodorou) Date: Wed, 30 May 2012 23:16:48 +0200 Subject: performance degeneration from jdk7u2 to jdk7u6? In-Reply-To: <4FBE1EF1.4060601@univ-mlv.fr> References: <4FB74A73.7@gmx.org> <4FBD2381.9020803@gmx.org> <4FBD57A7.3030908@univ-mlv.fr> <4FBDF373.90102@gmx.org> <4FBE1EF1.4060601@univ-mlv.fr> Message-ID: <4FC68E40.4040509@gmx.org> Am 24.05.2012 13:43, schrieb R?mi Forax: [...] > if invokedynamic knows more, you can provide a path with > less boxing so it's usually better. I changed Groovy to get rid of getCallSiteArray and added backpropagation of the return type to the next directly involved method call. So in int fib(int x) { if (x<=1) return 1 return fib(x-1)+fib(x-2) } the plus will now have directly return type int, instead of Object. In a first iteration I made a primitives taking minus method, that also returns int. And nice, runtime is down from 3.5 (on update 2) to 2.5 (no tiered compilation and update 6). That is not yet making really use of the type back propagation, so in the next iteration I added a (II)I plus method as well. Before the callsite target type was (II)Object, now it is (Object,Object)I, so only little I can save on boxing, but let us see... 2.1s! And it stabilizes much faster than before too. That's almost as much gain as before (in percent). Certainly more than I thought. Had I only used the plus method and not the minus method as well, I would have ended up with 4.2s. only both together do make it that fast now. A behaviour I noticed with primitive optimizations as well. An optimization done in isolation can make things slower or does show only little gain, but in combination they are suddenly much better. Another interesting aspect is, now I don't see the slowdown through tiered compilation anymore. The times are more or less equal with and without tiered compilation, while before it was always slower with tiered compilation not disabled (in update 6 it is on by default). So the current state of my fib program is: indy: 2.1s primopts: 1.2s callsite caching: 4s Now I know that using catchException is causing quite a problem for indy, so I removed that guard for the math operations plus and minus. It is legal, since in those cases I can be certain I don't need it. And now indy is at 1.2s! That means primopts and indy are no on par. That's very cool! I mean I expected indy to come near indy, but to actually get on par with it... I would never have thought that is possible. The assembly is still quite big: http://rifers.org/paste/show/1717 But much better already. bye Jochen -- Jochen "blackdrag" Theodorou - Groovy Project Tech Lead blog: http://blackdragsview.blogspot.com/ german groovy discussion newsgroup: de.comp.lang.misc For Groovy programming sources visit http://groovy-lang.org From nerdscentral at gmail.com Sun May 27 11:49:35 2012 From: nerdscentral at gmail.com (Alexander Turner) Date: Sun, 27 May 2012 18:49:35 -0000 Subject: Byte code or branching method handle? In-Reply-To: <5E81A3F2-8874-4A57-87CF-CEB310EE63AD@iq80.com> References: <5E81A3F2-8874-4A57-87CF-CEB310EE63AD@iq80.com> Message-ID: HI Dain, I would suggest using byte code for everything that byte code and falling back to invoke dynamic only if there is no alternative. - AJ On 27 May 2012 19:12, Dain Sundstrom wrote: > Hi all, > > I have a general strategy question about using invoke dynamic. Say I have > a dynamic language that has null safe property chaining where "a.b.c.d" > results in null if a null is encountered anywhere in the chain. I could > implement this two ways, 1) generate byte code that checks for nulls or 2) > use a guarded method handle at each step that takes care of the nulls. > Either option is ok with me, but I'd rather not write both versions to and > then try to figure out which one will make the JVM angry. > > As a broader question, there are lots of places where I can make this type > of decision (e.g., pass by value parameters, copy on assignment value > classes), is there a rule of thumb for going with traditional byte code > over branching method handles? > > Thanks, > > -dain > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20120527/3a034a49/attachment-0001.html